├── .gitignore
├── DOCS.md
├── LICENSE
├── README.md
├── TODO.md
├── demos
    ├── objdump
    │   └── objdump.py
    └── patching
    │   ├── patch.py
    │   ├── patched
    │   ├── thing
    │   └── thing.c
├── dispatch
    ├── __init__.py
    ├── analysis
    │   ├── __init__.py
    │   ├── arm_analyzer.py
    │   ├── base_analyzer.py
    │   └── x86_analyzer.py
    ├── constructs.py
    ├── enums.py
    ├── formats
    │   ├── SectionDoubleP.py
    │   ├── __init__.py
    │   ├── base_executable.py
    │   ├── elf_executable.py
    │   ├── macho_executable.py
    │   ├── pe_executable.py
    │   └── section.py
    └── util
    │   ├── __init__.py
    │   └── trie.py
├── setup.py
└── tests
    ├── analyze_one.py
    ├── binaries
        ├── arm32
        │   ├── conditions-static.elf
        │   ├── conditions.elf
        │   ├── functions-static.elf
        │   ├── functions.elf
        │   ├── hello-static.elf
        │   ├── hello.elf
        │   ├── switch-static.elf
        │   ├── switch.elf
        │   ├── test2-static.elf
        │   └── test2.elf
        ├── src
        │   ├── conditions.c
        │   ├── functions.c
        │   ├── hello.c
        │   ├── switch.c
        │   └── test2.c
        ├── x86
        │   ├── conditions-static.elf
        │   ├── conditions.elf
        │   ├── conditions.macho
        │   ├── conditions.pe
        │   ├── functions-static.elf
        │   ├── functions.elf
        │   ├── functions.macho
        │   ├── functions.pe
        │   ├── hello-static.elf
        │   ├── hello.elf
        │   ├── hello.macho
        │   ├── hello.pe
        │   ├── switch-static.elf
        │   ├── switch.elf
        │   ├── switch.macho
        │   ├── switch.pe
        │   ├── test2-static.elf
        │   ├── test2.elf
        │   ├── test2.macho
        │   └── test2.pe
        └── x86_64
        │   ├── conditions-static.elf
        │   ├── conditions.elf
        │   ├── conditions.macho
        │   ├── conditions.pe
        │   ├── functions-static.elf
        │   ├── functions.elf
        │   ├── functions.macho
        │   ├── functions.pe
        │   ├── hello-static.elf
        │   ├── hello.elf
        │   ├── hello.macho
        │   ├── hello.pe
        │   ├── switch-static.elf
        │   ├── switch.elf
        │   ├── switch.macho
        │   ├── switch.pe
        │   ├── test2-static.elf
        │   ├── test2.elf
        │   ├── test2.macho
        │   └── test2.pe
    ├── test_analysis.py
    └── test_injection.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | .idea/
3 | build/
4 | dist/
5 | dispatch.egg-info/
6 | 


--------------------------------------------------------------------------------
/DOCS.md:
--------------------------------------------------------------------------------
  1 | # dispatch Docs
  2 | 
  3 | Though this code is reasonably well commented/logged (esp. the base classes), I wanted to write up some basic theory and broad docs to make contributing easier.
  4 | 
  5 | So here we go.
  6 | 
  7 | ## Class breakdown
  8 | 
  9 | Larger classes (i.e. Executable and Analyzer) have a single base class which defines required functions for the subclasses and also provides basic implementations for a few helper functions.
 10 | 
 11 | ### Executable
 12 | 
 13 | The executable class should be subclassed for each file format to be supported. Currently, we provide executable parsers for the 3 most common binary types:
 14 | 
 15 | * [ELF](./dispatch/formats/elf_executable.py) (used by Linux, Solaris, the BSDs, etc.)
 16 | * [PE](./dispatch/formats/pe_executable.py) (used by Windows)
 17 | * [MachO](./dispatch/formats/macho_executable.py) (used by OS X)
 18 | 
 19 | It is preferred to use existing (license compatible) libraries to do the low-level executable parsing to reduce errors that we could make and keep the codebase small.
 20 | 
 21 | #### Purpose
 22 | 
 23 | The executable classes are responsible for parsing the executable, handing off the "chunks" of the binary to the analyzer, and doing the binary rewriting part of the patching.
 24 | 
 25 | The executable classes currently extract and keep the following:
 26 | 
 27 | * Segments/sections
 28 | * Referenced libraries
 29 | * Symbol table(s)
 30 | * Strings
 31 | 
 32 | The executable classes also keep an array for the functions of the binary, however it is up to the analyzer to identify and store those.
 33 | 
 34 | ### Analyzer
 35 | 
 36 | The analyzer class should be subclassed for each architecture to be supported. If two architectures are very similar (e.g. x86/i386 and x86\_64), they should be put into one file. Also if possible, a superset architecture (e.g. x86\_64) should subclass the "simpler" subset architecture (e.g. x86).
 37 | 
 38 | We currently provide analysis classes for 4 architectures:
 39 | 
 40 | * [x86](./dispatch/analysis/x86_analyzer.py)
 41 | * [x86\_64](./dispatch/analysis/x86_analyzer.py)
 42 | * [ARM](./dispatch/analysis/arm_analyzer.py)
 43 | * [AArch64](./dispatch/analysis/arm_analyzer.py) (a.k.a. ARM64)
 44 | 
 45 | Currently, all of these analyzers are based around the [capstone engine](https://github.com/aquynh/capstone), but any disassembler could be used with minimal effort required to switch.
 46 | 
 47 | #### Purpose
 48 | 
 49 | The analyzer classes are responsible for doing the actual analysis of binaries:
 50 | 
 51 | * Disassembling the binary
 52 | * Identifying constructs in a binary (e.g. functions, basic blocks, jump tables)
 53 | * Generating CFGs
 54 | 
 55 | The analyzer also provides architecture-specific helper methods and constants for use in patching (e.g. `REG_NAMES`, `IP_REGS`, `SP_REGS`, `NOP_INSTRUCTION`)
 56 | 
 57 | ## Loading & Analysis Flow
 58 | 
 59 | The following is a breakdown of what happens when a binary is loaded and analyzed:
 60 | 
 61 | 1. `read_executable` (in [\_\_init\_\_.py](./dispatch/__init__.py)) identifies the binary format based on starting magic bytes.
 62 | 2. The initializer for the found format is called which loads the binary into its helper (e.g. [pyelftools](https://github.com/eliben/pyelftools)) for parsing
 63 | 3. The format initializer parses out some basic information from the loaded binary and stores it for further use (e.g. the sections/segments of the binary, which segment is the main read&executable segment, etc.)
 64 | 4. `analyze()` (defined in [base\_analyzer.py](./dispatch/analysis/base_analyzer.py) is called by a script on the returned executable instance, which...
 65 | 5. Disassembles the binary into a Trie for quick lookups
 66 | 6. Asks the executable to parse and store the symbol table
 67 | 7. Identifies functions through a couple of methods (see below)
 68 | 8. Populates the (empty) functions with Instructions
 69 | 9. Does basic block analysis on the (now populated) Functions
 70 | 10. Marks cross-references
 71 | 11. Marks strings
 72 | 
 73 | Once this is done, everything in the binary has been setup and can be used.
 74 | 
 75 | ## Implementation Notes
 76 | 
 77 | ### Function analysis
 78 | 
 79 | Currently, functions are marked in two ways:
 80 | 
 81 | 1. Through symbol tables (if applicable)
 82 | 2. Through prologue/epilogue matching
 83 | 
 84 | Since symbol tables and prologue/epilogue matching occur at different times, the binaries' `.functions` array is filled with what are essentially placeholder functions (i.e. functions without instructions stored) until the functions are formally populated (step 8 above).
 85 | 
 86 | The need for this two-step find and fill processs will be completely removed soon when a single structure represents all bytes in the binary along with what they represent.
 87 | Basically instead of a Function having a normal array, the array will actually just be a view into this backing datastructure (since the offset and size is already known).
 88 | This will fix a lot of potential issues stemming from arrays not being synchronized and whatnot, and will allow for something like the following to work:
 89 | 
 90 | ```python
 91 | main = executable.function_named('main')
 92 | main.bbs[0].instructions[0] = '\xcc'
 93 | main.save('modified')
 94 | ```
 95 | 
 96 | 
 97 | ### Patching
 98 | 
 99 | #### ELF
100 | 
101 | As noted, we use a method derived from [http://vxheavens.com/lib/vsc01.html](http://vxheavens.com/lib/vsc01.html).
102 | 
103 | #### MachO
104 | 
105 | MachO's are very kind and provide us with room to just drop in a new section because of the large amount of padding after the headers and before the rest of the binary.
106 | All we have to do as create the new load command and have it point to the end of the executable where we drop our (address aligned) injected code.
107 | 
108 | #### PE
109 | 
110 | Since we are already using [pefile](https://github.com/erocarrera/pefile), we are able to let [SectionDoubleP](http://git.n0p.cc/?p=SectionDoubleP.git;a=summary) do the heavy lifting of adding a new section.
111 | 
112 | 
113 | ### Why a Trie?
114 | 
115 | Because it gives us a quick way to do fast (i.e. non-linear time) lookups, while also providing a way to get ranges of the binary without a linear search.
116 | 
117 | ### X-Ref detection
118 | 
119 | Currently we do _very_ simplistic x-ref detection by finding any instruction operands that happen to be immediates (i.e. set values) and that happen to land in mapped virtual memory.
120 | While this is potentially error-prone, it seems to work very well in practice, and so we haven't seen a need to improve it yet.
121 | 
122 | ### String detection
123 | 
124 | Similar to x-ref detection, string detection is very simplistic: any time 3 or more printable characters appear in a row in certain sections, it is marked as a string.
125 | Again, while this is definitely error-prone, it seems to end up working just fine in almost all cases so far, so we haven't seen a need to improve it.


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2016 NYU OSIRIS Lab
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | Dispatch
 2 | ========
 3 | 
 4 | Programmatic binary disassembly and patching
 5 | 
 6 | ## Features
 7 | * Support for all 3 common executable formats (ELF, MachO, PE)
 8 | * Support for x86(-64) and ARM (including AArch64)
 9 |     * MIPS eventually
10 | 
11 | ## Quick Example
12 | ```python
13 | import dispatch
14 | ex = dispatch.read_executable('/bin/cat')
15 | ex.analyze()
16 | print ex.functions
17 | ```
18 | 


--------------------------------------------------------------------------------
/TODO.md:
--------------------------------------------------------------------------------
 1 | * Store read/written registers in Instruction
 2 |     * Use these to properly implement references_ip() and references_sp()
 3 | * Change MachO and PE replace_instructions() to the new format (args: vaddr, asm)
 4 | * Load binary from stream
 5 | * Stop CFG flow after call to exit()
 6 | * ARM analysis
 7 | * Improve x86 function analysis with flow analysis
 8 | * Generators for common instructions for all platforms (jump, call)
 9 | * Shift to having a single mmap backed instance of the binary with everything providing views into that data
10 | 


--------------------------------------------------------------------------------
/demos/objdump/objdump.py:
--------------------------------------------------------------------------------
 1 | from dispatch import *
 2 | from sys import argv
 3 | 
 4 | def main():
 5 |     if len(argv) < 2:
 6 |         print "Usage: python objdump.py [binary]"
 7 |         return
 8 | 
 9 |     exe = read_executable(argv[1])
10 |     exe.analyze()
11 | 
12 |     for function in exe.iter_functions():
13 |         print "{:08x} <{}>:".format(function.address, function.name)
14 |         for ins in function.instructions:
15 |             ins_bytes = ' '.join(["{:02x}".format(x) for x in ins.raw])
16 |             print "    {:08x}\t{:<20}\t{!s}".format(ins.address, ins_bytes, ins)
17 |         print "" # newline for space
18 | 
19 | if __name__ == '__main__':
20 |     main()
21 | 


--------------------------------------------------------------------------------
/demos/patching/patch.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from dispatch import *
 3 | 
 4 | # Read in our executable
 5 | exe = read_executable('thing')
 6 | # ... and analyze it
 7 | exe.analyze()
 8 | 
 9 | # Find the main function (main for linux, _main for OS X)
10 | main = exe.function_named('main') or exe.function_named('_main')
11 | 
12 | for i in main.instructions:
13 |     # Find the first jne which happens to be the "winner" check
14 |     if i.mnemonic == 'jne':
15 |         ins = i
16 |         exe.replace_at(i.address, '') # NOP it out
17 |         exe.save('patched') # Save
18 |         os.system("chmod +x patched") # and make the patched binary executable
19 | 
20 |         break
21 | 


--------------------------------------------------------------------------------
/demos/patching/patched:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/demos/patching/patched


--------------------------------------------------------------------------------
/demos/patching/thing:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/demos/patching/thing


--------------------------------------------------------------------------------
/demos/patching/thing.c:
--------------------------------------------------------------------------------
 1 | #include <stdio.h>
 2 | 
 3 | int main() {
 4 |     int i;
 5 |     scanf("%d", &i);
 6 |     if (i == 0x1337) {
 7 |         printf("Winner!");
 8 |     } else {
 9 |         printf("Nope!");
10 |     }
11 |     return 0;
12 | }
13 | 


--------------------------------------------------------------------------------
/dispatch/__init__.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import os
 3 | 
 4 | from .formats.elf_executable import ELFExecutable
 5 | from .formats.pe_executable import PEExecutable
 6 | from .formats.macho_executable import MachOExecutable
 7 | 
 8 | from .enums import *
 9 | 
10 | MAGICS = {'\x7f\x45\x4c\x46': FORMAT.ELF,
11 |           '\x4d\x5a': FORMAT.PE,
12 |           '\x50\x45\x00\x00': FORMAT.PE,
13 |           '\xFE\xED\xFA\xCE': FORMAT.MACH_O,
14 |           '\xFE\xED\xFA\xCF': FORMAT.MACH_O,
15 |           '\xCE\xFA\xED\xFE': FORMAT.MACH_O,
16 |           '\xCF\xFA\xED\xFE': FORMAT.MACH_O}
17 | 
18 | def _identify_format(fh):
19 |     maxlen = max([len(m) for m in MAGICS])
20 | 
21 |     fh.seek(0)
22 |     header = fh.read(maxlen)
23 | 
24 |     for m in MAGICS:
25 |         if header.startswith(m):
26 |             return MAGICS[m]
27 | 
28 |     return None
29 | 
30 | def read_executable(file_path):
31 |     if not os.path.exists(file_path):
32 |         raise Exception('No such file')
33 | 
34 |     fmt = _identify_format(open(file_path, 'rb'))
35 | 
36 |     if fmt == FORMAT.ELF:
37 |         exe = ELFExecutable(file_path)
38 |     elif fmt == FORMAT.PE:
39 |         exe = PEExecutable(file_path)
40 |     elif fmt == FORMAT.MACH_O:
41 |         exe = MachOExecutable(file_path)
42 |     else:
43 |         raise Exception('Could not determine executable format.')
44 | 
45 |     logging.info('Extracting symbol table')
46 |     exe._extract_symbol_table()
47 | 
48 |     return exe


--------------------------------------------------------------------------------
/dispatch/analysis/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/dispatch/analysis/__init__.py


--------------------------------------------------------------------------------
/dispatch/analysis/arm_analyzer.py:
--------------------------------------------------------------------------------
  1 | import capstone
  2 | from capstone import *
  3 | from capstone.arm_const import *
  4 | from Queue import Queue
  5 | import struct
  6 | 
  7 | from ..constructs import *
  8 | from .base_analyzer import BaseAnalyzer
  9 | 
 10 | class ARM_Analyzer(BaseAnalyzer):
 11 |     def __init__(self, executable):
 12 |         super(ARM_Analyzer, self).__init__(executable)
 13 | 
 14 |         if self.executable.entry_point() & 0x1:
 15 |             self._disassembler = Cs(CS_ARCH_ARM, CS_MODE_THUMB)
 16 |         else:
 17 |             self._disassembler = Cs(CS_ARCH_ARM, CS_MODE_ARM)
 18 | 
 19 |         self._disassembler.detail = True
 20 |         self._disassembler.skipdata = True
 21 | 
 22 |         self.REG_NAMES = dict([(v,k[8:].lower()) for k,v in capstone.arm_const.__dict__.iteritems() if k.startswith('ARM_REG')])
 23 |         self.IP_REGS = set([11])
 24 |         self.SP_REGS = set([12])
 25 |         self.NOP_INSTRUCTION = '\x00\x00\x00\x00'
 26 | 
 27 |     def _gen_ins_map(self):
 28 |         # Again, since ARM binaries can have code using both instruction sets, we basically have to make a CFG and
 29 |         # disassemble each BB as we find them.
 30 | 
 31 |         # vaddr -> disassembly type
 32 |         bb_disasm_mode = {}
 33 | 
 34 |         # If we find a constants table (used for pc-relative ld's), mark it as a known end because it always comes after
 35 |         # the end of a BB/function
 36 |         known_ends = set()
 37 | 
 38 |         entry = self.executable.entry_point()
 39 | 
 40 |         if entry & 0b1:
 41 |             initial_mode = CS_MODE_THUMB
 42 |         else:
 43 |             initial_mode = CS_MODE_ARM
 44 | 
 45 |         entry &= ~0b1
 46 | 
 47 |         to_analyze = Queue()
 48 |         to_analyze.put((entry, initial_mode, ))
 49 | 
 50 |         bb_disasm_mode[entry] = initial_mode
 51 | 
 52 |         # TODO: make this much cleaner, not use raw mnemonic checks, etc
 53 |         while not to_analyze.empty():
 54 |             start_vaddr, mode = to_analyze.get()
 55 | 
 56 |             self._disassembler.mode = mode
 57 | 
 58 |             logging.debug('Analyzing code at address {} in {} mode'
 59 |                           .format(hex(start_vaddr), 'thumb' if mode == CS_MODE_THUMB else 'arm'))
 60 | 
 61 |             # Stop at either the next BB listed or the end of the section
 62 |             cur_section = self.executable.section_containing_vaddr(start_vaddr)
 63 |             section_end_vaddr = cur_section.vaddr + cur_section.size
 64 |             end_vaddr = min([a for a in bb_disasm_mode if a > start_vaddr] or [section_end_vaddr])
 65 | 
 66 |             # Force the low bit 0
 67 |             start_vaddr &= ~0b1
 68 | 
 69 |             code = self.executable.get_binary_vaddr_range(start_vaddr, end_vaddr)
 70 | 
 71 |             for ins in self._disassembler.disasm(code, start_vaddr):
 72 |                 if ins.id == 0:  # We hit a data byte, so we must have gotten to the end of this bb/function
 73 |                     break
 74 |                 elif ins.address in known_ends:  # At a constants table, so we know we're at the end of a bb/function
 75 |                     break
 76 | 
 77 |                 our_ins = instruction_from_cs_insn(ins, self.executable)
 78 |                 self.ins_map[ins.address] = our_ins
 79 | 
 80 |                 if self._insn_is_epilogue(our_ins):
 81 |                     break
 82 | 
 83 |                 # Branch immediate
 84 |                 if ins.mnemonic.startswith('b') and ins.operands[-1].type == CS_OP_IMM:
 85 |                     jump_dst = ins.operands[-1].imm
 86 | 
 87 |                     if self.executable.vaddr_is_executable(jump_dst) and jump_dst not in bb_disasm_mode:
 88 |                         if 'x' in ins.mnemonic:
 89 |                             next_mode = CS_MODE_THUMB if jump_dst & 0x1 else CS_MODE_ARM
 90 |                         else:
 91 |                             next_mode = mode
 92 | 
 93 |                         jump_dst &= ~0b1
 94 | 
 95 |                         logging.debug('Found branch to address {} in instruction at {}'
 96 |                                       .format(hex(int(jump_dst)), hex(int(ins.address))))
 97 |                         bb_disasm_mode[jump_dst] = next_mode
 98 |                         to_analyze.put((jump_dst, next_mode, ))
 99 | 
100 |                 # load/move function address as in the case of libc_start_main
101 |                 elif ins.mnemonic.startswith('ld') or ins.mnemonic.startswith('mov'):
102 |                     # load/move immediate
103 |                     if ins.operands[-1].type == CS_OP_IMM and self.executable.vaddr_is_executable(ins.operands[-1].imm):
104 |                         referenced_addr = ins.operands[-1].imm
105 |                         if referenced_addr not in bb_disasm_mode:
106 |                             logging.debug('Found reference to executable address {} in instruction at {}'
107 |                                           .format(hex(int(referenced_addr)), hex(int(ins.address))))
108 | 
109 |                             next_mode = CS_MODE_THUMB if referenced_addr & 0x1 else CS_MODE_ARM
110 |                             referenced_addr &= ~0b1
111 |                             bb_disasm_mode[referenced_addr] = next_mode
112 |                             to_analyze.put((referenced_addr, next_mode, ))
113 | 
114 |                     # load/move PC-relative entry
115 |                     elif ins.operands[-1].type == CS_OP_MEM and ins.operands[-1].mem.base == ARM_REG_PC:
116 |                         '''
117 |                         ARM THUMB Instruction Set sec. 5.6.1:
118 | 
119 |                         Note: The value specified by #Imm is a full 10-bit address, but must always be word-aligned
120 |                         (ie with bits 1:0 set to 0), since the assembler places #Imm >> 2 in field Word8.
121 | 
122 |                         Note: The value of the PC will be 4 bytes greater than the address of this instruction, but bit
123 |                         1 of the PC is forced to 0 to ensure it is word aligned.
124 |                         '''
125 |                         ptr = (ins.address + 4 + ins.operands[-1].mem.disp) & (~0b11)
126 | 
127 |                         known_ends.add(ptr)
128 | 
129 |                         referenced_bytes = self.executable.get_binary_vaddr_range(ptr, ptr + self.executable.address_length())
130 |                         referenced_addr = struct.unpack(self.executable.pack_endianness + self.executable.address_pack_type,
131 |                                                         referenced_bytes)[0]
132 | 
133 |                         if self.executable.vaddr_is_executable(referenced_addr):
134 |                             logging.debug('Found reference to address {} through const table at {} in instruction at {}'
135 |                                           .format(hex(int(referenced_addr)), hex(int(ptr)), hex(int(ins.address))))
136 | 
137 |                             if referenced_addr not in bb_disasm_mode:
138 |                                 next_mode = CS_MODE_THUMB if referenced_addr & 0x1 else CS_MODE_ARM
139 |                                 referenced_addr &= ~0b1
140 |                                 bb_disasm_mode[referenced_addr] = next_mode
141 |                                 to_analyze.put((referenced_addr, next_mode, ))
142 | 
143 |         self._disasm_mode = bb_disasm_mode
144 | 
145 |     def disassemble_range(self, start_vaddr, end_vaddr):
146 |         if start_vaddr & 0x1:
147 |             self._disassembler.mode = CS_MODE_THUMB
148 |         else:
149 |             self._disassembler.mode = CS_MODE_ARM
150 | 
151 |         start_vaddr &= ~0b1
152 | 
153 |         size = end_vaddr - start_vaddr
154 |         self.executable.binary.seek(self.executable.vaddr_binary_offset(start_vaddr))
155 | 
156 |         instructions = []
157 | 
158 |         for ins in self._disassembler.disasm(self.executable.binary.read(size), start_vaddr):
159 |             if ins.id:
160 |                 instructions.append(instruction_from_cs_insn(ins, self.executable))
161 | 
162 |         return instructions
163 | 
164 |     def _insn_is_epilogue(self, ins):
165 |         """
166 |         Determines whether the instruction is a typical function epilogue
167 |         :param ins: Instruction to test
168 |         :return: True if the instruction is an epilogue
169 |         """
170 | 
171 |         # b** {..., lr}
172 |         if ins.mnemonic.startswith('b') and ins.operands[0].type == Operand.REG and \
173 |             ins.operands[0].reg == ARM_REG_LR:
174 |             return True
175 | 
176 |         # pop {..., pc}
177 |         elif ins.mnemonic == 'pop' and \
178 |             any(o.reg == ARM_REG_PC for o in ins.operands if o.type == Operand.REG):
179 |             return True
180 | 
181 |         return False
182 | 
183 |     def _identify_functions(self):
184 |         STATE_NOT_IN_FUNC, STATE_IN_FUNCTION = 0, 1
185 | 
186 |         state = STATE_NOT_IN_FUNC
187 | 
188 |         cur_func = None
189 | 
190 |         for cur_ins in self.ins_map:
191 |             if cur_ins.address in self.executable.functions:
192 |                 state = STATE_IN_FUNCTION
193 |                 cur_func = self.executable.functions[cur_ins.address]
194 | 
195 |                 logging.debug('Analyzing function {} with pre-populated size {}'.format(cur_func, cur_func.size))
196 | 
197 |                 if not cur_func.size:
198 |                     # Function from symtab has no size, so start to keep track of it
199 |                     cur_func.size += cur_ins.size
200 | 
201 |             elif cur_func and cur_func.contains_address(cur_ins.address):
202 |                 # ARM sometimes stores pointers to various things after the function body, but this data is included in
203 |                 # ELF's (and maybe others) symbol size, so we have to actively look for the actual end of the function.
204 | 
205 |                 if self._insn_is_epilogue(cur_ins):
206 |                     state = STATE_NOT_IN_FUNC
207 |                     logging.debug('Identified function epilogue at {}'.format(hex(cur_ins.address)))
208 |                     cur_func.size -= (cur_func.address + cur_func.size) - (cur_ins.address + cur_ins.size)
209 |                     cur_func = None
210 | 
211 |             elif state == STATE_NOT_IN_FUNC and cur_ins.mnemonic == 'push' and \
212 |                     any(o.reg == ARM_REG_LR for o in cur_ins.operands if o.type == Operand.REG):
213 | 
214 |                 state = STATE_IN_FUNCTION
215 |                 logging.debug(
216 |                     'Identified function by prologue at {} with prologue instruction {}'.format(hex(cur_ins.address),
217 |                                                                                                 cur_ins))
218 | 
219 |                 cur_func = Function(cur_ins.address,
220 |                                     cur_ins.size,
221 |                                     'sub_' + hex(cur_ins.address)[2:],
222 |                                     self.executable)
223 | 
224 |             elif state == STATE_IN_FUNCTION and self._insn_is_epilogue(cur_ins):
225 |                 state = STATE_NOT_IN_FUNC
226 |                 cur_func.size += cur_ins.size
227 | 
228 |                 logging.debug('Identified function epilogue at {}'.format(hex(cur_ins.address)))
229 | 
230 |                 self.executable.functions[cur_func.address] = cur_func
231 | 
232 |                 cur_func = None
233 | 
234 |             elif state == STATE_IN_FUNCTION:
235 |                 cur_func.size += cur_ins.size
236 | 
237 |     def cfg(self):
238 |         edges = set()
239 | 
240 |         for f in self.executable.iter_functions():
241 |             if f.type == Function.NORMAL_FUNC:
242 |                 for ins in f.instructions:
243 |                     if ins.is_call() and ins.operands[-1].type == Operand.IMM:
244 |                         call_addr = ins.operands[-1].imm
245 |                         if self.executable.vaddr_is_executable(call_addr):
246 |                             edge = CFGEdge(ins.address, call_addr, CFGEdge.CALL)
247 |                             edges.add(edge)
248 | 
249 |                 for cur_bb in f.bbs:
250 |                     last_ins = cur_bb.instructions[-1]
251 | 
252 |                     if last_ins.is_jump():
253 |                         if last_ins.operands[-1].type == Operand.IMM:
254 |                             jmp_addr = last_ins.operands[-1].imm
255 | 
256 |                             if self.executable.vaddr_is_executable(jmp_addr):
257 |                                 if last_ins.mnemonic == 'b' or last_ins.mnemonic == 'bx':
258 |                                     edge = CFGEdge(last_ins.address, jmp_addr, CFGEdge.DEFAULT)
259 |                                     edges.add(edge)
260 |                                 else:  # Conditional jump
261 |                                     # True case
262 |                                     edge = CFGEdge(last_ins.address, jmp_addr, CFGEdge.COND_JUMP, True)
263 |                                     edges.add(edge)
264 | 
265 |                                     # Default/fall-through case
266 |                                     next_addr = last_ins.address + last_ins.size
267 |                                     edge = CFGEdge(last_ins.address, next_addr, CFGEdge.COND_JUMP, False)
268 |                                     edges.add(edge)
269 |                     elif last_ins != f.instructions[-1]:
270 |                         # Otherwise, if we're just at the end of a BB that's not the end of the function, just fall
271 |                         # through to the next of the instruction
272 |                         edge = CFGEdge(last_ins.address, last_ins.address + last_ins.size, CFGEdge.DEFAULT)
273 |                         edges.add(edge)
274 | 
275 |         return edges
276 | 
277 | 
278 | class ARM_64_Analyzer(ARM_Analyzer):
279 |     def __init__(self, executable):
280 |         super(ARM_64_Analyzer, self).__init__(executable)
281 | 
282 |         if self.executable.entry_point() & 0x1:
283 |             self._disassembler = Cs(CS_ARCH_ARM64, CS_MODE_THUMB)
284 |         else:
285 |             self._disassembler = Cs(CS_ARCH_ARM64, CS_MODE_ARM)
286 | 
287 |         self._disassembler.detail = True
288 |         self._disassembler.skipdata = True
289 | 
290 |         self.REGISTER_NAMES = dict([(v,k[10:].lower()) for k,v in capstone.arm64_const.__dict__.iteritems() if k.startswith('ARM64_REG')])
291 |         self.IP_REGS = set()
292 |         self.SP_REGS = set([4, 5])
293 |         self.NOP_INSTRUCTION = '\x1F\x20\x03\xD5'
294 | 


--------------------------------------------------------------------------------
/dispatch/analysis/base_analyzer.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import re
  3 | import string
  4 | from ..util.trie import Trie
  5 | 
  6 | from ..constructs import *
  7 | 
  8 | class BaseAnalyzer(object):
  9 |     '''
 10 |     The analyzers are responsible for taking raw instructions from the executable and transforming them
 11 |     into higher-level constructs. This includes identifying functions, basic blocks, etc.
 12 | 
 13 |     The analyzers also provide some helper methods (ins_*) which are quick ways to determine what an instruction does.
 14 |     This can include determining if a instruction is sensitive to location, is a call/jump, etc.
 15 |     '''
 16 |     def __init__(self, executable):
 17 |         self.executable = executable
 18 | 
 19 |         self.ins_map = Trie()
 20 |     
 21 |     def __repr__(self):
 22 |         return '<{} for {} {} \'{}\'>'.format(self.__class__.__name__,
 23 |                                               self.executable.architecture,
 24 |                                               self.executable.__class__.__name__,
 25 |                                               self.executable.fp)
 26 | 
 27 |     def _gen_ins_map(self):
 28 |         '''
 29 |         Generates the instruction lookup dictionary
 30 |         :return: None
 31 |         '''
 32 |         raise NotImplementedError()
 33 | 
 34 |     def disassemble_range(self, start_vaddr, end_vaddr):
 35 |         '''
 36 |         Return an array of instructions disassembled between start and end
 37 |         :param start_vaddr: The virtual address to start disassembly at
 38 |         :param end_vaddr: The last virtual address to disassemble
 39 |         :return: Array of disassembled instructions
 40 |         '''
 41 |         raise NotImplementedError()
 42 | 
 43 |     def _identify_functions(self):
 44 |         '''
 45 |         Iterates through instructions and identifies functions by prologues and epilogues
 46 |         :return: None
 47 |         '''
 48 |         raise NotImplementedError()
 49 | 
 50 |     def _populate_func_instructions(self):
 51 |         '''
 52 |         Iterates through all found functions and add instructions inside that function to the Function object
 53 |         :return: None
 54 |         '''
 55 |         for f in self.executable.iter_functions():
 56 |             # some formats (such as macho) have special functions
 57 |             # that don't actually exist in the binary, so we ignore them
 58 |             if f.address in self.ins_map:
 59 |                 f.instructions = self.ins_map[f.address : f.address+f.size]
 60 |             else:
 61 |                 f.instructions = []
 62 | 
 63 |     def _identify_strings(self):
 64 |         '''
 65 |         Extracts all strings from the executable and stores them in the strings dict (addr -> string)
 66 |         :return: None
 67 |         '''
 68 |         # https://stackoverflow.com/questions/6804582/extract-strings-from-a-binary-file-in-python
 69 |         chars = string.printable
 70 |         shortest_run = 2
 71 |         regexp = '[%s]{%d,}' % (chars, shortest_run)
 72 |         pattern = re.compile(regexp)
 73 | 
 74 |         for section in self.executable.iter_string_sections():
 75 |             for found_string in pattern.finditer(section.raw):
 76 |                 vaddr = section.vaddr + found_string.start()
 77 |                 self.executable.strings[vaddr] = String(found_string.group(), vaddr, self.executable)
 78 | 
 79 | 
 80 |     def _mark_xrefs(self):
 81 |         '''
 82 |         Identify all the xrefs from the executable and store them in the xrefs dict (addr -> set of referencing addrs)
 83 |         :return: None
 84 |         '''
 85 |         for ins in self.ins_map:
 86 |             for operand in ins.operands:
 87 |                 if operand is not None and operand.type == Operand.IMM and self.executable.vaddr_binary_offset(operand.imm) is not None:
 88 |                     if operand.imm in self.executable.xrefs:
 89 |                         self.executable.xrefs[operand.imm].add(ins.address)
 90 |                     else:
 91 |                         self.executable.xrefs[operand.imm] = set([ins.address])
 92 | 
 93 |     def analyze(self):
 94 |         '''
 95 |         Run the analysis subroutines.
 96 |         Generates the instruction map, extracts symbol tables, identifies functions/BBs, and "prettifies" instruction op_str's
 97 |         :return: None
 98 |         '''
 99 |         logging.info('Generating instruction map')
100 |         self._gen_ins_map()
101 | 
102 |         logging.info('Extracting symbol table')
103 |         self.executable._extract_symbol_table()
104 | 
105 |         logging.info('Identifying functions')
106 |         self._identify_functions()
107 | 
108 |         # TODO: CFA
109 | 
110 |         logging.info('Populating function instructions')
111 |         self._populate_func_instructions()
112 |         logging.info('Identifying basic blocks')
113 |         for func in self.executable.iter_functions():
114 |             func.do_bb_analysis()
115 |         logging.info('Marking XRefs')
116 |         self._mark_xrefs()
117 | 
118 |         logging.info('Identifying strings')
119 |         self._identify_strings()
120 | 
121 |     def cfg(self):
122 |         '''
123 |         Creates a control flow graph for the binary
124 |         :return: List of CFGEdges that describe the edges of the graph.
125 |         '''
126 |         raise NotImplementedError()
127 | 


--------------------------------------------------------------------------------
/dispatch/analysis/x86_analyzer.py:
--------------------------------------------------------------------------------
  1 | import capstone
  2 | from capstone import *
  3 | from capstone.x86_const import *
  4 | import logging
  5 | import collections
  6 | import struct
  7 | 
  8 | from ..constructs import *
  9 | from .base_analyzer import BaseAnalyzer
 10 | 
 11 | class X86_Analyzer(BaseAnalyzer):
 12 |     def __init__(self, executable):
 13 |         super(X86_Analyzer, self).__init__(executable)
 14 | 
 15 |         self._disassembler = Cs(CS_ARCH_X86, CS_MODE_32)
 16 |         self._disassembler.detail = True
 17 |         self._disassembler.skipdata = True
 18 | 
 19 |         self.REG_NAMES = dict([(v,k[8:].lower()) for k,v in capstone.x86_const.__dict__.iteritems() if k.startswith('X86_REG')])
 20 |         self.IP_REGS = set([26, 34, 41])
 21 |         self.SP_REGS = set([6, 7, 20, 30, 36, 44, 47, 48])
 22 |         self.NOP_INSTRUCTION = '\x90'
 23 | 
 24 |     def _gen_ins_map(self):
 25 |         for section in self.executable.sections_to_disassemble():
 26 |             for ins in self._disassembler.disasm(section.raw, section.vaddr):
 27 |                 if ins.id: # .byte "instructions" have an id of 0
 28 |                     self.ins_map[ins.address] = instruction_from_cs_insn(ins, self.executable)
 29 | 
 30 |     def disassemble_range(self, start_vaddr, end_vaddr):
 31 |         size = end_vaddr - start_vaddr
 32 |         self.executable.binary.seek(self.executable.vaddr_binary_offset(start_vaddr))
 33 | 
 34 |         instructions = []
 35 | 
 36 |         for ins in self._disassembler.disasm(self.executable.binary.read(size), start_vaddr):
 37 |             if ins.id:
 38 |                 instructions.append(instruction_from_cs_insn(ins, self.executable))
 39 |             else:
 40 |                 print ins
 41 | 
 42 |         return instructions
 43 | 
 44 |     def ins_modifies_esp(self, instruction):
 45 |         return 'pop' in instruction.mnemonic or 'push' in instruction.mnemonic \
 46 |                 or instruction.operands[0] in self.SP_REGS
 47 | 
 48 |     def _identify_functions(self):
 49 |         """
 50 |         This has to take into account 3 possibilities:
 51 | 
 52 |         1) No symbols whatsoever. Here we basically end up just doing basic prologue/epilogue analysis and hoping that
 53 |         the functions aren't weird and are relatively predictable.
 54 | 
 55 |         2) Symbols with no size. We use the symbols we have as known starting points (replacing the prologue) but still
 56 |         look for a epilogue (or the start of another function) to signal the end of the function.
 57 | 
 58 |         3) Symbols with size.
 59 |         """
 60 | 
 61 |         STATE_NOT_IN_FUNC, STATE_IN_PROLOGUE, STATE_IN_FUNCTION = 0, 1, 2
 62 | 
 63 |         state = STATE_NOT_IN_FUNC
 64 | 
 65 |         cur_func = None
 66 | 
 67 |         ops = []
 68 | 
 69 |         for cur_ins in self.ins_map:
 70 |             if cur_ins.address in self.executable.functions:
 71 |                 state = STATE_IN_FUNCTION
 72 |                 cur_func = self.executable.functions[cur_ins.address]
 73 | 
 74 |                 logging.debug('Analyzing function {} with pre-populated size {}'.format(cur_func, cur_func.size))
 75 | 
 76 |                 if not cur_func.size:
 77 |                     # Function from symtab has no size, so start to keep track of it
 78 |                     cur_func.size += cur_ins.size
 79 | 
 80 |             elif cur_func and cur_func.contains_address(cur_ins.address):
 81 |                 # Current function under analysis has a pre-populated size so just continue on until we get to the end
 82 |                 continue
 83 | 
 84 |             # Windows sometimes puts `mov edi, edi` as the first instruction in a function for hot patching, so we check
 85 |             # for this case to make sure the function we detect starts at the correct address.
 86 |             #  https://blogs.msdn.microsoft.com/oldnewthing/20110921-00/?p=9583
 87 |             elif state == STATE_NOT_IN_FUNC and cur_ins.mnemonic == 'mov' and \
 88 |                     cur_ins.operands[0].type == Operand.REG and \
 89 |                     cur_ins.operands[0].reg == X86_REG_EDI and \
 90 |                     cur_ins.operands[1].type == Operand.REG and \
 91 |                     cur_ins.operands[1].reg == X86_REG_EDI:
 92 | 
 93 |                 state = STATE_IN_PROLOGUE
 94 |                 ops.append(cur_ins)
 95 | 
 96 |             elif state in (STATE_NOT_IN_FUNC, STATE_IN_PROLOGUE) and cur_ins.mnemonic == 'push' and \
 97 |                     cur_ins.operands[0].type == Operand.REG and \
 98 |                     cur_ins.operands[0].reg in (X86_REG_EBP, X86_REG_RBP):
 99 | 
100 |                 state = STATE_IN_PROLOGUE
101 |                 ops.append(cur_ins)
102 | 
103 |             elif state == STATE_IN_PROLOGUE and \
104 |                             cur_ins.mnemonic == 'mov' and \
105 |                             cur_ins.operands[0].type == Operand.REG and \
106 |                             cur_ins.operands[0].reg in (X86_REG_EBP, X86_REG_RBP) and \
107 |                             cur_ins.operands[1].type == Operand.REG and \
108 |                             cur_ins.operands[1].reg in self.SP_REGS:
109 | 
110 | 
111 |                 state = STATE_IN_FUNCTION
112 |                 ops.append(cur_ins)
113 | 
114 |                 logging.debug('Identified function by prologue at {} with prologue ops {}'.format(hex(cur_ins.address), ops))
115 |                 cur_func = Function(ops[0].address,
116 |                                     sum(i.size for i in ops),
117 |                                     'sub_'+hex(ops[0].address)[2:],
118 |                                     self.executable)
119 |                 ops = []
120 | 
121 |             elif state == STATE_IN_FUNCTION and 'ret' in cur_ins.mnemonic:
122 |                 state = STATE_NOT_IN_FUNC
123 |                 cur_func.size += cur_ins.size
124 | 
125 |                 logging.debug('Identified function epilogue at {}'.format(hex(cur_ins.address)))
126 | 
127 |                 self.executable.functions[cur_func.address] = cur_func
128 | 
129 |                 cur_func = None
130 | 
131 |             elif state == STATE_IN_FUNCTION:
132 |                 cur_func.size += cur_ins.size
133 | 
134 | 
135 |     def cfg(self):
136 |         edges = set()
137 | 
138 |         for f in self.executable.iter_functions():
139 |             if f.type == Function.NORMAL_FUNC:
140 |                 for ins in f.instructions:
141 |                     #TODO: understand non-immediates here
142 |                     if ins.is_call() and ins.operands[-1].type == Operand.IMM:
143 |                         call_addr = ins.operands[-1].imm
144 |                         if self.executable.vaddr_is_executable(call_addr):
145 |                             edge = CFGEdge(ins.address, call_addr, CFGEdge.CALL)
146 |                             edges.add(edge)
147 | 
148 |                 for cur_bb in f.bbs:
149 |                     last_ins = cur_bb.instructions[-1]
150 | 
151 |                     if last_ins.is_jump():
152 |                         if last_ins.operands[-1].type == Operand.IMM:
153 |                             jmp_addr = last_ins.operands[-1].imm
154 | 
155 |                             if self.executable.vaddr_is_executable(jmp_addr):
156 |                                 if last_ins.mnemonic == 'jmp':
157 |                                     edge = CFGEdge(last_ins.address, jmp_addr, CFGEdge.DEFAULT)
158 |                                     edges.add(edge)
159 |                                 else:  # Conditional jump
160 |                                     # True case
161 |                                     edge = CFGEdge(last_ins.address, jmp_addr, CFGEdge.COND_JUMP, True)
162 |                                     edges.add(edge)
163 | 
164 |                                     # Default/fall-through case
165 |                                     next_addr = last_ins.address + last_ins.size
166 |                                     edge = CFGEdge(last_ins.address, next_addr, CFGEdge.COND_JUMP, False)
167 |                                     edges.add(edge)
168 |                     elif last_ins != f.instructions[-1]:
169 |                         # Otherwise, if we're just at the end of a BB that's not the end of the function, just fall
170 |                         # through to the next of the instruction
171 |                         edge = CFGEdge(last_ins.address, last_ins.address + last_ins.size, CFGEdge.DEFAULT)
172 |                         edges.add(edge)
173 | 
174 |             edges.update(self._do_jump_table_detection(f))
175 | 
176 |         return edges
177 | 
178 |     def _do_jump_table_detection(self, f):
179 |         # Basic idea is to label each BB as one of these types based on its contents
180 |         class BB_TYPE:
181 |             NONE  = 0  # Seemingly not associated with a switch
182 |             VALUE = 1  # A simple value compare (cmp and jmp)
183 |             RANGE = 2  # A range compare (cmp and jl/jle/jg/jge)
184 |             TABLE = 3  # A jump to a jump table (anything with [_+_*(4,8)]
185 | 
186 | 
187 |         # BB address -> (type, important instruction)
188 |         bb_types = {}
189 | 
190 |         for bb in f.iter_bbs():
191 |             bb_type = (BB_TYPE.NONE, None)
192 | 
193 |             # Table detection
194 |             # NOTE: We *should* do full register tracing if this instruction is a mov/lea,
195 |             #  but we can relatively safely assume that the jump at the end of the BB
196 |             #  will be a `jmp {reg}` if this is indeed a jump table
197 |             # NOTE: Value tables will be marked as a TABLE, but sanity checking later on
198 |             #  prevents values from being interpreted as jump destinations
199 |             for i, ins in enumerate(bb.instructions):
200 |                 if any(o.type == Operand.MEM and o.scale in [4,8] for o in ins.operands):
201 |                     bb_type = (BB_TYPE.TABLE, ins)
202 |                     break
203 | 
204 |             # Range detection
205 |             cmp_ins = None
206 |             for i, ins in enumerate(bb.instructions):
207 |                 if ins.mnemonic == 'cmp': # Anything else? Is sub used in ranges in clang?
208 |                     cmp_ins = ins
209 | 
210 |                 # TODO: replace `in ...` with `not in`
211 |                 elif cmp_ins and ins.is_jump() and ins.mnemonic in ('jb','jnae','jnb','jae','jbe',
212 |                                                                     'jna','ja','jnbe','jl','jnge',
213 |                                                                     'jge','jnl','jle','jng','jg','jnle'):
214 |                     bb_type = (BB_TYPE.RANGE, cmp_ins)
215 | 
216 |             # Value detection
217 |             cmp_ins = None
218 |             for i, ins in enumerate(bb.instructions):
219 |                 if ins.mnemonic in ('cmp', 'test', 'sub'): # TODO: Properly check for clang's use of `sub`
220 |                     cmp_ins = ins
221 | 
222 |                 elif cmp_ins and ins.mnemonic in ('je', 'jne'):
223 |                     bb_type = (BB_TYPE.VALUE, cmp_ins)
224 | 
225 |             logging.debug("Marking BB at {} as type {}".format(hex(bb.address), bb_type))
226 |             bb_types[bb.address] = bb_type
227 | 
228 | 
229 |         # Start address of table -> (type, scale, {relative location})
230 |         table_types = {}
231 | 
232 |         class TABLE_TYPE:
233 |             ADDR_REL = 0  # Values in the table are relative to a constant loaded elsewhere
234 |             ABS = 1       # Values in the table are absolute
235 | 
236 |         ins_to_table = []
237 | 
238 |         # TODO: Look for _CSWTCH symbols
239 | 
240 |         for bb in f.iter_bbs():
241 |             if bb_types[bb.address][0] == BB_TYPE.TABLE:
242 |                 for ins in bb.instructions:
243 |                     # Special-case the various ways of doing a jump table
244 | 
245 |                     # Option 1 (seemingly most common): lea {reg}, {ip-rel const}
246 |                     # NOTE: This could either be a jump table or a value table
247 |                     if ins.mnemonic == 'lea' and ins.operands[1].type == Operand.MEM:
248 |                         insn_with_mem_op = bb_types[bb.address][1]
249 |                         if len(insn_with_mem_op.operands) > 1 and insn_with_mem_op.operands[1].type == Operand.MEM:
250 |                             table_scale = insn_with_mem_op.operands[1].scale
251 | 
252 |                             table_addr = ins.address + ins.size + ins.operands[1].disp
253 | 
254 |                             logging.debug("Marking table at {} as an ADDR_REL table".format(hex(table_addr)))
255 |                             table_types[table_addr] = (TABLE_TYPE.ADDR_REL, table_scale, ins.address + ins.size)
256 |                             ins_to_table.append((ins.address, table_addr))
257 |                             break
258 | 
259 |                     # Option 2: offset is directly in the jumps mem. operand
260 |                     elif bb_types[bb.address][1].operands[-1].type == Operand.MEM:
261 |                         mem_offset = bb_types[bb.address][1].operands[-1].disp
262 |                         if mem_offset:
263 |                             logging.debug("Marking table at {} as an ABS table".format(hex(mem_offset)))
264 |                             table_types[mem_offset] = (TABLE_TYPE.ABS, bb_types[bb.address][1].operands[-1].scale)
265 |                             ins_to_table.append((ins.address, mem_offset))
266 |                             break
267 | 
268 |                     logging.debug("Couldn't find anything with a table offset in BB at {}".format(hex(bb.address)))
269 | 
270 | 
271 |         # Add the end of the segment as an upper bound
272 | 
273 |         table_types[self.executable.executable_segment_vaddr() + self.executable.executable_segment_size()] = None
274 | 
275 |         # http://stackoverflow.com/questions/32030412/twos-complement-sign-extension-python
276 |         def sign_extend(value, bits):
277 |             sign_bit = 1 << (bits - 1)
278 |             return (value & (sign_bit - 1)) - (value & sign_bit)
279 | 
280 |         # Start address of table -> [destination addresses]
281 |         table_values = collections.defaultdict(list)
282 | 
283 |         table_addrs = sorted(table_types.keys())
284 | 
285 |         for start_a, end_a in zip(table_addrs[:-1], table_addrs[1:]):
286 |             if not table_types[start_a]:
287 |                 continue
288 |             t_type = table_types[start_a][0]
289 |             scale = table_types[start_a][1]
290 |             for addr in range(start_a, end_a, scale):
291 |                 # sometimes, our addr+scale ends up not being in the executable,
292 |                 # usually because they compute a relative offset and then add a
293 |                 # base address to it. For now, we'll just skip the address.
294 |                 # TODO: Is there a way to do this without basically implementing
295 |                 #       symbolic execution?
296 |                 try:
297 |                     raw = self.executable.get_binary_vaddr_range(addr, addr+scale)
298 |                     data_val = struct.unpack(self.executable.pack_endianness+('i' if scale == 4 else 'q'), raw)[0]
299 |                 except KeyError:
300 |                     logging.warning("Invalid vaddrs requested during jump table analysis, skipping this vaddr: {:08x}".format(addr))
301 |                     continue
302 |                 if t_type == TABLE_TYPE.ADDR_REL:
303 |                     addr_bit_len = 8*self.executable.address_length()
304 |                     abs_val = (start_a+sign_extend(data_val, addr_bit_len)) & (2**(addr_bit_len+1) - 1)
305 |                 else:
306 |                     abs_val = data_val
307 | 
308 |                 # Only add the values if they land us in the executable segment
309 |                 # TODO: Be smarter here. Add restrictions to make sure that the table doesn't extend
310 |                 #  past the end of a section/segment before making sure the address is valid
311 |                 valid_start = self.executable.executable_segment_vaddr()
312 |                 valid_end = valid_start + self.executable.executable_segment_size()
313 | 
314 |                 if valid_start <= abs_val < valid_end:
315 |                     table_values[start_a].append(abs_val)
316 |                 else:
317 |                     break
318 | 
319 |         edges = set()
320 |         for addr, table in ins_to_table:
321 |             for dst in table_values[table]:
322 |                 edges.add(CFGEdge(addr, dst, CFGEdge.SWITCH))
323 | 
324 |         return edges
325 | 
326 | class X86_64_Analyzer(X86_Analyzer):
327 |     def __init__(self, executable):
328 |         super(X86_64_Analyzer, self).__init__(executable)
329 | 
330 |         self._disassembler = Cs(CS_ARCH_X86, CS_MODE_64)
331 |         self._disassembler.detail = True
332 |         self._disassembler.skipdata = True
333 | 


--------------------------------------------------------------------------------
/dispatch/constructs.py:
--------------------------------------------------------------------------------
  1 | import subprocess
  2 | import logging
  3 | import capstone
  4 | import string
  5 | from enums import *
  6 | 
  7 | import ctypes
  8 | 
  9 | class Function(object):
 10 |     NORMAL_FUNC = 0
 11 |     DYNAMIC_FUNC = 1
 12 | 
 13 |     def __init__(self, address, size, name, executable, type=NORMAL_FUNC):
 14 |         self.address = int(address)
 15 |         self.size = int(size)
 16 |         self.name = name
 17 |         self.type = type
 18 |         self._executable = executable
 19 | 
 20 |         # BELOW: Helpers used to explore the binary.
 21 |         # NOTE: These should *not* be directly modified at this time.
 22 |         # Instead, executable.replace_at should be used.
 23 |         self.instructions = [] # Sequential list of instructions
 24 |         self.bbs = [] # Sequential list of basic blocks. BB instructions are auto-populated from our instructions
 25 | 
 26 |     def __repr__(self):
 27 |         return '<Function \'{}\' at {}>'.format(self.name, hex(self.address))
 28 | 
 29 |     def do_bb_analysis(self):
 30 |         if self.instructions:
 31 |             bb_ends = set([self.instructions[-1].address + self.instructions[-1].size])
 32 | 
 33 |             for i in range(len(self.instructions) - 1):
 34 |                 cur = self.instructions[i]
 35 |                 next = self.instructions[i + 1]
 36 | 
 37 |                 if cur.is_jump():
 38 |                     bb_ends.add(next.address)
 39 |                     if cur.operands[0].type == Operand.IMM:
 40 |                         bb_ends.add(cur.operands[0].imm)
 41 | 
 42 |             bb_ends = sorted(list(bb_ends))
 43 |             bb_instructions = []
 44 | 
 45 |             for ins in self.instructions:
 46 |                 if ins.address == bb_ends[0] and bb_instructions:
 47 |                     bb = BasicBlock(self,
 48 |                                     bb_instructions[0].address,
 49 |                                     bb_instructions[-1].address + bb_instructions[-1].size - bb_instructions[0].address)
 50 |                     bb.instructions = bb_instructions
 51 |                     self.bbs.append(bb)
 52 | 
 53 |                     bb_ends = bb_ends[1:]
 54 |                     bb_instructions = [ins]
 55 |                 else:
 56 |                     bb_instructions.append(ins)
 57 | 
 58 |             # There will always be one BB left over which "ends" at the first address of the next function, so be
 59 |             # sure to add it
 60 | 
 61 |             bb = BasicBlock(self,
 62 |                             bb_instructions[0].address,
 63 |                             bb_instructions[-1].address + bb_instructions[-1].size - bb_instructions[0].address)
 64 |             bb.instructions = bb_instructions
 65 |             self.bbs.append(bb)
 66 | 
 67 |     def contains_address(self, address):
 68 |         return self.address <= address < self.address + self.size
 69 | 
 70 |     def iter_bbs(self):
 71 |         for bb in self.bbs:
 72 |             yield bb
 73 | 
 74 |     def print_disassembly(self):
 75 |         for i in self.instructions:
 76 |             print hex(i.address) + ' ' + str(i)
 77 | 
 78 |     def demangle(self):
 79 |         if self.name.startswith('_Z'):
 80 |             p = subprocess.Popen(['c++filt', '-n', self.name], stdout=subprocess.PIPE)
 81 |             demangled, _ = p.communicate()
 82 |             return demangled.replace('\n','')
 83 |         elif self.name.startswith('@'):
 84 |             # TODO: MSVC demangling (look at wine debugger source)
 85 |             return self.name
 86 |         else:
 87 |             logging.debug('Call to demangle with a non-reserved function name')
 88 | 
 89 | 
 90 | class BasicBlock(object):
 91 |     def __init__(self, parent_func, address, size):
 92 |         self.parent = parent_func
 93 |         self.address = int(address)
 94 |         self.size = int(size)
 95 |         self.offset = self.parent.address - self.address
 96 |         self.instructions = []
 97 | 
 98 |     def __repr__(self):
 99 |         return '<Basic block at {}>'.format(hex(self.address))
100 | 
101 |     def print_disassembly(self):
102 |         for i in self.instructions:
103 |             print hex(i.address) + ' ' + str(i)
104 | 
105 | 
106 | class Instruction(object):
107 |     GRP_CALL = 0
108 |     GRP_JUMP = 1
109 | 
110 |     def __init__(self, address, size, raw, mnemonic, operands, groups, backend_instruction, executable):
111 |         self.address = int(address)
112 |         self.size = int(size)
113 |         self.raw = raw
114 |         self.mnemonic = mnemonic
115 |         self.operands = operands
116 |         self.groups = groups
117 |         self._backend_instruction = backend_instruction
118 |         self._executable = executable
119 | 
120 |         self.comment = ''
121 | 
122 |     def __repr__(self):
123 |         return '<Instruction at {}>'.format(hex(self.address))
124 | 
125 |     def __str__(self):
126 |         s = self.mnemonic + ' ' + self.nice_op_str()
127 |         if self.comment:
128 |             s += '; "{}"'.format(self.comment)
129 |         if self.address in self._executable.xrefs:
130 |             s += '; XREF={}'.format(', '.join(hex(a)[:-1] for a in self._executable.xrefs[self.address]))
131 |             # TODO: Print nice function relative offsets if the xref is in a function
132 | 
133 |         return s
134 | 
135 |     def is_call(self):
136 |         return Instruction.GRP_CALL in self.groups
137 | 
138 |     def is_jump(self):
139 |         return Instruction.GRP_JUMP in self.groups
140 | 
141 |     def redirects_flow(self):
142 |         return self.is_jump() or self.is_call()
143 | 
144 |     def references_ip(self):
145 |         implicit_read, implicit_written = self._backend_instruction.regs_access()
146 |         ops_direct = [op.used_regs() for op in self.operands]
147 |         if ops_direct:
148 |             explicit_accessed = set.union(*ops_direct)
149 |         else:
150 |             explicit_accessed = set()
151 |         all_accessed = set.union(explicit_accessed, implicit_read, implicit_written)
152 |         return bool(self._executable.analyzer.IP_REGS.intersection(all_accessed))
153 | 
154 |     def references_sp(self):
155 |         implicit_read, implicit_written = self._backend_instruction.regs_access()
156 |         ops_direct = [op.used_regs() for op in self.operands]
157 |         if ops_direct:
158 |             explicit_accessed = set.union(*ops_direct)
159 |         else:
160 |             explicit_accessed = set()
161 |         all_accessed = set.union(explicit_accessed, implicit_read, implicit_written)
162 |         return bool(self._executable.analyzer.SP_REGS.intersection(all_accessed))
163 | 
164 |     def references_seg_reg(self):
165 |         '''
166 |         Returns whether our instruction uses segmentation registers (fs, gs, etc on x86[_64])
167 |         Mostly seen on x86[_64] stack canaries
168 |         :return: Whether this instruction references the segmentation registers
169 |         '''
170 |         operand_refs_seg_reg = lambda op: op.type == Operand.MEM and op.seg_reg
171 | 
172 |         return any(operand_refs_seg_reg(op) for op in self.operands)
173 | 
174 |     def op_str(self):
175 |         return ', '.join(str(op) for op in self.operands)
176 | 
177 |     def nice_op_str(self):
178 |         '''
179 |         Returns the operand string "nicely formatted." I.e. replaces addresses with function names (and function
180 |         relative offsets) if appropriate.
181 |         :return: The nicely formatted operand string
182 |         '''
183 |         op_strings = [str(op) for op in self.operands]
184 | 
185 |         # If this is an immediate call or jump, try to put a name to where we're calling/jumping to
186 |         if self.is_call() or self.is_jump():
187 |             # jump/call destination will always be the last operand (even with conditional ARM branch instructions)
188 |             operand = self.operands[-1]
189 |             # TODO: Don't only do this when we've got an IMM operation
190 |             if operand.type == Operand.IMM:
191 |                 if operand.imm in self._executable.functions:
192 |                     op_strings[-1] = self._executable.functions[operand.imm].name
193 |                 elif self._executable.vaddr_is_executable(operand.imm):
194 |                     for func in self._executable.iter_functions():
195 |                         if func.contains_address(operand.imm):
196 |                             diff = operand.imm - func.address
197 |                             op_strings[-1] = func.name+'+'+hex(diff)
198 |                             break
199 |         else: # TODO: Limit this to only be sensible instructions (e.g. mov, push, etc.)
200 |             for i, operand in enumerate(self.operands):
201 |                 if operand.type == Operand.IMM and operand.imm in self._executable.strings:
202 |                     referenced_string = self._executable.strings[operand.imm]
203 |                     op_strings[i] = referenced_string.short_name
204 |                     self.comment = referenced_string.string.strip()
205 | 
206 |         return ', '.join(op_strings)
207 | 
208 | 
209 | class Operand(object):
210 |     IMM = 0
211 |     FP = 1
212 |     REG = 2
213 |     MEM = 3
214 | 
215 |     def __init__(self, type, size, instruction, **kwargs):
216 |         self.type = type
217 |         self.size = size
218 |         self._instruction = instruction
219 |         if self.type == Operand.IMM:
220 |             self.imm = int(kwargs.get('imm'))
221 |         elif self.type == Operand.FP:
222 |             self.fp = float(kwargs.get('fp'))
223 |         elif self.type == Operand.REG:
224 |             self.reg = kwargs.get('reg')
225 |         elif self.type == Operand.MEM:
226 |             self.base = kwargs.get('base')
227 |             self.index = kwargs.get('index')
228 |             self.scale = int(kwargs.get('scale', 1))
229 |             self.disp = int(kwargs.get('disp', 0))
230 |             self.seg_reg = kwargs.get('seg_reg')
231 |         else:
232 |             raise ValueError('Type is not one of Operand.{IMM,FP,REG,MEM}')
233 | 
234 |     def _get_simplified(self):
235 |         # Auto-simplify ip-relative operands to their actual address
236 |         if self.type == Operand.MEM and self.base in self._instruction._executable.analyzer.IP_REGS and self.index == 0:
237 |             addr = self._instruction.address + self._instruction.size + self.index * self.scale + self.disp
238 |             return Operand(Operand.MEM, self.size, self._instruction, disp=addr)
239 | 
240 |         return self
241 | 
242 |     def used_regs(self):
243 |         if self.type == Operand.REG:
244 |             return set([self.reg])
245 |         elif self.type == Operand.MEM:
246 |             return set([self.base, self.index])
247 |         else:
248 |             return set()
249 | 
250 |     def __str__(self):
251 |         sizes = {
252 |                 1: 'byte ptr',
253 |                 2: 'word ptr',
254 |                 4: 'dword ptr',
255 |                 8: 'qword ptr'
256 |                 }
257 |         if self.type == Operand.IMM:
258 |             return sizes.get(self.size, '') + ' ' + hex(self.imm)
259 |         elif self.type == Operand.FP:
260 |             return str(self.fp)
261 |         elif self.type == Operand.REG:
262 |             return self._instruction._executable.analyzer.REG_NAMES[self.reg]
263 |         elif self.type == Operand.MEM:
264 |             simplified = self._get_simplified()
265 |             
266 |             s = ''
267 |             if self.seg_reg:
268 |                 s += self._instruction._executable.analyzer.REG_NAMES[simplified.seg_reg]
269 |                 s += ':'
270 | 
271 |             s += '['
272 | 
273 |             show_plus = False
274 |             if simplified.base:
275 |                 s += self._instruction._executable.analyzer.REG_NAMES[simplified.base]
276 |                 show_plus = True
277 |             if simplified.index:
278 |                 if show_plus:
279 |                     s += ' + '
280 | 
281 |                 s += self._instruction._executable.analyzer.REG_NAMES[simplified.index]
282 |                 if simplified.scale > 1:
283 |                     s += '*'
284 |                     s += str(simplified.scale)
285 | 
286 |                 show_plus = True
287 |             if simplified.disp:
288 |                 if show_plus:
289 |                     s += ' + '
290 |                 s += hex(simplified.disp)
291 | 
292 |             s += ']'
293 | 
294 |             return sizes.get(self.size, '') + ' ' + s
295 | 
296 | 
297 | def operand_from_cs_op(csOp, instruction):
298 |     size = csOp.size if hasattr(csOp, 'size') else None
299 |     if csOp.type == capstone.CS_OP_IMM:
300 |         return Operand(Operand.IMM, size, instruction, imm=csOp.imm)
301 |     elif csOp.type == capstone.CS_OP_FP:
302 |         return Operand(Operand.FP, size, instruction, fp=csOp.fp)
303 |     elif csOp.type == capstone.CS_OP_REG:
304 |         return Operand(Operand.REG, size, instruction, reg=csOp.reg)
305 |     elif csOp.type == capstone.CS_OP_MEM:
306 |         return Operand(Operand.MEM, size, instruction, base=csOp.mem.base, index=csOp.mem.index, scale=csOp.mem.scale, disp=csOp.mem.disp, seg_reg=csOp.reg)
307 | 
308 | 
309 | def instruction_from_cs_insn(csInsn, executable):
310 |     groups = []
311 | 
312 |     if executable.architecture in (ARCHITECTURE.ARM, ARCHITECTURE.ARM_64):
313 |         if csInsn.mnemonic.startswith('bl'):
314 |             groups.append(Instruction.GRP_CALL)
315 |         elif csInsn.mnemonic.startswith('b'):
316 |             groups.append(Instruction.GRP_JUMP)
317 |     else:
318 |         if capstone.CS_GRP_JUMP in csInsn.groups:
319 |             groups.append(Instruction.GRP_JUMP)
320 |         if capstone.CS_GRP_CALL in csInsn.groups:
321 |             groups.append(Instruction.GRP_CALL)
322 | 
323 |     instruction = Instruction(csInsn.address, csInsn.size, csInsn.bytes, csInsn.mnemonic, [], groups, csInsn, executable)
324 | 
325 |     # We manually pull out the instruction details here so that capstone doesn't deepcopy everything which burns time
326 |     # and memory
327 |     detail = ctypes.cast(csInsn._raw.detail, ctypes.POINTER(capstone._cs_detail)).contents
328 | 
329 |     if executable.architecture == ARCHITECTURE.X86 or executable.architecture == ARCHITECTURE.X86_64:
330 |         detail = detail.arch.x86
331 |     elif executable.architecture == ARCHITECTURE.ARM:
332 |         detail = detail.arch.arm
333 |     elif executable.architecture == ARCHITECTURE.ARM_64:
334 |         detail = detail.arch.arm64
335 | 
336 |     operands = [operand_from_cs_op(detail.operands[i], instruction) for i in range(detail.op_count)]
337 | 
338 |     instruction.operands = operands
339 | 
340 |     return instruction
341 | 
342 | 
343 | class String(object):
344 |     def __init__(self, s, vaddr, executable):
345 |         self.string = s
346 |         self.short_name = reduce(lambda s, r: s.replace(r, ''), ' '+string.punctuation, self.string)[:8]
347 |         self.vaddr = vaddr
348 |         self._executable = executable
349 | 
350 |     def __repr__(self):
351 |         return '<String \'{}\' at {}>'.format(self.string, hex(self.vaddr))
352 | 
353 |     def __str__(self):
354 |         return self.string
355 | 
356 | 
357 | class CFGEdge(object):
358 |     # Edge with no special information. Could be from a default fall-through, unconditional jump, etc.
359 |     DEFAULT = 0
360 | 
361 |     # Edge from a conditional jump. Two of these should be added for each cond. jump, one for the True, and one for False
362 |     COND_JUMP = 1
363 | 
364 |     # Edge from a switch/jump table. One edge should be added for each entry, and the corresponding key set as the value
365 |     SWITCH = 2
366 | 
367 |     # Edge from a call instruction.
368 |     CALL = 3
369 | 
370 |     def __init__(self, src, dst, type, value=None):
371 |         self.src = src
372 |         self.dst = dst
373 |         self.type = type
374 |         self.value = value
375 | 
376 |     def __eq__(self, other):
377 |         if isinstance(other, CFGEdge) and self.src == other.src and self.dst == other.dst and self.type == other.type:
378 |             return True
379 |         return False
380 | 
381 |     def __ne__(self, other):
382 |         return not self.__eq__(other)
383 | 
384 |     def __repr__(self):
385 |         return '<CFGEdge from {} to {}>'.format(hex(self.src), hex(self.dst))
386 | 


--------------------------------------------------------------------------------
/dispatch/enums.py:
--------------------------------------------------------------------------------
 1 | class FORMAT:
 2 |     ELF = 'ELF'
 3 |     PE = 'PE'
 4 |     MACH_O = 'MachO'
 5 | 
 6 | class ARCHITECTURE:
 7 |     X86 = 'x86'
 8 |     X86_64 = 'x86-64'
 9 |     ARM = 'ARM'
10 |     ARM_64 = 'ARM64'


--------------------------------------------------------------------------------
/dispatch/formats/SectionDoubleP.py:
--------------------------------------------------------------------------------
  1 | """ Tested with pefile 1.2.10-123 on 32bit PE executable files.
  2 | 
  3 |     An implementation to push or pop a section header to the section table of a PE file.
  4 |     For further information refer to the docstrings of pop_back/push_back.
  5 | 
  6 |     by n0p
  7 | """
  8 | 
  9 | import pefile
 10 | 
 11 | class SectionDoublePError(Exception):
 12 |     pass
 13 | 
 14 | class SectionDoubleP:
 15 |     def __init__(self, pe):
 16 |         self.pe = pe
 17 | 
 18 |     def __adjust_optional_header(self):
 19 |         """ Recalculates the SizeOfImage, SizeOfCode, SizeOfInitializedData and
 20 |             SizeOfUninitializedData of the optional header.
 21 |         """
 22 | 
 23 |         # SizeOfImage = ((VirtualAddress + VirtualSize) of the new last section)
 24 |         self.pe.OPTIONAL_HEADER.SizeOfImage = (self.pe.sections[-1].VirtualAddress +
 25 |                                                 self.pe.sections[-1].Misc_VirtualSize)
 26 | 
 27 |         self.pe.OPTIONAL_HEADER.SizeOfCode = 0
 28 |         self.pe.OPTIONAL_HEADER.SizeOfInitializedData = 0
 29 |         self.pe.OPTIONAL_HEADER.SizeOfUninitializedData = 0
 30 | 
 31 |         # Recalculating the sizes by iterating over every section and checking if
 32 |         # the appropriate characteristics are set.
 33 |         for section in self.pe.sections:
 34 |             if section.Characteristics & 0x00000020:
 35 |                 # Section contains code.
 36 |                 self.pe.OPTIONAL_HEADER.SizeOfCode += section.SizeOfRawData
 37 |             if section.Characteristics & 0x00000040:
 38 |                 # Section contains initialized data.
 39 |                 self.pe.OPTIONAL_HEADER.SizeOfInitializedData += section.SizeOfRawData
 40 |             if section.Characteristics & 0x00000080:
 41 |                 # Section contains uninitialized data.
 42 |                 self.pe.OPTIONAL_HEADER.SizeOfUninitializedData += section.SizeOfRawData
 43 | 
 44 |     def __add_header_space(self):
 45 |         """ To make space for a new section header a buffer filled with nulls is added at the
 46 |             end of the headers. The buffer has the size of one file alignment.
 47 |             The data between the last section header and the end of the headers is copied to
 48 |             the new space (everything moved by the size of one file alignment). If any data
 49 |             directory entry points to the moved data the pointer is adjusted.
 50 |         """
 51 | 
 52 |         FileAlignment = self.pe.OPTIONAL_HEADER.FileAlignment
 53 |         SizeOfHeaders = self.pe.OPTIONAL_HEADER.SizeOfHeaders
 54 | 
 55 |         data = '\x00' * FileAlignment
 56 | 
 57 |         # Adding the null buffer.
 58 |         self.pe.__data__ = (self.pe.__data__[:SizeOfHeaders] + data +
 59 |                             self.pe.__data__[SizeOfHeaders:])
 60 | 
 61 |         section_table_offset = (self.pe.DOS_HEADER.e_lfanew + 4 +
 62 |                         self.pe.FILE_HEADER.sizeof() + self.pe.FILE_HEADER.SizeOfOptionalHeader)
 63 | 
 64 |         # Copying the data between the last section header and SizeOfHeaders to the newly allocated
 65 |         # space.
 66 |         new_section_offset = section_table_offset + self.pe.FILE_HEADER.NumberOfSections*0x28
 67 |         size = SizeOfHeaders - new_section_offset
 68 |         data = self.pe.get_data(new_section_offset, size)
 69 |         self.pe.set_bytes_at_offset(new_section_offset + FileAlignment, data)
 70 | 
 71 |         # Filling the space, from which the data was copied from, with NULLs.
 72 |         self.pe.set_bytes_at_offset(new_section_offset, '\x00' * FileAlignment)
 73 | 
 74 |         data_directory_offset = section_table_offset - self.pe.OPTIONAL_HEADER.NumberOfRvaAndSizes * 0x8
 75 | 
 76 |         # Checking data directories if anything points to the space between the last section header
 77 |         # and the former SizeOfHeaders. If that's the case the pointer is increased by FileAlignment.
 78 |         for data_offset in xrange(data_directory_offset, section_table_offset, 0x8):
 79 |             data_rva = self.pe.get_dword_from_offset(data_offset)
 80 | 
 81 |             if new_section_offset <= data_rva and data_rva < SizeOfHeaders:
 82 |                 self.pe.set_dword_at_offset(data_offset, data_rva + FileAlignment)
 83 | 
 84 |         SizeOfHeaders_offset = (self.pe.DOS_HEADER.e_lfanew + 4 +
 85 |                         self.pe.FILE_HEADER.sizeof() + 0x3C)
 86 | 
 87 |         # Adjusting the SizeOfHeaders value.
 88 |         self.pe.set_dword_at_offset(SizeOfHeaders_offset, SizeOfHeaders + FileAlignment)
 89 | 
 90 |         section_raw_address_offset = section_table_offset + 0x14
 91 | 
 92 |         # The raw addresses of the sections are adjusted.
 93 |         for section in self.pe.sections:
 94 |             if section.PointerToRawData != 0:
 95 |                 self.pe.set_dword_at_offset(section_raw_address_offset, section.PointerToRawData+FileAlignment)
 96 | 
 97 |             section_raw_address_offset += 0x28
 98 | 
 99 |         # All changes in this method were made to the raw data (__data__). To make these changes
100 |         # accessbile in self.pe __data__ has to be parsed again. Since a new pefile is parsed during
101 |         # the init method, the easiest way is to replace self.pe with a new pefile based on __data__
102 |         # of the old self.pe.
103 |         self.pe = pefile.PE(data=self.pe.__data__)
104 | 
105 |     def __is_null_data(self, data):
106 |         """ Checks if the given data contains just null bytes.
107 |         """
108 | 
109 |         for char in data:
110 |             if char != '\x00':
111 |                 return False
112 |         return True
113 | 
114 |     def push_back(self, Name=".NewSec", VirtualSize=0x00000000, VirtualAddress=0x00000000,
115 |                 RawSize=0x00000000, RawAddress=0x00000000, RelocAddress=0x00000000,
116 |                 Linenumbers=0x00000000, RelocationsNumber=0x0000, LinenumbersNumber=0x0000,
117 |                 Characteristics=0xE00000E0, Data=""):
118 |         """ Adds the section, specified by the functions parameters, at the end of the section
119 |             table.
120 |             If the space to add an additional section header is insufficient, a buffer is inserted
121 |             after SizeOfHeaders. Data between the last section header and the end of SizeOfHeaders
122 |             is copied to +1 FileAlignment. Data directory entries pointing to this data are fixed.
123 | 
124 |             A call with no parameters creates the same section header as LordPE does. But for the
125 |             binary to be executable without errors a VirtualSize > 0 has to be set.
126 | 
127 |             If a RawSize > 0 is set or Data is given the data gets aligned to the FileAlignment and
128 |             is attached at the end of the file.
129 |         """
130 | 
131 |         if self.pe.FILE_HEADER.NumberOfSections == len(self.pe.sections):
132 | 
133 |             FileAlignment = self.pe.OPTIONAL_HEADER.FileAlignment
134 |             SectionAlignment = self.pe.OPTIONAL_HEADER.SectionAlignment
135 | 
136 |             if len(Name) > 8:
137 |                 raise SectionDoublePError("The name is too long for a section.")
138 | 
139 |             if (    VirtualAddress < (self.pe.sections[-1].Misc_VirtualSize +
140 |                                         self.pe.sections[-1].VirtualAddress)
141 |                 or  VirtualAddress % SectionAlignment != 0):
142 | 
143 |                 if (self.pe.sections[-1].Misc_VirtualSize % SectionAlignment) != 0:
144 |                     VirtualAddress =    \
145 |                         (self.pe.sections[-1].VirtualAddress + self.pe.sections[-1].Misc_VirtualSize -
146 |                         (self.pe.sections[-1].Misc_VirtualSize % SectionAlignment) + SectionAlignment)
147 |                 else:
148 |                     VirtualAddress =    \
149 |                         (self.pe.sections[-1].VirtualAddress + self.pe.sections[-1].Misc_VirtualSize)
150 | 
151 |             if VirtualSize < len(Data):
152 |                 VirtualSize = len(Data)
153 | 
154 |             if (len(Data) % FileAlignment) != 0:
155 |                 # Padding the data of the section.
156 |                 Data += '\x00' * (FileAlignment - (len(Data) % FileAlignment))
157 | 
158 |             if RawSize != len(Data):
159 |                 if (    RawSize > len(Data)
160 |                     and (RawSize % FileAlignment) == 0):
161 |                     Data += '\x00' * (RawSize - (len(Data) % RawSize))
162 |                 else:
163 |                     RawSize = len(Data)
164 | 
165 | 
166 |             section_table_offset = (self.pe.DOS_HEADER.e_lfanew + 4 +
167 |                 self.pe.FILE_HEADER.sizeof() + self.pe.FILE_HEADER.SizeOfOptionalHeader)
168 | 
169 |             # If the new section header exceeds the SizeOfHeaders there won't be enough space
170 |             # for an additional section header. Besides that it's checked if the 0x28 bytes
171 |             # (size of one section header) after the last current section header are filled
172 |             # with nulls/ are free to use.
173 |             if (        self.pe.OPTIONAL_HEADER.SizeOfHeaders <
174 |                         section_table_offset + (self.pe.FILE_HEADER.NumberOfSections+1)*0x28
175 |                 or not  self.__is_null_data(self.pe.get_data(section_table_offset +
176 |                         (self.pe.FILE_HEADER.NumberOfSections)*0x28, 0x28))):
177 | 
178 |                 # Checking if more space can be added.
179 |                 if self.pe.OPTIONAL_HEADER.SizeOfHeaders < self.pe.sections[0].VirtualAddress:
180 | 
181 |                     self.__add_header_space()
182 |                 else:
183 |                     raise SectionDoublePError("No more space can be added for the section header.")
184 | 
185 | 
186 |             # The validity check of RawAddress is done after space for a new section header may
187 |             # have been added because if space had been added the PointerToRawData of the previous
188 |             # section would have changed.
189 |             if (RawAddress != (self.pe.sections[-1].PointerToRawData +
190 |                                     self.pe.sections[-1].SizeOfRawData)):
191 |                     RawAddress =     \
192 |                         (self.pe.sections[-1].PointerToRawData + self.pe.sections[-1].SizeOfRawData)
193 | 
194 | 
195 |             # Appending the data of the new section to the file.
196 |             if len(Data) > 0:
197 |                 self.pe.__data__ = (self.pe.__data__[:RawAddress] + Data + \
198 |                                     self.pe.__data__[RawAddress:])
199 | 
200 |             section_offset = section_table_offset + self.pe.FILE_HEADER.NumberOfSections*0x28
201 | 
202 |             # Manually writing the data of the section header to the file.
203 |             self.pe.set_bytes_at_offset(section_offset, Name)
204 |             self.pe.set_dword_at_offset(section_offset+0x08, VirtualSize)
205 |             self.pe.set_dword_at_offset(section_offset+0x0C, VirtualAddress)
206 |             self.pe.set_dword_at_offset(section_offset+0x10, RawSize)
207 |             self.pe.set_dword_at_offset(section_offset+0x14, RawAddress)
208 |             self.pe.set_dword_at_offset(section_offset+0x18, RelocAddress)
209 |             self.pe.set_dword_at_offset(section_offset+0x1C, Linenumbers)
210 |             self.pe.set_word_at_offset(section_offset+0x20, RelocationsNumber)
211 |             self.pe.set_word_at_offset(section_offset+0x22, LinenumbersNumber)
212 |             self.pe.set_dword_at_offset(section_offset+0x24, Characteristics)
213 | 
214 |             self.pe.FILE_HEADER.NumberOfSections +=1
215 | 
216 |             # Parsing the section table of the file again to add the new section to the sections
217 |             # list of pefile.
218 |             self.pe.parse_sections(section_table_offset)
219 | 
220 |             self.__adjust_optional_header()
221 |         else:
222 |             raise SectionDoublePError("The NumberOfSections specified in the file header and the " +
223 |                 "size of the sections list of pefile don't match.")
224 | 
225 |         return self.pe


--------------------------------------------------------------------------------
/dispatch/formats/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/dispatch/formats/__init__.py


--------------------------------------------------------------------------------
/dispatch/formats/base_executable.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import logging
  3 | 
  4 | try:
  5 |     from keystone import *
  6 | except:
  7 |     logging.warning('Keystone assembler not found so assembling will not work')
  8 | 
  9 | from StringIO import StringIO
 10 | 
 11 | from ..analysis.x86_analyzer import *
 12 | from ..analysis.arm_analyzer import *
 13 | 
 14 | from ..enums import *
 15 | 
 16 | class BaseExecutable(object):
 17 |     '''
 18 |     The executable classes expose the raw binary in higher-level chunks.
 19 |     They automatically lift the .text segment (or equiv.) for quick use, keep a map of offset->vaddrs for lookups in
 20 |     the rewriting process, and more. You can think of them as the middle-man that sits between the disassembly and the
 21 |     underlying binary.
 22 |     '''
 23 |     def __init__(self, file_path):
 24 |         if not os.path.exists(file_path):
 25 |             raise Exception('No such file')
 26 | 
 27 |         self.fp = file_path
 28 |         self.binary = StringIO(open(self.fp).read())
 29 | 
 30 |         self.architecture = None
 31 |         self.pack_endianness = None
 32 | 
 33 |         self.helper = None
 34 | 
 35 |         self.analyzer = None
 36 |         self.libraries = []
 37 |         self.functions = {} # Vaddr: Function
 38 |         self.strings = {}
 39 |         self.xrefs = {}
 40 | 
 41 |         self.next_injection_vaddr = None
 42 | 
 43 |     def __repr__(self):
 44 |         return '<{} {} \'{}\'>'.format(self.architecture, self.__class__.__name__, self.fp)
 45 |     
 46 |     def _identify_arch(self):
 47 |         '''
 48 |         Identifies the architecture that the executable is compiled for.
 49 |         :return: None
 50 |         '''
 51 |         raise NotImplementedError()
 52 |     
 53 |     def is_64_bit(self):
 54 |         '''
 55 |         Determines if the executable is 64 bit or 32 bit.
 56 |         :return: True if the executable is 64 bit, otherwise false.
 57 |         '''
 58 |         return self.architecture in (ARCHITECTURE.X86_64, ARCHITECTURE.ARM_64)
 59 | 
 60 |     def address_length(self):
 61 |         '''
 62 |         :return: Number of bytes an address in this executable will have (i.e. 4 for 32 bit, 8 for 64 bit)
 63 |         '''
 64 |         return 8 if self.is_64_bit() else 4
 65 | 
 66 |     def entry_point(self):
 67 |         '''
 68 |         Gets the entry point of the executable.
 69 |         :return: The entry point of the executable.
 70 |         '''
 71 |         raise NotImplementedError()
 72 | 
 73 |     def sections_to_disassemble(self):
 74 |         '''
 75 |         Iterates through each section in the executable that is supposed to be disassembled.
 76 |         :return: Iterator
 77 |         '''
 78 |         for s in self.sections:
 79 |             if s.executable:
 80 |                 yield s
 81 | 
 82 |     def iter_string_sections(self):
 83 |         '''
 84 |         Returns the section(s) with strings in this executable
 85 |         :return: Section(s) with strings.
 86 |         '''
 87 |         raise NotImplementedError()
 88 | 
 89 |     def vaddr_is_executable(self, vaddr):
 90 |         '''
 91 |         Determine if the given virtual address is in a mapped executable memory segment.
 92 |         :param vaddr: Virtual address to check
 93 |         :return: True if the vaddr is in an executable segment, False otherwise
 94 |         '''
 95 |         for section in self.sections:
 96 |             if section.executable and section.contains_vaddr(vaddr):
 97 |                 return True
 98 | 
 99 |         return False
100 | 
101 |     def section_containing_vaddr(self, vaddr):
102 |         for section in self.sections:
103 |             if section.contains_vaddr(vaddr):
104 |                 return section
105 | 
106 |         return None
107 | 
108 |     def function_containing_vaddr(self, vaddr):
109 |         for f in self.iter_functions():
110 |             if f.contains_address(vaddr):
111 |                 return f
112 | 
113 |         return None
114 | 
115 |     def bb_containing_vaddr(self, vaddr):
116 |         for f in self.iter_functions():
117 |             for bb in f.bbs:
118 |                 if bb.address <= vaddr < bb.address + bb.size:
119 |                     return bb
120 | 
121 |         return None
122 | 
123 |     def vaddr_binary_offset(self, vaddr):
124 |         '''
125 |         Gets the offset in the binary file for a given virtual address.
126 |         :param vaddr: The virtual address to get the offset for.
127 |         :return: The offset in the binary of the virtual address.
128 |         '''
129 |         for section in self.sections:
130 |             if section.contains_vaddr(vaddr):
131 |                 return section.offset + vaddr - section.vaddr
132 | 
133 |         return None
134 | 
135 |     def _extract_symbol_table(self):
136 |         '''
137 |         Extracts the symbol table from the binary and creates named functions as appropriate.
138 |         Called from the analyzer in the main analysis function.
139 |         :return: None
140 |         '''
141 |         raise NotImplementedError()
142 | 
143 |     def get_binary(self):
144 |         '''
145 |         Gets the entire binary.
146 |         :return: The raw bytes of the entire binary.
147 |         '''
148 |         return self.binary.getvalue()
149 | 
150 |     def get_binary_vaddr_range(self, start, end):
151 |         '''
152 |         Gets the raw bytes from the binary within a virtual address range
153 |         :param start: Starting virtual address
154 |         :param end: Ending virtual address
155 |         :raises: KeyError if either the start or end virtual addresses do not actually exist in the binary
156 |         :return: The bytes in the binary between the two virtual addresses
157 |         '''
158 |         start_offset = self.vaddr_binary_offset(start)
159 |         end_offset = self.vaddr_binary_offset(end)
160 |         # if either of these returns None we don't want to slice up -- raise an error
161 |         if start_offset and end_offset:
162 |             return self.get_binary()[start_offset:end_offset]
163 | 
164 |         bad_addr = start if not start_offset else end # which address triggered our error
165 |         raise KeyError("Vaddr is not in binary: {:x}".format(bad_addr))
166 |     
167 |     def analyze(self):
168 |         '''
169 |         Creates an analyzer for the binary and then runs the initial analysis routine.
170 |         :return: The created analyzer object.
171 |         '''
172 |         if self.architecture == ARCHITECTURE.X86:
173 |             self.analyzer = X86_Analyzer(self)
174 |         elif self.architecture == ARCHITECTURE.X86_64:
175 |             self.analyzer = X86_64_Analyzer(self)
176 |         elif self.architecture == ARCHITECTURE.ARM:
177 |             self.analyzer = ARM_Analyzer(self)
178 |         elif self.architecture == ARCHITECTURE.ARM_64:
179 |             self.analyzer = ARM_64_Analyzer(self)
180 | 
181 |         if self.analyzer:
182 |             self.analyzer.analyze()
183 |         else:
184 |             logging.error('Could not create analyzer for {}'.format(self))
185 | 
186 |         return self.analyzer
187 | 
188 |     def _ks_symbol_resolver(self, symbol, value):
189 |         f = self.function_named(symbol)
190 | 
191 |         if f:
192 |             value = f.address
193 |             return True
194 | 
195 |         return False
196 | 
197 |     def assemble(self, s, vaddr=0):
198 |         '''
199 |         Assemble the given string relative to the given virtual address
200 |         :param s: String of assembly commands to be assembled
201 |         :param vaddr: Virtual address the code is assembled relative to
202 |         :return: A bytearray with the resulting machine code
203 |         '''
204 |         if self.architecture == ARCHITECTURE.X86:
205 |             ks = Ks(KS_ARCH_X86, KS_MODE_32)
206 |         elif self.architecture == ARCHITECTURE.X86_64:
207 |             ks = Ks(KS_ARCH_X86, KS_MODE_64)
208 |         elif self.architecture == ARCHITECTURE.ARM:
209 |             ks = Ks(KS_ARCH_ARM, KS_MODE_ARM)
210 |         elif self.architecture == ARCHITECTURE.ARM_64:
211 |             ks = Ks(KS_ARCH_ARM64, KS_MODE_ARM)
212 |         else:
213 |             logging.error('Could not create assembler for {}'.format(self))
214 |             raise Exception('Architecture not supported')
215 | 
216 |         ks.sym_resolver =  self._ks_symbol_resolver
217 | 
218 |         encoding, count = ks.asm(s, vaddr)
219 | 
220 |         return bytearray(encoding)
221 | 
222 |     def prepare_for_injection(self):
223 |         '''
224 |         Prepares the binary for code injection, creating sections/segments where needed.
225 |         This should *always* be called before inject() is called, as it provides the initial values for
226 |         next_injection_vaddr which may be required to do certain IP-relative computations.
227 |         :return: None
228 |         '''
229 |         raise NotImplementedError()
230 | 
231 |     def inject(self, asm, update_entry=False):
232 |         '''
233 |         Injects the given assembly into the binary, optionally updating the entry point if the injected code is to run
234 |         before initialization.
235 |         :param asm: The assembly to inject.
236 |         :param update_entry: Whether or not to update the binary entry point to point to the injected code.
237 |         :return: (offset of injected assembly in binary, virtual address of injected assembly)
238 |         '''
239 |         raise NotImplementedError()
240 | 
241 |     def hook(self, vaddr, asm):
242 |         '''
243 |         Patches the given binary to call `asm` at `vaddr`.
244 |         :param vaddr: The virtual address of the instruction to hook/patch
245 |         :param asm: The assembly (either string of assembly, bytearray of assembled opcodes, or list of Instructions) to
246 |         be written
247 |         :return: The virtual address of the created hook
248 |         '''
249 | 
250 |         # TODO: Move below to its own function and use in replace_instruction and inject
251 |         if type(asm) not in [str, list, bytearray]:
252 |             raise ValueError('asm is not a valid type. Must be str, list, or bytearray')
253 | 
254 |         if self.next_injection_vaddr is None:
255 |             self.prepare_for_injection()
256 | 
257 |         # We first replace the original instruction with a call to a new code chunk
258 |         jmp = self.assemble('call '+hex(self.next_injection_vaddr), vaddr) #TODO: Architecture independent
259 |         overwritten_instructions = self.replace_at(vaddr, jmp)
260 | 
261 |         if type(asm) == str:
262 |             # Assemble with keystone
263 |             pulled_list = [x.mnemonic + ' ' + x.op_str() for x in overwritten_instructions] 
264 |             asm = ' ; '.join(pulled_list) + ' ; ' + asm
265 |             asm = self.assemble(asm, self.next_injection_vaddr)
266 |             new_chunk = asm
267 |         elif type(asm) == list:
268 |             # TODO: reassemble to fix offsets in overwritten instructions
269 |             # Assemble each Instruction object
270 |             asm = sum((ins.raw for ins in asm), bytearray())
271 |             new_chunk = sum((ins.raw for ins in overwritten_instructions), bytearray()) + asm
272 | 
273 |         # Then we inject that new code chunk. This is composed of the instructions we wrote over to create the jump
274 |         # as well as the assembly we actually want to call
275 |         hook_addr = self.inject(new_chunk)
276 |         logging.debug('Replaced instruction at {} with jump to {}'.format(vaddr, hook_addr))
277 | 
278 |         return hook_addr
279 | 
280 | 
281 |     def iter_functions(self):
282 |         '''
283 |         Iterates over the functions in this executable
284 |         :return: Iterator
285 |         '''
286 |         return iter(self.functions.values())
287 | 
288 |     def function_named(self, name):
289 |         '''
290 |         Finds a function with a given name if it exists.
291 |         :param name: The name of the function to search for.
292 |         :return: The function if it is found, else None.
293 |         '''
294 |         for func in self.iter_functions():
295 |             if func.name == name or func.name == 'sub_'+name or func.name == name+'@PLT':
296 |                 return func
297 | 
298 |         return None
299 | 
300 |     def replace_at(self, vaddr, new_asm):
301 |         '''
302 |         Replaces an instruction with the given assembly.
303 |         :param vaddr: The address of the existing instruction(s) to overwrite.
304 |         :param new_asm: The new assembly that will be written over the old instruction.
305 |         :return: The original instruction(s) that was/were overwritten
306 |         '''
307 | 
308 |         if not vaddr in self.analyzer.ins_map:
309 |             raise Exception('Starting virtual address to replace must be an existing instruction')
310 | 
311 |         # Find all instructions we will be overwriting, and warn if they are referenced elsewhere.
312 | 
313 |         # If an instruction is being referenced elsewhere (most likely in a jump), it's possible that the jump
314 |         # (or whatever it is) will end up going to the middle of our replaced asm which can obviously make the program
315 |         # behave unexpectedly.
316 |         overwritten_insns = self.analyzer.ins_map[vaddr:vaddr + max(len(new_asm), 1)]
317 |         for ins in overwritten_insns:
318 |             if ins.address in self.xrefs:
319 |                 logging.warning('{} will be overwritten but there are xrefs to it: {}'.format(ins,
320 |                                                                                               self.xrefs[ins.address]))
321 | 
322 |         # Write the new bytes
323 |         offset = self.vaddr_binary_offset(vaddr)
324 |         self.binary.seek(offset)
325 |         logging.debug('Replacing instruction(s) at offset {}'.format(offset))
326 |         self.binary.write(new_asm)
327 | 
328 |         # Find how much is left over in the original instruction(s) and NOP them out
329 |         overwritten_size = sum(i.size for i in overwritten_insns)
330 |         padding = self.analyzer.NOP_INSTRUCTION * ((overwritten_size - len(new_asm)) / len(self.analyzer.NOP_INSTRUCTION))
331 |         self.binary.write(padding)
332 | 
333 |         # Disassemble the new instructions
334 |         new_instructions = self.analyzer.disassemble_range(vaddr, vaddr + len(new_asm))
335 | 
336 |         func = self.function_containing_vaddr(vaddr)
337 | 
338 |         insert_point = func.instructions.index(overwritten_insns[0])
339 | 
340 |         # Remove the old instructions from the function
341 |         for ins in overwritten_insns:
342 |             func.instructions.remove(ins)
343 | 
344 |         # Insert the new instructions where we just removed the old ones
345 |         func.instructions = func.instructions[:insert_point] + new_instructions + func.instructions[insert_point:]
346 | 
347 |         # Re-analyze the function for BBs
348 |         func.do_bb_analysis()
349 | 
350 |         # Finally clear the instructions out from the global instruction map
351 |         for ins in overwritten_insns:
352 |             del self.analyzer.ins_map[ins.address]
353 | 
354 |         for ins in new_instructions:
355 |             self.analyzer.ins_map[ins.address] = ins
356 | 
357 |         return overwritten_insns
358 | 
359 |     def save(self, file_name):
360 |         with open(file_name, 'wb') as f:
361 |             f.write(self.get_binary())
362 | 


--------------------------------------------------------------------------------
/dispatch/formats/elf_executable.py:
--------------------------------------------------------------------------------
  1 | from elftools.elf.elffile import ELFFile
  2 | from elftools.elf.enums import *
  3 | from elftools.elf.constants import *
  4 | from elftools.elf.sections import SymbolTableSection
  5 | import logging
  6 | 
  7 | from .base_executable import *
  8 | from .section import *
  9 | 
 10 | INJECTION_SIZE = 0x1000
 11 | 
 12 | class ELFExecutable(BaseExecutable):
 13 |     def __init__(self, file_path):
 14 |         super(ELFExecutable, self).__init__(file_path)
 15 | 
 16 |         self.helper = ELFFile(self.binary)
 17 | 
 18 |         self.architecture = self._identify_arch()
 19 | 
 20 |         if self.architecture is None:
 21 |             raise Exception('Architecture is not recognized')
 22 | 
 23 |         logging.debug('Initialized {} {} with file \'{}\''.format(self.architecture, type(self).__name__, file_path))
 24 | 
 25 |         self.pack_endianness = '<' if self.helper.little_endian else '>'
 26 |         self.address_pack_type = 'I' if self.helper.elfclass == 32 else 'Q'
 27 | 
 28 |         self.sections = [section_from_elf_section(s) for s in self.helper.iter_sections()]
 29 | 
 30 |         self.executable_segment = [s for s in self.helper.iter_segments() if s['p_type'] == 'PT_LOAD' and s['p_flags'] & 0x1][0]
 31 | 
 32 |         dyn = self.helper.get_section_by_name('.dynamic')
 33 |         if dyn:
 34 |             self.libraries = [t.needed for t in dyn.iter_tags() if t['d_tag'] == 'DT_NEEDED']
 35 | 
 36 |         self.next_injection_offset = None
 37 | 
 38 |     def _identify_arch(self):
 39 |         machine = self.helper.get_machine_arch()
 40 |         if machine == 'x86':
 41 |             return ARCHITECTURE.X86
 42 |         elif machine == 'x64':
 43 |             return ARCHITECTURE.X86_64
 44 |         elif machine == 'ARM':
 45 |             return ARCHITECTURE.ARM
 46 |         elif machine == 'AArch64':
 47 |             return ARCHITECTURE.ARM_64
 48 |         else:
 49 |             return None
 50 | 
 51 |     def entry_point(self):
 52 |         return self.helper['e_entry']
 53 | 
 54 |     def executable_segment_vaddr(self):
 55 |         return self.executable_segment['p_vaddr']
 56 | 
 57 |     def executable_segment_size(self):
 58 |         # TODO: Maybe limit this because we use this as part of our injection method?
 59 |         return self.executable_segment['p_memsz']
 60 | 
 61 |     def iter_string_sections(self):
 62 |         STRING_SECTIONS = ['.rodata', '.data', '.bss']
 63 |         for s in self.sections:
 64 |             if s.name in STRING_SECTIONS:
 65 |                 yield s
 66 | 
 67 |     def _extract_symbol_table(self):
 68 |         # Add in symbols from the PLT/rela.plt
 69 |         # .rela.plt contains indexes to reference both .dynsym (symbol names) and .plt (jumps to GOT)
 70 | 
 71 |         if self.is_64_bit():
 72 |             reloc_section = self.helper.get_section_by_name('.rela.plt')
 73 |         else:
 74 |             reloc_section = self.helper.get_section_by_name('.rel.plt')
 75 | 
 76 |         if reloc_section:
 77 |             dynsym = self.helper.get_section(reloc_section['sh_link']) # .dynsym
 78 |             if isinstance(dynsym, SymbolTableSection):
 79 |                 plt = self.helper.get_section_by_name('.plt')
 80 |                 for idx, reloc in enumerate(reloc_section.iter_relocations()):
 81 |                     # Get the symbol's name from dynsym
 82 |                     symbol_name = dynsym.get_symbol(reloc['r_info_sym']).name
 83 | 
 84 |                     # The address of this function in the PLT is the base PLT offset + the index of the relocation.
 85 |                     # However, since there is the extra "trampoline" entity at the top of the PLT, we need to add one to the
 86 |                     # index to account for it.
 87 | 
 88 |                     # While sh_entsize is sometimes defined, it appears to be incorrect in some cases so we just ignore that
 89 |                     # and calculate it based off of the total size / num_relocations (plus the trampoline entity)
 90 |                     entsize = (plt['sh_size'] / (reloc_section.num_relocations() + 1))
 91 | 
 92 |                     plt_addr = plt['sh_addr'] + ((idx+1) * entsize)
 93 | 
 94 |                     logging.debug('Directly adding PLT function {} at vaddr {}'.format(symbol_name, hex(plt_addr)))
 95 | 
 96 |                     f = Function(plt_addr,
 97 |                                  entsize,
 98 |                                  symbol_name + '@PLT',
 99 |                                  self,
100 |                                  type=Function.DYNAMIC_FUNC)
101 |                     self.functions[plt_addr] = f
102 |             else:
103 |                 logging.debug('.rel(a).plt section had sh_link to {}. Not parsing symbols...'.format(dynsym))
104 | 
105 |         if self.helper.get_section_by_name('.dynsym'):
106 |             for symbol in self.helper.get_section_by_name('.dynsym').iter_symbols():
107 |                 if symbol.entry['st_info']['type'] == 'STT_FUNC' and symbol.entry['st_size'] > 0:
108 |                     vaddr = symbol.entry['st_value']
109 |                     if vaddr not in self.functions:
110 |                         logging.debug('Adding function from .dynsym directly at vaddr {}'.format(vaddr))
111 |                         f = Function(vaddr,
112 |                                      symbol.entry['st_size'],
113 |                                      symbol.name,
114 |                                      self,
115 |                                      type=Function.DYNAMIC_FUNC)
116 |                         self.functions[vaddr] = f
117 | 
118 | 
119 |         # Some things in the symtab have st_size = 0 which confuses analysis later on. To solve this, we keep track of
120 |         # where each address is in the `function_vaddrs` set and go back after all symbols have been iterated to compute
121 |         # size by taking the difference between the current address and the next recorded address.
122 | 
123 |         # We do this for each executable section so that the produced functions cannot span multiple sections.
124 | 
125 |         for section in self.helper.iter_sections():
126 |             if self.executable_segment.section_in_segment(section):
127 |                 name_for_addr = {}
128 | 
129 |                 function_vaddrs = set([section['sh_addr'] + section['sh_size']])
130 | 
131 |                 symbol_table = self.helper.get_section_by_name('.symtab')
132 |                 if symbol_table:
133 |                     for symbol in symbol_table.iter_symbols():
134 |                         if symbol['st_info']['type'] == 'STT_FUNC' and symbol['st_shndx'] != 'SHN_UNDEF':
135 |                             if section['sh_addr'] <= symbol['st_value'] < section['sh_addr'] + section['sh_size']:
136 |                                 name_for_addr[symbol['st_value']] = symbol.name
137 |                                 function_vaddrs.add(symbol['st_value'])
138 | 
139 |                                 if symbol['st_size']:
140 |                                     logging.debug('Eagerly adding function {} from .symtab at vaddr {} with size {}'
141 |                                                   .format(symbol.name, hex(symbol['st_value']), hex(symbol['st_size'])))
142 |                                     f = Function(symbol['st_value'],
143 |                                                  symbol['st_size'],
144 |                                                  symbol.name,
145 |                                                  self)
146 |                                     self.functions[symbol['st_value']] = f
147 | 
148 | 
149 |                 function_vaddrs = sorted(list(function_vaddrs))
150 | 
151 |                 for cur_addr, next_addr in zip(function_vaddrs[:-1], function_vaddrs[1:]):
152 |                     # If st_size was set, we already added the function above, so don't add it again.
153 |                     if cur_addr not in self.functions:
154 |                         func_name = name_for_addr[cur_addr]
155 |                         size = next_addr - cur_addr
156 |                         logging.debug('Lazily adding function {} from .symtab at vaddr {} with size {}'
157 |                                       .format(func_name, hex(cur_addr), hex(size)))
158 |                         f = Function(cur_addr,
159 |                                      next_addr - cur_addr,
160 |                                      name_for_addr[cur_addr],
161 |                                      self,
162 |                                      type=Function.DYNAMIC_FUNC)
163 |                         self.functions[cur_addr] = f
164 | 
165 |         # TODO: Automatically find and label main from call to libc_start_main
166 | 
167 |     def prepare_for_injection(self):
168 |         """
169 |         Derived from http://vxheavens.com/lib/vsc01.html
170 |         """
171 |         modified = StringIO(self.binary.getvalue())
172 | 
173 |         # Add INJECTION_SIZE to the section header list offset to make room for our injected code
174 |         elf_hdr = self.helper.header.copy()
175 |         elf_hdr.e_shoff += INJECTION_SIZE
176 |         logging.debug('Changing e_shoff to {}'.format(elf_hdr.e_shoff))
177 | 
178 |         modified.seek(0)
179 |         modified.write(self.helper.structs.Elf_Ehdr.build(elf_hdr))
180 | 
181 |         # Find the main RX LOAD segment and also adjust other segment offsets along the way
182 |         executable_segment = None
183 | 
184 |         for segment_idx, segment in enumerate(self.helper.iter_segments()):
185 |             segment_hdr = segment.header.copy()
186 |             segment_hdr_offset = self.helper._segment_offset(segment_idx)
187 | 
188 |             if executable_segment is not None:
189 |                 # Already past the executable segment, so just update the offset if needed (i.e. don't update things
190 |                 # that come before the expanded section)
191 |                 if segment_hdr.p_offset > last_exec_section['sh_offset']:
192 |                     segment_hdr.p_offset += INJECTION_SIZE
193 | 
194 |             elif segment['p_type'] == 'PT_LOAD' and segment['p_flags'] & P_FLAGS.PF_X:
195 |                 # Found the executable LOAD segment.
196 |                 # Make room for our injected code.
197 | 
198 |                 logging.debug('Found executable LOAD segment at index {}'.format(segment_idx))
199 |                 executable_segment = segment
200 | 
201 |                 last_exec_section_idx = max([idx for idx in range(self.helper.num_sections()) if
202 |                                              executable_segment.section_in_segment(self.helper.get_section(idx))])
203 |                 last_exec_section = self.helper.get_section(last_exec_section_idx)
204 | 
205 |                 segment_hdr.p_flags |= P_FLAGS.PF_X | P_FLAGS.PF_W | P_FLAGS.PF_R
206 |                 segment_hdr.p_filesz += INJECTION_SIZE
207 |                 segment_hdr.p_memsz += INJECTION_SIZE
208 | 
209 |                 logging.debug('Rewriting segment filesize and memsize to {} and {}'.format(
210 |                     segment_hdr.p_filesz, segment_hdr.p_memsz)
211 |                 )
212 | 
213 |             modified.seek(segment_hdr_offset)
214 |             modified.write(self.helper.structs.Elf_Phdr.build(segment_hdr))
215 | 
216 |         if executable_segment is None:
217 |             logging.error("Could not locate an executable LOAD segment. Cannot continue injection.")
218 |             return False
219 | 
220 |         logging.debug('Last section in executable LOAD segment is at index {} ({})'.format(last_exec_section_idx,
221 |                                                                                            last_exec_section.name))
222 | 
223 |         self.next_injection_offset = last_exec_section['sh_offset'] + last_exec_section['sh_size']
224 |         self.next_injection_vaddr = last_exec_section['sh_addr'] + last_exec_section['sh_size']
225 | 
226 |         # Update sh_size for the section we grew
227 |         section_header_offset = self.helper._section_offset(last_exec_section_idx)
228 |         section_header = last_exec_section.header.copy()
229 | 
230 |         section_header.pflags = P_FLAGS.PF_R | P_FLAGS.PF_W | P_FLAGS.PF_X # Hack to make it so we can RWX the page
231 |         section_header.sh_size += INJECTION_SIZE
232 | 
233 |         modified.seek(section_header_offset)
234 |         modified.write(self.helper.structs.Elf_Shdr.build(section_header))
235 | 
236 |         # Update sh_offset for each section past the last section in the executable segment
237 |         for section_idx in range(last_exec_section_idx + 1, self.helper.num_sections()):
238 |             section_header_offset = self.helper._section_offset(section_idx)
239 |             section_header = self.helper.get_section(section_idx).header.copy()
240 | 
241 |             section_header.sh_offset += INJECTION_SIZE
242 |             logging.debug('Rewriting section {}\'s offset to {}'.format(section_idx, section_header.sh_offset))
243 | 
244 |             modified.seek(section_header_offset)
245 |             modified.write(self.helper.structs.Elf_Shdr.build(section_header))
246 | 
247 |         # TODO: Architecture-specific padding
248 |         # Should be something that won't immediately crash, but can be caught (e.g. SIGTRAP on x86)
249 |         modified = StringIO(modified.getvalue()[:self.next_injection_offset] +
250 |                             '\xCC'*INJECTION_SIZE +
251 |                             modified.getvalue()[self.next_injection_offset:])
252 | 
253 |         self.binary = modified
254 |         self.helper = ELFFile(self.binary)
255 | 
256 |         return True
257 | 
258 |     def inject(self, asm, update_entry=False):
259 |         if self.next_injection_offset is None or self.next_injection_vaddr is None:
260 |             logging.warning(
261 |                 'prepare_for_injection() was not called before inject(). Calling now, but this may cause unexpected behavior')
262 |             self.prepare_for_injection()
263 | 
264 |         for segment in self.helper.iter_segments():
265 |             if segment['p_type'] == 'PT_LOAD' and segment['p_flags'] & P_FLAGS.PF_X:
266 |                 injection_section_idx = max(i for i in range(self.helper.num_sections()) if segment.section_in_segment(self.helper.get_section(i)))
267 |                 break
268 | 
269 |         injection_section = self.helper.get_section(injection_section_idx)
270 | 
271 |         # If we haven't injected code before or need to expand the section again for this injection, go ahead and
272 |         # shift stuff around.
273 |         if injection_section['sh_offset'] + injection_section['sh_size'] < self.next_injection_offset + len(asm):
274 |             logging.debug('Automatically expanding injection section to accommodate for assembly')
275 | 
276 |             # NOTE: Could this change the destination address for the code that gets injected?
277 |             self.prepare_for_injection()
278 |             injection_section = self.helper.get_section(injection_section_idx)
279 | 
280 |             used_code_len = len(injection_section.data().rstrip('\xCC'))
281 |             self.next_injection_offset = injection_section['sh_offset'] + used_code_len
282 |             self.next_injection_vaddr = injection_section['sh_addr'] + used_code_len
283 | 
284 |         # "Inject" the assembly
285 |         logging.debug('Injecting {} bytes of assembly at offset {}'.format(len(asm), self.next_injection_offset))
286 |         self.binary.seek(self.next_injection_offset)
287 |         self.binary.write(asm)
288 | 
289 |         # Update e_entry if requested
290 |         if update_entry:
291 |             logging.debug('Rewriting ELF entry address to {}'.format(self.next_injection_vaddr))
292 |             elf_hdr = self.helper.header
293 |             elf_hdr.e_entry = self.next_injection_vaddr
294 | 
295 |             self.binary.seek(0)
296 |             self.binary.write(self.helper.structs.Elf_Ehdr.build(elf_hdr))
297 | 
298 |         self.helper = ELFFile(self.binary)
299 | 
300 |         self.next_injection_vaddr += len(asm)
301 |         self.next_injection_offset += len(asm)
302 | 
303 |         return self.next_injection_vaddr - len(asm)
304 | 


--------------------------------------------------------------------------------
/dispatch/formats/macho_executable.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | import struct
  3 | 
  4 | from macholib.MachO import MachO
  5 | from macholib.mach_o import *
  6 | 
  7 | from .base_executable import *
  8 | from .section import *
  9 | 
 10 | INJECTION_SEGMENT_NAME = 'INJECT'
 11 | INJECTION_SECTION_NAME = 'inject'
 12 | 
 13 | class MachOExecutable(BaseExecutable):
 14 |     def __init__(self, file_path):
 15 |         super(MachOExecutable, self).__init__(file_path)
 16 | 
 17 |         self.helper = MachO(self.fp)
 18 | 
 19 |         if self.helper.fat:
 20 |             raise Exception('MachO fat binaries are not supported at this time')
 21 | 
 22 |         self.architecture = self._identify_arch()
 23 | 
 24 |         if self.architecture is None:
 25 |             raise Exception('Architecture is not recognized')
 26 | 
 27 |         logging.debug('Initialized {} {} with file \'{}\''.format(self.architecture, type(self).__name__, file_path))
 28 | 
 29 |         self.pack_endianness = self.helper.headers[0].endian
 30 | 
 31 |         self.sections = []
 32 |         for lc, cmd, data in self.helper.headers[0].commands:
 33 |             if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64):
 34 |                 for section in data:
 35 |                     self.sections.append(section_from_macho_section(section, cmd))
 36 | 
 37 |         self.executable_segment = [cmd for lc, cmd, _ in self.helper.headers[0].commands
 38 |                                    if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64) and cmd.initprot & 0x4][0]
 39 | 
 40 |         self.libraries = [fp.rstrip('\x00') for lc, cmd, fp in self.helper.headers[0].commands if lc.cmd == LC_LOAD_DYLIB]
 41 | 
 42 |     def _identify_arch(self):
 43 |         if self.helper.headers[0].header.cputype == 0x7:
 44 |             return ARCHITECTURE.X86
 45 |         elif self.helper.headers[0].header.cputype == 0x01000007:
 46 |             return ARCHITECTURE.X86_64
 47 |         elif self.helper.headers[0].header.cputype == 0xc:
 48 |             return ARCHITECTURE.ARM
 49 |         elif self.helper.headers[0].header.cputype == 0x0100000c:
 50 |             return ARCHITECTURE.ARM_64
 51 |         else:
 52 |             return None
 53 | 
 54 |     def executable_segment_vaddr(self):
 55 |         return self.executable_segment.vmaddr
 56 | 
 57 |     def executable_segment_size(self):
 58 |         return self.executable_segment.vmsize
 59 | 
 60 |     def entry_point(self):
 61 |         for lc, cmd, _ in self.helper.headers[0].commands:
 62 |             if lc.cmd == LC_MAIN:
 63 |                 return cmd.entryoff
 64 |         return
 65 | 
 66 |     def _extract_symbol_table(self):
 67 |         ordered_symbols = []
 68 | 
 69 |         symtab_command = self.helper.headers[0].getSymbolTableCommand()
 70 | 
 71 |         if symtab_command:
 72 |             self.binary.seek(symtab_command.stroff)
 73 |             symbol_strings = self.binary.read(symtab_command.strsize)
 74 | 
 75 |             self.binary.seek(symtab_command.symoff)
 76 | 
 77 |             for i in range(symtab_command.nsyms):
 78 |                 if self.is_64_bit():
 79 |                     symbol = nlist_64.from_fileobj(self.binary, _endian_=self.pack_endianness)
 80 |                 else:
 81 |                     symbol = nlist.from_fileobj(self.binary, _endian_=self.pack_endianness)
 82 | 
 83 |                 symbol_name = symbol_strings[symbol.n_un:].split('\x00')[0]
 84 | 
 85 |                 if symbol.n_type & N_STAB == 0:
 86 |                     is_ext = symbol.n_type & N_EXT and symbol.n_value == 0
 87 | 
 88 |                     # Ignore Apple's hack for radar bug 5614542
 89 |                     if not is_ext and symbol_name != 'radr://5614542':
 90 |                         size = 0
 91 |                         logging.debug('Adding function {} from the symtab at vaddr {} with size {}'
 92 |                                       .format(symbol_name, hex(symbol.n_value), hex(size)))
 93 |                         f = Function(symbol.n_value, size, symbol_name, self)
 94 |                         self.functions[symbol.n_value] = f
 95 | 
 96 |                 ordered_symbols.append(symbol_name)
 97 | 
 98 |         dysymtab_command = self.helper.headers[0].getDynamicSymbolTableCommand()
 99 |         if dysymtab_command:
100 |             self.binary.seek(dysymtab_command.indirectsymoff)
101 |             indirect_symbols = self.binary.read(dysymtab_command.nindirectsyms*4)
102 | 
103 |             sym_offsets = struct.unpack(self.pack_endianness + 'I'*dysymtab_command.nindirectsyms, indirect_symbols)
104 | 
105 |             for lc, cmd, sections in self.helper.headers[0].commands:
106 |                 if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64) and cmd.initprot & 0x4:
107 |                     for section in sections:
108 |                         if section.flags & S_NON_LAZY_SYMBOL_POINTERS == S_NON_LAZY_SYMBOL_POINTERS \
109 |                             or section.flags & S_LAZY_SYMBOL_POINTERS == S_LAZY_SYMBOL_POINTERS \
110 |                             or section.flags & S_SYMBOL_STUBS == S_SYMBOL_STUBS:
111 | 
112 |                             logging.debug('Parsing dynamic entries in {}.{}'.format(section.segname, section.sectname))
113 | 
114 |                             if section.flags & S_SYMBOL_STUBS:
115 |                                 stride = section.reserved2
116 |                             else:
117 |                                 stride = (64 if self.is_64_bit() else 32)
118 | 
119 |                             count = section.size / stride
120 | 
121 |                             for i in range(count):
122 |                                 addr = self.executable_segment.vmaddr + section.offset + (i * stride)
123 |                                 idx = sym_offsets[i + section.reserved1]
124 |                                 if idx == 0x40000000:
125 |                                     symbol_name = "INDIRECT_SYMBOL_ABS"
126 |                                 elif idx == 0x80000000:
127 |                                     symbol_name = "INDIRECT_SYMBOL_LOCAL"
128 |                                 else:
129 |                                     symbol_name = ordered_symbols[idx]
130 |                                 logging.debug('Adding function {} from the dynamic symtab at vaddr {} with size {}'
131 |                                               .format(symbol_name, hex(addr), hex(stride)))
132 |                                 f = Function(addr, stride, symbol_name, self, type=Function.DYNAMIC_FUNC)
133 |                                 self.functions[addr] = f
134 | 
135 |     def iter_string_sections(self):
136 |         STRING_SECTIONS = ['__const', '__cstring', '__objc_methname', '__objc_classname']
137 |         for s in self.sections:
138 |             if s.name in STRING_SECTIONS:
139 |                 yield s
140 | 
141 |     def prepare_for_injection(self):
142 |         # Total size of the stuff we're going to be adding in the middle of the binary
143 |         offset = 72+80 if self.is_64_bit() else 56+68  # 1 segment header + 1 section header
144 | 
145 |         fileoff = (self.binary.len & ~0xfff) + 0x1000
146 | 
147 |         vmaddr = self.function_named('__mh_execute_header').address + fileoff
148 | 
149 |         logging.debug('Creating new MachOSegment at vaddr {}'.format(hex(vmaddr)))
150 |         new_segment = segment_command_64() if self.is_64_bit() else segment_command()
151 |         new_segment._endian_ = self.pack_endianness
152 |         new_segment.segname = INJECTION_SEGMENT_NAME
153 |         new_segment.fileoff = fileoff
154 |         new_segment.filesize = 0
155 |         new_segment.vmaddr = vmaddr
156 |         new_segment.vmsize = 0x1000
157 |         new_segment.maxprot = 0x7 #RWX
158 |         new_segment.initprot = 0x5 # RX
159 |         new_segment.flags = 0
160 |         new_segment.nsects = 1
161 | 
162 |         logging.debug('Creating new MachOSection at vaddr {}'.format(hex(vmaddr)))
163 |         new_section = section_64() if self.is_64_bit() else section()
164 |         new_section._endian_ = self.pack_endianness
165 |         new_section.sectname = INJECTION_SECTION_NAME
166 |         new_section.segname = new_segment.segname
167 |         new_section.addr = new_segment.vmaddr
168 |         new_section.size = 0
169 |         new_section.offset = new_segment.fileoff
170 |         new_section.align = 4
171 |         new_section.flags = 0x80000400
172 | 
173 |         lc = load_command()
174 |         lc._endian_ = self.pack_endianness
175 |         lc.cmd = LC_SEGMENT_64 if self.is_64_bit() else LC_SEGMENT
176 |         lc.cmdsize = offset
177 | 
178 |         self.helper.headers[0].commands.append((lc, new_segment, [new_section]))
179 | 
180 |         self.helper.headers[0].header.ncmds += 1
181 |         self.helper.headers[0].header.sizeofcmds += offset
182 | 
183 |         return new_segment
184 | 
185 |     def inject(self, asm, update_entry=False):
186 |         found = [s for lc,s,_ in self.helper.headers[0].commands if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64) and s.segname == INJECTION_SEGMENT_NAME]
187 |         if found:
188 |             injection_vaddr = found[0].vmaddr
189 |         else:
190 |             logging.warning(
191 |                 'prepare_for_injection() was not called before inject(). This may cause unexpected behavior')
192 |             inject_seg = self.prepare_for_injection()
193 |             injection_vaddr = inject_seg.vmaddr
194 | 
195 |         if update_entry:
196 |             for lc, cmd, _ in self.helper.headers[0].commands:
197 |                 if lc.cmd == LC_MAIN:
198 |                     cmd.entryoff = injection_vaddr
199 |                     break
200 | 
201 |         self.binary.seek(0)
202 | 
203 |         for lc, segment, sections in self.helper.headers[0].commands:
204 |             if lc.cmd in (LC_SEGMENT, LC_SEGMENT_64) and segment.segname == INJECTION_SEGMENT_NAME:
205 |                 injection_offset = segment.fileoff + segment.filesize
206 |                 segment.filesize += len(asm)
207 |                 if segment.filesize + len(asm) > segment.vmsize:
208 |                     segment.vmsize += 0x1000
209 |                 for section in sections:
210 |                     if section.sectname == INJECTION_SECTION_NAME:
211 |                         section.size += len(asm)
212 |                         self.next_injection_vaddr = section.addr + section.size
213 | 
214 |         self.helper.headers[0].write(self.binary)
215 | 
216 |         self.binary.seek(injection_offset)
217 |         self.binary.write(asm)
218 | 
219 |         return injection_vaddr
220 | 


--------------------------------------------------------------------------------
/dispatch/formats/pe_executable.py:
--------------------------------------------------------------------------------
  1 | import pefile
  2 | from .SectionDoubleP import SectionDoubleP
  3 | 
  4 | from .base_executable import *
  5 | from .section import *
  6 | 
  7 | SECTION_SIZE = 0x1000
  8 | 
  9 | class PEExecutable(BaseExecutable):
 10 |     def __init__(self, file_path):
 11 |         super(PEExecutable, self).__init__(file_path)
 12 | 
 13 |         self.helper = pefile.PE(self.fp)
 14 | 
 15 |         self.architecture = self._identify_arch()
 16 | 
 17 |         if self.architecture is None:
 18 |             raise Exception('Architecture is not recognized')
 19 | 
 20 |         logging.debug('Initialized {} {} with file \'{}\''.format(self.architecture, type(self).__name__, file_path))
 21 | 
 22 |         self.pack_endianness = '<'
 23 | 
 24 |         self.sections = [section_from_pe_section(s, self.helper) for s in self.helper.sections]
 25 | 
 26 |         if hasattr(self.helper, 'DIRECTORY_ENTRY_IMPORT'):
 27 |             self.libraries = [dll.dll for dll in self.helper.DIRECTORY_ENTRY_IMPORT]
 28 |         else:
 29 |             self.libraries = []
 30 |     
 31 |     def _identify_arch(self):
 32 |         machine = pefile.MACHINE_TYPE[self.helper.FILE_HEADER.Machine]
 33 |         if machine == 'IMAGE_FILE_MACHINE_I386':
 34 |             return ARCHITECTURE.X86
 35 |         elif machine == 'IMAGE_FILE_MACHINE_AMD64':
 36 |             return ARCHITECTURE.X86_64
 37 |         elif machine == 'IMAGE_FILE_MACHINE_ARM':
 38 |             return ARCHITECTURE.ARM
 39 |         else:
 40 |             return None
 41 | 
 42 |     def entry_point(self):
 43 |         return self.helper.OPTIONAL_HEADER.AddressOfEntryPoint
 44 | 
 45 |     def get_binary(self):
 46 |         return self.helper.write()
 47 | 
 48 |     def iter_string_sections(self):
 49 |         STRING_SECTIONS = ['.rdata']
 50 |         for s in self.sections:
 51 |             if s.name in STRING_SECTIONS:
 52 |                 yield s
 53 | 
 54 |     def _extract_symbol_table(self):
 55 |         # Load in stuff from the IAT if it exists
 56 |         if hasattr(self.helper, 'DIRECTORY_ENTRY_IMPORT'):
 57 |             for dll in self.helper.DIRECTORY_ENTRY_IMPORT:
 58 |                 for imp in dll.imports:
 59 |                     if imp.name:
 60 |                         name = imp.name + '@' + dll.dll
 61 |                     else:
 62 |                         name = 'ordinal_' + str(imp.ordinal) + '@' + dll.dll
 63 | 
 64 |                     self.functions[imp.address] = Function(imp.address,
 65 |                                                            self.address_length(),
 66 |                                                            name,
 67 |                                                            self)
 68 | 
 69 |         # Load in information from the EAT if it exists
 70 |         if hasattr(self.helper, 'DIRECTORY_ENTRY_EXPORT'):
 71 |             for symbol in self.helper.DIRECTORY_ENTRY_EXPORT.symbols:
 72 |                 if symbol.address not in self.functions:
 73 |                     self.functions[symbol.address] = Function(symbol.address,
 74 |                                                               0,
 75 |                                                               symbol.name,
 76 |                                                               self)
 77 |                 else:
 78 |                     self.functions[symbol.address].name = symbol.name
 79 | 
 80 |     def prepare_for_injection(self):
 81 |         sdp = SectionDoubleP(self.helper)
 82 |         to_inject = '\x00' * SECTION_SIZE
 83 |         self.helper = sdp.push_back(Name='.inject', Characteristics=0x60000020, Data=to_inject)
 84 |         self.next_injection_vaddr = self.helper.sections[-1].VirtualAddress + self.helper.OPTIONAL_HEADER.ImageBase
 85 | 
 86 |     def inject(self, asm, update_entry=False):
 87 |         has_injection_section = [s for s in self.helper.sections if s.Name.startswith('.inject')]
 88 | 
 89 |         if not has_injection_section:
 90 |             logging.warning(
 91 |                 'prepare_for_injection() was not called before inject(). This may cause unexpected behavior')
 92 |             self.prepare_for_injection()
 93 | 
 94 |         inject_rva = self.next_injection_vaddr - self.helper.OPTIONAL_HEADER.ImageBase
 95 |         self.helper.set_bytes_at_rva(inject_rva, asm)
 96 | 
 97 |         if update_entry:
 98 |             self.helper.OPTIONAL_HEADER.AddressOfEntryPoint = inject_rva
 99 | 
100 |         self.next_injection_vaddr += len(asm)
101 | 
102 |         return inject_rva + self.helper.OPTIONAL_HEADER.ImageBase
103 | 
104 |     def replace_at(self, vaddr, new_asm):
105 |         # Identical to the implementation in base_executable except for the commented section
106 | 
107 |         if not vaddr in self.analyzer.ins_map:
108 |             raise Exception('Starting virtual address to replace must be an existing instruction')
109 | 
110 |         overwritten_insns = self.analyzer.ins_map[vaddr:vaddr + max(len(new_asm), 1)]
111 |         for ins in overwritten_insns:
112 |             if ins.address in self.xrefs:
113 |                 logging.warning('{} will be overwritten but there are xrefs to it: {}'.format(ins,
114 |                                                                                               self.xrefs[ins.address]))
115 | 
116 |         logging.debug('Replacing instruction(s) at vaddr {}'.format(vaddr))
117 | 
118 |         # Since we're using pefile to keep track of the (changed) binary, use pefile's methods to write the new asm
119 |         self.helper.set_bytes_at_rva(vaddr - self.helper.OPTIONAL_HEADER.ImageBase, new_asm)
120 | 
121 |         overwritten_size = sum(i.size for i in overwritten_insns)
122 |         padding = self.analyzer.NOP_INSTRUCTION * ((overwritten_size - len(new_asm)) / len(self.analyzer.NOP_INSTRUCTION))
123 |         self.helper.set_bytes_at_rva(vaddr - self.helper.OPTIONAL_HEADER.ImageBase + len(new_asm), padding)
124 | 
125 |         new_instructions = self.analyzer.disassemble_range(vaddr, vaddr + len(new_asm))
126 | 
127 |         func = self.function_containing_vaddr(vaddr)
128 | 
129 |         insert_point = func.instructions.index(overwritten_insns[0])
130 | 
131 |         for ins in overwritten_insns:
132 |             func.instructions.remove(ins)
133 | 
134 |         func.instructions = func.instructions[:insert_point] + new_instructions + func.instructions[insert_point:]
135 | 
136 |         func.do_bb_analysis()
137 | 
138 |         for ins in overwritten_insns:
139 |             del self.analyzer.ins_map[ins.address]
140 | 
141 |         for ins in new_instructions:
142 |             self.analyzer.ins_map[ins.address] = ins
143 | 
144 |         return overwritten_insns


--------------------------------------------------------------------------------
/dispatch/formats/section.py:
--------------------------------------------------------------------------------
 1 | class Section(object):
 2 |     '''
 3 |     Represents a section from an executable. All common executable formats have nearly the exact same idea of a
 4 |     section, so we just put it into a unified class for easy, consistent access
 5 |     '''
 6 |     def __init__(self):
 7 |         self.name = ''
 8 |         self.vaddr = 0
 9 |         self.offset = 0
10 |         self.size = 0
11 |         self.raw = None
12 | 
13 |         self.readable = False
14 |         self.writable = False
15 |         self.executable = False
16 | 
17 |         self.orig_section = None
18 | 
19 |     def __repr__(self):
20 |         return '<Section {} at vaddr {}>'.format(self.name, hex(self.vaddr))
21 | 
22 |     def contains_vaddr(self, vaddr):
23 |         return self.vaddr <= vaddr < self.vaddr + self.size
24 | 
25 | def section_from_elf_section(elf_section):
26 |     s = Section()
27 |     s.name = elf_section.name
28 |     s.vaddr = elf_section['sh_addr']
29 |     s.offset = elf_section['sh_offset']
30 |     s.size = elf_section['sh_size']
31 |     s.raw = elf_section.data()
32 | 
33 |     s.writable = bool(elf_section['sh_flags'] & 0x1)
34 |     s.readable = bool(elf_section['sh_flags'] & 0x2)
35 |     s.executable = bool(elf_section['sh_flags'] & 0x4)
36 | 
37 |     s.orig_section = elf_section
38 | 
39 |     return s
40 | 
41 | def section_from_macho_section(macho_section, macho_segment):
42 |     s = Section()
43 |     s.name = macho_section.sectname.rstrip('\x00')
44 |     s.vaddr = macho_section.addr
45 |     s.offset = macho_section.offset
46 |     s.size = macho_section.size
47 |     if hasattr(macho_section, 'section_data'):
48 |         s.raw = macho_section.section_data
49 |     else:
50 |         s.raw = ''
51 |     s.readable = bool(macho_segment.initprot & 0x1)
52 |     s.writable = bool(macho_segment.initprot & 0x2)
53 |     s.executable = bool(macho_segment.initprot & 0x4)
54 | 
55 |     s.orig_section = macho_section
56 | 
57 |     return s
58 | 
59 | def section_from_pe_section(pe_section, pe):
60 |     s = Section()
61 |     s.name = pe_section.Name.strip('\x00')
62 |     s.vaddr = pe_section.VirtualAddress + pe.OPTIONAL_HEADER.ImageBase
63 |     s.offset = pe_section.PointerToRawData
64 |     s.size = pe_section.SizeOfRawData
65 |     s.raw = pe_section.get_data()
66 | 
67 |     s.writable = bool(pe_section.Characteristics & 0x80000000)
68 |     s.readable = bool(pe_section.Characteristics & 0x40000000)
69 |     s.executable = bool(pe_section.Characteristics & 0x20000000)
70 | 
71 |     s.orig_section = pe_section
72 | 
73 |     return s
74 | 
75 | 
76 | 
77 | 


--------------------------------------------------------------------------------
/dispatch/util/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/dispatch/util/__init__.py


--------------------------------------------------------------------------------
/dispatch/util/trie.py:
--------------------------------------------------------------------------------
 1 | from ..constructs import Instruction
 2 | 
 3 | class Trie(object):
 4 |     BUCKET_LEN = 1
 5 |     BUCKET_MASK = (2**BUCKET_LEN)-1
 6 |     def __init__(self):
 7 |         self.children = [None for _ in range(2**Trie.BUCKET_LEN)]
 8 |         self.value = None
 9 | 
10 |     def __setitem__(self, key, value):
11 |         assert type(value) == Instruction
12 | 
13 |         node = self
14 |         for bucket in [(key >> i) & Trie.BUCKET_MASK for \
15 |                        i in range(64, -1, -Trie.BUCKET_LEN)]:
16 |             if not node.children[bucket]:
17 |                 node.children[bucket] = Trie()
18 |             node = node.children[bucket]
19 | 
20 |         node.value = value
21 | 
22 |     def __getitem__(self, item):
23 |         if type(item) in (int, long):
24 |             node = self
25 |             for bucket in [(item >> i) & Trie.BUCKET_MASK for \
26 |                            i in range(64, -1, -Trie.BUCKET_LEN)]:
27 |                 if not node.children[bucket]:
28 |                     raise KeyError()
29 |                 node = node.children[bucket]
30 | 
31 |             return node.value
32 | 
33 |         elif type(item) == slice:
34 |             start = item.start
35 |             stop = item.stop
36 |             if start is None:
37 |                 start = 0
38 |             if stop is None:
39 |                 # 128 bits max address. Seems big enough for practical purposes
40 |                 stop = 0xFFFFFFFFFFFFFFFF
41 |             uncommon_bits = (stop ^ start).bit_length()
42 | 
43 |             node = self
44 |             for bucket in [(start >> i) & Trie.BUCKET_MASK for \
45 |                            i in range(64, uncommon_bits, -Trie.BUCKET_LEN)]:
46 |                 if not node.children[bucket]:
47 |                     raise KeyError()
48 |                 node = node.children[bucket]
49 | 
50 |             return [v for v in iter(node) if start <= v.address < stop][::item.step]
51 | 
52 |     def __iter__(self):
53 |         if self.value:
54 |             yield self.value
55 |         for child in filter(None, self.children):
56 |             for v in child:
57 |                 yield v
58 | 
59 |     def __contains__(self, item):
60 |         node = self
61 |         for bucket in [(item >> i) & Trie.BUCKET_MASK for \
62 |                        i in range(64, -1, -Trie.BUCKET_LEN)]:
63 |             if not node.children[bucket]:
64 |                 return False
65 |             node = node.children[bucket]
66 |         return True
67 | 
68 |     def __delitem__(self, key):
69 |         node = self
70 |         for bucket in [(key >> i) & Trie.BUCKET_MASK for \
71 |                        i in range(64, -1, -Trie.BUCKET_LEN)]:
72 |             if not node.children[bucket]:
73 |                 raise KeyError()
74 |             node = node.children[bucket]
75 | 
76 |         node.value = None
77 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | setup(
 4 |     name='dispatch',
 5 |     version='0.9',
 6 |     author='NYU OSIRIS Lab',
 7 |     url='https://github.com/isislab/dispatch',
 8 |     description='Programmatic disassembly and patching from NYU\'s OSIRIS lab',
 9 |     packages=['dispatch', 'dispatch.util', 'dispatch.formats', 'dispatch.analysis'],
10 |     install_requires=[
11 |         'capstone>3.0',
12 |         'keystone-engine',
13 |         'pyelftools',
14 |         'pefile',
15 |         'macholib'
16 |     ]
17 | )
18 | 


--------------------------------------------------------------------------------
/tests/analyze_one.py:
--------------------------------------------------------------------------------
1 | from dispatch import *
2 | import logging, sys
3 | logging.basicConfig(level=logging.INFO)
4 | 
5 | exe = read_executable(sys.argv[1])
6 | exe.analyze()
7 | exe.analyzer.cfg()
8 | print "passed"
9 | 


--------------------------------------------------------------------------------
/tests/binaries/arm32/conditions-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/conditions-static.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/conditions.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/conditions.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/functions-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/functions-static.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/functions.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/functions.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/hello-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/hello-static.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/hello.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/hello.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/switch-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/switch-static.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/switch.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/switch.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/test2-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/test2-static.elf


--------------------------------------------------------------------------------
/tests/binaries/arm32/test2.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/arm32/test2.elf


--------------------------------------------------------------------------------
/tests/binaries/src/conditions.c:
--------------------------------------------------------------------------------
 1 | #include <stdio.h>
 2 | 
 3 | int main() {
 4 |     int x = 10;
 5 |     if (x == 100) {
 6 |         printf("x is 100\n");
 7 |     } else {
 8 |         printf("x is not 100\n");
 9 |     }
10 |     return 0;
11 | }
12 | 


--------------------------------------------------------------------------------
/tests/binaries/src/functions.c:
--------------------------------------------------------------------------------
 1 | #include <stdio.h>
 2 | 
 3 | // To make Visual Studio happy ;)
 4 | int mut_rec2(int);
 5 | 
 6 | int add_two(int x) {
 7 |     return x + 2;
 8 | }
 9 | 
10 | int subtract_two(int x) {
11 |     return add_two(x) - 4;
12 | }
13 | 
14 | int fib(int n) {
15 |     if (n == 0) return 0;
16 |     if (n == 1) return 1;
17 |     return fib(n-1) + fib(n-2);
18 | }
19 | 
20 | int mut_rec1(int n) {
21 |     if (n == 0) return 0;
22 |     return mut_rec2(n-1);
23 | }
24 | 
25 | int mut_rec2(int n) {
26 |     if (n == 0) return 1;
27 |     return mut_rec1(n-1);
28 | }
29 | 
30 | int main() {
31 |     fib(10);
32 |     printf("%d\n", add_two(1000));
33 |     printf("%d\n", subtract_two(10));
34 |     mut_rec1(10);
35 | }
36 | 


--------------------------------------------------------------------------------
/tests/binaries/src/hello.c:
--------------------------------------------------------------------------------
1 | #include <stdio.h>
2 | 
3 | int main() {
4 | 	int a = 1000000;
5 | 	a += 10000;
6 | 	printf("Hello World\n");
7 | }
8 | 


--------------------------------------------------------------------------------
/tests/binaries/src/switch.c:
--------------------------------------------------------------------------------
 1 | int main() {
 2 |     int x = 10;
 3 |     switch (x) {
 4 |         case 0:
 5 |             x = 1000003;
 6 |             break;
 7 |         case 1:
 8 |             x = 10;
 9 |             break;
10 |         case 2:
11 |             x = 320;
12 |             break;
13 |         case 3:
14 |             x = 12021;
15 |             break;
16 |         case 4:
17 |             x = 11983;
18 |             break;
19 |         case 5:
20 |             x = 12028;
21 |             break;
22 |         case 6:
23 |             x = 11985;
24 |             break;
25 |         case 7:
26 |             x = 12002;
27 |             break;
28 |         case 8:
29 |             x = 12019;
30 |             break;
31 |         case 9:
32 |             x = 12048;
33 |             break;
34 |         case 10:
35 |             x = 12082;
36 |             break;
37 |         case 11:
38 |             x = 12100;
39 |             break;
40 |         case 12:
41 |             x = 12106;
42 |             break;
43 |         case 13:
44 |             x = 12173;
45 |             break;
46 |         case 14:
47 |             x = 12235;
48 |             break;
49 |         case 15:
50 |             x = 12248;
51 |             break;
52 |         case 16:
53 |             x = 12333;
54 |             break;
55 |         default:
56 |             break;
57 |     }
58 |     return 0;
59 | }
60 | 


--------------------------------------------------------------------------------
/tests/binaries/src/test2.c:
--------------------------------------------------------------------------------
 1 | #include <stdio.h>
 2 | 
 3 | int main() {
 4 |     int i = 101;
 5 |     if (i < 100) {
 6 |         printf("lesser!\n");
 7 |     } else {
 8 |         printf("Greater!\n");
 9 |     }
10 |     return 0;
11 | }
12 | 


--------------------------------------------------------------------------------
/tests/binaries/x86/conditions-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/conditions-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/conditions.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/conditions.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/conditions.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/conditions.macho


--------------------------------------------------------------------------------
/tests/binaries/x86/conditions.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/conditions.pe


--------------------------------------------------------------------------------
/tests/binaries/x86/functions-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/functions-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/functions.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/functions.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/functions.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/functions.macho


--------------------------------------------------------------------------------
/tests/binaries/x86/functions.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/functions.pe


--------------------------------------------------------------------------------
/tests/binaries/x86/hello-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/hello-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/hello.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/hello.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/hello.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/hello.macho


--------------------------------------------------------------------------------
/tests/binaries/x86/hello.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/hello.pe


--------------------------------------------------------------------------------
/tests/binaries/x86/switch-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/switch-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/switch.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/switch.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/switch.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/switch.macho


--------------------------------------------------------------------------------
/tests/binaries/x86/switch.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/switch.pe


--------------------------------------------------------------------------------
/tests/binaries/x86/test2-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/test2-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/test2.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/test2.elf


--------------------------------------------------------------------------------
/tests/binaries/x86/test2.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/test2.macho


--------------------------------------------------------------------------------
/tests/binaries/x86/test2.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86/test2.pe


--------------------------------------------------------------------------------
/tests/binaries/x86_64/conditions-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/conditions-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/conditions.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/conditions.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/conditions.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/conditions.macho


--------------------------------------------------------------------------------
/tests/binaries/x86_64/conditions.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/conditions.pe


--------------------------------------------------------------------------------
/tests/binaries/x86_64/functions-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/functions-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/functions.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/functions.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/functions.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/functions.macho


--------------------------------------------------------------------------------
/tests/binaries/x86_64/functions.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/functions.pe


--------------------------------------------------------------------------------
/tests/binaries/x86_64/hello-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/hello-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/hello.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/hello.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/hello.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/hello.macho


--------------------------------------------------------------------------------
/tests/binaries/x86_64/hello.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/hello.pe


--------------------------------------------------------------------------------
/tests/binaries/x86_64/switch-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/switch-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/switch.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/switch.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/switch.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/switch.macho


--------------------------------------------------------------------------------
/tests/binaries/x86_64/switch.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/switch.pe


--------------------------------------------------------------------------------
/tests/binaries/x86_64/test2-static.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/test2-static.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/test2.elf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/test2.elf


--------------------------------------------------------------------------------
/tests/binaries/x86_64/test2.macho:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/test2.macho


--------------------------------------------------------------------------------
/tests/binaries/x86_64/test2.pe:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/osirislab/dispatch/c73765c0a2586cc6ef388eb43ba929bcb6c06b24/tests/binaries/x86_64/test2.pe


--------------------------------------------------------------------------------
/tests/test_analysis.py:
--------------------------------------------------------------------------------
 1 | import dispatch
 2 | import logging,glob
 3 | 
 4 | logging.basicConfig(level=logging.INFO)
 5 | binary_types = ['macho', 'elf', 'pe']
 6 | for bin_type in binary_types:
 7 |     print("~~ Testing binary type: {} ~~".format(bin_type))
 8 |     for f in glob.glob('binaries/*/*.{}'.format(bin_type)):
 9 |         print("Testing {}...".format(f))
10 |         executable = dispatch.read_executable(f)
11 |         executable.analyze()
12 |         executable.analyzer.cfg()
13 |         print("Passed {}!".format(f))
14 |         print('')
15 |     print('')
16 | 


--------------------------------------------------------------------------------
/tests/test_injection.py:
--------------------------------------------------------------------------------
 1 | import dispatch
 2 | 
 3 | import logging, sys
 4 | 
 5 | if len(sys.argv) != 3:
 6 |     print "Usage: {} input_binary output_binary".format(sys.argv[0])
 7 |     sys.exit(1)
 8 | 
 9 | 
10 | logging.basicConfig(level=logging.DEBUG)
11 | 
12 | # Load in the executable with read_executable (pass filename)
13 | executable = dispatch.read_executable(sys.argv[1])
14 | 
15 | # Invoke the analyzer to find functions
16 | executable.analyze()
17 | 
18 | # Prepare the executable for code injection
19 | executable.prepare_for_injection()
20 | 
21 | instrumentation = '\xcc\xc3' # Sample x86 instrumentation - INT 3 (SIGTRAP), RET
22 | instrumentation_vaddr = executable.inject(instrumentation)
23 | logging.debug('Injected instrumentation asm at {}'.format(hex(instrumentation_vaddr)))
24 | 
25 | for function in executable.iter_functions():
26 |     replaced_instruction = None
27 |     for instruction in function.instructions:
28 |         if instruction.size >= 5 \
29 |                 and not instruction.redirects_flow() \
30 |                 and not instruction.references_sp() \
31 |                 and not instruction.references_ip():
32 |             logging.debug('In {} - Found candidate replacement instruction at {}: {} {}'
33 |                           .format(function, hex(instruction.address), instruction.mnemonic, instruction.op_str()))
34 | 
35 |             replaced_instruction = instruction
36 |             break
37 | 
38 |     if not replaced_instruction:
39 |         logging.warning('Could not find instruction to replace in {}'.format(function))
40 |     else:
41 |         # Given a candidate instruction, replace it with a call to a new "function" that contains just that one
42 |         # instruction and a jmp to the instrumentation code.
43 | 
44 |         hook_addr = executable.hook(replaced_instruction.address, 'jmp {}'.format(instrumentation_vaddr))
45 |         logging.info('Replaced instruction at address {} to call hook at {}'.format(hex(replaced_instruction.address),
46 |                                                                                     hex(hook_addr)))
47 | 
48 | executable.save(sys.argv[2])
49 | 


--------------------------------------------------------------------------------