├── LICENSE ├── README.md ├── amnesia.py ├── cortex_m_firmware.py └── reobjc.py /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Duo Security, Inc. All rights reserved. 2 | 3 | Redistribution and use in source and binary forms, with or without 4 | modification, are permitted provided that the following conditions 5 | are met: 6 | 7 | 1. Redistributions of source code must retain the above copyright 8 | notice, this list of conditions and the following disclaimer. 9 | 2. Redistributions in binary form must reproduce the above copyright 10 | notice, this list of conditions and the following disclaimer in the 11 | documentation and/or other materials provided with the distribution. 12 | 3. Neither the name of the copyright holder nor the names of its 13 | contributors may be used to endorse or promote products derived from 14 | this software without specific prior written permission. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS 17 | IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 18 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 19 | PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR 20 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 21 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 22 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 23 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 24 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 25 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 26 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 27 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Duo Labs IDAPython Repository 2 | 3 | This IDAPython repository contains a few Python modules developed for use with IDA Pro from the researchers at Duo Labs. There are currently two modules being released. These modules are discussed on the [Duo Security blog](https://duo.com/blog/examining-personal-protection-devices-hardware-and-firmware-research-methodology-in-action) and in the associated paper [Examining Personal Protection Devices 4 | Hardware & Firmware Research Methodology in Action](https://duo.com/assets/ebooks/Duo-Labs-Personal-Protection-Devices.pdf). 5 | 6 | We also wish to thank two contributors that discussed ARM code detection heuristics during the development of this code: Luis Miras and Josh Mitchell. 7 | 8 | ### Cortex M Firmware (cortex_m_firmware.py) 9 | This Cortex M Firmware module grooms an IDA Pro database containing firmware from an ARM Cortex M microcontroller. This module will annotate the firmware vector table, which contains a number of function pointers. This vector table annotation will cause IDA Pro to perform auto analysis against the functions these pointers point to. The Cortex M Firmware module also calls into the Amnesia module to automate discovery of additional code in the firmware image using the Amnesia heuristics. 10 | 11 | This example shows the most common usage of the code, for loading firmware images with the vector table located at address 0x0: 12 | ```python 13 | from cortex_m_firmware import * 14 | cortex = CortexMFirmware(auto=True) 15 | ``` 16 | 17 | This example shows how to annotate multiple vector tables in a firmware: 18 | ```python 19 | from cortex_m_firmware import * 20 | cortex = CortexMFirmware() 21 | cortex.annotate_vector_table(0x4000) 22 | cortex.annotate_vector_table(0x10000) 23 | cortex.find_functions() 24 | ``` 25 | 26 | ### Amnesia (amnesia.py) 27 | Amnesia is an IDAPython module designed to use byte level heuristics to find ARM thumb instructions in undefined bytes in an IDA Pro database. Currently, the heuristics in this module find code in a few different ways. Some instructions identify and define new code by looking for comon byte sequences that correspond to particular ARM opcodes. Other functions in this module define new functions based on sequences of defined instructions. 28 | 29 | ```python 30 | class Amnesia: 31 | def find_function_epilogue_bxlr(self, makecode=False) 32 | def find_pushpop_registers_thumb(self, makecode=False) 33 | def find_pushpop_registers_arm(self, makecode=False) 34 | def make_new_functions_heuristic_push_regs(self, makefunction=False) 35 | def nonfunction_first_instruction_heuristic(self, makefunction=False) 36 | ``` 37 | 38 | ### REobjc (reobjc.py) 39 | REobjc is an IDAPython module designed to make proper cross references between calling functions and called functions in Objective-C methods. The current form of the module supports X64, and will be updated to also support ARM in the future. 40 | 41 | ```python 42 | idaapi.require("reobjc") 43 | r = reobjc.REobjc(autorun=True) 44 | ``` 45 | 46 | The module is described in detail in the Duo blog post [Reversing Objective-C Binaries With the REobjc Module for IDA Pro](https://duo.com/blog/reversing-objective-c-binaries-with-the-reobjc-module-for-ida-pro). 47 | 48 | 49 | -------------------------------------------------------------------------------- /amnesia.py: -------------------------------------------------------------------------------- 1 | import idc 2 | import ida_bytes 3 | import ida_funcs 4 | import ida_search 5 | import idautils 6 | 7 | import re 8 | 9 | class Amnesia: 10 | ''' 11 | Filename: amnesia.py 12 | Description: IDA Python module for finding code in ARM binaries. 13 | Contributors: tmanning@duo.com, luis@ringzero.net, jmitch 14 | 15 | Notes: 16 | ------ 17 | This code currently focuses more on Thumb detection. 18 | Lots more work to do here on ARM and Thumb detection. 19 | For ARM Cortex, this code works pretty well. It also 20 | gave some good results with ARM Mach-o binaries. 21 | 22 | This code will undergo continued development. Development might break scripts. 23 | ''' 24 | 25 | printflag = False 26 | 27 | def find_function_epilogue_bxlr(self, makecode=False): 28 | ''' 29 | Find opcode bytes corresponding to BX LR. 30 | This is a common way to return from a function call. 31 | Using the IDA API, convert these opcodes to code. This kicks off IDA analysis. 32 | ''' 33 | EAstart = idc.MinEA() 34 | EAend = idc.MaxEA() 35 | 36 | ea = EAstart 37 | length = 2 # this code isn't tolerant to values other than 2 right now 38 | 39 | fmt_string = "Possible BX LR 0x%08x == " 40 | for i in range(length): 41 | fmt_string += "%02x " 42 | 43 | while ea < EAend: 44 | instructions = [] 45 | for i in range(length): 46 | instructions.append(idc.Byte(ea + i)) 47 | 48 | if not ida_bytes.isCode(ida_bytes.getFlags(ea)) and instructions[0] == 0x70 and instructions[1] == 0x47: 49 | if self.printflag: 50 | print fmt_string % (ea, instructions[0], instructions[1]) 51 | if makecode: 52 | idc.MakeCode(ea) 53 | ea = ea + length 54 | 55 | def find_pushpop_registers_thumb(self, makecode=False): 56 | ''' 57 | Look for opcodes that push registers onto the stack, which are indicators of function prologues. 58 | Using the IDA API, convert these opcodes to code. This kicks off IDA analysis. 59 | ''' 60 | 61 | ''' 62 | thumb register list from luis@ringzero.net 63 | ''' 64 | 65 | thumb_reg_list = [0x00, 0x02, 0x08, 0x0b, 0x0e, 0x10, 0x1c, 0x1f, 0x30, 0x30, 0x38, 0x3e, 0x4e, 66 | 0x55, 0x70, 0x72, 0x73, 0x7c, 0x7f, 0x80, 0x90, 0xb0, 0xf0, 0xf3, 0xf7, 0xf8, 0xfe, 0xff] 67 | 68 | EAstart = idc.MinEA() 69 | EAend = idc.MaxEA() 70 | 71 | ea = EAstart 72 | length = 2 # this code isn't tolerant to values other than 2 right now 73 | 74 | fmt_string = "Possible Function 0x%08x == " 75 | for i in range(length): 76 | fmt_string += "%02x " 77 | 78 | while ea < EAend: 79 | instructions = [] 80 | for i in range(length): 81 | instructions.append(idc.Byte(ea + i)) 82 | 83 | if not ida_bytes.isCode(ida_bytes.getFlags(ea)) and instructions[0] in thumb_reg_list and (instructions[1] == 0xb5 or instructions[1]== 0xbd): 84 | if self.printflag: 85 | print fmt_string % (ea, instructions[0], instructions[1]) 86 | if makecode: 87 | idc.MakeCode(ea) 88 | ea = ea + length 89 | 90 | def find_pushpop_registers_arm(self, makecode=False): 91 | ''' 92 | Find opcodes for PUSH/POP registers in ARM mode 93 | Using the IDA API, convert these opcodes to code. This kicks off IDA analysis. 94 | 95 | bigup jmitch 96 | ** ** 2d e9 and ** ** bd e8 97 | ''' 98 | 99 | EAstart = idc.MinEA() 100 | EAend = idc.MaxEA() 101 | 102 | ea = EAstart 103 | length = 2 # this code isn't tolerant to values other than 2 right now 104 | 105 | fmt_string = "Possible %s {REGS} 0x%08x == " 106 | for i in range(length): 107 | fmt_string += "%02x " 108 | 109 | while ea < EAend: 110 | instructions = [] 111 | for i in range(length): 112 | instructions.append(idc.Byte(ea + i)) 113 | 114 | # print BX LR bytes 115 | if not ida_bytes.isCode(ida_bytes.getFlags(ea)) and \ 116 | (instructions[0] == 0xbd and instructions[1] == 0xe8): 117 | if self.printflag: 118 | print fmt_string % ("POP ", ea, instructions[0], instructions[1]) 119 | if makecode: 120 | idc.MakeCode(ea) 121 | 122 | if not ida_bytes.isCode(ida_bytes.getFlags(ea)) and \ 123 | (instructions[0] == 0x2d and instructions[1] == 0xe9) \ 124 | : 125 | if self.printflag: 126 | print fmt_string % ("PUSH", ea, instructions[0], instructions[1]) 127 | if makecode: 128 | idc.MakeCode(ea) 129 | ea = ea + length 130 | 131 | def make_new_functions_heuristic_push_regs(self, makefunction=False): 132 | ''' 133 | After converting bytes to instructions, Look for PUSH instructions that are likely the beginning of functions. 134 | Convert these code areas to functions. 135 | ''' 136 | EAstart = idc.MinEA() 137 | EAend = idc.MaxEA() 138 | ea = EAstart 139 | 140 | while ea < EAend: 141 | if self.printflag: 142 | print "EA %08x" % ea 143 | 144 | ea_function_start = idc.GetFunctionAttr(ea, idc.FUNCATTR_START) 145 | 146 | # If ea is inside a defined function, skip to end of function 147 | if ea_function_start != idc.BADADDR: 148 | ea = idc.FindFuncEnd(ea) 149 | continue 150 | 151 | # If current ea is code 152 | if ida_bytes.isCode(ida_bytes.getFlags(ea)): 153 | # Looking for prologues that do PUSH {register/s} 154 | mnem = idc.GetMnem(ea) 155 | 156 | # 157 | if ( 158 | mnem == "PUSH" 159 | ): 160 | if makefunction: 161 | if self.printflag: 162 | print "Converting code to function @ %08x" % ea 163 | idc.MakeFunction(ea) 164 | 165 | eanewfunction = idc.FindFuncEnd(ea) 166 | if eanewfunction != idc.BADADDR: 167 | ea = eanewfunction 168 | continue 169 | 170 | nextcode = ida_search.find_code(ea, idc.SEARCH_DOWN) 171 | 172 | if nextcode != idc.BADADDR: 173 | ea = nextcode 174 | else: 175 | ea += 1 176 | 177 | def nonfunction_first_instruction_heuristic(self, makefunction=False): 178 | EAstart = idc.MinEA() 179 | EAend = idc.MaxEA() 180 | ea = EAstart 181 | 182 | flag_code_outside_function = False 183 | self.printflag = False 184 | 185 | while ea < EAend: 186 | 187 | # skip functions, next instruction will be the target to inspect 188 | function_name = idc.GetFunctionName(ea) 189 | if function_name != "": 190 | 191 | flag_code_outside_function = False 192 | 193 | # skip to end of function and keep going 194 | # ea = idc.FindFuncEnd(ea) 195 | #if self.printflag: 196 | # print "Skipping function %s" % (function_name) 197 | 198 | ea = ida_search.find_not_func(ea, 1) 199 | continue 200 | 201 | elif ida_bytes.isCode(ida_bytes.getFlags(ea)): 202 | 203 | # code that is not a function 204 | # get mnemonic to see if this is a push 205 | mnem = idc.GetMnem(ea) 206 | 207 | if makefunction and (mnem == "PUSH" or mnem == "PUSH.W" or mnem == "STM" or mnem=="MOV"): 208 | if self.printflag: 209 | print "nonfunction_first_instruction_heuristic() making function %08x" % ea 210 | idc.MakeFunction(ea) 211 | flag_code_outside_function = False 212 | ea =ida_search.find_not_func(ea, 1) 213 | continue 214 | 215 | else: 216 | if self.printflag: 217 | print "nonfunction_first_instruction_heuristic() other instruction %08x\t'%s'" % (ea, mnem) 218 | ea = idc.NextFunction(ea) 219 | continue 220 | 221 | ea += 1 222 | 223 | -------------------------------------------------------------------------------- /cortex_m_firmware.py: -------------------------------------------------------------------------------- 1 | import idc 2 | import ida_bytes 3 | import ida_funcs 4 | import ida_name 5 | import idaapi 6 | 7 | from amnesia import Amnesia 8 | 9 | class CortexMFirmware: 10 | ''' 11 | Filename: cortex_m_firmware.py 12 | Description: IDA Python module for loading ARM Cortex M firmware. 13 | Contributors: tmanning@duo.com 14 | 15 | Example IDA commandline usage: 16 | ------------------------------ 17 | from cortex_m_firmware import * 18 | cortex = CortexMFirmware(auto=True) 19 | 20 | The vtoffset parameter is used by annotate_vector_table(): 21 | ---------------------------------------------------------- 22 | from cortex_m_firmware import * 23 | cortex = CortexMFirmware() 24 | cortex.annotate_vector_table() 25 | cortex.annotate_vector_table(0x10000) 26 | cortex.find_functions() 27 | 28 | Notes: 29 | ------ 30 | This code will undergo continued development. Development might break scripts. 31 | ''' 32 | 33 | def __init__(self, auto=False): 34 | ''' 35 | 36 | Vector table offset is passed in at instantiation. 37 | Multiple vector tables can exist in a flash image 38 | 39 | 32 is the max number of irqs allowed in M0 40 | 48 items in self.annotations 41 | 42 | This class will likely change and potentially break scripts. 43 | It's probably a good idea to backup your IDB file before trying this out. 44 | 45 | ''' 46 | 47 | self.auto = auto 48 | 49 | self.annotations = [ 50 | "arm_initial_sp", 51 | "arm_reset", 52 | "arm_nmi", 53 | "arm_hard_fault", 54 | "arm_mm_fault", 55 | "arm_bus_fault", 56 | "arm_usage_fault", 57 | "arm_reserved", "arm_reserved", "arm_reserved", "arm_reserved", 58 | "arm_svcall", 59 | "arm_reserved_debug", "arm_reserved", 60 | "arm_pendsv", 61 | "arm_systick", 62 | "arm_irq_0", "arm_irq_1", "arm_irq_2", "arm_irq_3", 63 | "arm_irq_4", "arm_irq_5", "arm_irq_6", "arm_irq_7", 64 | "arm_irq_8", "arm_irq_9", "arm_irq_10", "arm_irq_11", 65 | "arm_irq_12", "arm_irq_13", "arm_irq_14", "arm_irq_15", 66 | "arm_irq_16", "arm_irq_17", "arm_irq_18", "arm_irq_19", 67 | "arm_irq_20", "arm_irq_21", "arm_irq_22", "arm_irq_23", 68 | "arm_irq_24", "arm_irq_25", "arm_irq_26", "arm_irq_27", 69 | "arm_irq_28", "arm_irq_29", "arm_irq_30", "arm_irq_31", 70 | ] 71 | 72 | if not self.verify_processor_settings(): 73 | print "ERROR: Processor architecture is incorrect" 74 | print "Please set processor type to ARM, and ARM architecture options to ARMv7-M (or other valid Cortex architecture)" 75 | return None 76 | 77 | 78 | if self.auto: 79 | self.annotate_vector_table() 80 | self.find_functions() 81 | 82 | def verify_processor_settings(self): 83 | ''' 84 | The intent here is to validate the processor has been set to ARM. 85 | In a better world, I would be able to check the processor sub options. 86 | In a perfect world, I could set these myself, or would know the IDA APIs a little better. 87 | ''' 88 | info = idaapi.get_inf_structure() 89 | return info.procName=="ARM" 90 | 91 | def annotate_vector_table(self, vtoffset=0x0000000000): 92 | ''' 93 | Name the vector table entries according to docs: 94 | http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/BABIFJFG.html 95 | 96 | Vector tables can appear in mulitple places in device flash 97 | Functions are not renamed because multiple vectors might point to a single function 98 | Append the address of the VT entry to the name from self.annotations to keep unique names 99 | 100 | ''' 101 | 102 | for annotation_index in range(len(self.annotations)): 103 | entry_addr = vtoffset + 4 * annotation_index 104 | entry_name = "%s_%08x" % (self.annotations[annotation_index], entry_addr) 105 | 106 | idc.MakeDword(entry_addr) 107 | ida_name.set_name(entry_addr, entry_name, 0) 108 | 109 | # get the bytes of the vt entry 110 | dword = idc.Dword(entry_addr) 111 | 112 | if dword != 0: 113 | # print "ea %08x = 0x%08x" % (ea, dword) 114 | idc.MakeCode(dword-1) 115 | idc.MakeFunction(dword-1) 116 | # TODO fix the offsets created here 117 | # for thumb, they show to be off by a byte 118 | # one of the end args controls stuff about this 119 | idc.OpOffEx(entry_addr,0,idaapi.REF_OFF32, -1, 0, 0) 120 | 121 | instruction = idc.Word(dword-1) 122 | 123 | # functions like this are common 124 | if instruction == 0xe7fe: 125 | idc.SetFunctionCmt(dword-1, 'Infinite Loop', 1) 126 | 127 | 128 | def find_functions(self): 129 | ''' 130 | Using the Amnesia IDA Python module, find ARM code and create functions 131 | ''' 132 | 133 | a = Amnesia() 134 | a.find_pushpop_registers_thumb(makecode=True) 135 | a.find_pushpop_registers_arm(makecode=True) 136 | a.find_function_epilogue_bxlr(makecode=True) 137 | a.make_new_functions_heuristic_push_regs(makefunction=True) 138 | a.nonfunction_first_instruction_heuristic(makefunction=True) 139 | 140 | -------------------------------------------------------------------------------- /reobjc.py: -------------------------------------------------------------------------------- 1 | import idc 2 | import idaapi 3 | import ida_bytes 4 | import ida_funcs 5 | import ida_search 6 | import ida_struct 7 | import ida_typeinf 8 | import idautils 9 | import ida_ua 10 | import re 11 | import sys 12 | import traceback 13 | 14 | class REobjc: 15 | ''' 16 | Todd Manning 17 | tmanning@duo.com 18 | https://duo.com/blog/reversing-objective-c-binaries-with-the-reobjc-module-for-ida-pro 19 | 20 | Code to assist in reverse engineering MacOS Objective C binaries. 21 | Currently this code is Intel x64 specific, and doesn't handle ARM/iOS. 22 | 23 | New cross references are made to Objective C methods located in the binary. 24 | 25 | ''' 26 | 27 | 28 | def __init__(self, autorun=False): 29 | # 30 | self.ea = None 31 | self.printflag = False 32 | self.verboseflag = False 33 | self.debugflag = False 34 | self.printxrefs = True 35 | self.target_objc_msgsend = [] 36 | self._locate_objc_runtime_functions() 37 | if autorun: 38 | self.run() 39 | return None 40 | 41 | def _locate_objc_runtime_functions(self): 42 | ''' 43 | Find the references to 44 | id objc_msgSend(id self, SEL op, ...); 45 | This is the target of all calls and jmps for ObjC calls. 46 | 47 | RDI == self 48 | RSI == selector 49 | X86/64 args: RDI, RSI, RDX, RCX, R8, R9 ... 50 | 51 | This function populates self.target_objc_msgsend with the intention of 52 | using this array in other functions to find indirect calls to the various 53 | ways objc_msgsend is referenced in binaries. 54 | 55 | The negative_reg variable below is blank, but is included in case some functions need to be excluded... 56 | 57 | TODO: Handle all other objective c runtime functions, not just objc_msgsend 58 | TODO: generalize to all architectures 59 | TODO: check that the matched names are in the proper mach-o sections based on the address in the tuple 60 | ''' 61 | positive_reg = re.compile('.*_objc_msgsend', re.IGNORECASE) 62 | negative_reg = re.compile('^$', re.IGNORECASE) 63 | 64 | if self.printflag: print "Finding Objective C runtime functions..." 65 | 66 | for name_tuple in idautils.Names(): # returns a tuple (address, name) 67 | addr, name = name_tuple 68 | if positive_reg.match(name) and not negative_reg.match(name): 69 | if self.printflag: print "0x%08x\t%s" % (addr, name) 70 | self.target_objc_msgsend.append(name_tuple) 71 | 72 | return None 73 | 74 | def lookup_objc_runtime_function(self, fname): 75 | ''' 76 | Find a matching function (address,name) tuple in self.target_objc_msgsend 77 | Sometimes this name can have a register prepended to it, as with 'cs:selRef_setObject_forKey_' 78 | ''' 79 | 80 | register_reg = re.compile('.*:') 81 | if register_reg.match(fname): 82 | fname = re.sub(register_reg, '', fname) 83 | 84 | function_ea = idc.get_name_ea_simple(fname) 85 | 86 | # fname is found 87 | if function_ea != idc.BADADDR: 88 | if self.debugflag: print "Looking for function %s" % fname 89 | 90 | # iterate over objc runtime functions 91 | for name_tuple in self.target_objc_msgsend: 92 | addr, name = name_tuple 93 | if fname == name: 94 | if self.debugflag: print "Found match: 0x%08x\t%s" % (addr, name) 95 | return name_tuple 96 | 97 | return None 98 | 99 | 100 | def objc_msgsend_xref(self, call_ea, objc_self, objc_selector, create_xref = True): 101 | ''' 102 | This function will create a code xref to an objc method 103 | 104 | call_ea : location of call/jmp objc_msgsend (regardless of direct/indirect) 105 | objc_self: ea where RDI is set to static value (or that we find it's from a previous call or the RDI of the current function) 106 | objc_selector: ea where RSI is set to static value 107 | 108 | This ignores the RDI register, which is the `self` argument to objc_msgsend() 109 | id objc_msgSend(id self, SEL op, ...); 110 | So far, this seems to be fine as far as the cross-references are concerned. 111 | 112 | ''' 113 | 114 | # get instruction mnemonic at address - I guess to check and make sure 115 | # it's mov rsi, blah 116 | instruction = idc.GetDisasm(objc_selector) 117 | if self.debugflag: print ">>> objc_msgsend_xref 0x%08x %s" % (objc_selector, instruction) 118 | 119 | # get outbound references in the appropriate segment 120 | # implicit assumption is there is exacltly one 121 | target_selref = None 122 | for _ref in idautils.DataRefsFrom(objc_selector): 123 | if idc.SegName(_ref) == "__objc_selrefs": 124 | target_selref = _ref 125 | 126 | if not target_selref: 127 | return False 128 | 129 | # get outbound references in the appropriate segment 130 | # implicit assumption is there is exacltly one 131 | target_methname = None 132 | for _ref in idautils.DataRefsFrom(target_selref): 133 | if idc.SegName(_ref) == "__objc_methname": 134 | target_methname = _ref 135 | 136 | if not target_methname: 137 | return False 138 | 139 | # get inbound references 140 | # __objc_const 141 | # must be a __objc2_meth 142 | # I hope this method is correct to find __objc2_meth structs 143 | # BUG: when the binary has mutiple objc methods by the same name, this logic fails 144 | # Track RDI register. have to figure out what instance/class is referenced 145 | objc2_meth_struct_id = ida_struct.get_struc_id("__objc2_meth") 146 | meth_struct_found = False 147 | target_method = None 148 | for _ref in idautils.DataRefsTo(target_methname): 149 | # multiple may match 150 | # we care about the __obj2_meth struct found in references 151 | if idc.SegName(_ref) == "__objc_const": 152 | # check the outbound references 153 | for _meth_ref in idautils.DataRefsFrom(_ref): 154 | if _meth_ref == objc2_meth_struct_id: 155 | meth_struct_found = True 156 | 157 | if meth_struct_found: 158 | # only do this once 159 | # TODO: check against RDI here to make sure it's the proper class 160 | # meth_struct_found = False 161 | 162 | for _meth_ref in idautils.DataRefsFrom(_ref): 163 | # assumption made on function always being in text segment 164 | if idc.SegName(_meth_ref) == "__text": 165 | # save the method implementation -- this is the function ptr 166 | if self.debugflag: print "0x%08x checking for the proper method -- %s" % (_meth_ref, idc.get_name(idc.get_func_attr(_meth_ref, idc.FUNCATTR_START))) 167 | target_method = _meth_ref 168 | 169 | if not target_method: 170 | return False 171 | 172 | # After dereferencing across the IDB file, we finally have a target function. 173 | # In other words, if there isn't a method **in this binary** no xref is made (IDA only loads one binary?) 174 | # that is referenced from the mov rsi, instruction 175 | if self.debugflag: print "Found target method 0x%08x" % target_method 176 | if create_xref: idc.AddCodeXref(objc_selector, target_method, idc.fl_CF) 177 | 178 | return True 179 | 180 | 181 | def run(self): 182 | ''' 183 | This method will iterate over each function 184 | ''' 185 | for f in idautils.Functions(): 186 | 187 | f_start = f # idc.get_func_attr(f, idc.FUNCATTR_START) 188 | f_end = idc.get_func_attr(f, idc.FUNCATTR_END) 189 | 190 | try: 191 | self.find_objc_calls(f) 192 | except Exception as e: 193 | fname = idc.get_name(idc.get_func_attr(f, idc.FUNCATTR_START)) 194 | print "\n\n[!!] Exception processing function %s: %s @ ea = 0x%08x (%dL)" % (fname, e, self.ea, self.ea) 195 | traceback.print_exc() 196 | print "\n\n" 197 | 198 | 199 | 200 | # f is an address in a function 201 | # done so there's not a requirement for f to be the start of a function. 202 | # f = ScreenEA() 203 | def find_objc_calls(self, f): 204 | 205 | f_start = idc.get_func_attr(f, idc.FUNCATTR_START) 206 | f_end = idc.get_func_attr(f, idc.FUNCATTR_END) 207 | 208 | for ea in idautils.Heads(f_start, f_end): 209 | if self.debugflag: print "0x%08x '%s'" % (ea, idc.GetMnem(ea)) 210 | 211 | objc_selector = None 212 | objc_selector_ea = None 213 | 214 | # TODO ARM branching (B, BL, BX, BLX) 215 | # TODO ARM registers (R0..R7) 216 | if idc.GetMnem(ea) == "call" or idc.GetMnem(ea) == "jmp": 217 | 218 | # global tracking of ea (only in this loop) to cite when exceptions are caught 219 | self.ea = ea 220 | 221 | call_ea = ea 222 | call_operand = idc.GetOpnd(call_ea, 0) 223 | call_type = None 224 | call_target = None 225 | 226 | # is this a CALL or CALL 227 | # call_target is the address of the function being called 228 | # for indirect calls, resolve the register into a value 229 | # for direct calls, pull the value from the first operand 230 | if idc.get_operand_type(call_ea,0) == ida_ua.o_reg: 231 | call_type = "indirect" 232 | target_register = call_operand 233 | call_target_dict = self.resolve_register_backwalk_ea(call_ea, target_register) 234 | if call_target_dict: 235 | call_target = call_target_dict["value"] 236 | else: 237 | call_type = "direct" 238 | call_target = call_operand 239 | 240 | # check the list of functions from the objc runtime 241 | # call_target should be validated here, in case something fails with resolve_register_backwalk_ea() 242 | if call_target and self.lookup_objc_runtime_function(call_target): 243 | if self.debugflag: print "%s call, operand_type == %s" % (call_type, idc.get_operand_type(call_ea,0)) 244 | 245 | # get the argument values at the call 246 | # id objc_msgSend(id self, SEL op, ...); 247 | # inefficient to get all these if they're not needed 248 | # returns dict 249 | # TODO Eliminate hardcoded x64 250 | objc_self = rdi = self.resolve_register_backwalk_ea(call_ea, "rdi") 251 | objc_selector = rsi = self.resolve_register_backwalk_ea(call_ea, "rsi") 252 | arg1_selector = rdx = self.resolve_register_backwalk_ea(call_ea, "rdx") 253 | arg2_selector = rcx = self.resolve_register_backwalk_ea(call_ea, "rcx") 254 | arg3_selector = r8 = self.resolve_register_backwalk_ea(call_ea, "r8") 255 | arg4_selector = r9 = self.resolve_register_backwalk_ea(call_ea, "r9") 256 | 257 | # RDI is the self pointer 258 | # if RDI used in objc_msgsend is the same value passed into this function, 259 | # resolve_register_backwalk_ea will return {value: rdi...} 260 | # RDI is self, so figure out which class that self is 261 | # This code presumes that resolve_register_backwalk_ea() will find *some value* for RDI... 262 | # what if RDI is None? If the method is actually a selector, we can presume RDI is self, and pull 263 | # the class from the type of the first argument to the current function 264 | # TODO add a check for ``not rdi`` here 265 | if not objc_self: 266 | # if RDI is None, that means the call is being sent to self 267 | # the first objc call on self wouldn't need to set an RDI value (because it was already set) 268 | 269 | # get the class from the function parameter type 270 | objc_class = self.resolve_objc_self_to_class(call_ea) 271 | 272 | # create a faked RDI dict 273 | # _faked_ key set here to differentiate from a 'real' RDI dict 274 | # TODO eliminate hardcoded x64 275 | objc_self = rdi = {"_faked_" : True, "target" : "rdi", "value" : "rdi", "target_ea" : f_start, "ea" : call_ea, "type" : -1} 276 | 277 | # TODO eliminate hardcoded x64 278 | if objc_self and objc_self['value'] == 'rdi': 279 | objc_class = self.resolve_objc_self_to_class(objc_self['target_ea']) 280 | if self.debugflag: print "### objc_class == %s" % objc_class 281 | 282 | # objc_selector: address of instruction mov rsi, 283 | if objc_self and objc_selector: 284 | xref_created = self.objc_msgsend_xref(call_ea, objc_self['target_ea'], objc_selector['target_ea']) 285 | if self.printxrefs and xref_created: print "0x%08x Creating xref: %s %s, %s" % (ea, idc.GetMnem(ea), idc.GetOpnd(ea, 0), objc_selector['value']) 286 | 287 | 288 | 289 | def resolve_register_backwalk_ea(self, ea, target_dest_reg): 290 | ''' 291 | Starting at ea, walk backward and locate the first instruction assigning a value to the target_register 292 | Keep walking backward, tracking the ultimate source to some value (register, variable, memory, etc) 293 | 294 | Assumption: ea is inside a function 295 | 296 | In some cases, target_dest_reg might resolve to a register used as a function argument (RDI, RSI, RDX, RCX, R8, R9) 297 | Callers will have to do something about that. Inferring the type and/or instance from the argument registers above 298 | 299 | Return: dict with keys ea, value 300 | 301 | Issues: 302 | 1 RAX tracking should track through as many calls as necessary. Right now, if multiple calls are made on an object 303 | (which is common in objc - alloc, init, ), this code doesn't handle properly. 304 | Essentially the class used in alloc results in rax of the instance, 305 | and that RAX returned eventually is the RDI in the call to 306 | Since this code returns previous_call in the returned dict, you could recursively call resolve_register_backwalk_ea 307 | using the previous_call EA until RDI is not from a previous call 308 | 2 register tracking breaks when referencing registers of different bitwidths e.g. RAX/EAX/AL 309 | 3 when ea points to a CALL , the value of isn't found. Workaround by passing ea-1 310 | 4 Doesn't handle backwalking basic blocks that have multiple incoming edges. 311 | In reality, there are cases where the target may have any number of values. 312 | Handle this by maybe showing all the values... ugh yuk 313 | For the case of objc methods, I haven't seen this matter much yet, but backtracing e.g. locals is affected more often 314 | 4 LEA. Add instructions for Arm architecture 315 | 5 Proper checking against RDI -- this fixes the issues around multiple subclasses of a common parent sharing method names 316 | ''' 317 | 318 | f_start = idc.get_func_attr(ea, idc.FUNCATTR_START) 319 | f_end = idc.get_func_attr(ea, idc.FUNCATTR_END) 320 | 321 | curr_ea = ea 322 | dst = None 323 | src = None 324 | 325 | target = target_dest_reg 326 | target_value = None 327 | target_ea = idc.BADADDR 328 | target_type = None 329 | previous_call = None 330 | 331 | ret_dict = {} 332 | 333 | # adjustment for issue 3 above 334 | # TODO Eliminate hardcoded x64 335 | if idc.GetMnem(curr_ea) == "call" and idc.GetOpnd(curr_ea, 0) == target_dest_reg: 336 | curr_ea = idc.prev_head(curr_ea-1, f_start) 337 | 338 | 339 | while curr_ea != idc.BADADDR: 340 | instruction = idc.GetDisasm(curr_ea) 341 | 342 | if self.debugflag: print "0x%08x %s" % (curr_ea, instruction) 343 | 344 | # looking for the previous place this register was assigned a value 345 | mnem = idc.GetMnem(curr_ea) 346 | dst = idc.GetOpnd(curr_ea, 0) 347 | src = idc.GetOpnd(curr_ea, 1) 348 | 349 | # X64 specific 350 | # TODO: generalize to other architectures 351 | if dst == target and (mnem == "mov" or mnem == "lea"): 352 | target = src 353 | target_value = src 354 | target_ea = curr_ea 355 | target_type = idc.get_operand_type(curr_ea,1) 356 | if self.debugflag: print " new target set %s (type=%d)" % (target, idc.get_operand_type(curr_ea,1)) 357 | 358 | # take stab at tracking calls - this is not the greatest approach, but slightly more correct than doing no tracking 359 | # call instruction affects RAX if it returns a result 360 | # 361 | if dst == target == "rax" and mnem == "call": 362 | target_value = "" 363 | previous_call = curr_ea 364 | break 365 | 366 | 367 | # step to previous instruction 368 | curr_ea = idc.prev_head(curr_ea-1, f_start) 369 | 370 | if target_value: 371 | if self.verboseflag: print ">>> 0x%08x, %s is set to %s @ 0x%08x" % (ea, target_dest_reg, target_value, target_ea) 372 | ret_dict = {"target" : target_dest_reg, "value" : target_value, "target_ea" : target_ea, "ea" : ea, "type" : target_type} 373 | 374 | if previous_call: 375 | ret_dict["previous_call"] = previous_call 376 | 377 | return ret_dict 378 | 379 | # fall through if nothing is found 380 | return None 381 | 382 | 383 | def resolve_objc_self_to_class(self, ea): 384 | ''' 385 | Get the objective c class for the current function RDI value 386 | based on the class of the first argument to the current function 387 | ''' 388 | f_start = idc.get_func_attr(ea, idc.FUNCATTR_START) 389 | 390 | tif = ida_typeinf.tinfo_t() 391 | idaapi.get_tinfo2(f_start, tif) 392 | funcdata = idaapi.func_type_data_t() 393 | got_data = tif.get_func_details(funcdata) 394 | 395 | # not happy about casting to a string and then regex replacing... but that's the best I could come up with 396 | if got_data: 397 | replace_reg = re.compile(' \*', re.IGNORECASE) 398 | objc_self_type = funcdata[0].type 399 | return objc_self_type 400 | else: 401 | return None 402 | --------------------------------------------------------------------------------