├── .gitattributes ├── README.md └── enoki.py /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | 4 | # Custom for Visual Studio 5 | *.cs diff=csharp 6 | 7 | # Standard to msysgit 8 | *.doc diff=astextplain 9 | *.DOC diff=astextplain 10 | *.docx diff=astextplain 11 | *.DOCX diff=astextplain 12 | *.dot diff=astextplain 13 | *.DOT diff=astextplain 14 | *.pdf diff=astextplain 15 | *.PDF diff=astextplain 16 | *.rtf diff=astextplain 17 | *.RTF diff=astextplain 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # What is _Enoki_ ? 2 | The _Enoki_ script is a wrapper class for [IDAPython](https://www.hex-rays.com/products/ida/support/idapython_docs/). It regroups various useful functions for reverse engineering of non-standard 3 | and/or uncommon binaries. Many of the scripts currently available online are geared towards malware analysis of Windows [Portable Executable (PE) 4 | files](https://en.wikipedia.org/wiki/Portable_Executable) and as such, most of their functionalities are geared toward Intel-based systems and perform many tasks to detect or 5 | deobfuscate malicious, well-known file standards. _Enoki_ seeks to provide a set of basic functions for analysis of binaries, memory maps 6 | or other non-malware oriented files for reverse engineering purposes. 7 | 8 | ## Summary 9 | 10 | The _Enoki_ script is a wrapper around many IDAPython functions and is designed for analysts conducting reverse engineering on 11 | non-standard and uncommon files such as firmware of embedded devices or simply plain unknown files for ICS systems. _Enoki_ provides 12 | additional shortcut functions for extracting, searching and analyzing machines code, useful when IDA as issue parsing 13 | or detecting the actual processor. 14 | 15 | ## Usage 16 | 17 | To use _Enoki_ with [IDA](https://www.hex-rays.com/products/ida/), simply load the _enoki.py_ file into IDA. An instance of the _Enoki_ object will automatically be created in the ```e``` variable or you can create your own 18 | instance using the following command in the interpreter: 19 | 20 | ``` 21 | e = Enoki() 22 | ``` 23 | 24 | Simply call any of the function required using the instance, for example: 25 | 26 | ``` 27 | Python>hex(e.current_file_offset()) 28 | 0x74fc 29 | ``` 30 | 31 | ## Examples 32 | 33 | This section provides some example of the functionalities provded by the _Enoki_ script. More details can be found by consulting the wiki of the project. 34 | 35 | ### Find a byte string 36 | 37 | One of the function provided by _Enoki_ is the ```find_byte_string```, which allow the analyst to search for specific sequence of bytes or words in the machine 38 | code. The function will return all locations where the specific byte string has been found in the range searched. 39 | 40 | ``` 41 | Python>e.find_byte_string(ScreenEA(), ScreenEA() + 0x1000, "7980 ????") 42 | [150, 155, 173, 198, 208] 43 | ``` 44 | 45 | If you need the output in hexadecimal addresses, simply wrap the result using the ```hex()``` function: 46 | 47 | ``` 48 | Python>[hex(i) for i in e.find_byte_string(ScreenEA(), ScreenEA() + 0x1000, "7980 ????")] 49 | ['0x96', '0x9b', '0xad', '0xc6', '0xd0'] 50 | ``` 51 | 52 | ### Compare two code ranges for similarity 53 | 54 | Another functionality available is to compare the similarity of two code segments via the ```compare_code``` function. This function 55 | will take two arrays of opcodes or assembly instructions and calculate the similarity of the sequence. In the example below, 56 | the similarity is only 11%, meaning the 2 code segments are quite different. 57 | 58 | ``` 59 | Python>c1 = e.get_words_between(0x2C00, 0x2CFF) 60 | Python>c2 = e.get_words_between(0x8000, 0x80FF) 61 | Python>e.compare_code(c1, c2) 62 | 0.11328125 63 | ``` 64 | 65 | Other functions are available within _Enoki_ and more details can be found in the comments of the script or in the future wiki of the project. 66 | 67 | 68 | ## References 69 | 70 | If you find this script useful for your projects or research, please add a reference or link to this project to help make it better. 71 | 72 | - __URL:__ 73 | - [Enoki](https://github.com/InfectedPacket/Idacraft), https://github.com/InfectedPacket/Idacraft 74 | - __Reference (Chicago):__ 75 | - Racicot, Jonathan. 2016. Enoki (version 1.0.2). Windows/Mac/Linux. Ottawa, Canada. 76 | - __Reference (IEEE):__ 77 | - J. Racicot, Enoki. Ottawa, Canada, 2016. 78 | 79 | -------------------------------------------------------------------------------- /enoki.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # Copyright (C) 2015 Jonathan Racicot 3 | # 4 | # This program is free software: you can redistribute it and/or modify 5 | # it under the terms of the GNU General Public License as published by 6 | # the Free Software Foundation, either version 3 of the License, or 7 | # (at your option) any later version. 8 | # 9 | # This program is distributed in the hope that it will be useful, 10 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 11 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 | # GNU General Public License for more details. 13 | # 14 | # You should have received a copy of the GNU General Public License 15 | # along with this program. If not, see . 16 | # 17 | # If you use this program and find it useful, please include a link 18 | # or reference to the project's page in your program and/or document. 19 | # 20 | # Reference (Chicago): 21 | # Racicot, Jonathan. 2016. Enoki (version 1.0.2). Windows/Mac/Linux. Ottawa, Canada. 22 | # Reference (IEEE): 23 | # J. Racicot, Enoki. Ottawa, Canada, 2016. 24 | # 25 | # 26 | # Jonathan Racicot 27 | # infectedpacket@gmail.com 28 | # 2016-01-10 29 | # https://github.com/infectedpacket 30 | #////////////////////////////////////////////////////////////////////////////// 31 | # 32 | #////////////////////////////////////////////////////////////////////////////// 33 | # Imports 34 | #////////////////////////////////////////////////////////////////////////////// 35 | # 36 | import re 37 | import idc 38 | import idaapi 39 | import difflib 40 | import idautils 41 | import logging 42 | # 43 | #////////////////////////////////////////////////////////////////////////////// 44 | # Enoki class 45 | #////////////////////////////////////////////////////////////////////////////// 46 | class Enoki(object): 47 | """ 48 | Description: 49 | Provides wrapping functions around IDAPython to analyze 50 | and format structures for unknown/difficult architectures. 51 | 52 | Notes: 53 | Tested on IDA Pro v6.5 54 | 55 | Author: 56 | Jonathan Racicot 57 | 58 | Date: 59 | Created: 2015-10-14 60 | Updated: 2016-01-10 61 | """ 62 | 63 | VERSION = "1.0.0" 64 | 65 | #Specifies a 16bit segment 66 | SEG_16 = 0 67 | #Specifies a 32bit segment 68 | SEG_32 = 1 69 | #Specifies a 64bit segment 70 | SEG_64 = 2 71 | 72 | #Segment bitness to use when none has been specified. 73 | DEFAULT_SEGMENT_SIZE = SEG_16 74 | 75 | #Specifies a DATA segment 76 | SEG_DATA = "DATA" 77 | #Specifies a CODE segment 78 | SEG_CODE = "CODE" 79 | 80 | SEG_TYPE_CODE = 2 81 | SEG_TYPE_DATA = 3 82 | 83 | #Used for assessing returns from IDA functions calls. 84 | FAIL = 0 85 | SUCCESS = 1 86 | 87 | # Basic colors 88 | RED = 0x0000FF 89 | GREEN = 0x00FF00 90 | BLUE = 0xFF0000 91 | YELLOW = 0x00FFFF 92 | WHITE = 0xFFFFFF 93 | BLACK = 0x000000 94 | CYAN = 0xFFFF00 95 | # Fancy colors 96 | ABSOLUTE_ZERO = 0xBA4800 97 | AFRICAN_VIOLET = 0xBE84B2 98 | ALIZARIN_CRIMSON = 0x3626E3 99 | AMBER = 0x00BFFF 100 | APPLE_GREEN = 0x00B68D 101 | AZURE = 0xFF7F00 102 | BABY_BLUE = 0xF0CF89 103 | BABY_PINK = 0xC2C2F4 104 | BONE = 0xE3DAC9 105 | CADMIUM_ORANGE = 0x2D87ED 106 | CITRINE = 0xE4D00A 107 | CADET_BLUE = 0x5F9EA0 108 | CHAMOISEE = 0xA0785A 109 | 110 | logger = logging.getLogger(__name__) 111 | 112 | def Enoki(self): 113 | """ 114 | Constructor of the Enoki engine. Does nothing. 115 | """ 116 | pass 117 | 118 | def vers(self): 119 | return Enoki.VERSION 120 | 121 | def make_comment(self, _ea, _comment): 122 | """ 123 | Creates a comment at the given address. 124 | 125 | @param _ea: The address where the comment will be created. 126 | @param _comment: The comment 127 | @return Enoki.SUCCESS if the comment as created successfully, 128 | Enoki.FAIL otherwise. 129 | """ 130 | return idc.MakeComm(_ea, _comment) 131 | 132 | def clear_comment(self, _ea): 133 | """ 134 | Removes any comment at the given address. 135 | 136 | @param _ea: The address where the comment will be removed. 137 | @return Enoki.SUCCESS if the comment as created successfully, 138 | Enoki.FAIL otherwise. 139 | """ 140 | return self.make_comment(_ea, "") 141 | 142 | def clear_all_comments(self, _startea, _endea): 143 | """ 144 | Removes all comment between the given addresses. 145 | 146 | @param _startea: The start address. 147 | @param _endea: The end address. 148 | @return Enoki.SUCCESS if all the comments were removed, 149 | Enoki.FAIL otherwise. 150 | """ 151 | if (_startea != BADADDR and _endea != BADADDR): 152 | curea = _startea 153 | error = Enoki.SUCCESS 154 | while (curea < _endea): 155 | r = self.clear_comment(curea) 156 | curea = idc.NextHead(curea) 157 | if (r == Enoki.FAIL): 158 | error = Enoki.FAIL 159 | return error 160 | 161 | def clear_function_comments(self, _funcea): 162 | """ 163 | Removes all comments in the function at the specified address. 164 | 165 | @param _funcea: An address within the function 166 | @return Enoki.SUCCESS if all the comments were removed, 167 | Enoki.FAIL otherwise. 168 | """ 169 | func = self.get_function_at(_funcea) 170 | if (func): 171 | return self.clear_all_comments(func.startEA, func.endEA) 172 | else: 173 | return Enoki.FAIL 174 | 175 | def append_comment(self, _ea, _comment): 176 | """ 177 | Appends a new comment to an instruction at the specified address. 178 | 179 | @param _ea: The address where the comment will be appended. 180 | @param _comment The comment 181 | @return Enoki.SUCCESS if the comment as created successfully, 182 | Enoki.FAIL otherwise. 183 | """ 184 | if (_ea != BADADDR): 185 | cur_comment = Comment(_ea) 186 | if (cur_comment != None and len(cur_comment) > 0): 187 | comment = "{:s}\n{:s}".format(cur_comment, _comment) 188 | else: 189 | comment = _comment 190 | return self.make_comment(_ea, comment) 191 | return Enoki.FAIL 192 | 193 | def make_repeat_comment(self, _ea, _comment): 194 | """ 195 | Creates a repeatable comment at the given address. 196 | 197 | @param _ea: The address where the comment will be created. 198 | @param _comment: The comment 199 | @return IDAEngine.SUCCESS if the comment as created successfully, 200 | IDAEngine.FAIL otherwise. 201 | """ 202 | return idc.MakeRptCmt(_ea, _comment) 203 | 204 | def backup_database(self): 205 | """ 206 | Backup the database to a file similar to 207 | IDA's snapshot function. 208 | """ 209 | time_string = strftime('%Y%m%d%H%M%S') 210 | file = idc.GetInputFile() 211 | if not file: 212 | raise NoInputFileException('No input file provided') 213 | input_file = rsplit(file, '.', 1)[0] 214 | backup_file = "{:s}_{:s}.idb".format(input_file, time_string) 215 | idc.SaveBase(backup_file, idaapi.DBFL_BAK) 216 | 217 | def create_segment(self, _startea, _endea, _name, 218 | _type, _segsize=DEFAULT_SEGMENT_SIZE): 219 | """ 220 | Creates a segment between provided addresses. 221 | 222 | @param _startea: The start address of the segment. 223 | @param _endea: The end address of the segment. 224 | @param _name: Name to be given to the new segment. 225 | @param _type: Either idaapi.SEG_CODE to specified a code 226 | segment or idaapi.SEG_DATA for a segment containing data. 227 | @param _segsize: Bitness of the segment, e.g. 16, 32 or 64 bit. 228 | """ 229 | r = idc.AddSeg(_startea, _endea, 0, _segsize, 1, 2) 230 | if (r == Enoki.SUCCESS): 231 | idc.RenameSeg(_startea, _name) 232 | return idc.SetSegmentType(_startea, _type) 233 | else: 234 | return Enoki.FAIL 235 | 236 | def get_segment(self, _ea): 237 | return idaapi.getseg(_ea) 238 | 239 | def get_segment_type(self, _ea): 240 | return self.get_seg_attribute(_ea, idc.SEGATTR_TYPE) 241 | 242 | def segment_is_code(self, _segea): 243 | return self.get_segment_type(_segea) == self.SEG_TYPE_CODE 244 | 245 | def segment_is_data(self, _segea): 246 | return self.get_segment_type(_segea) == self.SEG_TYPE_DATA 247 | 248 | def get_seg_attribute(self, _segea, _attr): 249 | """ 250 | Sets an attribute to the segment at the given address. The available 251 | attributes are: 252 | SEGATTR_START starting address 253 | SEGATTR_END ending address 254 | SEGATTR_ALIGN alignment 255 | SEGATTR_COMB combination 256 | SEGATTR_PERM permissions 257 | SEGATTR_BITNESS bitness (0: 16, 1: 32, 2: 64 bit segment) 258 | SEGATTR_FLAGS segment flags 259 | SEGATTR_SEL segment selector 260 | SEGATTR_ES default ES value 261 | SEGATTR_CS default CS value 262 | SEGATTR_SS default SS value 263 | SEGATTR_DS default DS value 264 | SEGATTR_FS default FS value 265 | SEGATTR_GS default GS value 266 | SEGATTR_TYPE segment type 267 | SEGATTR_COLOR segment color 268 | @param _segea Address within the segment to be modified. 269 | @param _attr The attribute to change. This is one of the value listed above. 270 | @param _value The value of the attibute. 271 | """ 272 | return idc.GetSegmentAttr(_segea, _attr, _value) 273 | 274 | def create_selector(self, _sel, _value): 275 | return idc.SetSelector(_sel, _value) 276 | 277 | def create_data_segment(self, _startea, _endea, _name, 278 | _segsize=DEFAULT_SEGMENT_SIZE): 279 | """ 280 | Wrapper around the create_segment function to 281 | create a new data segment. 282 | @param _startea: The start address of the segment. 283 | @param _endea: The end address of the segment. 284 | @param _name: Name to be given to the new segment. 285 | @param _segsize: Bitness of the segment, e.g. 16, 32 or 64 bit. 286 | """ 287 | r = self.create_segment(_startea, _endea, _name, idaapi.SEG_DATA, _segsize) 288 | if (r == Enoki.SUCCESS): 289 | return self.set_seg_class_code(_startea) 290 | return Enoki.FAIL 291 | 292 | def create_code_segment(self, _startea, _endea, _name, 293 | _segsize=DEFAULT_SEGMENT_SIZE): 294 | """ 295 | Wrapper around the create_segment function to 296 | create a new code segment. 297 | @param _startea: The start address of the segment. 298 | @param _endea: The end address of the segment. 299 | @param _name: Name to be given to the new segment. 300 | @param _segsize: Bitness of the segment, e.g. 16, 32 or 64 bit. 301 | """ 302 | r = self.create_segment(_startea, _endea, _name, idaapi.SEG_CODE, _segsize) 303 | if (r == Enoki.SUCCESS): 304 | return self.set_seg_class_code(_startea) 305 | return Enoki.FAIL 306 | 307 | def set_seg_selector(self, _segea, _sel): 308 | return self.set_seg_attribute(_segea, SEGATTR_SEL, _sel) 309 | 310 | def set_seg_align_para(self, _segea): 311 | """ 312 | Sets the alignment of the segment at the given address as 'paragraph', 313 | i.e. 16bit. 314 | 315 | #param _segea Address within the segment to be modified. 316 | """ 317 | return idc.SegAlign(_segea, saRelPara) 318 | 319 | def set_seg_class_code(self, _segea): 320 | """ 321 | Sets the class of the segment at the given address as containing code. 322 | 323 | #param _segea Address within the segment to be modified. 324 | """ 325 | return self.set_seg_class(_segea, "CODE") 326 | 327 | def set_seg_class_data(self, _segea): 328 | """ 329 | Sets the class of the segment at the given address as containing data. 330 | 331 | #param _segea Address within the segment to be modified. 332 | """ 333 | return self.set_seg_class(_segea, "DATA") 334 | 335 | def set_seg_class(self, _segea, _type): 336 | """ 337 | Sets the class of the segment at the given address. 338 | 339 | #param _segea Address within the segment to be modified. 340 | """ 341 | return idc.SegClass(_segea, _type) 342 | 343 | def set_seg_attribute(self, _segea, _attr, _value): 344 | """ 345 | Sets an attribute to the segment at the given address. The available 346 | attributes are: 347 | SEGATTR_START starting address 348 | SEGATTR_END ending address 349 | SEGATTR_ALIGN alignment 350 | SEGATTR_COMB combination 351 | SEGATTR_PERM permissions 352 | SEGATTR_BITNESS bitness (0: 16, 1: 32, 2: 64 bit segment) 353 | SEGATTR_FLAGS segment flags 354 | SEGATTR_SEL segment selector 355 | SEGATTR_ES default ES value 356 | SEGATTR_CS default CS value 357 | SEGATTR_SS default SS value 358 | SEGATTR_DS default DS value 359 | SEGATTR_FS default FS value 360 | SEGATTR_GS default GS value 361 | SEGATTR_TYPE segment type 362 | SEGATTR_COLOR segment color 363 | @param _segea Address within the segment to be modified. 364 | @param _attr The attribute to change. This is one of the value listed above. 365 | @param _value The value of the attibute. 366 | """ 367 | return idc.SetSegmentAttr(_segea, _attr, _value) 368 | 369 | def create_string_at(self, _startea, _unicode=False, _terminator="00"): 370 | """ 371 | Creates a StringItem object at the specified location. 372 | @param _startea The start address of the string 373 | @param _unicode Specifies whether the string is ASCII or UnicodeDecodeError 374 | @param _terminator Specify the terminator character of a sequence. Default is 375 | "00" 376 | """ 377 | # Gets the address of the closest terminator byte/word 378 | strend = self.find_next_byte_string(_startea, _terminator) 379 | strlen = strend-_startea 380 | if strend != idaapi.BADADDR: 381 | if (_unicode): 382 | result = idaapi.make_ascii_string(_startea, strlen, idaapi.ACFOPT_UTF8) 383 | else: 384 | result = idaapi.make_ascii_string(_startea, strlen, idaapi.ACFOPT_ASCII) 385 | if (result == Enoki.FAIL): 386 | print "[-] Failed to create a string at 0x{:x} to 0x{:x}.".format(_startea, strend+1) 387 | return Enoki.FAIL 388 | return Enoki.SUCCESS 389 | return Enoki.FAIL 390 | 391 | def current_file_offset(self): 392 | """ 393 | Returns the file offset, i.e. absolute offset from the beginning of the file, 394 | of the currently selected address. 395 | @return The absolute offset of the selected address. 396 | """ 397 | return idaapi.get_fileregion_offset(ScreenEA()) 398 | 399 | def min_file_offset(self): 400 | """ 401 | Returns the minimum file offset, i.e. absolute offset of the beginning of the file/memory. 402 | @return The absolute minimum offset of the loaded code. 403 | """ 404 | return idaapi.get_fileregion_offset(MinEA()) 405 | 406 | def max_file_offset(self): 407 | """ 408 | Returns the maximum file offset, i.e. absolute offset of the end of the file/memory. 409 | @return The absolute maximum offset of the loaded code. 410 | """ 411 | return idaapi.get_fileregion_offset(MaxEA()) 412 | 413 | def get_byte_at(self, _ea): 414 | return idc.Byte(_ea) 415 | 416 | def get_word_at(self, _ea): 417 | return idc.Word(_ea) 418 | 419 | def get_dword_at(self, _ea): 420 | return idc.Dword(_ea) 421 | 422 | def get_all_bytes_between(self, _startea, _endea): 423 | """ 424 | Returns all bytes between the given addresses. 425 | @param _startea The starting address 426 | @param _endea The ending address 427 | @return A list containing all bytes between the given addresses. 428 | """ 429 | bytes = [] 430 | if (_startea != BADADDR and _endea != BADADDR): 431 | curea = _startea 432 | while (curea < _endea): 433 | bytes.append(self.get_byte_at(curea)) 434 | curea = NextHead(curea) 435 | 436 | return bytes 437 | 438 | def get_all_words_between(self, _startea, _endea): 439 | """ 440 | Returns all words between the given addresses. 441 | @param _startea The starting address 442 | @param _endea The ending address 443 | @return A list containing all words between the given addresses. 444 | """ 445 | words = [] 446 | if (_startea != BADADDR and _endea != BADADDR): 447 | curea = _startea 448 | while (curea < _endea): 449 | words.append(self.get_word_at(curea)) 450 | curea = NextHead(curea) 451 | 452 | return words 453 | 454 | def get_all_strings(self, _filter='', 455 | _encoding=(Strings.STR_UNICODE | Strings.STR_C)): 456 | """ 457 | Retrieves all strings from the current file matching the 458 | regular expression specified in the filter parameter. If no 459 | filter value is provided, all strings IDA objects with the specified encoding 460 | are returned. To access only the strings and display them in the interpreter, 461 | consult the show_all_strings function. 462 | 463 | Values for the _encoding parameters includes: 464 | - Strings.STR_UNICODE 465 | - Strings.STR_C 466 | 467 | Values for the _encoding parameter can be combined using the | 468 | operator. Example: 469 | 470 | _encoding=(Strings.STR_UNICODE | Strings.STR_C) 471 | 472 | @param _filter Regular expression to filter unneeded strings. 473 | @param _encoding Specified the type of strings to seek. 474 | @return A list of strings IDA objects 475 | """ 476 | strings = [] 477 | string_finder = idautils.Strings(False) 478 | string_finder.setup(strtypes=_encoding) 479 | 480 | for index, string in enumerate(string_finder): 481 | s = str(string) 482 | if len(_filter) > 0 and len(s) > 0: 483 | if re.search(_filter, s): 484 | strings.append(string) 485 | else: 486 | strings.append(string) 487 | return strings 488 | 489 | def show_all_strings(self, _filter='', 490 | _encoding=(Strings.STR_UNICODE | Strings.STR_C)): 491 | """ 492 | This function will display the address and the strings found in the 493 | file. This function differs from get_all_strings by printing the results 494 | into the interpreter and only the strings are returns, while the 495 | get_all_strings function returns the IDA string objects. 496 | 497 | @param _filter Regular expression to filter unneeded strings. 498 | @param _encoding Specified the type of strings to seek. 499 | @return A list of strings 500 | """ 501 | strings = [] 502 | strings_objs = self.get_all_strings(_filter, _encoding) 503 | for s in strings_objs: 504 | strings.append(str(s)) 505 | print("[>]\t0x{:x}: {:s}".format(s.ea, str(s))) 506 | return strings 507 | 508 | def get_string_at(self, _ea): 509 | """ 510 | Returns the string, if any, at the specified address. 511 | @param _ea Address of the string 512 | @return The string at the specified address. 513 | """ 514 | if (_ea != BADADDR): 515 | stype = idc.GetStringType(_ea) 516 | return idc.GetString(_ea, strtype=stype) 517 | return "" 518 | 519 | def get_all_comments_at(self, _ea): 520 | """ 521 | Returns both normal and repeatable comments at 522 | the specified address. If both are present, a single 523 | string is returned, both comments separated by a semi- 524 | colon (:) 525 | 526 | @param _ea: Address from which to retrieve the comments 527 | @return: A string containing both normal and repeatable comments, 528 | or an empty string if no comments are found. 529 | """ 530 | normal_comment = self.get_normal_comment(_ea) 531 | rpt_comment = self.get_repeat_comment(_ea) 532 | comment = normal_comment 533 | 534 | if (comment and rpt_comment): 535 | comment += ":" + rpt_command 536 | 537 | return comment 538 | 539 | def get_normal_comment_at(self, _ea): 540 | comment = idc.Comment(_ea) 541 | if not comment: 542 | comment = "" 543 | 544 | return comment; 545 | 546 | def get_repeat_comment(self, _ea): 547 | comment = idc.RptCmt(_ea) 548 | if not comment: 549 | comment = "" 550 | 551 | return comment; 552 | 553 | def get_ea(self, _name): 554 | """ 555 | Returns the address of a named location. Returns Enoki.FAIL 556 | if no address matches the supplied name. 557 | @param _name Name of the location. 558 | @return The address corresponding to the name. 559 | """ 560 | if (len(_name) > 0): 561 | return idc.LocByName(_name) 562 | return Enoki.FAIL 563 | 564 | def get_ea_label(self, _ea): 565 | """ 566 | Returns the label of an address if any. Returns an empty string 567 | if no label is assigned to the address. 568 | @param _ea Address of the location. 569 | @return The label set to the address if any, empty string otherwise. 570 | """ 571 | return idc.Name(_ea) 572 | 573 | def get_disasm(self, _ea): 574 | """ 575 | Returns the disassembled code at the specified address. 576 | @param _ea Address of the opcode to disassembled. 577 | @return String containing the disassembled code. 578 | """ 579 | return idc.GetDisasm(_ea) 580 | 581 | def get_mnemonic(self, _ea): 582 | """ 583 | Returns the instruction at the specified address. 584 | @param _ea The address from which to retrieve the instruction. 585 | @return String containing the mnemonic of the instruction. 586 | """ 587 | return idc.GetMnem(_ea) 588 | 589 | def get_first_segment(self): 590 | """ 591 | Returns the address of the first defined 592 | segment of the file. 593 | 594 | @return: Start address of the first segment or 595 | idc.BADADDR if no segments are defined 596 | """ 597 | return idc.FirstSeg() 598 | 599 | def get_next_segment(self, _ea): 600 | """ 601 | Returns the address of the segment following the one defined 602 | at the given address. 603 | 604 | @param _ea: Address of the current segment. 605 | 606 | @return: Start address of the next segment or 607 | idc.BADADDR if no segments are defined 608 | """ 609 | return idc.FirstSeg() 610 | 611 | def get_segment_name(self, _ea): 612 | """ 613 | Returns the name of the segment at the specified address. 614 | @param _ea An address within the segment 615 | @return String containing the name of the segment. 616 | """ 617 | return idc.Segname(_ea) 618 | 619 | def get_segment_start(self, _ea): 620 | """ 621 | Returns the starting address of the segment located at the specified 622 | address 623 | @param _ea An address within the segment 624 | @return long The starting address of the segment. 625 | """ 626 | return idc.SegStart(_ea) 627 | 628 | def get_segment_end(self, _ea): 629 | """ 630 | Returns the ending address of the segment located at the specified 631 | address 632 | @param _ea An address within the segment 633 | @return long The ending address of the segment. 634 | """ 635 | return idc.SegEnd(_ea) 636 | 637 | def find_next_byte_string(self, _startea, _bytestr, _fileOffset = False, 638 | _bitness=DEFAULT_SEGMENT_SIZE): 639 | """ 640 | This function searches for text representing bytes and/or words in the 641 | machine code of the file from a start address. This function is built on top of the native 642 | FindBinary function. The search is conducted starting at the specified address and downward 643 | for the provided byte string. 644 | 645 | Example: 646 | e.find_next_byte_string(ScreenEA(), "0000 FFFF ???? 0000") 647 | 648 | @param _startea Starting address of the search 649 | @param _bytestr String to search for 650 | @param _fileOffset Specifies whether to return found addresses as relative or absolute 651 | offsets 652 | @param _bitness Specifies the bitness of the segment. 653 | @return The offset of the byte string found, or None if there is no search result. 654 | """ 655 | offset = None 656 | ea = _startea; 657 | if ea == idaapi.BADADDR: 658 | print ("[-] Failed to retrieve starting address.") 659 | offset = None 660 | else: 661 | block = FindBinary(ea, SEARCH_DOWN | SEARCH_CASE, _bytestr, _bitness) 662 | if (block == idc.BADADDR): 663 | offset = None 664 | if _fileOffset: 665 | offset = idaapi.get_fileregion_offset(block) 666 | else: 667 | offset = block 668 | return offset 669 | 670 | def find_byte_string(self, _startea, _endea, _bytestr, 671 | _fileOffsets = False, _showmsg = False): 672 | """ 673 | This function searches for text representing bytes and/or words in the 674 | machine code of the file between 2 addresses. This function is built on top of the native 675 | FindBinary function. The search is conducted starting at the specified address and downward 676 | for the provided byte string. 677 | 678 | Example: 679 | e.find_byte_string(0x4000, 0x8000, "FF FF AA AA FF FF", True) 680 | 681 | @param _startea Starting address of the search 682 | @param _startea Ending address of the search 683 | @param _bytestr String to search for 684 | @param _fileOffsets Specifies whether to return found addresses as relative or absolute 685 | offsets 686 | @param _showmsg Specifies if the function should print a message with the results 687 | @return An array of addresses corresponding to the start of the byte string. If none found, 688 | returns an empty array. 689 | """ 690 | try: 691 | offsets = [] 692 | ea = _startea; 693 | if ea == idaapi.BADADDR: 694 | print ("[-] Failed to retrieve starting address.") 695 | return None 696 | else: 697 | block = FindBinary(ea, SEARCH_DOWN | SEARCH_CASE, _bytestr, 16) 698 | if (block == idc.BADADDR): 699 | print("[-] Byte string '{:s}' not found.".format(_bytestr)) 700 | 701 | while (block != idc.BADADDR and block < _endea): 702 | block_file_offset = idaapi.get_fileregion_offset(block) 703 | if _fileOffsets: 704 | offsets.append(block_file_offset) 705 | else: 706 | offsets.append(block) 707 | next_block_offset = idaapi.get_fileregion_ea(block_file_offset+4) 708 | if (_showmsg): 709 | print("[+] Byte string '{:s}' found at offset 0x{:X}, file offset 0x{:X}.".format( 710 | _bytestr, 711 | block, 712 | block_file_offset)) 713 | block = FindBinary(next_block_offset, SEARCH_DOWN | SEARCH_CASE, _bytestr, 16) 714 | return offsets 715 | except Exception as e: 716 | print("[-] An error occured while seeking byte string {:s}: {:s}".format(_bytestr, e.message)) 717 | return [] 718 | 719 | def get_code_ranges(self, _startea, _endea, _prolog, _epilog): 720 | """ 721 | This function will extract all the machine opcodes located between the 722 | provided code boundaries in the prescribed range. 723 | 724 | Example: TODO 725 | 726 | m = e.get_code_ranges(MinEA(), MaxEA(), "4500 4885", "4886 0090") 727 | print(m) 728 | [[0x2C00, 0x2C15], [0x2C16, 0x2C38]] 729 | 730 | @param _startea The start address of the range to look for code 731 | segment 732 | @param _endea The end address of the range to look for code 733 | segment 734 | @param _prolog Starting byte string of the code segment to look for. 735 | @param _epilog Ending byte string of the code segment to look for. 736 | @return matrix containing the starting and ending addresses of the code 737 | segment found. 738 | """ 739 | segments = [] 740 | if (_startea != BADADDR and _endea != BADADDR): 741 | prolog_offsets = self.find_byte_string(_startea, _endea, _prolog, False) 742 | for offset_idx in range(0, len(prolog_offsets)): 743 | epilog_offset = self.find_next_byte_string( 744 | prolog_offsets[offset_idx], 745 | _epilog) 746 | if epilog_offset != idc.BADADDR: 747 | segments.append([prolog_offsets[offset_idx], epilog_offset]) 748 | return segments; 749 | 750 | def get_instruction_tokens(self, _ea): 751 | """ 752 | Returns the tokens of the disassembled instruction at the specified address. 753 | 754 | Example: 755 | ... 756 | 0x2C00: pop r1 ; Pops stack into R1 register 757 | ... 758 | s = get_instruction_tokens(0x2C00) 759 | print(s) 760 | ['pop', 'r1', ';', 'Pops', 'stack', 'into', 'R1', 'register'] 761 | 762 | @param _ea Address of the instruction to disassembled 763 | @return Array of string containing the tokens of the disassembled instruction. 764 | """ 765 | if (_ea != BADADDR): 766 | return filter(None, GetDisasm(_ea).split(" ")) 767 | 768 | def get_function_at(self, _ea): 769 | """ 770 | Returns the function object at the specified address. 771 | @param _ea An address within the function 772 | @return The native IDA function object at the given address. 773 | """ 774 | if (_ea != BADADDR): 775 | return idaapi.get_func(_ea) 776 | else: 777 | return None 778 | 779 | def set_function_name_at(self, _funcea, _name): 780 | """ 781 | Sets the name of the function located at the specified address, 782 | if any. 783 | 784 | @param _funcea An address within the function 785 | @param _name The new name of the function. Cannot be empty. 786 | @return Enoki.SUCCESS or Enoki.FAIL on error. 787 | """ 788 | if (_funcea != BADADDR and len(_name) > 0): 789 | func = self.get_function_at(_funcea) 790 | if (func): 791 | return idc.MakeName(func.startEA, _name) 792 | return Enoki.FAIL 793 | 794 | def get_function_name_at(self, _ea): 795 | """ 796 | Returns the name of the function at the given address if one is 797 | defined.Returns an empty string if no function is defined at the 798 | address. 799 | @param _ea An address within the function 800 | @return The name of the function or an empty string. 801 | """ 802 | return GetFunctionName(_ea) 803 | 804 | def get_function_disasm(self, _ea): 805 | """ 806 | This function retrieves all of the disassembled and tokenized instructions 807 | of the function located at the specified address. 808 | 809 | Example: 810 | ... 811 | 0x2C00: pop r1 812 | 0x2C01: load acc, 0 813 | 0x2C03: jmp 0x2C0A 814 | ... 815 | s = get_function_disasm(0x2C00) 816 | print(s) 817 | [['pop', 'r1'], ['load', 'acc,', '0'], ['jmp', '0x2C0A']] 818 | 819 | Note that the tokenization is done using white spaces only, so any commas will remain 820 | as part of the token. 821 | 822 | @param _ea An address within the function. 823 | @return A matrix of tuples containing the address of the instruction and a 824 | list of tokenized instructions contained in the function at the specified address. 825 | 826 | """ 827 | matrix_disasm = [] 828 | if (_ea != BADADDR): 829 | current_func = self.get_function_at(_ea) 830 | if (current_func): 831 | func_start = current_func.startEA 832 | func_end = current_func.endEA 833 | curea = func_start 834 | while(curea < func_end): 835 | inst_tokens = self.get_instruction_tokens(curea) 836 | matrix_disasm.append(inst_tokens) 837 | curea = NextHead(curea) 838 | else: 839 | print("[-] No function found at 0x{:x}.".format(_ea)) 840 | return matrix_disasm 841 | 842 | def get_function_disasm_with_ea(self, _ea): 843 | """ 844 | This function retrieves all of the disassembled and tokenized instructions 845 | of the function located at the specified address. 846 | 847 | Example: 848 | ... 849 | 0x2C00: pop r1 850 | 0x2C01: load acc, 0 851 | 0x2C03: jmp 0x2C0A 852 | ... 853 | s = get_function_disasm(0x2C00) 854 | print(s) 855 | [(0x2C00, ['pop', 'r1']), (0x2C01, ['load', 'acc,', '0']), (0x2C03, ['jmp', '0x2C0A'])] 856 | 857 | Note that the tokenization is done using white spaces only, so any commas will remain 858 | as part of the token. 859 | 860 | @param _ea An address within the function. 861 | @return A matrix of tuples containing the address of the instruction and a 862 | list of tokenized instructions contained in the function at the specified address. 863 | 864 | """ 865 | matrix_disasm = [] 866 | if (_ea != BADADDR): 867 | current_func = self.get_function_at(_ea) 868 | if (current_func): 869 | func_start = current_func.startEA 870 | func_end = current_func.endEA 871 | curea = func_start 872 | while(curea < func_end): 873 | inst_tokens = self.get_instruction_tokens(curea) 874 | matrix_disasm.append((curea, inst_tokens)) 875 | curea = NextHead(curea) 876 | else: 877 | print("[-] No function found at 0x{:x}.".format(_ea)) 878 | return matrix_disasm 879 | 880 | def compare_code(self, _code1, _code2): 881 | """ 882 | The compare_code function provides a similarity ratio between the provided code 883 | segments. It does so by using the SequenceMatcher from the difflib module, which 884 | return a value between 0 and 1, 0 indicating 2 completely different segment and 1 885 | specifying identical code segments. 886 | 887 | @param _code1 First code segment to compare 888 | @param _code2 Seconde code segment to compare 889 | @return double A value between 0 and 1 indicating the degree of similarity between the 890 | 2 code segments. 891 | """ 892 | sm=difflib.SequenceMatcher(None,_code1,_code2,autojunk=False) 893 | r = sm.ratio() 894 | return r 895 | 896 | def compare_functions(self, _ea_func1, _ea_func2): 897 | """ 898 | Compares the code of 2 functions using the compare_code function. 899 | 900 | @param _ea_func1 Address within the first function to compare 901 | @param _ea_func2 Address within the second function to compare 902 | @return double A value between 0 and 1, 0 indicating 2 completely different 903 | functions and 1 specifying identical functions. 904 | """ 905 | l1 = self.get_function_instructions(_ea_func1) 906 | l2 = self.get_function_instructions(_ea_func2) 907 | return self.compare_code(l1, l2) 908 | 909 | def get_function_instructions(self, _ea): 910 | """ 911 | Retrieves the instructions, without operands, of the function located at the 912 | specified address. 913 | 914 | Example: 915 | ... 916 | 0x2C00: pop r1 917 | 0x2C01: load acc, 0 918 | 0x2C03: jmp 0x2C0A 919 | ... 920 | s = e.get_function_instructions(0x2C00) 921 | print(s) 922 | ['pop', 'load', 'jmp'] 923 | 924 | @param _ea Address within the function 925 | @return Array of string representing the instruction of the function. 926 | """ 927 | instr = [] 928 | if (_ea != BADADDR): 929 | instr_matrix = self.get_function_disasm(_ea) 930 | for line in instr_matrix: 931 | instr.append(line[0]) 932 | return instr 933 | 934 | def get_all_functions_instr(self, _startea, _endea): 935 | """ 936 | Extracts the instructions of all functions located between the provided 937 | start and end addresses. Returns a dictionary in the format 938 | <"FunctionName", ['i1', 'i2', ..., 'in']> 939 | 940 | @param _startea Starting address 941 | @param _endea Ending address 942 | @return A dictionary object. The keys are the name of the functions found 943 | within the boundaries, while the value is the array of instructions 944 | for the function. 945 | """ 946 | f_instr = {} 947 | curEA = _startea 948 | func = self.get_function_at(_ea) 949 | 950 | while (curEA <= _endea): 951 | name = GetFunctionName(curEA) 952 | i = self.get_function_instructions(curEA) 953 | f_instr[name] = i 954 | func = idaapi.get_next_func(curEA) 955 | curEA = func.startEA 956 | return f_instr 957 | 958 | def get_all_functions(self, _startea, _endea): 959 | """ 960 | Gets all function objects between the provided start and end 961 | addresses. Returns a dictionary in the format <"FunctionName", FunctionObject>. 962 | 963 | @param _startea Starting address 964 | @param _endea Ending address 965 | @return A dictionary object. The keys are the name of the functions found 966 | within the boundaries, while the value is the native Function object 967 | if IDA. 968 | """ 969 | functions = {} 970 | curEA = _startea 971 | func = self.get_function_at(curEA) 972 | if (func): 973 | while (curEA <= _endea): 974 | name = GetFunctionName(curEA) 975 | functions[name] = func 976 | func = idaapi.get_next_func(curEA) 977 | if (func): 978 | curEA = func.startEA 979 | else: 980 | NextHead(curEA) 981 | return functions 982 | 983 | def get_all_func_instr_seg(self, _ea=ScreenEA()): 984 | """ 985 | Returns all the functions in the segment specified by the provided address. 986 | Returns a dictionary in the format <"FunctionName", FunctionObject>. 987 | 988 | @param _ea An address within the segment. Default is the segment of the current 989 | instruction. 990 | @return A dictionary object. The keys are the name of the functions found 991 | within the boundaries, while the value is the native Function object 992 | if IDA. 993 | """ 994 | return self.get_all_functions_instr(SegStart(_ea), SegEnd(_ea)) 995 | 996 | def get_closest_previous_instr(self, _ea, _instruction, _max=20): 997 | """ 998 | Find the closest instruction matching the specified instructions above the 999 | specified address. 1000 | 1001 | Example: 1002 | 0x2C00 lacl #FFh 1003 | 0x2C01 sacl *+ 1004 | 0x2C02 sbrk #5 1005 | 0x2C03 lar ar1, *- 1006 | 0x2C04 call SUB_02CC4 1007 | ... 1008 | Python>e.get_closest_previous_instr(0x2C04, "lac") 1009 | (11264, 'lacl #FF') 1010 | 1011 | If found, the function will return the address of the matching instruction 1012 | and the matching instruction. You can specified a maximum of instructions 1013 | to look before giving up by setting the _max argument, which is set to 1014 | 20 per default. 1015 | 1016 | @param _ea The reference address to search from 1017 | @param _instruction A regular expression to match the required instruction 1018 | @param _max Maximum of instruction to look at before giving up. 1019 | @return A tuple containing the address and the matching instruction. 1020 | """ 1021 | found_ins = (BADADDR, "") 1022 | if (_ea != BADADDR): 1023 | step = 0 1024 | curea = _ea 1025 | found = False 1026 | while (step < _max and not found): 1027 | ins = GetMnem(curea) 1028 | if (re.search(_instruction, ins)): 1029 | found_ins = (curea, e.get_disasm(curea)) 1030 | found = True 1031 | step += 1 1032 | curea = PrevHead(curea) 1033 | 1034 | return found_ins 1035 | 1036 | def get_closest_next_instr(self, _ea, _instruction, _max=20): 1037 | """ 1038 | Find the closest instruction matching the specified instructions above the 1039 | specified address. 1040 | 1041 | Example: 1042 | 0x2C00 lacl #FFh 1043 | 0x2C01 sacl *+ 1044 | 0x2C02 sbrk #5 1045 | 0x2C03 lar ar1, *- 1046 | 0x2C04 call SUB_02CC4 1047 | ... 1048 | Python>e.get_closest_previous_instr(0x2C04, "lac") 1049 | (11264, 'lacl #FF') 1050 | 1051 | If found, the function will return the address of the matching instruction 1052 | and the matching instruction. You can specified a maximum of instructions 1053 | to look before giving up by setting the _max argument, which is set to 1054 | 20 per default. 1055 | 1056 | @param _ea The reference address to search from 1057 | @param _instruction A regular expression to match the required instruction 1058 | @param _max Maximum of instruction to look at before giving up. 1059 | @return A tuple containing the address and the matching instruction. 1060 | """ 1061 | found_ins = (BADADDR, "") 1062 | if (_ea != BADADDR): 1063 | step = 0 1064 | curea = _ea 1065 | found = False 1066 | while (step < _max and not found): 1067 | ins = GetMnem(curea) 1068 | if (re.search(_instruction, ins)): 1069 | found_ins = (curea, e.get_disasm(curea)) 1070 | found = True 1071 | step += 1 1072 | curea = NextHead(curea) 1073 | 1074 | return found_ins 1075 | 1076 | def get_similarity_ratios(self, func1, func2): 1077 | """ 1078 | Calculates the similarity ratios between 2 sets of functions and returns 1079 | a matrix of the results. The matrix is in the following format: 1080 | 1081 | [ 1082 | ["f11", "f12", r1] 1083 | ["f21", "f22", r2] 1084 | ... 1085 | ["fn1", "fn2", rn] 1086 | ] 1087 | 1088 | Note: this function can take a while to complete and was not design for 1089 | efficiency. O(n^2) 1090 | 1091 | @param func1 First set of function to compare 1092 | @param func2 Second set of function to compare. 1093 | @return Matrix of similarity ratios for each function compared. 1094 | """ 1095 | ratios = [] 1096 | for f1, l1 in func1.iteritems(): 1097 | for f2, l2 in func2.iteritems(): 1098 | r = self.compare_code(l1, l2) 1099 | ratios.append([f1, f2, r]) 1100 | return ratios 1101 | 1102 | def get_similarity_func(self, ratios, threshold=1.0): 1103 | """ 1104 | Returns a matrix of similarity vectors with ratios greater or equal 1105 | to the specified threshold. 1106 | 1107 | Example: 1108 | 1109 | ratios = [ 1110 | ["f11", "f12", 1.0] 1111 | ["f21", "f22", 0.64] 1112 | ["f31", "f32", 0.85] 1113 | ] 1114 | 1115 | m = e.get_similarity_func(ratios, 0.9) 1116 | print(m) 1117 | [["f11", "f12", 1.0]] 1118 | 1119 | @param ratios Matrix of ratios as returned by function get_similarity_ratios 1120 | @param threshold Minimum threshold desired. Default value is 1.0 1121 | @return Matrix of similarity ratios with ratio greater or equal to specified threshold. 1122 | """ 1123 | funcs = [] 1124 | for r in ratios: 1125 | if (r[2] >= threshold): 1126 | #print("[+] Similarity between '{:s}' and '{:s}': {:f}.".format(r[0], r[1], r[2])) 1127 | funcs.append(r) 1128 | return funcs 1129 | 1130 | def function_is_leaf(self, _funcea): 1131 | """ 1132 | Verifies if the function at the specified address is a leaf function, i.e. 1133 | it does not make any call to other function. 1134 | 1135 | @param _funcea An address within the function 1136 | @return True if the function at the address contains no call instructions. 1137 | """ 1138 | # Retrieves the function at _funcea: 1139 | near_calls = self.get_functions_called_from(_funcea) 1140 | return len(near_calls) == 0 1141 | 1142 | def get_functions_called_by(self, _funcea, _display=True): 1143 | """ 1144 | Get all functions directly called by the function at the given address. This function 1145 | only extract functions called at the first level, i.e. this function is not recursive. 1146 | Returns a matrix containing the address originating the call, the destination address 1147 | and the name of the function/address called. 1148 | 1149 | Example: 1150 | ... 1151 | 0x2C00: pop r1 1152 | 0x2C01: load acc, 0 1153 | 0x2C03: call 0x2CC0 1154 | 0x2C05: load acc, 27h 1155 | 0x2C07: call 0x2D78 1156 | 0x2C09: push r1 1157 | 0x2C0A: ret 1158 | ... 1159 | 1160 | m = e.get_functions_called_by(0x2C00) 1161 | print(m) 1162 | [[0x2C03, 0x2CC0, 'SUB__02CC0'],[0x2C07, 0x2D78, 'SUB__02D78']] 1163 | 1164 | @param _funcea Address within the function 1165 | @param _display If True, display the results at the console. 1166 | @return Matrix containing the source, destination and name of the functions called. 1167 | """ 1168 | # Retrieves the function at _funcea: 1169 | func = self.get_function_at(_funcea) 1170 | # Boundaries: 1171 | startea = func.startEA 1172 | endea = func.endEA 1173 | # EA index: 1174 | curea = startea 1175 | # Results here: 1176 | near_calls = [] 1177 | while (curea < endea): 1178 | for xref in XrefsFrom(curea): 1179 | # Code 17 is the code for 'Code_Near_Jump' type of XREF 1180 | if (xref.type == 17): 1181 | # Add the current address, the address of the call and the 1182 | # name of the function called. 1183 | call_info = [xref.frm, xref.to, GetFunctionName(xref.to)] 1184 | near_calls.append(call_info) 1185 | if (_display): 1186 | print("[*] 0x{:x}: {:s} -> {:s}.".format( 1187 | call_info[0], 1188 | GetFunctionName(call_info[0]), 1189 | GetFunctionName(call_info[1]))) 1190 | # Next instruction in the function 1191 | curea = NextHead(curea) 1192 | return near_calls 1193 | 1194 | def get_function_flowchart(self, _funcea): 1195 | """ 1196 | Returns the flowchart of the function specified at the given address. 1197 | 1198 | @param _funcea An address within the function 1199 | @return A FlowChart object or Enoki.FAIL if the address given is invalid, 1200 | or no function were found at the address. 1201 | """ 1202 | if (_funcea != BADADDR): 1203 | func = self.get_function_at(_funcea) 1204 | if (func): 1205 | return idaapi.FlowChart(func) 1206 | return Enoki.FAIL 1207 | 1208 | def get_func_block_bounds(self, _funcea): 1209 | """ 1210 | Returns all the code blocks of a given function, i.e. code segment 1211 | between branches/returns and other jumps except for calls. 1212 | 1213 | Example: 1214 | 0x2C00 pop * 1215 | 0x2C01 load r1, *+ 1216 | ... 1217 | 0x2C15 jmp 0x2C20 1218 | 0x2C16 call 0x03D0 1219 | ... 1220 | 0x2C20 jne r2, 0x2C3D 1221 | ... 1222 | 1223 | Python>c_blks = e.get_code_block_boundaries(0x2C00) 1224 | Python>c_blks 1225 | [(0x2C00, 0x2C15), (0x2C15, 0x2C20), ...] 1226 | 1227 | @param _funcea An address within the function 1228 | @return A list of tuples containing the start of the block (inclusive) and the 1229 | end of the block (exclusive). Returns an empty list on error. 1230 | """ 1231 | blks = [] 1232 | fc = self.get_function_flowchart(_funcea) 1233 | if (fc != Enoki.FAIL): 1234 | for blk in fc: 1235 | blks.append((blk.startEA, blk.endEA)) 1236 | return blks 1237 | 1238 | def get_block_at(self, _funcea): 1239 | """ 1240 | Retrieves the code block at the given address 1241 | @param _funcea An address within the function 1242 | @return A tuple containing the boundaries of the corresponding code block. 1243 | returns (BADADDR, BADADDR) if none found. 1244 | """ 1245 | found = (BADADDR, BADADDR) 1246 | if (_funcea != BADADDR): 1247 | blks = self.get_func_block_bounds(_funcea) 1248 | if (len(blks) > 0): 1249 | for (b_start, b_end) in blks: 1250 | if (_funcea >= b_start and _funcea < b_end): 1251 | return (b_start, b_end) 1252 | return found 1253 | 1254 | def get_all_sub_functions_called(self, _funcea, _level=0, _visited=[]): 1255 | """ 1256 | Get all functions directly and indirectly called by the function at the given address. 1257 | This function is recursive and will seek all sub function calls as well, therefore this 1258 | function can be time consumming to complete. 1259 | Returns a matrix containing the address originating the call, the destination address 1260 | and the name of the function/address called and the depth of the call from the initial 1261 | function. 1262 | 1263 | Example: 1264 | ... 1265 | 0x2C00: pop r1 1266 | 0x2C01: load acc, 0 1267 | 0x2C03: call 0x2CC0 1268 | 0x2C05: load acc, 27h 1269 | 0x2C07: call 0x2D78 1270 | 0x2C09: push r1 1271 | 0x2C0A: ret 1272 | ... 1273 | 0x2CC0 SUB__02CC0: 1274 | 0x2CC0 pop r1 1275 | 0x2CC1 load acc, 00 1276 | 0x2CC2 call 0x3DEE 1277 | ... 1278 | 1279 | m = e.get_all_sub_functions_called(0x2C00) 1280 | print(m) 1281 | [[0x2C03, 0x2CC0, 'SUB__02CC0', 0],[0x2CC2, 0x3DEE, 'SUB__03DDE', 1], 1282 | [0x2C07, 0x2D78, 'SUB__02D78', 0]] 1283 | 1284 | @param _funcea Address within the function 1285 | @return Matrix containing the source, destination, name of the functions called and 1286 | the depth relative to the first function. 1287 | """ 1288 | # Retrieves the function at _funcea: 1289 | func = self.get_function_at(_funcea) 1290 | # Make sure a function object was extracted 1291 | if (not func): 1292 | print("[-] Error getting function at 0x{:x}.".format(_funcea)) 1293 | return [] 1294 | # Boundaries: 1295 | startea = func.startEA 1296 | endea = func.endEA 1297 | # EA index: 1298 | curea = startea 1299 | # Results here: 1300 | near_calls = [] 1301 | while (curea < endea): 1302 | for xref in XrefsFrom(curea): 1303 | # Code 17 is the code for 'Code_Near_Jump' type of XREF 1304 | if (xref.type == 17): 1305 | # Add the current address, the address of the call and the 1306 | # name of the function called along with the depth. 1307 | fname = GetFunctionName(xref.to) 1308 | if not fname in _visited: 1309 | _visited.append(fname) 1310 | call_info = [xref.frm, xref.to, fname, _level] 1311 | print("[*]{:s}0x{:x}: {:s} -> {:s}.".format( 1312 | " " * _level, 1313 | call_info[0], 1314 | self.get_function_name_at(call_info[0]), 1315 | self.get_function_name_at(call_info[1]))) 1316 | sub_calls = self.get_all_sub_functions_called(xref.to, _level+1, _visited) 1317 | # Add calls to current ones 1318 | near_calls.append(call_info) 1319 | if (len(sub_calls) > 0): 1320 | near_calls += sub_calls 1321 | 1322 | # Next instruction in the function 1323 | curea = NextHead(curea) 1324 | return near_calls 1325 | 1326 | def get_functions_leading_to(self, _funcea): 1327 | """ 1328 | This function returns all the functions calling the function at the 1329 | provided address. This function is not recursive and only returns the 1330 | first depth of function calling. Returns a matrix containing the address 1331 | originating the call, the destination address and the name of the 1332 | function/address called. 1333 | 1334 | Example: 1335 | ... 1336 | 0x2C00: MAIN: 1337 | 0x2C00: pop r1 1338 | 0x2C01: load acc, 0 1339 | 0x2C03: call 0x2CC0 1340 | 0x2C05: load acc, 27h 1341 | 0x2C07: call 0x2D78 1342 | 0x2C09: push r1 1343 | 0x2C0A: ret 1344 | ... 1345 | 0x2CC0 SUB__02CC0: 1346 | 0x2CC0 pop r1 1347 | 0x2CC1 load acc, 00 1348 | 0x2CC2 call 0x3DEE 1349 | ... 1350 | 1351 | m = e.get_all_sub_functions_called(0x2CC0) 1352 | print(m) 1353 | [[0x2C00, 0x2CC0, 'MAIN']] 1354 | 1355 | @param _funcea Address within the function 1356 | @return Matrix containing the source, destination, name of the functions calling the 1357 | function. 1358 | """ 1359 | # Retrieves the function at _funcea: 1360 | func = idaapi.get_prev_func(idaapi.get_next_func(_funcea).startEA) 1361 | # Boundaries: 1362 | startea = func.startEA 1363 | endea = func.endEA 1364 | # EA index: 1365 | curea = startea 1366 | # Results here: 1367 | near_calls = [] 1368 | while (curea < endea): 1369 | for xref in XrefsTo(curea): 1370 | # Code 17 is the code for 'Code_Near_Jump' type of XREF 1371 | if (xref.type == 17): 1372 | # Add the current address, the address of the call and the 1373 | # name of the function called. 1374 | call_info = [xref.frm, xref.to, GetFunctionName(xref.to)] 1375 | near_calls.append(call_info) 1376 | print("[*] 0x{:x}: {:s} -> {:s}.".format( 1377 | call_info[0], 1378 | GetFunctionName(call_info[0]), 1379 | GetFunctionName(call_info[1]))) 1380 | # Next instruction in the function 1381 | curea = NextHead(curea) 1382 | return near_calls 1383 | 1384 | def color_all_functions_from(self, _funcea, _color): 1385 | """ 1386 | Sets the background color of all functions and sub functions called from the 1387 | root function specified at the given address, i.e. this function is recursive. 1388 | This function can be use to trace the call tree of a function. The function 1389 | will return a matrix of functions calls as per returned by the 1390 | get_all_sub_functions_called function if it succeeds. 1391 | 1392 | Note: You may need to scroll around/refresh the GUI for the change to take 1393 | effect. The background will remain as the default color otherwise. 1394 | 1395 | The value of the color must be in the following format: 0xBBGGRR. Some colors 1396 | are defined in the header of the Enoki class. 1397 | 1398 | Example: 1399 | m = e.get_all_sub_functions_called(0x2CC0, Enoki.BABY_BLUE) 1400 | print(m) 1401 | [[0x2C00, 0x2CC0, 'MAIN']] 1402 | 1403 | Unlike the get_all_sub_functions_called function, this function will 1404 | also change the background color in the GUI. 1405 | 1406 | @param _funcea Address within the root function 1407 | @param _color The background color to set. 1408 | @return Matrix containing the source, destination, name of the functions calling the 1409 | function. Enoki.FAIL otherwise. 1410 | """ 1411 | if (_funcea != BADADDR): 1412 | fct_calls = self.get_all_sub_functions_called(_funcea, _visited=[]) 1413 | if (len(fct_calls) > 0): 1414 | for fcall in fct_calls: 1415 | self.set_function_color(fcall[0], _color) 1416 | self.set_function_color(fcall[1], _color) 1417 | return fct_calls 1418 | else: 1419 | return Enoki.FAIL 1420 | 1421 | def set_function_color(self, _funcea, _color): 1422 | """ 1423 | Sets the background color of the function at the specified address. The value 1424 | of the color must be in the following format: 0xBBGGRR. Some colors 1425 | are defined in the header of the Enoki class. 1426 | 1427 | Example: 1428 | Red: 1429 | e.set_function_color(0x2C00, 0x0000FF) 1430 | 1431 | Blue: 1432 | e.set_function_color(0x2C00, 0xFF0000) 1433 | 1434 | Yellow: 1435 | e.set_function_color(0x2C00, Enoki.YELLOW) 1436 | 1437 | @param _funcea Address within the function 1438 | @param _color The background color to set. 1439 | @return Enoki.SUCCESS if the background color was changed. Enoki.FAIL otherwise. 1440 | """ 1441 | if (_funcea != BADADDR): 1442 | idc.SetColor(_funcea, CIC_FUNC, _color) 1443 | return Enoki.SUCCESS 1444 | return Enoki.FAIL 1445 | 1446 | def get_bytes_between(self, _startea, _endea): 1447 | """ 1448 | Returns bytes located between the provided start and end addresses. 1449 | 1450 | @param _startea The start address 1451 | @param _endea The end address 1452 | @return An array of bytes located between the addresses specified. 1453 | """ 1454 | bytes = [] 1455 | if (_startea != BADADDR and _endea != BADADDR): 1456 | curea = _startea 1457 | while (curea <= _endea): 1458 | b = idaapi.get_byte(curea) 1459 | bytes.append(b) 1460 | curea += 1 1461 | return bytes 1462 | 1463 | def get_words_between(self, _startea, _endea): 1464 | """ 1465 | Returns words located between the provided start and end addresses. 1466 | 1467 | @param _startea The start address 1468 | @param _endea The end address 1469 | @return An array of words located between the addresses specified. 1470 | """ 1471 | words = [] 1472 | if (_startea != BADADDR and _endea != BADADDR): 1473 | curea = _startea 1474 | while (curea <= _endea): 1475 | w = idaapi.get_16bit(curea) 1476 | words.append(w) 1477 | curea += 1 1478 | return words 1479 | 1480 | def get_disasm_between(self, _startea, _endea): 1481 | """ 1482 | Returns a list of disassembled code between the two addresses 1483 | provided. 1484 | 1485 | Example: 1486 | Python>a = e.get_disasm_section(0x2C00, 0x2C10) 1487 | Python>a 1488 | ['pop ar0, 'sar ar0, *', 'sar ar1, *', 'lar ar0, #106', ...] 1489 | 1490 | @param _startea The starting address of the section 1491 | @param _endea The ending address of the section 1492 | @return A list of instructions, returns an empty list ([]) if an error occured. 1493 | """ 1494 | lines = [] 1495 | if (_startea != BADADDR and _endea != BADADDR): 1496 | if (_startea > _endea): 1497 | t = _startea 1498 | _startea = _endea 1499 | _endea = _startea 1500 | curea = _startea 1501 | 1502 | while (curea <= _endea): 1503 | disasm = self.get_disasm(curea) 1504 | lines.append(disasm) 1505 | curea = NextHead(curea) 1506 | return lines 1507 | 1508 | def get_disasm_function_line(self, _funcea): 1509 | """ 1510 | Returns a list of disassembled instructions from the function at the 1511 | given address. 1512 | 1513 | Example: 1514 | Python>a = e.get_disasm_function(0x2CD0) 1515 | Python>a 1516 | ['popd *+', 'sar ar0, *+', 'sar ar1, *', ...] 1517 | 1518 | @param _funcea Address within the function 1519 | @return A list of instructions, returns an empty list ([]) if an error occured. 1520 | """ 1521 | if (_funcea != BADADDR): 1522 | func = self.get_function_at(_funcea) 1523 | if (func): 1524 | return self.get_disasm_between(func.startEA, func.endEA-1) 1525 | return [] 1526 | 1527 | def get_disasm_all_functions_from(self, _funcea): 1528 | """ 1529 | Retrieves all the disassembled codes of the function at the specified 1530 | address and all functions called from the function. This function is recursive 1531 | and can take a while to complete. Depending on the complexity of the root function, 1532 | it may also take considerable memory resources. 1533 | 1534 | If successful, this function returns a dictionary. The keys are the name 1535 | of the functions and the values are list of strings containing the instructions 1536 | of the function. 1537 | 1538 | Example: 1539 | Python>a = e.get_disasm_all_functions_from(0x2C00) 1540 | Python>print(a) 1541 | {'sub_2C00': ['popd *+', 'sar ar0, *+', 'sar ar1, ...], 1542 | ... 1543 | 'sub_23CC': ['popd *+', 'sar ar0, *+', 'sar ar1, ...] } 1544 | 1545 | @param _funcea Address within the function 1546 | @return a dictionary using the key-value pair ("function_name", [instructions]) 1547 | """ 1548 | fdisasm = {} 1549 | if (_funcea != BADADDR): 1550 | froot_disasm = self.get_disasm_function_line(_funcea) 1551 | froot_name = GetFunctionName(_funcea) 1552 | fdisasm[froot_name] = froot_disasm 1553 | fcalled = self.get_all_sub_functions_called(_funcea, _visited=[]) 1554 | print(fcalled) 1555 | if (len(fcalled) > 0): 1556 | print("[*] Retrieving assembly from {:d} function(s).".format(len(fcalled))) 1557 | for finfo in fcalled: 1558 | fea = finfo[1] 1559 | fname = finfo[2] 1560 | fcode = self.get_disasm_function_line(fea) 1561 | fdisasm[fname] = fcode 1562 | return fdisasm 1563 | 1564 | def function_find_all(self, _funcea, _criteria): 1565 | """ 1566 | Retrieves all instructions within the specified function that matches 1567 | the strings provided in the list '_criteria'. 1568 | 1569 | Example: 1570 | Python>r = e.function_find_all(0xA8BF, ["popd", "#15Ah"]) 1571 | Python>print(r) 1572 | ['popd *+ ; Pop Top of Stack', 1573 | 'ldp #15Ah '] 1574 | 1575 | @param _funcea Address within the function to search 1576 | @param _criteria A list of regular expressions to match against 1577 | every instruction in the function. 1578 | @return A list of instructions matching the provided search criterias 1579 | """ 1580 | found_ins = [] 1581 | if (_funcea != BADADDR): 1582 | if (not type(_criteria) in [list, tuple]): 1583 | _criteria = [_criteria] 1584 | 1585 | fdisasm = self.get_disasm_function_line(_funcea) 1586 | if (len(fdisasm) > 0): 1587 | for ins in fdisasm: 1588 | for crit in _criteria: 1589 | if (re.search(crit, ins)): 1590 | found_ins.append(ins) 1591 | return found_ins 1592 | 1593 | def function_find_all_ea(self, _funcea, _criteria): 1594 | """ 1595 | Retrieves all instructions within the specified function that matches 1596 | the strings provided in the list '_criteria' along with the address the 1597 | matching instruction was found. 1598 | 1599 | Example: 1600 | Python>r = e.function_find_all(0xA8BF, ["popd", "#15Ah"]) 1601 | Python>print(r) 1602 | [(0xA8C5, 'popd *+ ; Pop Top of Stack'), 1603 | (0xA8D9, 'ldp #15Ah ')] 1604 | 1605 | @param _funcea Address within the function to search 1606 | @param _criteria A list of regular expressions to match against 1607 | every instruction in the function. 1608 | @return A list of instructions matching the provided search criterias 1609 | """ 1610 | found_ins = [] 1611 | if (_funcea != BADADDR): 1612 | if (not type(_criteria) in [list, tuple]): 1613 | _criteria = [_criteria] 1614 | 1615 | func = self.get_function_at(_funcea) 1616 | curea = func.startEA 1617 | while (curea < func.endEA): 1618 | ins_disasm = self.get_disasm(curea) 1619 | 1620 | for c in _criteria: 1621 | if (re.search(c, ins_disasm)): 1622 | found_ins.append((curea, ins_disasm)) 1623 | 1624 | curea = NextHead(curea) 1625 | 1626 | return found_ins 1627 | 1628 | def function_contains_all(self, _funcea, _criteria): 1629 | """ 1630 | Verifies if ALL the regular expressions in the _criteria arguments 1631 | have a matching instruction in the function at the given address. If one 1632 | or more of the regular expression included does not match any instruction, 1633 | this function will return False. 1634 | 1635 | Example: 1636 | popd *+ 1637 | sar ar0, *+ 1638 | sar ar1, * 1639 | lar ar0, #1 1640 | lar ar0, *0+, ar2 ;(dseg:0001) 1641 | ... 1642 | 1643 | Python>e.function_contains_all(0xBFDC, ["popd", "lar\\s+ar"]) 1644 | True 1645 | Python>e.function_contains_all(0xBFDC, ["popd", "lar\\s+ar7"]) 1646 | False 1647 | 1648 | @param _funcea Address within the function to search 1649 | @param _criteria A list of regular expressions to match against each instruction of 1650 | the function. 1651 | @return True if all regular expresions were matched, False otheriwse. 1652 | """ 1653 | if (_funcea != BADADDR): 1654 | if (not type(_criteria) in [list, tuple]): 1655 | _criteria = [_criteria] 1656 | 1657 | fdisasm = self.get_disasm_function_line(_funcea) 1658 | 1659 | if (len(fdisasm) > 0): 1660 | for crit in _criteria: 1661 | idx = 0 1662 | matched = False 1663 | 1664 | while (idx < len(fdisasm) and not matched): 1665 | ins = fdisasm[idx] 1666 | if (re.search(crit, ins)): 1667 | matched = True 1668 | 1669 | idx += 1 1670 | 1671 | if (not matched): 1672 | return False 1673 | 1674 | return True 1675 | return False 1676 | 1677 | def find_all_functions_contain(self, _criteria, _startea=MinEA(), _endea=MaxEA()): 1678 | """ 1679 | This function will look for all functions between the given boundaries that contains 1680 | instructions matching all regular expresions in the given list. 1681 | 1682 | Example: 1683 | Python>e.find_all_functions_contain(["popd", "lar\\s+ar"], _startea=0x8F59, _endea=0x9000) 1684 | ['sub_8F42', 'sub_8F97', 'sub_8FE8', ...] 1685 | 1686 | Python>e.find_all_functions_contain(["popd", "lar\\s+ar7"], _startea=0x8000, _endea=0x8FFF) 1687 | [] 1688 | 1689 | @param _criteria A list of regular expressions to match against each instruction of 1690 | the function. 1691 | @param _startea The starting address of the search. If no value is specified, MinEA() is 1692 | used. 1693 | @param _endea The ending address of the search. If no value is specified, MaxEA() is 1694 | used. 1695 | """ 1696 | found = [] 1697 | f = self.get_function_at(_startea) 1698 | while (f): 1699 | fname = GetFunctionName(f.startEA) 1700 | if (self.function_contains_all(f.startEA, _criteria)): 1701 | found.append(fname) 1702 | f = idaapi.get_next_func(f.endEA+1) 1703 | return found 1704 | 1705 | def search_code_all_functions_from(self, _funcea, _search): 1706 | """ 1707 | This function searches all the disassembly of the function at the 1708 | given address and all functions called from it for the specified 1709 | regular expression. 1710 | 1711 | Example: 1712 | Python>a = e.search_code_all_functions_from(0x0800, "1EFh") 1713 | Python>print(a) 1714 | [('sub_0800', 'lacc #1EFh, *+'), ('sub_0800', 'lacc #1EFh, *+')] 1715 | 1716 | In the example above, two LACC instructions containing the "1EFh" is found 1717 | in the same function, hence it appears twice in the results. 1718 | 1719 | @param _funcea Address within the function to search 1720 | @param _search A regular expression to search for in the disassembled code. 1721 | @return A list of tuples containing the name of the function and the 1722 | matching instructions. 1723 | """ 1724 | results = [] 1725 | if (_funcea != BADADDR): 1726 | disasm = self.get_disasm_all_functions_from(_funcea) 1727 | for fname, fcode in disasm.iteritems(): 1728 | for ins in fcode: 1729 | if re.search(_search, ins): 1730 | results.append((fname, ins)) 1731 | return results 1732 | 1733 | def find_similar_functions_in_tree(self, _funcea, _startea, _threshold=1.0): 1734 | """ 1735 | Attempts to find other functions similar to the one specified in the call tree 1736 | of the given function. 1737 | 1738 | This function will accept the address of a function and navigate the call tree 1739 | of the second address provided. The instructions of both function will be compared 1740 | and if the similarity between both is above the specified threshold, the function 1741 | of the call tree is stored along with other found function and returns. 1742 | 1743 | The function returns a matrix in the following format: 1744 | [ 1745 | [, , ratio1], 1746 | ... 1747 | [, , ratioN] 1748 | ] 1749 | 1750 | @param _funcea Address within the function to search 1751 | @param _funcea Address of the starting function of the call tree 1752 | @return A matrix containing the address, name and ratio of the functions 1753 | found. 1754 | """ 1755 | results = [] 1756 | if (_funcea != BADADDR): 1757 | tree = self.get_all_sub_functions_called(_startea, _visited=[]) 1758 | for fcall in tree: 1759 | fcalled_ea = fcall[1] 1760 | fcalled_name = fcall[2] 1761 | ratio = self.compare_functions(_funcea, fcalled_ea) 1762 | if (ratio >= _threshold): 1763 | results.append([fcalled_ea, fcalled_name, ratio]) 1764 | 1765 | return results 1766 | 1767 | def save_range_to_file(self, _startea, _endea, _file): 1768 | """ 1769 | Saves the chunk of bytes between the given start and end addresses into 1770 | the given file. 1771 | 1772 | @param _startea The starting address of the chunk 1773 | @param _endea The ending address of the chunk 1774 | @param _file Name of the file to write. 1775 | @return Enoki.SUCCESS if the file was written successfully, Enoki.FAIL 1776 | otherwise. 1777 | """ 1778 | if (_startea != BADADDR and _endea != BADADDR): 1779 | try: 1780 | chunk = bytearray(idc.GetManyBytes(_startea, ((_endea-_startea)+1)*2)) 1781 | print("Exporting {:d} bytes chunk 0x{:05x} to 0x{:05x} to {:s}.".format(len(chunk), _startea, _endea, _file)) 1782 | with open(_file, "wb") as f: 1783 | f.write(chunk) 1784 | except Exception as e: 1785 | print("[-] Error while writing file: {:s}.".format(e.message)) 1786 | return Enoki.FAIL 1787 | return Enoki.SUCCESS 1788 | 1789 | e = Enoki() 1790 | print("[+] Enoki {:s} loaded successfully.".format(e.vers())) --------------------------------------------------------------------------------