├── Clear_All_Instruction_Colors.py ├── Minimize_Automatic_Function_Comments.py ├── README.md ├── Highlight_Target_Instructions.py ├── Label_Dynamically_Resolved_Iat_Entries.py ├── LICENSE ├── Utils.py └── Preview_Function_Capabilities.py /Clear_All_Instruction_Colors.py: -------------------------------------------------------------------------------- 1 | # Clears all colors applied to instructions in program 2 | #@author https://AGDCServices.com 3 | #@category AGDCservices 4 | #@keybinding 5 | #@menupath 6 | #@toolbar 7 | 8 | ''' 9 | Removes all highlight colors from current program. 10 | Applied highlighting colors are saved with the ghidra file. 11 | This script can be used to remove the colors prior to exporting 12 | and sharing the ghidra database so that the highlight colors 13 | don't clash with different color schemes used by coworkers 14 | ''' 15 | 16 | instructions = currentProgram.getListing().getInstructions(True) 17 | for curInstr in instructions: 18 | clearBackgroundColor(curInstr.getAddress()) 19 | -------------------------------------------------------------------------------- /Minimize_Automatic_Function_Comments.py: -------------------------------------------------------------------------------- 1 | # Adds a short repeatable comment to all functions to hide the automatic function comment 2 | #@author https://AGDCServices.com 3 | #@category AGDCservices 4 | #@keybinding 5 | #@menupath 6 | #@toolbar 7 | 8 | ''' 9 | Adds a single space as a repeatable comment to all functions 10 | within the current program. By default, Ghidra adds a function 11 | prototype as a repeatable comment to all functions. These comments 12 | are very long which will force the code block to expand it its maximum 13 | size within the graph view. These default comments do not add any real value 14 | and decreases the amount of code that can be seen in the graph view. 15 | 16 | Currently, there is no way to turn this option off. A work around is 17 | to replace the repeatable comment with a single space so that you don't 18 | see any comment by default, and the code block is not expanded out to 19 | it's maximum size because of the long function prototype comment. 20 | ''' 21 | 22 | REPEATABLE_COMMENT = ghidra.program.model.listing.CodeUnit.REPEATABLE_COMMENT 23 | listing = currentProgram.getListing() 24 | 25 | commentCount = 0 26 | for func in listing.getFunctions(True): 27 | listing.getCodeUnitAt(func.getEntryPoint()).setComment(REPEATABLE_COMMENT, ' ') 28 | commentCount += 1 29 | 30 | print('Set {:d} repeatable function comments to a single space to prevent automatic function comments from being displayd'.format(commentCount)) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Ghidra Scripts 2 | Custom scripts to make analyzing malware easier in Ghidra 3 | ## Installation 4 | Add these scripts to your Ghidra scripts directory: 5 | 1. Open any file in Ghidra for analysis 6 | 2. Select the Window / Script Manager menu 7 | 3. Click the "Script Directories" icon in the upper right toolbar 8 | 4. Add the directory where your scripts are located via the green plus sign 9 | 5. All scripts will show up under the AGDCservices folder 10 | ## Clear_All_Instruction_Colors.py 11 | Removes all highlight colors from current program. Applied highlighting colors are saved with the ghidra file. 12 | This script can be used to remove the colors prior to exporting and sharing the ghidra database so that the highlight colors don't clash with different color schemes used by coworkers. See script header for more usage details. 13 | ## Preview_Function_Capabilities.py 14 | This script will name all unidentified functions with a nomenclature that provides a preview of important capabilities included within the function and all child functions. 15 | 16 | The script includes a list of hardcoded important API calls. The script will locate all calls contained in the unidentifed function and it's children functions. For any of the calls which match the hardcoded API call list, a shorthand name will be applied to indicate which category of important call is contained within the function. 17 | 18 | The naming nomenclature is based on capability and does not identify specific API's. By keeping the syntax short and just for capability, you can get a preview of all the important capabilities within a function without having the name get enormous. See script header for more details. 19 | 20 | For a video demonstration of this script, view the video "Ghidra Script To Name Function From Capabilities" on the AGDC Services channel of youtube, https://youtu.be/s5weitGaKLw 21 | ## Highlight_Target_Instructions.py 22 | Script to search all instructions in current program looking for target instructions of interest. When found, 23 | a defined highlighting color will be applied to make it easy to identify target instructions. Target instructions are things like call instructions, potential crypto operations, pointer instructions, etc. Highlighting instructions of interest decrease the chance of missing important instructions when skimming malware code. See script header for more usage details. 24 | 25 | **Default color choices are made to work with the AGDC_codeBrowser_##.tool. They can be changed to fit any coloring schema by modifying the defined color constants at the top of the script** 26 | ## Minimize_Automatic_Function_Comments.py 27 | Adds a single space as a repeatable comment to all functions within the current program. By default, Ghidra adds a function prototype as a repeatable comment to all functions. These comments are very long which will force the code block to expand it its maximum size within the graph view. These default comments do not add any real value and decreases the amount of code that can be seen in the graph view. 28 | 29 | Currently, there is no way to turn this option off. A work around is to replace the repeatable comment with a single space so that you don't see any comment by default, and the code block is not expanded out to 30 | it's maximum size because of the long function prototype comment. See script header for more usage details. 31 | ## Utils.py 32 | A number of commonly used convenience functions to aid in rapid scripting, e.g. Get_Operand_As_Immediate_Value, Get_Next_Target_Instruction, Get_Bytes_List, etc. See script header for more usage details. 33 | ## Label_Dynamically_Resolved_Iat_Entries.py 34 | Script to aid in reverse engineering files that dynamically resolve imports. Script will search program for all dynamically resolved imports and label them with the appropriate API name pulled from a provided labeled IAT dump file. Only resolved imports stored in global variables will be identified. This script will not label every resolved global variable, but only those that are used inside a call instruction. 35 | 36 | The labeled IAT dump file must be generated by an associated program, "Dump_Labeled_Iat_Memory.exe". This program is located in another repo on this github site called "Misc Malware Anaysis Tools". See script header for more usage details. 37 | 38 | -------------------------------------------------------------------------------- /Highlight_Target_Instructions.py: -------------------------------------------------------------------------------- 1 | # Highlights target instructions using custom colors for easy identification 2 | #@author https://AGDCServices.com 3 | #@category AGDCservices 4 | #@keybinding 5 | #@menupath 6 | #@toolbar 7 | 8 | ''' 9 | Script will search all instructions in current program 10 | looking for target instructions of interest. When found, 11 | a defined highlighting color will be applied to make it 12 | easy to identify target instructions. 13 | 14 | default color choices are made to work with the 15 | AGDC_codeBrowser_14.tool. They can be changed to fit any 16 | coloring schema by modifying the defined color constants 17 | at the top of the program 18 | ''' 19 | 20 | from java.awt import Color 21 | 22 | 23 | # define RGB colors for target instructions 24 | 25 | # color_default sets non-target instructions colors 26 | # needed to account for bug in graph view 27 | COLOR_DEFAULT = Color(255,255,255) # white 28 | COLOR_CALL = Color(255, 220, 220) #light red 29 | COLOR_POINTER = Color(200, 240, 255) # blue 30 | COLOR_CRYPTO = Color(245, 205, 255) # violet 31 | COLOR_STRING_OPERATION = Color(180,230,170) # green 32 | 33 | # 34 | # additional unused colors 35 | # 36 | # Color(255,255,180) #yellow 37 | # Color(220,255,200) #very light green 38 | # Color(255,200,100) #orange 39 | # Color(220, 220, 220) #light grey 40 | # Color(195, 195, 195) # dark grey 41 | 42 | 43 | 44 | REG_TYPE = 512 45 | 46 | 47 | # loop through all program instructions searching 48 | # for target instructions. when found, apply defined 49 | # color 50 | instructions = currentProgram.getListing().getInstructions(True) 51 | for curInstr in instructions: 52 | 53 | bIsTargetInstruction = False 54 | 55 | curMnem = curInstr.getMnemonicString().lower() 56 | 57 | # color call instructions 58 | if curMnem == 'call': 59 | bIsTargetInstruction = True 60 | setBackgroundColor(curInstr.getAddress(), COLOR_CALL) 61 | 62 | 63 | # color lea instructions 64 | if curMnem == 'lea': 65 | bIsTargetInstruction = True 66 | setBackgroundColor(curInstr.getAddress(), COLOR_POINTER) 67 | 68 | 69 | # 70 | # color suspected crypto instructions 71 | # 72 | 73 | # xor that does not zero out the register 74 | if (curMnem == 'xor') and (curInstr.getOpObjects(0) != curInstr.getOpObjects(1)): 75 | bIsTargetInstruction = True 76 | setBackgroundColor(curInstr.getAddress(), COLOR_CRYPTO) 77 | 78 | 79 | # common RC4 instructions 80 | if (curMnem == 'cmp') and (curInstr.getOperandType(0) == REG_TYPE) and (curInstr.getOpObjects(1)[0].toString() == '0x100'): 81 | bIsTargetInstruction = True 82 | setBackgroundColor(curInstr.getAddress(), COLOR_CRYPTO) 83 | 84 | # misc math operations 85 | mathInstrList = ['sar', 'sal', 'shr', 'shl', 'ror', 'rol', 'idiv', 'div', 'imul', 'mul', 'not'] 86 | if curMnem in mathInstrList: 87 | bIsTargetInstruction = True 88 | setBackgroundColor(curInstr.getAddress(), COLOR_CRYPTO) 89 | 90 | 91 | # 92 | # 93 | # 94 | 95 | 96 | 97 | # color string operations 98 | # skip instructions that start with 'c' to exclude conditional moves, e.g. cmovs 99 | if (curMnem.startswith('c') == False) and (curMnem.endswith('x') == False) and ( ('scas' in curMnem) or ('movs' in curMnem) or ('stos' in curMnem) ): 100 | bIsTargetInstruction = True 101 | setBackgroundColor(curInstr.getAddress(), COLOR_STRING_OPERATION) 102 | 103 | 104 | 105 | 106 | # fixes ghidra bug in graph mode where if a color is applied to the first instruction of a code block 107 | # the color will also be applied to the rest of the instructions in that code block 108 | # by setting the color to every line that's not a target instruction to the default color, 109 | # target colors should be applied accurately 110 | # error only appears to be in graph view. colors will be correctly applied in flat view, but incorrect in graph view 111 | # if you just clear the colors instead of setting all the colors to the default color, 112 | # the error will still occur. In this case, it may get fixed by redrawing the graph, 113 | # but you will have to redraw the graph every time you come across an error 114 | if bIsTargetInstruction == False: 115 | setBackgroundColor(curInstr.getAddress(), COLOR_DEFAULT) 116 | 117 | 118 | 119 | -------------------------------------------------------------------------------- /Label_Dynamically_Resolved_Iat_Entries.py: -------------------------------------------------------------------------------- 1 | #Find dynamically resolved IAT locations and apply labels from input file 2 | #@author https://AGDCServices.com 3 | #@category AGDCservices 4 | #@keybinding 5 | #@menupath 6 | #@toolbar 7 | #@toolbar 8 | 9 | ''' 10 | Script will search program for all dynamically resolved 11 | imports and label them with the appropriate API name pulled 12 | from a provided labeled IAT dump file. Only resolved imports 13 | stored in global variables will be identified. This script will 14 | not label every resolved global variable, but only those that 15 | are used inside a call instruction 16 | 17 | The labeled IAT dump file must be generated by an associated 18 | program, "Dump_Labeled_Iat_Memory.exe". This program is located 19 | in another repo on this github site called "Misc Malware Anaysis Tools" 20 | 21 | usage: 22 | Run file inside a debugger up to the point where all 23 | dynamically resolved imports are resolved. At that point, 24 | run the associated "Dump_Labeled_Iat_Memory.exe" to create 25 | the labeled Iat dump file. 26 | 27 | Once you have the labeled IAT dump file, run this script. 28 | The script must be run prior to renaming any of the global 29 | IAT variables. The script will not overwrite any manually 30 | named global variables. 31 | 32 | ''' 33 | 34 | 35 | def main(): 36 | 37 | try: 38 | fileObject = askFile('Select Labeled Iat Dump File', 'Open') 39 | except: 40 | print('file could not be opened') 41 | quit() 42 | 43 | iatList = Get_Dynamically_Resolved_Iat_Addresses() 44 | Label_Dynamically_Resolved_Iat_Addresses(iatList, fileObject.getPath()) 45 | 46 | 47 | def Get_Dynamically_Resolved_Iat_Addresses(): 48 | ''' 49 | function will search current program for all 50 | calls to unresolved global variables and return 51 | a list of all the global variable addresses. 52 | ''' 53 | 54 | instructions = currentProgram.getListing().getInstructions(True) 55 | iatSet = set() 56 | for curInstr in instructions: 57 | curMnem = curInstr.getMnemonicString().lower() 58 | if curMnem == 'call': 59 | operandRef = curInstr.getOperandReferences(0) 60 | if len(operandRef) != 0: 61 | operandRefEa = operandRef[0].getToAddress() 62 | curLabel = getSymbolAt(operandRefEa) 63 | if curLabel != None: # accounts for non memory references 64 | if curLabel.getName().lower().startswith( ('dat_', 'byte_', 'word_', 'dword_', 'qword_') ): 65 | iatSet.add(operandRefEa) 66 | 67 | 68 | return list(iatSet) 69 | 70 | 71 | def Label_Dynamically_Resolved_Iat_Addresses(iatList, labeledIatDumpFileName): 72 | ''' 73 | function will read in file with a format of: 74 | iatRva\tapiString 75 | each address in the iatList will be checked to 76 | see if there is an entry in the labeled Iat Dump File. 77 | If so, the iat label will be set to the api string 78 | from the input file 79 | 80 | iatList should be list of address objects 81 | ''' 82 | 83 | with open(labeledIatDumpFileName, 'r') as fp: 84 | labeledIatList = fp.read().splitlines() 85 | 86 | imageBase = currentProgram.getImageBase().getOffset() 87 | labeledIatDict = dict() 88 | for i in labeledIatList: 89 | curRva, curIatLabel = i.split('\t') 90 | labeledIatDict[imageBase + int(curRva, 16)] = curIatLabel 91 | 92 | labeledCount = 0 93 | unresolvedList = [] 94 | for entry in iatList: 95 | curIatLabel = labeledIatDict.get(entry.getOffset(), None) 96 | if curIatLabel != None: 97 | getSymbolAt(entry).setName(curIatLabel, ghidra.program.model.symbol.SourceType.USER_DEFINED) 98 | labeledCount += 1 99 | else: 100 | unresolvedList.append('could not resolve address 0x{:x}'.format(entry.getOffset())) 101 | 102 | print('labeled {:x} dynamically resolved IAT entries'.format(labeledCount)) 103 | 104 | if len(unresolvedList) != 0: 105 | print('[*] ERROR, was not able to resolve {:x} entries'.format(len(unresolvedList))) 106 | print('\n'.join(unresolvedList)) 107 | 108 | 109 | 110 | 111 | 112 | if __name__ == '__main__': 113 | main() -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /Utils.py: -------------------------------------------------------------------------------- 1 | from __main__ import * 2 | 3 | ''' 4 | Utility module of common helper functions used 5 | in building Ghidra scripts 6 | 7 | Contained function prototypes below: 8 | Get_Bytes_List(targetEa, nLen) 9 | Get_Bytes_String(targetEa, nLen) 10 | Get_Ascii_String(targetEa) 11 | Set_Bytes_String(targetEa, patchStr) 12 | Get_Call_Xrefs_To(targetEa) 13 | Get_Prev_Target_Instruction(curInstr, mnem, N, MAX_INSTRUCTIONS = 9999) 14 | Get_Next_Target_Instruction(curInstr, mnem, N, MAX_INSTRUCTIONS = 9999) 15 | Get_Operand_As_Address(targetInstr, operandIndex) 16 | Get_Operand_As_Immediate_Value(targetInstr, operandIndex) 17 | Get_Operand_As_String(targetInstr, operandIndex) 18 | 19 | ''' 20 | 21 | def Get_Bytes_List(targetEa, nLen): 22 | ''' 23 | gets the bytes from memory, treating as unsigned bytes 24 | ghidra treats read bytes as signed which is not what 25 | you normally want when reading memory, e.g. if you call 26 | getBytes on a byte 0xfe, you won't get 0xfe, you'll get -2 27 | this may not be an issue depending on what operation you 28 | are performing, or it may, e.g. reading a byte that is 29 | displayed as a negative value will fail when compared to 30 | the two's complement hex (-2 != 0xfe). If you're using 31 | the byte to patch the program, it may work ok. 32 | 33 | returns result as a list 34 | ''' 35 | 36 | signedList = list(getBytes(targetEa, nLen)) 37 | unsignedList = [] 38 | for curByte in signedList: 39 | if curByte < 0: 40 | uByte = (0xff - abs(curByte) + 1) 41 | else: 42 | uByte= curByte 43 | unsignedList.append(uByte) 44 | 45 | return unsignedList 46 | 47 | def Get_Bytes_String(targetEa, nLen): 48 | ''' 49 | gets the bytes from memory, treating as unsigned bytes 50 | ghidra treats read bytes as signed which is not what 51 | you normally want when reading memory, e.g. if you call 52 | getBytes on a byte 0xfe, you won't get 0xfe, you'll get -2 53 | this may not be an issue depending on what operation you 54 | are performing, or it may, e.g. reading a byte that is 55 | displayed as a negative value will fail when compared to 56 | the two's complement hex (-2 != 0xfe). If you're using 57 | the byte to patch the program, it may work ok. 58 | 59 | returns result as a string 60 | ''' 61 | 62 | signedList = list(getBytes(targetEa, nLen)) 63 | unsignedList = [] 64 | for curByte in signedList: 65 | if curByte < 0: 66 | uByte = (0xff - abs(curByte) + 1) 67 | else: 68 | uByte= curByte 69 | unsignedList.append(chr(uByte)) 70 | 71 | return ''.join(unsignedList) 72 | 73 | 74 | def Get_Ascii_String(targetEa): 75 | ''' 76 | returns the null terminated ascii string starting 77 | at targetEa. Returns a string object and does not 78 | include the terminating null character 79 | 80 | targetEa must be an address object 81 | ''' 82 | 83 | result = '' 84 | i = 0 85 | while True: 86 | curByte = chr(getByte(targetEa.add(i))) 87 | if curByte == chr(0): break 88 | result += curByte 89 | i += 1 90 | 91 | return result 92 | 93 | def Set_Bytes_String(targetEa, patchStr): 94 | ''' 95 | writes the patchStr to targetEa 96 | does in a loop with setByte instead of setBytes 97 | so avoid having to deal with bytearray in jython 98 | ''' 99 | 100 | for i, v in enumerate(patchStr): 101 | setByte(targetEa.add(i), ord(v)) 102 | 103 | 104 | def Get_Call_Xrefs_To(targetEa): 105 | ''' 106 | returns list of addresses which call the targetEa 107 | 108 | ''' 109 | 110 | callEaList = [] 111 | for ref in getReferencesTo(targetEa): 112 | if getInstructionAt(ref.getFromAddress()).getMnemonicString().lower() == 'call': 113 | callEaList.append(ref.getFromAddress()) 114 | 115 | return callEaList 116 | 117 | def Get_Prev_Target_Instruction(curInstr, mnem, N, MAX_INSTRUCTIONS = 9999): 118 | ''' 119 | gets N'th previous target instruction from the curInstr 120 | function will only go back MAX_INSTRUCTIONS 121 | function will not search outside of current function if the 122 | current instruction is inside a defined function 123 | returns None on failure 124 | ''' 125 | 126 | 127 | # get address set of current function to use in determining if prev instruction 128 | # is outside of current function 129 | try: 130 | funcBody = getFunctionContaining(curInstr.getAddress()).getBody() 131 | except: 132 | funcBody = None 133 | 134 | 135 | # get Nth prev instruction 136 | totalInstructionCount = 0 137 | targetInstructionCount = 0 138 | while (totalInstructionCount < MAX_INSTRUCTIONS) and (targetInstructionCount < N): 139 | curInstr = curInstr.getPrevious() 140 | 141 | if curInstr == None: break 142 | if funcBody != None: 143 | if funcBody.contains(curInstr.getAddress()) == False: break 144 | 145 | if curInstr.getMnemonicString().lower() == mnem.lower(): targetInstructionCount += 1 146 | 147 | totalInstructionCount += 1 148 | 149 | 150 | # return the results 151 | if targetInstructionCount == N: 152 | result = curInstr 153 | else: 154 | result = None 155 | 156 | return result 157 | 158 | def Get_Next_Target_Instruction(curInstr, mnem, N, MAX_INSTRUCTIONS = 9999): 159 | ''' 160 | gets N'th next target instruction from the curInstr 161 | function will only go forward MAX_INSTRUCTIONS 162 | function will not search outside of current function if the 163 | current instruction is inside defined function 164 | returns None on failure 165 | ''' 166 | 167 | # get address set of current function to use in determining if prev instruction 168 | # is outside of current function 169 | try: 170 | funcBody = getFunctionContaining(curInstr.getAddress()).getBody() 171 | except: 172 | funcBody = None 173 | 174 | 175 | # get Nth next instruction 176 | totalInstructionCount = 0 177 | targetInstructionCount = 0 178 | while (totalInstructionCount < MAX_INSTRUCTIONS) and (targetInstructionCount < N): 179 | curInstr = curInstr.getNext() 180 | 181 | if curInstr == None: break 182 | if funcBody != None: 183 | if funcBody.contains(curInstr.getAddress()) == False: break 184 | 185 | if curInstr.getMnemonicString().lower() == mnem.lower(): targetInstructionCount += 1 186 | 187 | totalInstructionCount += 1 188 | 189 | 190 | # return the results 191 | if targetInstructionCount == N: 192 | result = curInstr 193 | else: 194 | result = None 195 | 196 | return result 197 | 198 | def Get_Operand_As_Address(targetInstr, operandIndex): 199 | ''' 200 | returns the value for the operandIndex operand of the 201 | target instruction treated as an address. if the 202 | target operand can not be treated as an address, 203 | returns None. operandIndex starts at 0 204 | 205 | If this is called on jumps or calls, the final 206 | address jumped to / called will be returned 207 | 208 | There are no real checks for validity and it's up to 209 | the author to ensure the target operand should be an address 210 | 211 | ''' 212 | 213 | # error check 214 | if operandIndex >= targetInstr.getNumOperands(): 215 | print('[*] Error in Get_Operand_As_Address. operandIndex is too large at {:s}'.format(targetInstr.getAddress().toString())) 216 | return None 217 | elif targetInstr.getNumOperands() == 0: 218 | return None 219 | 220 | 221 | operand = targetInstr.getOpObjects(operandIndex)[0] 222 | if type(operand) == ghidra.program.model.scalar.Scalar: 223 | targetValue = toAddr(operand.getValue()) 224 | elif type(operand) == ghidra.program.model.address.GenericAddress: 225 | targetValue = operand 226 | else: 227 | targetValue = None 228 | 229 | return targetValue 230 | 231 | def Get_Operand_As_Immediate_Value(targetInstr, operandIndex): 232 | ''' 233 | returns the value for the operandIndex operand of the target instruction 234 | if the target operand is not an immediate value, the function will attempt 235 | to find where the variable was previously set. It will ONLY search within 236 | the current function to find where the variable was previously set. 237 | if operand value can not be determined, returns None 238 | operandIndex starts at 0 239 | ''' 240 | 241 | # operand types are typically different if operand is 242 | # used in a call versus not a call and if there is a 243 | # reference or not 244 | OP_TYPE_IMMEDIATE = 16384 245 | OP_TYPE_NO_CALL_REG = 512 246 | OP_TYPE_NO_CALL_STACK = 4202496 247 | # global variables have numerous reference types 248 | # unsure how to differentiate the different types 249 | 250 | 251 | # error check 252 | if operandIndex >= targetInstr.getNumOperands(): 253 | print('[*] Error in Get_Operand_As_Immediate_Value. operandIndex is too large at {:s}'.format(targetInstr.getAddress().toString())) 254 | return None 255 | elif targetInstr.getNumOperands() == 0: 256 | return None 257 | 258 | 259 | # get address set of current function to use in determining 260 | # if prev instruction is outside of current function 261 | try: 262 | funcBody = getFunctionContaining(targetInstr.getAddress()).getBody() 263 | except: 264 | funcBody = None 265 | 266 | 267 | # find the actual operand value 268 | targetValue = None 269 | opType = targetInstr.getOperandType(operandIndex) 270 | # if operand is a direct number 271 | if opType == OP_TYPE_IMMEDIATE: 272 | targetValue = targetInstr.getOpObjects(operandIndex)[0].getValue() 273 | # else if operand is a register 274 | elif opType == OP_TYPE_NO_CALL_REG: 275 | regName = targetInstr.getOpObjects(operandIndex)[0].getName().lower() 276 | 277 | # search for previous location where register value was set 278 | curInstr = targetInstr 279 | while True: 280 | curInstr = curInstr.getPrevious() 281 | 282 | # check to make sure curInstr is valid 283 | if curInstr == None: break 284 | if funcBody != None: 285 | if funcBody.contains(curInstr.getAddress()) == False: break 286 | 287 | # check different variations of how register values get set 288 | curMnem = curInstr.getMnemonicString().lower() 289 | if (curMnem == 'mov') and (curInstr.getOperandType(0) == OP_TYPE_NO_CALL_REG): 290 | if curInstr.getOpObjects(0)[0].getName().lower() == regName: 291 | if curInstr.getOperandType(1) == OP_TYPE_IMMEDIATE: 292 | targetValue = curInstr.getOpObjects(1)[0].getValue() 293 | elif curInstr.getOperandType(1) == OP_TYPE_NO_CALL_REG: 294 | targetValue = Get_Operand_As_Immediate_Value(curInstr, 1) 295 | break 296 | elif (curMnem == 'xor'): 297 | operand1 = curInstr.getOpObjects(0)[0] 298 | operand2 = curInstr.getOpObjects(1)[0] 299 | op1Type = curInstr.getOperandType(0) 300 | op2Type = curInstr.getOperandType(1) 301 | 302 | if (op1Type == OP_TYPE_NO_CALL_REG) and (op2Type == OP_TYPE_NO_CALL_REG): 303 | if (operand1.getName().lower() == regName) and (operand2.getName().lower() == regName): 304 | targetValue = 0 305 | break 306 | elif (curMnem == 'pop') and (curInstr.getOperandType(0) == OP_TYPE_NO_CALL_REG): 307 | if curInstr.getOpObjects(0)[0].getName().lower() == regName: 308 | # find previous push 309 | # NOTE: assumes previous push corresponds to pop but 310 | # will fail if there is a function call in-between 311 | tmpCurInstr = curInstr.getPrevious() 312 | while True: 313 | # check to make sure tmpCurInstr is valid 314 | if tmpCurInstr == None: break 315 | if funcBody != None: 316 | if funcBody.contains(tmpCurInstr.getAddress()) == False: break 317 | 318 | if tmpCurInstr.getMnemonicString().lower() == 'push': 319 | if tmpCurInstr.getOperandType(0) == OP_TYPE_IMMEDIATE: 320 | targetValue = tmpCurInstr.getOpObjects(0)[0].getValue() 321 | break 322 | 323 | # break out of outer while loop 324 | break 325 | # if operand is a stack variable 326 | elif opType == OP_TYPE_NO_CALL_STACK: 327 | stackOffset = targetInstr.getOperandReferences(operandIndex)[0].getStackOffset() 328 | 329 | # search for previous location where stack variable value was set 330 | curInstr = targetInstr 331 | while True: 332 | curInstr = curInstr.getPrevious() 333 | 334 | # check to make sure curInstr is valid 335 | if curInstr == None: break 336 | if funcBody != None: 337 | if funcBody.contains(curInstr.getAddress()) == False: break 338 | 339 | # find where stack variable was set 340 | curMnem = curInstr.getMnemonicString().lower() 341 | if (curMnem == 'mov') and (curInstr.getOperandType(0) == OP_TYPE_NO_CALL_STACK): 342 | if curInstr.getOperandReferences(0)[0].getStackOffset() == stackOffset: 343 | if curInstr.getOperandType(1) == OP_TYPE_IMMEDIATE: 344 | targetValue = curInstr.getOpObjects(1)[0].getValue() 345 | break 346 | 347 | 348 | 349 | 350 | return targetValue 351 | 352 | def Get_Operand_As_String(targetInstr, operandIndex): 353 | ''' 354 | returns the value for the operandIndex operand of the 355 | target instruction treated as a string. 356 | operandIndex starts at 0 357 | 358 | If this is called on jumps or calls, the final 359 | address jumped to / called will be returned 360 | 361 | ''' 362 | 363 | # error check 364 | if operandIndex >= targetInstr.getNumOperands(): 365 | print('[*] Error in Get_Operand_As_String. operandIndex is too large at {:s}'.format(targetInstr.getAddress().toString())) 366 | return None 367 | elif targetInstr.getNumOperands() == 0: 368 | return None 369 | 370 | 371 | operand = targetInstr.getOpObjects(operandIndex)[0] 372 | 373 | return operand.toString() 374 | 375 | 376 | 377 | 378 | 379 | -------------------------------------------------------------------------------- /Preview_Function_Capabilities.py: -------------------------------------------------------------------------------- 1 | # Names unindentified functions with a nomenclature that provides a preview of included capabilities within the function 2 | #@author https://AGDCServices.com 3 | #@category AGDCservices 4 | #@keybinding 5 | #@menupath 6 | #@toolbar 7 | 8 | ''' 9 | This script will name all unidentified functions with a nomenclature 10 | that provides a preview of important capabilities included within the 11 | function and all child functions. 12 | 13 | The script includes a list of hardcoded important API calls. The 14 | script will locate all calls contained in the unidentifed function 15 | and it's children functions. For any of the calls which match 16 | the hardcoded API call list, a shorthand name will be applied to 17 | indicate which category of important call is contained within the function. 18 | 19 | The naming nomenclature is based on capability and does not identify 20 | specific APIs. By keeping the syntax short and just for capability, 21 | you can get a preview of all the important capabilities within a function 22 | without having the name get enormous. 23 | 24 | The naming convention is as follows: 25 | - all funtions automatically named will start with a f_p__ 26 | - a function will only be renamed if it starts with either the 27 | Ghidra default function name, or this scripts default function name. 28 | If any other name is found, it is expected the function was either 29 | manually named or identified by a library signature, and it is 30 | assumed those names are more accurate than the automated preview name. 31 | - each category will be seperated by a double underscore 32 | - within each catagory, a specific capability is identified by a 33 | single "preview" letter. 34 | - if the preview letter is uppercase, it means the capability 35 | was found in the current function. If the preview letter is 36 | lowercase, it means the capability was found somewhere in a 37 | child function. 38 | - the last entry of the preview name will be the function address 39 | This is because Ghidra allows duplicate names, but when a name is 40 | selected, all copies are highlighted based only on the name. 41 | Because you often get duplicates of the preview name, adding the 42 | functions address to the end will make each name unique so you can 43 | easily differentiate functions with the same base preview name. 44 | 45 | One exception to the naming convention are functions which are the 46 | start of a thread. These functions will only have the category 47 | TS applied and will not contain any capability preview. 48 | Because the thread starts are almost like mini-programs, this 49 | identifer is used just to identify the starting functions so you 50 | can manually review them to determine the general capabilities 51 | 52 | The preview letters are all single characters that are typically 53 | the first letter of the capability. The categories and preview 54 | letters used are below. To see the specific API calls that 55 | correspond to each capability, see the list at the top of the 56 | function, Build_New_Func_Name() 57 | 58 | 59 | TS = thread start (no further capability preview will be applied) 60 | 61 | netw = networking functionality 62 | b = build 63 | c = connect 64 | l = listen 65 | s = send 66 | r = receive 67 | t = terminate 68 | m = modify 69 | 70 | reg = registry functionality 71 | h = handle 72 | r = read 73 | w = write 74 | d = delete 75 | 76 | file = file processing functionality 77 | h = handle 78 | r = read 79 | w = write 80 | d = delete 81 | c = copy 82 | m = move 83 | e = enumerate 84 | 85 | proc = process manipulation functionality 86 | h = handle 87 | e = enumerate 88 | c = create 89 | t = terminate 90 | r = read process memory 91 | w = write process memory 92 | 93 | serv = service manipulation functionality 94 | h = handle 95 | c = create 96 | d = delete 97 | s = start 98 | r = read 99 | w = write 100 | 101 | thread = thread functionality 102 | c = create 103 | o = open 104 | s = suspend 105 | r = resume 106 | 107 | str = string manipulation functionality 108 | c = compare 109 | 110 | zc = there were no call instructions in the function 111 | 112 | xref = number of cross references for the function 113 | 114 | ''' 115 | 116 | 117 | import re 118 | import collections 119 | 120 | 121 | GHIDRA_FUNC_PREFIX = 'FUN_' 122 | CUSTOM_AUTO_FUNC_PREFIX = 'f_p__' 123 | CUSTOM_AUTO_THREAD_FUNC_PREFIX = 'f_p__TS__' 124 | 125 | OP_TYPE_PUSH_REGISTER = 512 126 | #OP_TYPE_CALL_REGISTER_NO_REFERENCE = 516 127 | #OP_TYPE_CALL_REGISTER_WITH_REFERENCE = 8708 128 | OP_TYPE_CALL_STATIC_FUNCTION = 8256 129 | OP_TYPE_CALL_DATA_VARIABLE = 8324 # with or without known reference 130 | #OP_TYPE_CALL_STACK_VARIABLE = 4202500 131 | 132 | 133 | 134 | def main(): 135 | 136 | print('{:s}\n{:s}'.format('=' * 100, 'Function_Preview Script Starting')) 137 | 138 | 139 | 140 | # 141 | # rename thread start functions 142 | # do this first so to potentially create new functions 143 | # because often the thread start functions don't get 144 | # analyzed by default 145 | # 146 | 147 | # get initial thread starts 148 | threadRootsList = Get_Thread_Roots() 149 | 150 | # rename thread starts with auto name 151 | for rootEa in threadRootsList: 152 | newFuncName = '{:s}{:s}{:s}'.format(CUSTOM_AUTO_THREAD_FUNC_PREFIX , GHIDRA_FUNC_PREFIX, rootEa.toString()) 153 | 154 | curFunc = getFunctionAt(rootEa) 155 | if curFunc == None: 156 | createFunction(rootEa, newFuncName) 157 | else: 158 | curFunc.setName(newFuncName, ghidra.program.model.symbol.SourceType.USER_DEFINED) 159 | 160 | 161 | 162 | # 163 | # get list of all functions to rename and leaf nodes. Get leaf nodes by 164 | # checking if each function is a parent. leaf nodes will not be a parent functions 165 | # ignore library / thunk functions 166 | # 167 | 168 | 169 | # start with all unidentified functions, i.e. all functions that start with the 170 | # Ghidra standard function prefix or this scripts custom function prefix 171 | # assume any other function name was either named from a library signature or manually by 172 | # a user, and you don't want to overwrite those function names. Also ignore thunk functions 173 | # skip thread start functions because having all of the target functionality added to the 174 | # thread function name is generally overkill. 175 | funcList = [f for f in currentProgram.getListing().getFunctions(True) if f.getName().startswith( (GHIDRA_FUNC_PREFIX, CUSTOM_AUTO_FUNC_PREFIX) ) and not f.getName().startswith(CUSTOM_AUTO_THREAD_FUNC_PREFIX)] 176 | funcList = [f for f in funcList[:] if f.isThunk() == False] 177 | 178 | # identify all parent nodes within unidentified function set 179 | parentNodes = set() 180 | for curFunc in funcList: 181 | curParentNodes = curFunc.getCallingFunctions(monitor) 182 | parentNodes.update(curParentNodes) 183 | 184 | 185 | # store all functions that are not a parent as a leaf node 186 | leafNodes = [f for f in funcList if f not in parentNodes ] 187 | 188 | 189 | 190 | # 191 | # recusively apply renaming to unidentified functions starting from leaf nodes 192 | # up through parents. This will ensure child functionality is propagated 193 | # up through the parent functions 194 | # 195 | # do recursively until no changes are made. This will ensure that all of the 196 | # child function capabilities are propagated up through the parents 197 | # 198 | while True: 199 | funcRenamedCount = 0 200 | nodesTraversed = set() 201 | curNodes = leafNodes[:] 202 | while True: 203 | 204 | # rename each function in current level of nodes 205 | parentNodes = set() 206 | for curFunc in curNodes: 207 | 208 | # rename function and track if new name is actually different than old name 209 | # this count is used to determine when to finish recursively renaming functions 210 | oldFuncName = curFunc.getName() 211 | newFuncNameProposed = Build_New_Func_Name(curFunc) 212 | curFunc.setName(newFuncNameProposed, ghidra.program.model.symbol.SourceType.USER_DEFINED) 213 | newFuncNameActual = curFunc.getName() 214 | if oldFuncName != newFuncNameActual: funcRenamedCount += 1 215 | 216 | # add current function into nodesTraversed so you can check for infinite loops 217 | nodesTraversed.add(curFunc) 218 | 219 | # get parent nodes that are in the unidentified functions list 220 | # ignore any parents not in that list assuming they are library 221 | # calls or other functions we don't want to overwrite 222 | curParentNodes = curFunc.getCallingFunctions(monitor) 223 | parentNodes.update( curParentNodes & set(funcList) ) 224 | 225 | # remove any functions from the nodesTraversed list to eliminate infinite loops 226 | parentNodes = parentNodes - nodesTraversed 227 | 228 | 229 | # inner whie loop exit condition 230 | if len(parentNodes) == 0: break 231 | 232 | # copy parentNodes to curNodes to rename in next iteration of loop 233 | curNodes = parentNodes.copy() 234 | 235 | # outer while loop exit condition 236 | if funcRenamedCount == 0: break 237 | 238 | 239 | print('{:s}\n{:s}'.format('Function_Preview Script Completed', '=' * 100)) 240 | 241 | 242 | 243 | 244 | def Get_Prev_Target_Instruction(curInstr, mnem, N, MAX_INSTRUCTIONS = 9999): 245 | ''' 246 | gets N'th previous target instruction from the curInstr 247 | function will only go back MAX_INSTRUCTIONS 248 | function will not search outside of current function if the 249 | current instruction is inside a defined function 250 | returns None on failure 251 | ''' 252 | 253 | 254 | # get address set of current function to use in determining if prev instruction 255 | # is outside of current function 256 | try: 257 | funcBody = getFunctionContaining(curInstr.getAddress()).getBody() 258 | except: 259 | funcBody = None 260 | 261 | 262 | # get Nth prev instruction 263 | totalInstructionCount = 0 264 | targetInstructionCount = 0 265 | while (totalInstructionCount < MAX_INSTRUCTIONS) and (targetInstructionCount < N): 266 | curInstr = curInstr.getPrevious() 267 | 268 | if curInstr == None: break 269 | if funcBody != None: 270 | if funcBody.contains(curInstr.getAddress()) == False: break 271 | 272 | if curInstr.getMnemonicString().lower() == mnem.lower(): targetInstructionCount += 1 273 | 274 | totalInstructionCount += 1 275 | 276 | 277 | # return the results 278 | if targetInstructionCount == N: 279 | result = curInstr 280 | else: 281 | result = None 282 | 283 | return result 284 | 285 | 286 | 287 | 288 | 289 | 290 | def Get_Thread_Roots(): 291 | ''' 292 | returns a list of addresses of the root functions for all threads 293 | found in the program 294 | ''' 295 | 296 | # list of thread creation functions 297 | funcNamesList = ['CreateThread', '_beginthreadex', '__beginthreadex', '_beginthread', '__beginthread'] 298 | 299 | # go through every thread create option 300 | threadStartEaSet = set() 301 | for funcName in funcNamesList: 302 | # set thread start argument because it is different number based on API used 303 | argIndex = 1 if funcName.lstrip('_') == 'beginthread' else 3 304 | 305 | # get list of API references 306 | funcList = list(currentProgram.getSymbolTable().getSymbols(funcName)) 307 | if len(funcList) == 0: continue 308 | 309 | # get all references to target function 310 | funcReferences = funcList[0].getReferences() 311 | 312 | for ref in funcReferences: 313 | 314 | # if reference location is a call instruction 315 | if 'call' not in ref.getReferenceType().getName().lower(): continue 316 | 317 | # find the actual thread start function 318 | refInstr = getInstructionAt(ref.getFromAddress()) 319 | mnemInstr = Get_Prev_Target_Instruction(refInstr, 'push', argIndex, 10) 320 | if mnemInstr == None: continue 321 | 322 | 323 | # get thread start address 324 | if mnemInstr.getOperandType(0) == OP_TYPE_PUSH_REGISTER: 325 | # if thread start was a register, look for root address where register 326 | # value was set 327 | regStr = mnemInstr.getRegister(0).getName().lower() 328 | for i in range(5): 329 | mnemInstr = Get_Prev_Target_Instruction(mnemInstr, 'mov', 1, 10) 330 | if mnemInstr == None: break 331 | 332 | if mnemInstr.getRegister(0).getName().lower() == regStr: 333 | rootEa = mnemInstr.getOperandReferences(1)[0].getToAddress() 334 | if getFunctionContaining(rootEa) != None: threadStartEaSet.add(rootEa) 335 | 336 | break 337 | else: 338 | # assume normal push offset 339 | rootEa = mnemInstr.getOperandReferences(0)[0].getToAddress() 340 | threadStartEaSet.add(rootEa) 341 | 342 | 343 | 344 | return threadStartEaSet 345 | 346 | 347 | 348 | def Build_New_Func_Name(func): 349 | ''' 350 | function will return a string for naming functionality based on desired 351 | functionality found 352 | functionality is split into categories. Each category has a 353 | single identifier to indicate a generic capability for that category 354 | e.g. netwCSR = network category, connect, send, and receive capabilities 355 | ''' 356 | 357 | # use ordered dictionary so that categories are always printed 358 | # in the same order 359 | categoryNomenclatureDict = collections.OrderedDict() 360 | categoryNomenclatureDict['netw'] = ['b','c','l','s','r','t','m'] 361 | categoryNomenclatureDict['reg'] = ['h','r','w','d'] 362 | categoryNomenclatureDict['file'] = ['h','r','w','d','c','m','e'] 363 | categoryNomenclatureDict['proc'] = ['h','e','c','t','r','w'] 364 | categoryNomenclatureDict['serv'] = ['h','c','d','s','r','w'] 365 | categoryNomenclatureDict['thread'] = ['c','o','s','r'] 366 | categoryNomenclatureDict['str'] = ['c'] 367 | 368 | 369 | 370 | # for dictionary, list only the basenames, leave off prefixes of '_' 371 | # and any suffix such as Ex, ExA, etc. These will be stripped from 372 | # the functions calleed to account for all variations 373 | apiPurposeDict = { 374 | 'socket':'netwB', 375 | 376 | #WSAStartup':'netwC', 377 | 'connect':'netwC', 378 | 'InternetOpen':'netwC', 379 | 'InternetConnect':'netwC', 380 | 'InternetOpenURL':'netwC', 381 | 'HttpOpenRequest':'netwC', 382 | 'WinHttpConnect':'netwC', 383 | 'WinHttpOpenRequest':'netwC', 384 | 385 | 'bind':'netwL', 386 | 'listen':'netwL', 387 | 'accept':'netwL', 388 | 389 | 'send':'netwS', 390 | 'sendto':'netwS', 391 | 'InternetWriteFile':'netwS', 392 | 'HttpSendRequest':'netwS', 393 | 'WSASend':'netwS', 394 | 'WSASendTo':'netwS', 395 | 'WinHttpSendRequest':'netwS', 396 | 'WinHttpWriteData':'netwS', 397 | 398 | 'recv':'netwR', 399 | 'recvfrom':'netwR', 400 | 'InternetReadFile':'netwR', 401 | 'HttpReceiveHttpRequest':'netwR', 402 | 'WSARecv':'netwR', 403 | 'WSARecvFrom':'netwR', 404 | 'WinHttpReceiveResponse':'netwR', 405 | 'WinHttpReadData':'netwR', 406 | 'URLDownloadToFile':'netwR', 407 | 408 | 'inet_addr':'netwM', 409 | 'htons':'netwM', 410 | 'htonl':'netwM', 411 | 'ntohs':'netwM', 412 | 'ntohl':'netwM', 413 | 414 | # to common due to error conditions 415 | # basically becomes background noise 416 | # 417 | #'closesocket':'netwT', 418 | #'shutdown':'netwT', 419 | 420 | 421 | 'RegOpenKey':'regH', 422 | 423 | 'RegQueryValue':'regR', 424 | 'RegGetValue':'regR', 425 | 'RegEnumValue':'regR', 426 | 427 | 'RegSetValue':'regW', 428 | 'RegSetKeyValue':'regW', 429 | 430 | 'RegDeleteValue':'regD', 431 | 'RegDeleteKey':'regD', 432 | 'RegDeleteKeyValue':'regD', 433 | 434 | 'RegCreateKey':'regC', 435 | 436 | 'CreateFile':'fileH', 437 | 'fopen':'fileH', 438 | 439 | 'fscan':'fileR', 440 | 'fgetc':'fileR', 441 | 'fgets':'fileR', 442 | 'fread':'fileR', 443 | 'ReadFile':'fileR', 444 | 445 | 'flushfilebuffers':'fileW', 446 | 'fprintf':'fileW', 447 | 'fputc':'fileW', 448 | 'fputs':'fileW', 449 | 'fwrite':'fileW', 450 | 'WriteFile':'fileW', 451 | 452 | 'DeleteFile':'fileD', 453 | 454 | 'CopyFile':'fileC', 455 | 456 | 'MoveFile':'fileM', 457 | 458 | 'FindFirstFile':'fileE', 459 | 'FindNextFile':'fileE', 460 | 461 | 'strcmp':'strC', 462 | 'strncmp':'strC', 463 | 'stricmp':'strC', 464 | 'wcsicmp':'strC', 465 | 'mbsicmp':'strC', 466 | 'lstrcmp':'strC', 467 | 'lstrcmpi':'strC', 468 | 469 | 'OpenService':'servH', 470 | 471 | 'QueryServiceStatus':'servR', 472 | 'QueryServiceConfig':'servR', 473 | 474 | 'ChangeServiceConfig':'servW', 475 | 'ChangeServiceConfig2':'servW', 476 | 477 | 'CreateService':'servC', 478 | 479 | 'DeleteService':'servD', 480 | 481 | 'StartService':'servS', 482 | 483 | 'CreateToolhelp32Snapshot':'procE', 484 | 'Process32First':'procE', 485 | 'Process32Next':'procE', 486 | 487 | 'OpenProcess':'procH', 488 | 489 | 'CreateProcess':'procC', 490 | 'CreateProcessAsUser':'procC', 491 | 'CreateProcessWithLogon':'procC', 492 | 'CreateProcessWithToken':'procC', 493 | 'ShellExecute':'procC', 494 | 495 | # to common due to error conditions 496 | # basically becomes background noise 497 | # 498 | #'ExitProcess':'procT', 499 | #'TerminateProcess':'procT', 500 | 501 | 'ReadProcessMemory':'procR', 502 | 503 | 'WriteProcessMemory':'procW', 504 | 505 | 'CreateThread':'threadC', 506 | 'beginthread':'threadC', 507 | 'beginthreadex':'threadC', # EXCEPTION: include ex because it's lowercase and won't be caught by case-sensitive suffix stripper routine later 508 | 509 | 'OpenThread':'threadO', 510 | 511 | 'SuspendThread':'threadS', 512 | 513 | 'ResumeThread':'threadR', 514 | 515 | } 516 | 517 | 518 | # get function info 519 | funcOrigName = func.getName() 520 | funcAddressSet = func.getBody() 521 | 522 | # get count of number of times current function is called 523 | refToCount = getSymbolAt(func.getEntryPoint()).getReferenceCount() 524 | 525 | # get all calls in current function 526 | callList = [] 527 | curInstr = getInstructionAt(func.getEntryPoint()) 528 | while ( (curInstr != None) and (funcAddressSet.contains(curInstr.getAddress()) == True) ): 529 | if curInstr.getMnemonicString().lower() == 'call': callList.append(curInstr) 530 | curInstr = curInstr.getNext() 531 | 532 | 533 | 534 | # remove any recursive calls, otherwise any functionality in function 535 | # will also be treated as child functionality and appended to child 536 | # portion of name 537 | recursiveList = [] 538 | for curCall in callList: 539 | curOpRef = curCall.getOperandReferences(0) 540 | 541 | # skip calls to registers or any type that doesn't store adddress information 542 | if len(curOpRef) == 0: continue 543 | 544 | # check operand reference to make sure it's not recursive 545 | if curOpRef[0].getToAddress().equals(func.getEntryPoint()) == True: 546 | recursiveList.append(curCall) 547 | callList = list(set(callList) - set(recursiveList)) 548 | 549 | 550 | 551 | # if no calls, return appropriate response 552 | if len(callList) == 0: 553 | # check if functiton is a thunk 554 | if func.isThunk() == True: 555 | callList.append(getInstructionAt(func.getEntryPoint())) 556 | else: 557 | # otherwise, return zero call 558 | return '{:s}zc_{:s}{:s}__xref_{:02d}'.format(CUSTOM_AUTO_FUNC_PREFIX, GHIDRA_FUNC_PREFIX, func.getEntryPoint().toString(), refToCount) 559 | 560 | 561 | # 562 | # if calls are found, try to identify functionality 563 | # 564 | apiUsed = set() 565 | 566 | # process calls with external reference 567 | for curCall in callList: 568 | if curCall.getExternalReference(0) != None: 569 | # extract API basename to ignore prefix/suffix, e.g. _, Ex, ExA 570 | curApiName = curCall.getExternalReference(0).getLabel() 571 | pattern = '^(?:FID_conflict:)?(?:_)*(?P.+?)(?:A|W|Ex|ExA|ExW)?(?:@[a-fA-F0-9]+)?$' 572 | match = re.search(pattern, curApiName) 573 | curApiName = match.group('baseName') 574 | 575 | # add current API name to summary set 576 | apiUsed.add(curApiName) 577 | 578 | 579 | # process calls to statically linked functions 580 | for curCall in callList: 581 | if curCall.getOperandType(0) == OP_TYPE_CALL_STATIC_FUNCTION: 582 | curApiName = getFunctionAt(curCall.getReferencesFrom()[0].getToAddress()).getName() 583 | if curApiName.startswith((GHIDRA_FUNC_PREFIX, CUSTOM_AUTO_FUNC_PREFIX, CUSTOM_AUTO_THREAD_FUNC_PREFIX )) == False: 584 | # extract API basename to ingnore prefix/suffix, e.g. _, Ex, ExA 585 | pattern = '^(?:FID_conflict:)?(?:_)*(?P.+?)(?:A|W|Ex|ExA|ExW)?(?:@[a-fA-F0-9]+)?$' 586 | match = re.search(pattern, curApiName) 587 | curApiName = match.group('baseName') 588 | 589 | # add current API name to summary set 590 | apiUsed.add(curApiName) 591 | 592 | 593 | # process calls to function pointers stored in data variables 594 | for curCall in callList: 595 | if curCall.getOperandType(0) == OP_TYPE_CALL_DATA_VARIABLE: 596 | curOpEa = curCall.getReferencesFrom()[0].getToAddress() 597 | curData = getDataAt(curOpEa) 598 | 599 | # getDataAt should return data object for defined and undefined data, 600 | # but there seems to be a bug and sometimes returns None on undefined data 601 | if curData == None: curData = getUndefinedDataAt(curOpEa) 602 | 603 | # get the data variable label 604 | if curData.getExternalReference(0) != None: 605 | curApiName = curData.getExternalReference(0).getLabel() 606 | else: 607 | curApiName = curData.getLabel() 608 | 609 | 610 | if curApiName.lower().startswith(('dat_', 'byte_', 'word_', 'dword_', 'qword_')) == False: 611 | # extract API basename to ingnore prefix/suffix, e.g. _, Ex, ExA 612 | pattern = '^(?:FID_conflict:)?(?:_)*(?P.+?)(?:A|W|Ex|ExA|ExW)?(?:@[a-fA-F0-9]+)?$' 613 | match = re.search(pattern, curApiName) 614 | curApiName = match.group('baseName') 615 | 616 | # add current API name to summary set 617 | apiUsed.add(curApiName) 618 | 619 | 620 | 621 | 622 | # map API's called to functionality to use for naming 623 | implementedApiPurpose = set() 624 | for entry in apiUsed: 625 | implementedApiPurpose.add(apiPurposeDict.get(entry)) 626 | 627 | 628 | # identify functionality from child functions already renamed by this script 629 | # this will allow api usage to propagate up to the root function 630 | childFunctionImplementedApiPurpose = dict() 631 | for curCall in callList: 632 | if curCall.getOperandType(0) == OP_TYPE_CALL_STATIC_FUNCTION: 633 | curApiName = getFunctionAt(curCall.getReferencesFrom()[0].getToAddress()).getName() 634 | if curApiName.startswith(CUSTOM_AUTO_FUNC_PREFIX) == True: 635 | 636 | # pull out api capabilities based on naming convention 637 | for category in categoryNomenclatureDict: 638 | pattern = category + '_' + '([a-zA-Z]+)+_?([a-zA-Z]+)?' 639 | match = re.search(pattern, curApiName) 640 | 641 | # if category is found, save into results 642 | if match is not None: 643 | apiPurpose = set() 644 | if match.group(1) is not None: apiPurpose.update(list(match.group(1).lower())) 645 | if match.group(2) is not None: apiPurpose.update(list(match.group(2).lower())) 646 | if category in childFunctionImplementedApiPurpose: 647 | childFunctionImplementedApiPurpose[category].update(apiPurpose) 648 | else: 649 | childFunctionImplementedApiPurpose[category] = apiPurpose 650 | 651 | 652 | 653 | # 654 | # create function name based on API functionality found 655 | # 656 | 657 | newFuncNamePurpose = '' 658 | 659 | # for each category, loop through all the nomenclature symbols 660 | # if the symbol is found in the current function, add it to the parent string 661 | # if the symbol is found in a child function, add it to child string 662 | for category in categoryNomenclatureDict: 663 | 664 | # build the symbol list for the parent function 665 | parentStr = '' 666 | for symbol in categoryNomenclatureDict[category]: 667 | if (category + symbol.upper()) in implementedApiPurpose: 668 | parentStr += symbol.upper() 669 | 670 | 671 | # build the symbol list for the child functions 672 | childStr = '' 673 | if category in childFunctionImplementedApiPurpose: 674 | for symbol in categoryNomenclatureDict[category]: 675 | if symbol.lower() in childFunctionImplementedApiPurpose[category]: 676 | childStr += symbol.lower() 677 | 678 | # combine the parent / child symbol list into one final string 679 | if (len(parentStr) > 0) or (len(childStr) > 0): 680 | newFuncNamePurpose = newFuncNamePurpose + category 681 | if len(parentStr) > 0: newFuncNamePurpose = newFuncNamePurpose + '_' + parentStr 682 | if len(childStr) > 0: newFuncNamePurpose = newFuncNamePurpose + '_' + childStr 683 | newFuncNamePurpose = newFuncNamePurpose + '__' 684 | 685 | 686 | 687 | 688 | 689 | 690 | # build the final function name 691 | if len(newFuncNamePurpose) > 0: 692 | # targeted functionality found 693 | finalFuncName = '{:s}{:s}xref_{:02d}_{:s}'.format(CUSTOM_AUTO_FUNC_PREFIX, newFuncNamePurpose, refToCount, func.getEntryPoint().toString()) 694 | else: 695 | # no targeted functionality identified 696 | finalFuncName = '{:s}{:s}{:s}__xref_{:02d}'.format(CUSTOM_AUTO_FUNC_PREFIX, GHIDRA_FUNC_PREFIX, func.getEntryPoint().toString(), refToCount) 697 | 698 | 699 | 700 | return finalFuncName 701 | 702 | 703 | 704 | 705 | 706 | 707 | if __name__ == '__main__': 708 | main() 709 | 710 | --------------------------------------------------------------------------------