├── .gitignore ├── LICENSE.md ├── README.md ├── extract_decomps ├── README.md ├── assets │ └── image-20220726143115698.png ├── example │ ├── README.md │ ├── assets │ │ ├── image-20220726141542478.png │ │ └── image-20220726141821528.png │ └── test.c ├── extract.py └── requirements.txt └── g3po ├── .gitignore ├── G3PO.png ├── README.md └── g3po.py /.gitignore: -------------------------------------------------------------------------------- 1 | venv 2 | .vscode 3 | .DS_Store 4 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Tenable 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Tenable Ghidra Tools 2 | 3 | This is a repository of Ghidra-related tools and scripts open-sourced by Tenable. 4 | 5 | 6 | 7 | ## Tools 8 | 9 | * [extract.py](https://github.com/tenable/ghidra_tools/tree/main/extract_decomps) - Python script that makes use of the Ghidra Bridge to extract all decompiled functions from Ghirda 10 | * [g3po.py](https://github.com/tenable/ghidra_tools/tree/main/g3po) - Jython script that queries OpenAI's large language models (gpt-3.5-turbo by default, but compatible with gpt-4) for explanatory comments on decompiled functions. 11 | -------------------------------------------------------------------------------- /extract_decomps/README.md: -------------------------------------------------------------------------------- 1 | # Ghidra Decomp Extractor 2 | 3 | Python script to extract all Ghidra function decompilations from the currently loaded program. 4 | 5 | 6 | 7 | ## Setup 8 | 9 | ### Ghidra Setup 10 | 11 | * Install Ghidra Bridge for Python3 via https://github.com/justfoxing/ghidra_bridge 12 | 13 | * Start ghidra bridge background server 14 | 15 | ![image-20220726143115698](./assets/image-20220726143115698.png) 16 | 17 | * Don't forget to shut down the Ghidra Bridge when you're getting ready to close the analysis window! 18 | 19 | 20 | 21 | ### Python Setup 22 | 23 | * Create and initialize your desired base Python environment. 24 | 25 | * Install dependencies 26 | 27 | ``` 28 | pip install -r requirements.txt 29 | ``` 30 | 31 | 32 | 33 | ## Usage 34 | 35 | ```k 36 | $ python extract.py -h 37 | usage: extract.py [-h] [-o OUTPUT] [-v] [-t TIMEOUT] 38 | 39 | Extract ghidra decompilation output for currently loaded program. 40 | 41 | optional arguments: 42 | -h, --help show this help message and exit 43 | -o OUTPUT, --output OUTPUT 44 | Set output directory (default is current directory + program name) 45 | -v, --verbose Display verbose logging output 46 | -t TIMEOUT, --timeout TIMEOUT 47 | Custom timeout for individual function decompilation (default = 1000) 48 | ``` -------------------------------------------------------------------------------- /extract_decomps/assets/image-20220726143115698.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tenable/ghidra_tools/3d121c22d230dee3386046ad2544cf500365292b/extract_decomps/assets/image-20220726143115698.png -------------------------------------------------------------------------------- /extract_decomps/example/README.md: -------------------------------------------------------------------------------- 1 | This example is provided as a quick sanity check to verify that everything is installed and working correctly. 2 | 3 | ## Build Example 4 | 5 | ``` 6 | $ gcc test.c 7 | $ ./a.out 8 | What is your name? 9 | # dino 10 | Hello, dino! 11 | ``` 12 | 13 | ## Running Extract Script 14 | 15 | 1. Open `a.out` in Ghidra 16 | 17 | 2. Run auto-analysis within Ghidra 18 | 19 | 3. Start the Ghidra Bridge 20 | 21 | ![image-20220726141542478](./assets/image-20220726141542478.png) 22 | 23 | 4. Run the extraction script 24 | 25 | ``` 26 | $ python extract.py 27 | INFO:root:Program Name: a.out 28 | INFO:root:Creation Date: Tue Jul 26 13:51:21 EDT 2022 29 | INFO:root:Language ID: AARCH64:LE:64:AppleSilicon 30 | INFO:root:Compiler Spec ID: default 31 | INFO:root:Using 'a.out_extraction' as output directory... 32 | INFO:root:Extracting decompiled functions... 33 | INFO:root:Extracted 7 out of 7 functions 34 | 35 | $ tree a.out_extraction 36 | a.out_extraction 37 | ├── ___stack_chk_fail@100003f6c.c 38 | ├── ___stack_chk_fail@10000c000.c 39 | ├── _printf@100003f78.c 40 | ├── _printf@10000c010.c 41 | ├── _scanf@100003f84.c 42 | ├── _scanf@10000c018.c 43 | └── entry@100003edc.c 44 | ``` 45 | 46 | 5. Verify output as needed![image-20220726141821528](./assets/image-20220726141821528.png) -------------------------------------------------------------------------------- /extract_decomps/example/assets/image-20220726141542478.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tenable/ghidra_tools/3d121c22d230dee3386046ad2544cf500365292b/extract_decomps/example/assets/image-20220726141542478.png -------------------------------------------------------------------------------- /extract_decomps/example/assets/image-20220726141821528.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tenable/ghidra_tools/3d121c22d230dee3386046ad2544cf500365292b/extract_decomps/example/assets/image-20220726141821528.png -------------------------------------------------------------------------------- /extract_decomps/example/test.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | int main () { 4 | char name[20]; 5 | 6 | printf("What is your name?\n# "); 7 | scanf("%s", name); 8 | printf("Hello, %s!\n", name); 9 | 10 | return(0); 11 | } -------------------------------------------------------------------------------- /extract_decomps/extract.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | import argparse 4 | import logging 5 | import os 6 | 7 | import ghidra_bridge 8 | 9 | # Load Ghidra Bridge and make Ghidra namespace available 10 | TIMEOUT = 1000 11 | gb = ghidra_bridge.GhidraBridge(namespace=globals(), response_timeout=TIMEOUT) 12 | 13 | def get_program_info(): 14 | """Gather information for currentProgram in Ghidra.""" 15 | logging.debug("Gathering program information...") 16 | program_info = {} 17 | program_info["program_name"] = currentProgram.getName() 18 | program_info["creation_date"] = gb.remote_eval("currentProgram.getCreationDate()") 19 | program_info["language_id"] = gb.remote_eval("currentProgram.getLanguageID()") 20 | program_info["compiler_spec_id"] = gb.remote_eval("currentProgram.getCompilerSpec().getCompilerSpecID()") 21 | 22 | logging.info(f"Program Name: {program_info['program_name']}") 23 | logging.info(f"Creation Date: {program_info['creation_date']}") 24 | logging.info(f"Language ID: {program_info['language_id']}") 25 | logging.info(f"Compiler Spec ID: {program_info['compiler_spec_id']}") 26 | 27 | return program_info 28 | 29 | def create_output_dir(path): 30 | """ 31 | Create directory to store decompiled functions to. Will error and exit if 32 | the directory already exists and contains files. 33 | 34 | path: File path to desired directory 35 | """ 36 | logging.info(f"Using '{path}' as output directory...") 37 | 38 | if os.path.isdir(path): 39 | if os.listdir(path): 40 | logging.error(f"{path} already contains files!") 41 | exit() 42 | return path 43 | 44 | os.mkdir(path) 45 | 46 | def extract_decomps(output_dir): 47 | logging.info("Extracting decompiled functions...") 48 | decomp = ghidra.app.decompiler.DecompInterface() 49 | decomp.openProgram(currentProgram) 50 | functions = list(currentProgram.functionManager.getFunctions(True)) 51 | failed_to_extract = [] 52 | count = 0 53 | 54 | for function in functions: 55 | logging.debug(f"Decompiling {function.name}") 56 | decomp_res = decomp.decompileFunction(function, TIMEOUT, monitor) 57 | 58 | if decomp_res.isTimedOut(): 59 | logging.warning("Timed out while attempting to decompile '{function.name}'") 60 | elif not decomp_res.decompileCompleted(): 61 | logging.error(f"Failed to decompile {function.name}") 62 | logging.error(" Error: " + decomp_res.getErrorMessage()) 63 | failed_to_extract.append(function.name) 64 | continue 65 | 66 | decomp_src = decomp_res.getDecompiledFunction().getC() 67 | 68 | try: 69 | filename = f"{function.name}@{function.getEntryPoint()}.c" 70 | path = os.path.join(output_dir, filename) 71 | with open(path, "w") as f: 72 | logging.debug(f"Saving to '{path}'") 73 | f.write(decomp_src) 74 | count += 1 75 | except Exception as e: 76 | logging.error(e) 77 | failed_to_extract.append(function.name) 78 | continue 79 | 80 | logging.info(f"Extracted {str(count)} out of {str(len(functions))} functions") 81 | if failed_to_extract: 82 | logging.warning("Failed to extract the following functions:\n\n - " + "\n - ".join(failed_to_extract)) 83 | 84 | def main(output_dir=None): 85 | """Main function.""" 86 | program_info = get_program_info() 87 | 88 | # Default output directory to current directory + program name + _extraction 89 | if output_dir is None: 90 | output_dir = program_info["program_name"] + "_extraction" 91 | 92 | create_output_dir(output_dir) 93 | extract_decomps(output_dir) 94 | 95 | 96 | if __name__ == "__main__": 97 | parser = argparse.ArgumentParser(description="Extract ghidra decompilation output for currently loaded program.") 98 | parser.add_argument("-o", "--output", help="Set output directory (default is current directory + program name)") 99 | parser.add_argument("-v", "--verbose", action="count", help="Display verbose logging output") 100 | parser.add_argument("-t", "--timeout", type=int, help="Custom timeout for individual function decompilation (default = 1000)") 101 | args = parser.parse_args() 102 | 103 | if args.output: 104 | output_dir = args.output 105 | else: 106 | output_dir = None 107 | 108 | if args.verbose: 109 | logging.getLogger().setLevel(logging.DEBUG) 110 | else: 111 | logging.getLogger().setLevel(logging.INFO) 112 | 113 | if args.timeout: 114 | TIMEOUT = args.timeout 115 | 116 | main(output_dir=output_dir) 117 | -------------------------------------------------------------------------------- /extract_decomps/requirements.txt: -------------------------------------------------------------------------------- 1 | ghidra_bridge 2 | -------------------------------------------------------------------------------- /g3po/.gitignore: -------------------------------------------------------------------------------- 1 | scratch.py 2 | g3posay.py 3 | __pycache__ 4 | -------------------------------------------------------------------------------- /g3po/G3PO.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tenable/ghidra_tools/3d121c22d230dee3386046ad2544cf500365292b/g3po/G3PO.png -------------------------------------------------------------------------------- /g3po/README.md: -------------------------------------------------------------------------------- 1 | # G-3PO: A Protocol Droid for Ghidra 2 | 3 | (The acronym probably stands for "Ghidra gpt-3 Program Oracle", or something like that.) 4 | 5 | For a detailed writeup on the tool, and its rationale, see [G-3PO: A Protocol Droid for Ghidra](https://medium.com/tenable-techblog/g-3po-a-protocol-droid-for-ghidra-4b46fa72f1ff), on the Tenable TechBlog. 6 | 7 | 8 | ## Installing and Using G-3PO 9 | 10 | G-3PO is ready for use. The only catch is that it does require an OpenAI API key, and the text completion service is unfree (as in beer, and as insofar as the model’s a black box). It is, however, reasonably cheap, and even with heavy use I haven’t spent more than the price of a cup of coffee while developing, debugging, and toying around with this tool. 11 | 12 | To run the script: 13 | - get yourself an OpenAI or Anthropic API key (G-3PO supports LLM backends from both companies) 14 | - add the key as an environment variable by putting export `OPENAI_API_KEY=whateveryourkeyhappenstobe` or `ANTHROPIC_API_KEY=youranthropickeyifyouhaveone` in your `~/.profile` file, or any other file that will be sourced before you launch Ghidra 15 | - copy or symlink g3po.py to your Ghidra scripts directory 16 | - add that directory in the Script Manager window 17 | - visit the decompiler window for a function you’d like some assistance interpreting 18 | - and then either run the script from the Script Manager window by selecting it and hitting the ▶️ icon, or bind it to a hotkey and strike when needed 19 | 20 | Ideally, I’d like to provide a way for the user to twiddle the various parameters used to solicit a response from model, such as the “temperature” in the request (high temperatures — approaching 2.0 — solicit a more adventurous response, while low temperatures instruct the model to respond conservatively), all from within Ghidra. There’s bound to be a way to do this, but it seems neither the Ghidra API documentation, Google, nor even ChatGPT are offering me much help in that regard, so for now you can adjust the settings by editing the global variables declared near the beginning of the g3po.py source file: 21 | 22 | ```python 23 | ########################################################################################## 24 | # Script Configuration 25 | ########################################################################################## 26 | MODEL = "gpt-3.5-turbo" # Choose which large language model we query -- gpt-4 and claude-v1.2 also supported 27 | TEMPERATURE = 0.19 # Set higher for more adventurous comments, lower for more conservative 28 | TIMEOUT = 600 # How many seconds should we wait for a response from OpenAI? 29 | MAXTOKENS = 512 # The maximum number of tokens to request from OpenAI 30 | C3POSAY = True # True if you want the cute C-3PO ASCII art, False otherwise 31 | LANGUAGE = "English" # This can also be used as a style parameter. 32 | EXTRA = "" # Extra text appended to the prompt. 33 | LOGLEVEL = INFO # Adjust for more or less line noise in the console. 34 | COMMENTWIDTH = 80 # How wide the comment, inside the little speech balloon, should be. 35 | G3POASCII = r""" 36 | /~\ 37 | |oo ) 38 | _\=/_ 39 | / \ 40 | //|/.\|\\ 41 | || \_/ || 42 | || |\ /| || 43 | # \_ _/ # 44 | | | | 45 | | | | 46 | []|[] 47 | | | | 48 | /_]_[_\ 49 | """ 50 | ########################################################################################## 51 | ``` 52 | 53 | The LANGUAGE and EXTRA parameters provide the user with an easy way to play with the form of the LLM’s commentary. 54 | 55 | -------------------------------------------------------------------------------- /g3po/g3po.py: -------------------------------------------------------------------------------- 1 | # Query OpenAI for a comment 2 | # @author Olivia Lucca Fraser 3 | # @category Machine Learning 4 | # @keybinding Ctrl-G 5 | # @menupath File.Analysis.G-3PO Analyse function with GPT 6 | # @toolbar G3PO.png 7 | 8 | ########################################################################################## 9 | # Script Configuration 10 | ########################################################################################## 11 | MODEL = "gpt-3.5-turbo" # Choose which large language model we query 12 | MODEL = askChoice("Model", "Please choose a language model to query", ["text-davinci-003", "gpt-3.5-turbo", "gpt-4", "claude-v1.2"], "gpt-3.5-turbo") 13 | # If you have an OpenAI API key, gpt-3.5-turbo gives you the best bang for your buck. 14 | # Use gpt-4 for slightly higher quality results, at a higher cost. 15 | # If you have an Anthropic API key, try claude-v1.2, which also seems to work quite well. 16 | # Set higher for more adventurous comments, lower for more conservative 17 | TEMPERATURE = 0.05 18 | TIMEOUT = 600 # How many seconds should we wait for a response from OpenAI? 19 | MAXTOKENS = 1024 # The maximum number of tokens to request from OpenAI 20 | G3POSAY = True # True if you want the cute C-3PO ASCII art, False otherwise 21 | # LANGUAGE = "the form of a sonnet" # This can also be used as a style parameter for the comment 22 | LANGUAGE = "English" # This can also be used as a style parameter for the comment 23 | EXTRA = "" # Extra text appended to the prompt. 24 | # EXTRA = "but write everything in the form of a sonnet" # for example 25 | # How wide the comment, inside the little speech balloon, should be. 26 | COMMENTWIDTH = 80 27 | RENAME_FUNCTION = False # Rename function per G3PO's suggestions 28 | RENAME_VARIABLES = True # Rename variables per G3PO's suggestions 29 | OVERRIDE_COMMENTS = True # Override existing comments 30 | G3POASCII = r""" 31 | /~\ 32 | |oo ) 33 | _\=/_ 34 | / \ 35 | //|/.\|\\ 36 | || \_/ || 37 | || |\ /| || 38 | # \_ _/ # 39 | | | | 40 | | | | 41 | []|[] 42 | | | | 43 | /_]_[_\ 44 | """ 45 | SEND_ASSEMBLY = False 46 | ########################################################################################## 47 | 48 | ## 49 | # Note: I've updated this script so that it runs in the Ghidrathon Python 3 environment. 50 | # It should remain backwards-compatible with the Jython 2.7 environment. 51 | ## 52 | 53 | import textwrap 54 | import logging 55 | from logging import DEBUG, INFO, WARNING, ERROR, CRITICAL 56 | import json 57 | import os 58 | import sys 59 | import re 60 | import ghidra 61 | from ghidra.app.script import GhidraScript 62 | from ghidra.program.model.listing import Function, FunctionManager 63 | from ghidra.program.model.mem import MemoryAccessException 64 | from ghidra.util.exception import DuplicateNameException 65 | from ghidra.program.model.symbol import SourceType 66 | from ghidra.program.model.pcode import HighFunctionDBUtil 67 | from ghidra.app.decompiler import DecompileOptions 68 | from ghidra.app.decompiler import DecompInterface 69 | from ghidra.util.task import ConsoleTaskMonitor 70 | from ghidra.program.flatapi import FlatProgramAPI 71 | 72 | LOGLEVEL = INFO # Adjust for more or less line noise in the console. 73 | 74 | # The way we handle the API calls will vary depending on whether we're running jython 75 | # or python3. Jython doesn't have the requests library, so we'll use httplib instead. 76 | try: 77 | import httplib 78 | def send_https_request(address, path, data, headers): 79 | try: 80 | conn = httplib.HTTPSConnection(address) 81 | json_req_data = json.dumps(data) 82 | conn.request("POST", path, json_req_data, headers) 83 | response = conn.getresponse() 84 | json_data = response.read() 85 | conn.close() 86 | try: 87 | data = json.loads(json_data) 88 | return data 89 | except ValueError as e: 90 | logging.error("Could not parse JSON response: {e}".format(e=e)) 91 | logging.debug(json_data) 92 | return None 93 | except Exception as e: 94 | logging.error("Error sending HTTPS request: {e}".format(e=e)) 95 | return None 96 | except ImportError: 97 | import requests 98 | def send_https_request(address, path, data, headers): 99 | try: 100 | response = requests.post( 101 | "https://{address}{path}".format(address=address, path=path), 102 | json=data, 103 | headers=headers) 104 | try: 105 | data = response.json() 106 | return data 107 | except ValueError as e: 108 | logging.error("Could not parse JSON response: {e}".format(e=e)) 109 | logging.debug(response.text) 110 | return None 111 | except Exception as e: 112 | logging.error("Error sending HTTPS request: {e}".format(e=e)) 113 | return None 114 | 115 | try: 116 | import tiktoken 117 | ENCODING = tiktoken.encoding_for_model(MODEL) 118 | 119 | def estimate_number_of_tokens(s): 120 | if type(s) == str: 121 | return len(ENCODING.encode(s)) 122 | elif type(s) == list: 123 | for item in s: 124 | token_count += estimate_number_of_tokens(item) 125 | return token_count 126 | elif type(s) == dict: 127 | for k,v in s.items(): 128 | token_count += estimate_number_of_tokens(v) + 2 129 | 130 | except ImportError: 131 | 132 | def estimate_number_of_tokens(s): 133 | return int(len(s)/2.3) 134 | 135 | 136 | SOURCE = "AI" 137 | TAG = SOURCE + " generated comment, take with a grain of salt:" 138 | FOOTER = "Model: {model}, Temperature: {temperature}".format( 139 | model=MODEL, temperature=TEMPERATURE) 140 | 141 | logging.getLogger().setLevel(LOGLEVEL) 142 | 143 | STATE = getState() 144 | PROGRAM = state.getCurrentProgram() 145 | FLATAPI = FlatProgramAPI(PROGRAM) 146 | 147 | 148 | def get_api_key(): 149 | vendor = "ANTHROPIC" if MODEL.startswith("claude") else "OPENAI" 150 | try: 151 | return os.environ[vendor + "_API_KEY"] 152 | except KeyError as ke: 153 | try: 154 | home = os.environ["HOME"] 155 | keyfile = ".{v}_api_key".format(v=vendor.lower()) 156 | with open(os.path.join(home, keyfile)) as f: 157 | line = f.readline().strip() 158 | return line.split("=")[1].strip('"\'') 159 | except Exception as e: 160 | logging.error( 161 | "Could not find {v} API key. Please set the {v}_API_KEY environment variable. Errors: {ke}, {e}".format(ke=ke, e=e, v=vendor)) 162 | sys.exit(1) 163 | 164 | 165 | def flatten_list(l): 166 | return [item for sublist in l for item in sublist] 167 | 168 | 169 | def wordwrap(s, width=COMMENTWIDTH, pad=True): 170 | """Wrap a string to a given number of characters, but don't break words.""" 171 | # first replace single line breaks with double line breaks 172 | lines = [textwrap.TextWrapper(width=width, 173 | break_long_words=False, 174 | break_on_hyphens=True, 175 | replace_whitespace=False).wrap(" " + L) 176 | for L in s.splitlines()] 177 | # now flatten the lines list 178 | lines = flatten_list(lines) 179 | if pad: 180 | lines = [line.ljust(width) for line in lines] 181 | return "\n".join(lines) 182 | 183 | 184 | def boxedtext(text, width=COMMENTWIDTH, tag=TAG): 185 | wrapped = wordwrap(text, width, pad=True) 186 | wrapped = "\n".join([tag.ljust(width), " ".ljust( 187 | width), wrapped, " ".ljust(width), FOOTER.ljust(width)]) 188 | side_bordered = "|" + wrapped.replace("\n", "|\n|") + "|" 189 | top_border = "/" + "-" * (len(side_bordered.split("\n")[0]) - 2) + "\\" 190 | bottom_border = top_border[::-1] 191 | return top_border + "\n" + side_bordered + "\n" + bottom_border 192 | 193 | 194 | def g3posay(text, width=COMMENTWIDTH, character=G3POASCII, tag=TAG): 195 | box = boxedtext(text, width, tag=tag) 196 | headwidth = len(character.split("\n")[1]) + 2 197 | return box + "\n" + " "*headwidth + "/" + character 198 | 199 | 200 | def escape_unescaped_single_quotes(s): 201 | return re.sub(r"(? $new 353 | 354 | Then suggest a name for the function by printing it on its own line using the format 355 | 356 | $old :: $new 357 | 358 | If you observe any security vulnerabilities in the code, describe them in detail, and explain how they might be exploited. Do you understand? 359 | """.format(lang=lang, style=LANGUAGE) 360 | system_msg = {"role": "system", "content": intro} 361 | prompt = """Here is code from the function {function_name}:\n\n``` 362 | {code} 363 | ``` 364 | """.format(function_name=function_name, code=code) 365 | ack_msg = {"role": "assistant", "content": "Yes, I understand. Please show me the code."} 366 | prompt_msg = {"role": "user", "content": prompt} 367 | return [system_msg, ack_msg, prompt_msg] 368 | 369 | 370 | 371 | def generate_comment(code, function_name, temperature=0.19, program_info=None, model=MODEL, max_tokens=MAXTOKENS): 372 | prompt = build_prompt_for_function(code, function_name) 373 | logging.debug("Prompt:\n\n{prompt}".format(prompt=prompt)) 374 | response = query( 375 | prompt=prompt, 376 | temperature=temperature, 377 | max_tokens=max_tokens, 378 | model=MODEL) 379 | return response 380 | 381 | 382 | def add_explanatory_comment_to_current_function(temperature=0.19, model=MODEL, max_tokens=MAXTOKENS): 383 | function = get_current_function() 384 | function_name = function.getName() 385 | if function is None: 386 | logging.error("Failed to get current function") 387 | return None 388 | old_comment = function.getComment() 389 | if old_comment is not None: 390 | if OVERRIDE_COMMENTS or SOURCE in old_comment: 391 | logging.info("Removing old comment.") 392 | function.setComment(None) 393 | else: 394 | logging.info("Function {function_name} already has a comment".format( 395 | function_name=function_name)) 396 | return None 397 | code = get_code(function) 398 | if code is None: 399 | logging.error("Failed to {action} current function {function_name}".format( 400 | function_name=function_name, action="disassemble" if SEND_ASSEMBLY else "decompile")) 401 | return 402 | approximate_tokens = estimate_number_of_tokens(code) 403 | logging.info("Length of decompiled C code: {code_len} characters, guessing {approximate_tokens} tokens".format( 404 | code_len=len(code), approximate_tokens=approximate_tokens)) 405 | comment = generate_comment(code, function_name=function_name, 406 | temperature=temperature, model=model, max_tokens=max_tokens) 407 | if comment is None: 408 | logging.error("Failed to generate comment") 409 | sys.exit(1) 410 | if G3POSAY: 411 | comment = g3posay(comment) 412 | else: 413 | comment = TAG + "\n" + comment 414 | listing = currentProgram.getListing() 415 | function = listing.getFunctionContaining(currentAddress) 416 | try: 417 | function.setComment(comment) 418 | except DuplicateNameException as e: 419 | logging.error("Failed to set comment: {e}".format(e=e)) 420 | return 421 | logging.info("Added comment to function: {function_name}".format( 422 | function_name=function.getName())) 423 | return comment, code 424 | 425 | 426 | def parse_response_for_vars(comment): 427 | """takes block comment from AI, yields tuple of str old name & new name for each var""" 428 | # The LLM will sometimes wrap variable names in backticks, and sometimes prepend a dollar sign. 429 | # We want to ignore those artifacts. 430 | regex = re.compile(r'[`$]?([A-Za-z_][A-Za-z_0-9]*)`? -> [`$]?([A-Za-z_][A-Za-z_0-9]*)`?') 431 | for line in comment.split('\n'): 432 | m = regex.search(line) 433 | if m: 434 | old, new = m.groups() 435 | logging.debug("Found suggestion to rename {old} to {new}".format(old=old, new=new)) 436 | if old == new or new == 'new': 437 | continue 438 | yield old, new 439 | 440 | 441 | def parse_response_for_function_name(comment): 442 | """takes block comment from GPT, yields new function name""" 443 | regex = re.compile('[`$]?([A-Za-z_][A-Za-z_0-9]*)`? :: [$`]?([A-Za-z_][A-Za-z_0-9]*)`?') 444 | for line in comment.split('\n'): 445 | m = regex.search(line) 446 | if m: 447 | logging.debug("Renaming function to {new}".format(new=m.group(2))) 448 | _, new = m.groups() 449 | return new 450 | 451 | 452 | def rename_var(old_name, new_name, variables): 453 | """takes an old and new variable name from listing and renames it 454 | old_name: str, old variable name 455 | new_name: str, new variable name 456 | variables: {str, Variable}, vars in the func we're working in """ 457 | try: 458 | var_to_rename = variables.get(old_name) 459 | if var_to_rename: 460 | var_to_rename.setName(new_name, SourceType.USER_DEFINED) 461 | var_to_rename.setComment( 462 | 'GP3O renamed this from {} to {}'.format(old_name, new_name)) 463 | logging.debug( 464 | 'GP3O renamed variable {} to {}'.format(old_name, new_name)) 465 | else: 466 | logging.debug('GP3O wanted to rename variable {} to {}, but no Variable found'.format( 467 | old_name, new_name)) 468 | 469 | # only deals with listing vars, need to work with decomp to get the rest 470 | except KeyError: 471 | pass 472 | 473 | 474 | # https://github.com/NationalSecurityAgency/ghidra/issues/1561#issuecomment-590025081 475 | def rename_data(old_name, new_name): 476 | """takes an old and new data name, finds the data and renames it 477 | old_name: str, old variable name of the form DAT_{addr} 478 | new_name: str, new variable name""" 479 | new_name = new_name.upper() 480 | address = int(old_name.strip('DAT_'), 16) 481 | sym = FLATAPI.getSymbolAt(FLATAPI.toAddr(address)) 482 | sym.setName(new_name, SourceType.USER_DEFINED) 483 | logging.debug('GP3O renamed Data {} to {}'.format(old_name, new_name)) 484 | 485 | 486 | def rename_high_variable(symbols, old_name, new_name, data_type=None): 487 | """takes a high variable object, a new name, and, optionally, a data type 488 | and sets the name and data type of the high variable in the program database""" 489 | 490 | if old_name not in symbols: 491 | logging.debug('GP3O wanted to rename variable {} to {}, but no variable found'.format( 492 | old_name, new_name)) 493 | return 494 | hv = symbols[old_name] 495 | 496 | if data_type is None: 497 | data_type = hv.getDataType() 498 | 499 | # if running in Jython, we may need to ensure that the new name is in unicode 500 | try: 501 | new_name = unicode(new_name) 502 | except NameError: 503 | pass 504 | try: 505 | res = HighFunctionDBUtil.updateDBVariable(hv, 506 | new_name, 507 | data_type, 508 | SourceType.ANALYSIS) 509 | logging.debug("Renamed {} to {}".format(old_name, new_name, res)) 510 | return res 511 | except DuplicateNameException as e: 512 | logging.error("Failed to rename {} to {}: {}".format( 513 | old_name, new_name, e)) 514 | return None 515 | 516 | 517 | 518 | def apply_renaming_suggestions(comment, code): 519 | logging.info('Renaming variables...') 520 | 521 | func = get_current_function() 522 | func_name = func.getName() 523 | new_func_name = None 524 | 525 | if RENAME_VARIABLES: 526 | raw_vars = [v for v in func.getAllVariables()] 527 | variables = {var.getName(): var for var in raw_vars} 528 | logging.debug("Variables: {}".format(variables)) 529 | 530 | # John coming in clutch again 531 | # https://github.com/NationalSecurityAgency/ghidra/issues/2143#issuecomment-665300865 532 | options = DecompileOptions() 533 | monitor = ConsoleTaskMonitor() 534 | ifc = DecompInterface() 535 | ifc.setOptions(options) 536 | ifc.openProgram(func.getProgram()) 537 | res = ifc.decompileFunction(func, TIMEOUT, monitor) 538 | high_func = res.getHighFunction() 539 | lsm = high_func.getLocalSymbolMap() 540 | symbols = lsm.getSymbols() 541 | symbols = {var.getName(): var for var in symbols} 542 | logging.debug("Symbols: {}".format(symbols)) 543 | 544 | for old, new in parse_response_for_vars(comment): 545 | if re.match(r"^DAT_[0-9a-f]+$", old): # Globals with default names 546 | # suffix = old.split('_')[-1] # on second thought, we don't want stale address info 547 | # in a non-dynamic variable name 548 | try: 549 | # handy to retain the address info here 550 | rename_data(old, new) 551 | except Exception as e: 552 | logging.error('Failed to rename data: {}'.format(e)) 553 | elif old in symbols and symbols[old] is not None: 554 | try: 555 | rename_high_variable(symbols, old, new) 556 | except Exception as e: 557 | logging.error('Failed to rename variable: {}'.format(e)) 558 | else: 559 | # check for hallucination 560 | if old not in code: 561 | logging.error("G3PO wanted to rename variable {} to {}, but it may have been hallucinating.".format(old, new)) 562 | elif old == func_name: 563 | new_func_name = new 564 | else: 565 | logging.error("GP3O wanted to rename variable {old} to {new}, but {old} was not found in the symbol table.".format(old=old, new=new)) 566 | 567 | if func.getName().startswith('FUN_') or RENAME_FUNCTION: 568 | fn = parse_response_for_function_name(comment) 569 | new_func_name = fn or new_func_name # it may have been named with variable renaming syntax 570 | if new_func_name: 571 | func.setName(new_func_name, SourceType.USER_DEFINED) 572 | logging.debug('G3P0 renamed function to {}'.format(new_func_name)) 573 | 574 | 575 | comment, code = add_explanatory_comment_to_current_function(temperature=0.19, model=MODEL, max_tokens=MAXTOKENS) 576 | 577 | if comment is not None and (RENAME_FUNCTION or RENAME_VARIABLES): 578 | apply_renaming_suggestions(comment, code) 579 | 580 | --------------------------------------------------------------------------------