├── .gitignore
├── LICENSE.md
├── README.md
├── extract_decomps
    ├── README.md
    ├── assets
    │   └── image-20220726143115698.png
    ├── example
    │   ├── README.md
    │   ├── assets
    │   │   ├── image-20220726141542478.png
    │   │   └── image-20220726141821528.png
    │   └── test.c
    ├── extract.py
    └── requirements.txt
└── g3po
    ├── .gitignore
    ├── G3PO.png
    ├── README.md
    └── g3po.py


/.gitignore:
--------------------------------------------------------------------------------
1 | venv
2 | .vscode
3 | .DS_Store
4 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Tenable
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Tenable Ghidra Tools
 2 | 
 3 | This is a repository of Ghidra-related tools and scripts open-sourced by Tenable.
 4 | 
 5 | 
 6 | 
 7 | ## Tools
 8 | 
 9 | * [extract.py](https://github.com/tenable/ghidra_tools/tree/main/extract_decomps) - Python script that makes use of the Ghidra Bridge to extract all decompiled functions from Ghirda
10 | * [g3po.py](https://github.com/tenable/ghidra_tools/tree/main/g3po) - Jython script that queries OpenAI's large language models (gpt-3.5-turbo by default, but compatible with gpt-4) for explanatory comments on decompiled functions.
11 | 


--------------------------------------------------------------------------------
/extract_decomps/README.md:
--------------------------------------------------------------------------------
 1 | # Ghidra Decomp Extractor
 2 | 
 3 | Python script to extract all Ghidra function decompilations from the currently loaded program.
 4 | 
 5 | 
 6 | 
 7 | ## Setup
 8 | 
 9 | ### Ghidra Setup
10 | 
11 | * Install Ghidra Bridge for Python3 via https://github.com/justfoxing/ghidra_bridge
12 | 
13 | * Start ghidra bridge background server
14 | 
15 |   ![image-20220726143115698](./assets/image-20220726143115698.png)
16 | 
17 | * Don't forget to shut down the Ghidra Bridge when you're getting ready to close the analysis window!
18 | 
19 |   
20 | 
21 | ### Python Setup
22 | 
23 | * Create and initialize your desired base Python environment.
24 | 
25 | * Install dependencies
26 | 
27 |   ```
28 |   pip install -r requirements.txt
29 |   ```
30 | 
31 | 
32 | 
33 | ## Usage
34 | 
35 | ```k
36 | $ python extract.py -h
37 | usage: extract.py [-h] [-o OUTPUT] [-v] [-t TIMEOUT]
38 | 
39 | Extract ghidra decompilation output for currently loaded program.
40 | 
41 | optional arguments:
42 |   -h, --help            show this help message and exit
43 |   -o OUTPUT, --output OUTPUT
44 |                         Set output directory (default is current directory + program name)
45 |   -v, --verbose         Display verbose logging output
46 |   -t TIMEOUT, --timeout TIMEOUT
47 |                         Custom timeout for individual function decompilation (default = 1000)
48 | ```


--------------------------------------------------------------------------------
/extract_decomps/assets/image-20220726143115698.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tenable/ghidra_tools/3d121c22d230dee3386046ad2544cf500365292b/extract_decomps/assets/image-20220726143115698.png


--------------------------------------------------------------------------------
/extract_decomps/example/README.md:
--------------------------------------------------------------------------------
 1 | This example is provided as a quick sanity check to verify that everything is installed and working correctly.
 2 | 
 3 | ## Build Example
 4 | 
 5 | ```
 6 | $ gcc test.c
 7 | $ ./a.out
 8 | What is your name?
 9 | # dino
10 | Hello, dino!
11 | ```
12 | 
13 | ## Running Extract Script
14 | 
15 | 1. Open `a.out` in Ghidra
16 | 
17 | 2. Run auto-analysis within Ghidra
18 | 
19 | 3. Start the Ghidra Bridge
20 | 
21 |    ![image-20220726141542478](./assets/image-20220726141542478.png)
22 | 
23 | 4. Run the extraction script
24 | 
25 |    ```
26 |    $ python extract.py
27 |    INFO:root:Program Name: a.out
28 |    INFO:root:Creation Date: Tue Jul 26 13:51:21 EDT 2022
29 |    INFO:root:Language ID: AARCH64:LE:64:AppleSilicon
30 |    INFO:root:Compiler Spec ID: default
31 |    INFO:root:Using 'a.out_extraction' as output directory...
32 |    INFO:root:Extracting decompiled functions...
33 |    INFO:root:Extracted 7 out of 7 functions
34 |    
35 |    $ tree a.out_extraction
36 |    a.out_extraction
37 |    ├── ___stack_chk_fail@100003f6c.c
38 |    ├── ___stack_chk_fail@10000c000.c
39 |    ├── _printf@100003f78.c
40 |    ├── _printf@10000c010.c
41 |    ├── _scanf@100003f84.c
42 |    ├── _scanf@10000c018.c
43 |    └── entry@100003edc.c
44 |    ```
45 | 
46 | 5. Verify output as needed![image-20220726141821528](./assets/image-20220726141821528.png)


--------------------------------------------------------------------------------
/extract_decomps/example/assets/image-20220726141542478.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tenable/ghidra_tools/3d121c22d230dee3386046ad2544cf500365292b/extract_decomps/example/assets/image-20220726141542478.png


--------------------------------------------------------------------------------
/extract_decomps/example/assets/image-20220726141821528.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tenable/ghidra_tools/3d121c22d230dee3386046ad2544cf500365292b/extract_decomps/example/assets/image-20220726141821528.png


--------------------------------------------------------------------------------
/extract_decomps/example/test.c:
--------------------------------------------------------------------------------
 1 | #include <stdio.h>
 2 | 
 3 | int main () {
 4 |    char name[20];
 5 | 
 6 |    printf("What is your name?\n# ");
 7 |    scanf("%s", name);
 8 |    printf("Hello, %s!\n", name);
 9 |    
10 |    return(0);
11 | }


--------------------------------------------------------------------------------
/extract_decomps/extract.py:
--------------------------------------------------------------------------------
  1 | #! /usr/bin/env python3
  2 | 
  3 | import argparse
  4 | import logging
  5 | import os
  6 | 
  7 | import ghidra_bridge
  8 | 
  9 | # Load Ghidra Bridge and make Ghidra namespace available
 10 | TIMEOUT = 1000
 11 | gb = ghidra_bridge.GhidraBridge(namespace=globals(), response_timeout=TIMEOUT)
 12 | 
 13 | def get_program_info():
 14 |     """Gather information for currentProgram in Ghidra."""
 15 |     logging.debug("Gathering program information...")
 16 |     program_info = {}
 17 |     program_info["program_name"] = currentProgram.getName()
 18 |     program_info["creation_date"] = gb.remote_eval("currentProgram.getCreationDate()")
 19 |     program_info["language_id"] = gb.remote_eval("currentProgram.getLanguageID()")
 20 |     program_info["compiler_spec_id"] = gb.remote_eval("currentProgram.getCompilerSpec().getCompilerSpecID()")
 21 |     
 22 |     logging.info(f"Program Name: {program_info['program_name']}")
 23 |     logging.info(f"Creation Date: {program_info['creation_date']}")
 24 |     logging.info(f"Language ID: {program_info['language_id']}")
 25 |     logging.info(f"Compiler Spec ID: {program_info['compiler_spec_id']}")
 26 | 
 27 |     return program_info
 28 | 
 29 | def create_output_dir(path):
 30 |     """
 31 |     Create directory to store decompiled functions to. Will error and exit if
 32 |     the directory already exists and contains files.
 33 | 
 34 |     path: File path to desired directory
 35 |     """
 36 |     logging.info(f"Using '{path}' as output directory...")
 37 | 
 38 |     if os.path.isdir(path):
 39 |         if os.listdir(path):
 40 |             logging.error(f"{path} already contains files!")
 41 |             exit()
 42 |         return path
 43 |     
 44 |     os.mkdir(path)
 45 | 
 46 | def extract_decomps(output_dir):
 47 |     logging.info("Extracting decompiled functions...")
 48 |     decomp = ghidra.app.decompiler.DecompInterface()
 49 |     decomp.openProgram(currentProgram)
 50 |     functions = list(currentProgram.functionManager.getFunctions(True))
 51 |     failed_to_extract = []
 52 |     count = 0
 53 | 
 54 |     for function in functions:
 55 |         logging.debug(f"Decompiling {function.name}")
 56 |         decomp_res = decomp.decompileFunction(function, TIMEOUT, monitor)
 57 | 
 58 |         if decomp_res.isTimedOut():
 59 |             logging.warning("Timed out while attempting to decompile '{function.name}'")
 60 |         elif not decomp_res.decompileCompleted():
 61 |             logging.error(f"Failed to decompile {function.name}")
 62 |             logging.error("    Error: " + decomp_res.getErrorMessage())
 63 |             failed_to_extract.append(function.name)
 64 |             continue
 65 |     
 66 |         decomp_src = decomp_res.getDecompiledFunction().getC()
 67 | 
 68 |         try:
 69 |             filename = f"{function.name}@{function.getEntryPoint()}.c"
 70 |             path = os.path.join(output_dir, filename)
 71 |             with open(path, "w") as f:
 72 |                 logging.debug(f"Saving to '{path}'")
 73 |                 f.write(decomp_src)
 74 |                 count += 1
 75 |         except Exception as e:
 76 |             logging.error(e)
 77 |             failed_to_extract.append(function.name)
 78 |             continue
 79 |     
 80 |     logging.info(f"Extracted {str(count)} out of {str(len(functions))} functions")
 81 |     if failed_to_extract:
 82 |         logging.warning("Failed to extract the following functions:\n\n  - " + "\n  - ".join(failed_to_extract))
 83 | 
 84 | def main(output_dir=None):
 85 |     """Main function."""
 86 |     program_info = get_program_info()
 87 | 
 88 |     # Default output directory to current directory + program name + _extraction
 89 |     if output_dir is None:
 90 |         output_dir = program_info["program_name"] + "_extraction"
 91 |     
 92 |     create_output_dir(output_dir)
 93 |     extract_decomps(output_dir)
 94 | 
 95 | 
 96 | if __name__ == "__main__":
 97 |     parser = argparse.ArgumentParser(description="Extract ghidra decompilation output for currently loaded program.")
 98 |     parser.add_argument("-o", "--output", help="Set output directory (default is current directory + program name)")
 99 |     parser.add_argument("-v", "--verbose", action="count", help="Display verbose logging output")
100 |     parser.add_argument("-t", "--timeout", type=int, help="Custom timeout for individual function decompilation (default = 1000)")
101 |     args = parser.parse_args()
102 | 
103 |     if args.output:
104 |         output_dir = args.output
105 |     else:
106 |         output_dir = None
107 |     
108 |     if args.verbose:
109 |         logging.getLogger().setLevel(logging.DEBUG)
110 |     else:
111 |         logging.getLogger().setLevel(logging.INFO)
112 | 
113 |     if args.timeout:
114 |         TIMEOUT = args.timeout
115 |  
116 |     main(output_dir=output_dir)
117 | 


--------------------------------------------------------------------------------
/extract_decomps/requirements.txt:
--------------------------------------------------------------------------------
1 | ghidra_bridge
2 | 


--------------------------------------------------------------------------------
/g3po/.gitignore:
--------------------------------------------------------------------------------
1 | scratch.py
2 | g3posay.py
3 | __pycache__
4 | 


--------------------------------------------------------------------------------
/g3po/G3PO.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tenable/ghidra_tools/3d121c22d230dee3386046ad2544cf500365292b/g3po/G3PO.png


--------------------------------------------------------------------------------
/g3po/README.md:
--------------------------------------------------------------------------------
 1 | # G-3PO: A Protocol Droid for Ghidra
 2 | 
 3 | (The acronym probably stands for "Ghidra gpt-3 Program Oracle", or something like that.)
 4 | 
 5 | For a detailed writeup on the tool, and its rationale, see [G-3PO: A Protocol Droid for Ghidra](https://medium.com/tenable-techblog/g-3po-a-protocol-droid-for-ghidra-4b46fa72f1ff), on the Tenable TechBlog.
 6 | 
 7 | 
 8 | ## Installing and Using G-3PO
 9 | 
10 | G-3PO is ready for use. The only catch is that it does require an OpenAI API key, and the text completion service is unfree (as in beer, and as insofar as the model’s a black box). It is, however, reasonably cheap, and even with heavy use I haven’t spent more than the price of a cup of coffee while developing, debugging, and toying around with this tool.
11 | 
12 | To run the script:
13 | - get yourself an OpenAI or Anthropic API key (G-3PO supports LLM backends from both companies)
14 | - add the key as an environment variable by putting export `OPENAI_API_KEY=whateveryourkeyhappenstobe` or `ANTHROPIC_API_KEY=youranthropickeyifyouhaveone` in your `~/.profile` file, or any other file that will be sourced before you launch Ghidra
15 | - copy or symlink g3po.py to your Ghidra scripts directory
16 | - add that directory in the Script Manager window
17 | - visit the decompiler window for a function you’d like some assistance interpreting
18 | - and then either run the script from the Script Manager window by selecting it and hitting the ▶️ icon, or bind it to a hotkey and strike when needed
19 | 
20 | Ideally, I’d like to provide a way for the user to twiddle the various parameters used to solicit a response from model, such as the “temperature” in the request (high temperatures — approaching 2.0 — solicit a more adventurous response, while low temperatures instruct the model to respond conservatively), all from within Ghidra. There’s bound to be a way to do this, but it seems neither the Ghidra API documentation, Google, nor even ChatGPT are offering me much help in that regard, so for now you can adjust the settings by editing the global variables declared near the beginning of the g3po.py source file:
21 | 
22 | ```python
23 | ##########################################################################################
24 | # Script Configuration
25 | ##########################################################################################
26 | MODEL = "gpt-3.5-turbo" # Choose which large language model we query -- gpt-4 and claude-v1.2 also supported
27 | TEMPERATURE = 0.19   # Set higher for more adventurous comments, lower for more conservative
28 | TIMEOUT = 600        # How many seconds should we wait for a response from OpenAI?
29 | MAXTOKENS = 512      # The maximum number of tokens to request from OpenAI
30 | C3POSAY = True       # True if you want the cute C-3PO ASCII art, False otherwise
31 | LANGUAGE = "English" # This can also be used as a style parameter.
32 | EXTRA = ""           # Extra text appended to the prompt.
33 | LOGLEVEL = INFO      # Adjust for more or less line noise in the console.
34 | COMMENTWIDTH = 80    # How wide the comment, inside the little speech balloon, should be.
35 | G3POASCII = r"""
36 |           /~\
37 |          |oo )
38 |          _\=/_
39 |         /     \
40 |        //|/.\|\\
41 |       ||  \_/  ||
42 |       || |\ /| ||
43 |        # \_ _/  #
44 |          | | |
45 |          | | |
46 |          []|[]
47 |          | | |
48 |         /_]_[_\
49 | """
50 | ##########################################################################################
51 | ```
52 | 
53 | The LANGUAGE and EXTRA parameters provide the user with an easy way to play with the form of the LLM’s commentary.
54 | 
55 | 


--------------------------------------------------------------------------------
/g3po/g3po.py:
--------------------------------------------------------------------------------
  1 | # Query OpenAI for a comment
  2 | # @author Olivia Lucca Fraser
  3 | # @category Machine Learning
  4 | # @keybinding Ctrl-G
  5 | # @menupath File.Analysis.G-3PO Analyse function with GPT
  6 | # @toolbar G3PO.png
  7 | 
  8 | ##########################################################################################
  9 | # Script Configuration
 10 | ##########################################################################################
 11 | MODEL = "gpt-3.5-turbo"  # Choose which large language model we query
 12 | MODEL = askChoice("Model", "Please choose a language model to query", ["text-davinci-003", "gpt-3.5-turbo", "gpt-4", "claude-v1.2"], "gpt-3.5-turbo")
 13 | # If you have an OpenAI API key, gpt-3.5-turbo gives you the best bang for your buck.
 14 | # Use gpt-4 for slightly higher quality results, at a higher cost.
 15 | # If you have an Anthropic API key, try claude-v1.2, which also seems to work quite well.
 16 | # Set higher for more adventurous comments, lower for more conservative
 17 | TEMPERATURE = 0.05
 18 | TIMEOUT = 600         # How many seconds should we wait for a response from OpenAI?
 19 | MAXTOKENS = 1024       # The maximum number of tokens to request from OpenAI
 20 | G3POSAY = True        # True if you want the cute C-3PO ASCII art, False otherwise
 21 | # LANGUAGE = "the form of a sonnet"  # This can also be used as a style parameter for the comment
 22 | LANGUAGE = "English"  # This can also be used as a style parameter for the comment
 23 | EXTRA = ""            # Extra text appended to the prompt.
 24 | # EXTRA = "but write everything in the form of a sonnet" # for example
 25 | # How wide the comment, inside the little speech balloon, should be.
 26 | COMMENTWIDTH = 80
 27 | RENAME_FUNCTION = False  # Rename function per G3PO's suggestions
 28 | RENAME_VARIABLES = True  # Rename variables per G3PO's suggestions
 29 | OVERRIDE_COMMENTS = True  # Override existing comments
 30 | G3POASCII = r"""
 31 |           /~\
 32 |          |oo )
 33 |          _\=/_
 34 |         /     \
 35 |        //|/.\|\\
 36 |       ||  \_/  ||
 37 |       || |\ /| ||
 38 |        # \_ _/  #
 39 |          | | |
 40 |          | | |
 41 |          []|[]
 42 |          | | |
 43 |         /_]_[_\
 44 | """
 45 | SEND_ASSEMBLY = False
 46 | ##########################################################################################
 47 | 
 48 | ## 
 49 | # Note: I've updated this script so that it runs in the Ghidrathon Python 3 environment.
 50 | # It should remain backwards-compatible with the Jython 2.7 environment.
 51 | ##
 52 | 
 53 | import textwrap
 54 | import logging
 55 | from logging import DEBUG, INFO, WARNING, ERROR, CRITICAL
 56 | import json
 57 | import os
 58 | import sys
 59 | import re
 60 | import ghidra
 61 | from ghidra.app.script import GhidraScript
 62 | from ghidra.program.model.listing import Function, FunctionManager
 63 | from ghidra.program.model.mem import MemoryAccessException
 64 | from ghidra.util.exception import DuplicateNameException
 65 | from ghidra.program.model.symbol import SourceType
 66 | from ghidra.program.model.pcode import HighFunctionDBUtil
 67 | from ghidra.app.decompiler import DecompileOptions
 68 | from ghidra.app.decompiler import DecompInterface
 69 | from ghidra.util.task import ConsoleTaskMonitor
 70 | from ghidra.program.flatapi import FlatProgramAPI
 71 | 
 72 | LOGLEVEL = INFO       # Adjust for more or less line noise in the console.
 73 | 
 74 | # The way we handle the API calls will vary depending on whether we're running jython
 75 | # or python3. Jython doesn't have the requests library, so we'll use httplib instead.
 76 | try:
 77 |     import httplib
 78 |     def send_https_request(address, path, data, headers):
 79 |         try:
 80 |             conn = httplib.HTTPSConnection(address)
 81 |             json_req_data = json.dumps(data)
 82 |             conn.request("POST", path, json_req_data, headers)
 83 |             response = conn.getresponse()
 84 |             json_data = response.read()
 85 |             conn.close()
 86 |             try:
 87 |                 data = json.loads(json_data)
 88 |                 return data
 89 |             except ValueError as e:
 90 |                 logging.error("Could not parse JSON response: {e}".format(e=e))
 91 |                 logging.debug(json_data)
 92 |                 return None
 93 |         except Exception as e:
 94 |             logging.error("Error sending HTTPS request: {e}".format(e=e))
 95 |             return None
 96 | except ImportError:
 97 |     import requests
 98 |     def send_https_request(address, path, data, headers):
 99 |         try:
100 |             response = requests.post(
101 |                 "https://{address}{path}".format(address=address, path=path),
102 |                 json=data,
103 |                 headers=headers)
104 |             try:
105 |                 data = response.json()
106 |                 return data
107 |             except ValueError as e:
108 |                 logging.error("Could not parse JSON response: {e}".format(e=e))
109 |                 logging.debug(response.text)
110 |                 return None
111 |         except Exception as e:
112 |             logging.error("Error sending HTTPS request: {e}".format(e=e))
113 |             return None
114 | 
115 | try:
116 |     import tiktoken
117 |     ENCODING = tiktoken.encoding_for_model(MODEL)
118 |     
119 |     def estimate_number_of_tokens(s):
120 |         if type(s) == str:
121 |             return len(ENCODING.encode(s))
122 |         elif type(s) == list:
123 |             for item in s:
124 |                 token_count += estimate_number_of_tokens(item)
125 |             return token_count
126 |         elif type(s) == dict:
127 |             for k,v in s.items():
128 |                 token_count += estimate_number_of_tokens(v) + 2
129 | 
130 | except ImportError:
131 | 
132 |     def estimate_number_of_tokens(s):
133 |         return int(len(s)/2.3)
134 | 
135 | 
136 | SOURCE = "AI"
137 | TAG = SOURCE + " generated comment, take with a grain of salt:"
138 | FOOTER = "Model: {model}, Temperature: {temperature}".format(
139 |     model=MODEL, temperature=TEMPERATURE)
140 | 
141 | logging.getLogger().setLevel(LOGLEVEL)
142 | 
143 | STATE = getState()
144 | PROGRAM = state.getCurrentProgram()
145 | FLATAPI = FlatProgramAPI(PROGRAM)
146 | 
147 | 
148 | def get_api_key():
149 |     vendor = "ANTHROPIC" if MODEL.startswith("claude") else "OPENAI"
150 |     try:
151 |         return os.environ[vendor + "_API_KEY"]
152 |     except KeyError as ke:
153 |         try:
154 |             home = os.environ["HOME"]
155 |             keyfile = ".{v}_api_key".format(v=vendor.lower())
156 |             with open(os.path.join(home, keyfile)) as f:
157 |                 line = f.readline().strip()
158 |                 return line.split("=")[1].strip('"\'')
159 |         except Exception as e:
160 |             logging.error(
161 |                 "Could not find {v} API key. Please set the {v}_API_KEY environment variable. Errors: {ke}, {e}".format(ke=ke, e=e, v=vendor))
162 |             sys.exit(1)
163 | 
164 | 
165 | def flatten_list(l):
166 |     return [item for sublist in l for item in sublist]
167 | 
168 | 
169 | def wordwrap(s, width=COMMENTWIDTH, pad=True):
170 |     """Wrap a string to a given number of characters, but don't break words."""
171 |     # first replace single line breaks with double line breaks
172 |     lines = [textwrap.TextWrapper(width=width,
173 |                                   break_long_words=False,
174 |                                   break_on_hyphens=True,
175 |                                   replace_whitespace=False).wrap("    " + L)
176 |              for L in s.splitlines()]
177 |     # now flatten the lines list
178 |     lines = flatten_list(lines)
179 |     if pad:
180 |         lines = [line.ljust(width) for line in lines]
181 |     return "\n".join(lines)
182 | 
183 | 
184 | def boxedtext(text, width=COMMENTWIDTH, tag=TAG):
185 |     wrapped = wordwrap(text, width, pad=True)
186 |     wrapped = "\n".join([tag.ljust(width), " ".ljust(
187 |         width), wrapped, " ".ljust(width), FOOTER.ljust(width)])
188 |     side_bordered = "|" + wrapped.replace("\n", "|\n|") + "|"
189 |     top_border = "/" + "-" * (len(side_bordered.split("\n")[0]) - 2) + "\\"
190 |     bottom_border = top_border[::-1]
191 |     return top_border + "\n" + side_bordered + "\n" + bottom_border
192 | 
193 | 
194 | def g3posay(text, width=COMMENTWIDTH, character=G3POASCII, tag=TAG):
195 |     box = boxedtext(text, width, tag=tag)
196 |     headwidth = len(character.split("\n")[1]) + 2
197 |     return box + "\n" + " "*headwidth + "/" + character
198 | 
199 | 
200 | def escape_unescaped_single_quotes(s):
201 |     return re.sub(r"(?<!\\)'", r"\\'", s)
202 | 
203 | 
204 | def is_chat_model(model):
205 |     return 'turbo' in model or 'gpt-4' in model
206 | 
207 | 
208 | def query(prompt, temperature=TEMPERATURE, max_tokens=MAXTOKENS, model=MODEL):
209 |     vendor = "anthropic" if MODEL.startswith("claude") else "openai"
210 |     if vendor == "anthropic":
211 |         return anthropic_request(prompt, temperature, max_tokens, model)
212 |     elif vendor == "openai":
213 |         return openai_request(prompt, temperature, max_tokens, model)
214 |     else:
215 |         raise ValueError("Unknown vendor: {v}".format(v=vendor))
216 | 
217 | 
218 | def openai_request(prompt, temperature=0.19, max_tokens=MAXTOKENS, model=MODEL):
219 |     chat = is_chat_model(model)
220 |     if not chat:
221 |         prompt = '\n'.join(m['content'] for m in prompt)
222 |     data = {
223 |         "model": MODEL,
224 |         "messages" if chat else "prompt": prompt,
225 |         "max_tokens": max_tokens,
226 |         "temperature": temperature
227 |     }
228 |     # The URL is "https://api.openai.com/v1/completions"
229 |     host = "api.openai.com"
230 |     path = "/v1/chat/completions" if chat else "/v1/completions"
231 |     headers = {
232 |         "Content-Type": "application/json",
233 |         "Authorization": "Bearer {openai_api_key}".format(openai_api_key=get_api_key())
234 |     }
235 |     res = send_https_request(host, path, data, headers)
236 |     if res is None:
237 |         logging.error("OpenAI request failed!")
238 |         return None
239 |     logging.info("OpenAI responded: {res}".format(res=res))
240 |     if 'error' in res:
241 |         logging.error("OpenAI error: {error}".format(error=res['error']['message']))
242 |         return None
243 |     if is_chat_model(model):
244 |         response = res['choices'][0]['message']['content'].strip()
245 |     else:
246 |         response = res['choices'][0]['text'].strip()
247 |     return response
248 | 
249 | 
250 | def anthropic_request(prompt, temperature=0.19, max_tokens=MAXTOKENS, model=MODEL):
251 |     ## Format the prompt
252 |     formatted_prompt = []
253 |     for message in prompt:
254 |         role = 'Assistant' if message['role'] == 'assistant' else 'Human'
255 |         formatted_prompt.append(
256 |                 "\n\n{role}: {content}".format(role=role, content=message['content']))
257 |     formatted_prompt = ''.join(formatted_prompt) + "\n\nAssistant: "
258 |     logging.debug(formatted_prompt)
259 |     ## Send the request
260 |     host = "api.anthropic.com"
261 |     path = "/v1/complete"
262 |     headers = {
263 |             "Content-Type": "application/json",
264 |             "x-api-key": get_api_key()
265 |     }
266 |     data = {
267 |         "model": model,
268 |         "prompt": formatted_prompt,
269 |         "max_tokens_to_sample": max_tokens,
270 |         "temperature": temperature,
271 |         "stop_sequences": ["\n\nHuman:"]
272 |         }
273 |     res = send_https_request(host, path, data, headers)
274 |     if res is None:
275 |         logging.error("Anthropic request failed!")
276 |         return None
277 |     logging.info("Anthropic request succeeded!")
278 |     logging.info("Response: {data}".format(data=res))
279 |     return res['completion'].strip()
280 | 
281 | 
282 | def get_current_function():
283 |     logging.debug("currentAddress: {currentAddress}".format(currentAddress=currentAddress))
284 |     listing = currentProgram.getListing()
285 |     function = listing.getFunctionContaining(currentAddress)
286 |     return function
287 | 
288 | 
289 | def decompile_current_function(function=None):
290 |     if function is None:
291 |         function = get_current_function()
292 |     logging.info("Current address is at {currentAddress}".format(
293 |         currentAddress=currentAddress.__str__()))
294 |     logging.info("Decompiling function: {function_name} at {function_entrypoint}".format(
295 |         function_name=function.getName(), function_entrypoint=function.getEntryPoint().__str__()))
296 |     decomp = ghidra.app.decompiler.DecompInterface()
297 |     decomp.openProgram(currentProgram)
298 |     decomp_res = decomp.decompileFunction(function, TIMEOUT, monitor)
299 |     if decomp_res.isTimedOut():
300 |         logging.warning("Timed out while attempting to decompile '{function_name}'".format(
301 |             function_name=function.getName()))
302 |     elif not decomp_res.decompileCompleted():
303 |         logging.error("Failed to decompile {function_name}".format(
304 |             function_name=function.getName()))
305 |         logging.error("    Error: " + decomp_res.getErrorMessage())
306 |         return None
307 |     decomp_src = decomp_res.getDecompiledFunction().getC()
308 |     return decomp_src
309 | 
310 | 
311 | def get_assembly(function=None):
312 |     if function is None:
313 |         function = get_current_function()
314 |     listing = currentProgram.getListing()
315 |     code_units = listing.getCodeUnits(function.getBody(), True)
316 |     assembly = "\n".join([code_unit.toString() for code_unit in code_units])
317 |     return assembly
318 | 
319 | 
320 | def get_code(function=None):
321 |     if SEND_ASSEMBLY:
322 |         return get_assembly(function=function)
323 |     else:
324 |         return decompile_current_function(function=function)
325 | 
326 | 
327 | def get_architecture():
328 |     """Return the architecture, word size, and endianness of the current program."""
329 |     arch = currentProgram.getLanguage().getProcessor().toString()
330 |     word_size = currentProgram.getLanguage().getLanguageDescription().getSize()
331 |     endianness = currentProgram.getLanguage(
332 |     ).getLanguageDescription().getEndian().toString()
333 |     return {'arch': arch, 'word_size': word_size, 'endianness': endianness}
334 | 
335 | 
336 | def lang_description():
337 |     lang = "C"
338 |     if SEND_ASSEMBLY:
339 |         arch_details = get_architecture()
340 |         arch = arch_details['arch']
341 |         word_size = arch_details['word_size']
342 |         endianness = arch_details['endianness']
343 |         lang = "{arch} {word_size}-bit {endianness}".format(
344 |             arch=arch, word_size=word_size, endianness=endianness)
345 |     return lang
346 | 
347 | 
348 | def build_prompt_for_function(code, function_name):
349 |     lang = lang_description()
350 |     intro = """You are a reverse engineering assistant named G-3PO. I am going to show you some C code decompiled from a {lang} binary. You are to provide a high-level explanation of what that code does, in {style}, and try to infer its purpose. Explain your reasoning, step by step. Suggest informative variable names for any variable whose purpose is clear, and suggest an informative name for the function itself. Please print each suggested variable name on its own line using the format
351 |     
352 | $old -> $new
353 | 
354 | Then suggest a name for the function by printing it on its own line using the format
355 | 
356 | $old :: $new
357 | 
358 | If you observe any security vulnerabilities in the code, describe them in detail, and explain how they might be exploited. Do you understand?
359 | """.format(lang=lang, style=LANGUAGE)
360 |     system_msg = {"role": "system", "content": intro}
361 |     prompt = """Here is code from the function {function_name}:\n\n```
362 | {code}
363 | ```
364 | """.format(function_name=function_name, code=code)
365 |     ack_msg = {"role": "assistant", "content": "Yes, I understand. Please show me the code."}
366 |     prompt_msg = {"role": "user", "content": prompt}
367 |     return [system_msg, ack_msg, prompt_msg]
368 | 
369 | 
370 | 
371 | def generate_comment(code, function_name, temperature=0.19, program_info=None, model=MODEL, max_tokens=MAXTOKENS):
372 |     prompt = build_prompt_for_function(code, function_name)
373 |     logging.debug("Prompt:\n\n{prompt}".format(prompt=prompt))
374 |     response = query(
375 |         prompt=prompt, 
376 |         temperature=temperature,
377 |         max_tokens=max_tokens,
378 |         model=MODEL)
379 |     return response
380 | 
381 | 
382 | def add_explanatory_comment_to_current_function(temperature=0.19, model=MODEL, max_tokens=MAXTOKENS):
383 |     function = get_current_function()
384 |     function_name = function.getName()
385 |     if function is None:
386 |         logging.error("Failed to get current function")
387 |         return None
388 |     old_comment = function.getComment()
389 |     if old_comment is not None:
390 |         if OVERRIDE_COMMENTS or SOURCE in old_comment:
391 |             logging.info("Removing old comment.")
392 |             function.setComment(None)
393 |         else:
394 |             logging.info("Function {function_name} already has a comment".format(
395 |                 function_name=function_name))
396 |             return None
397 |     code = get_code(function)
398 |     if code is None:
399 |         logging.error("Failed to {action} current function {function_name}".format(
400 |             function_name=function_name, action="disassemble" if SEND_ASSEMBLY else "decompile"))
401 |         return
402 |     approximate_tokens = estimate_number_of_tokens(code)
403 |     logging.info("Length of decompiled C code: {code_len} characters, guessing {approximate_tokens} tokens".format(
404 |         code_len=len(code), approximate_tokens=approximate_tokens))
405 |     comment = generate_comment(code, function_name=function_name,
406 |                                temperature=temperature, model=model, max_tokens=max_tokens)
407 |     if comment is None:
408 |         logging.error("Failed to generate comment")
409 |         sys.exit(1)
410 |     if G3POSAY:
411 |         comment = g3posay(comment)
412 |     else:
413 |         comment = TAG + "\n" + comment
414 |     listing = currentProgram.getListing()
415 |     function = listing.getFunctionContaining(currentAddress)
416 |     try:
417 |         function.setComment(comment)
418 |     except DuplicateNameException as e:
419 |         logging.error("Failed to set comment: {e}".format(e=e))
420 |         return
421 |     logging.info("Added comment to function: {function_name}".format(
422 |         function_name=function.getName()))
423 |     return comment, code
424 | 
425 | 
426 | def parse_response_for_vars(comment):
427 |     """takes block comment from AI, yields tuple of str old name & new name for each var"""
428 |     # The LLM will sometimes wrap variable names in backticks, and sometimes prepend a dollar sign.
429 |     # We want to ignore those artifacts.
430 |     regex = re.compile(r'[`$]?([A-Za-z_][A-Za-z_0-9]*)`? -> [`$]?([A-Za-z_][A-Za-z_0-9]*)`?')
431 |     for line in comment.split('\n'):
432 |         m = regex.search(line)
433 |         if m:
434 |             old, new = m.groups()
435 |             logging.debug("Found suggestion to rename {old} to {new}".format(old=old, new=new))
436 |             if old == new or new == 'new':
437 |                 continue
438 |             yield old, new
439 | 
440 | 
441 | def parse_response_for_function_name(comment):
442 |     """takes block comment from GPT, yields new function name"""
443 |     regex = re.compile('[`$]?([A-Za-z_][A-Za-z_0-9]*)`? :: [$`]?([A-Za-z_][A-Za-z_0-9]*)`?')
444 |     for line in comment.split('\n'):
445 |         m = regex.search(line)
446 |         if m:
447 |             logging.debug("Renaming function to {new}".format(new=m.group(2)))
448 |             _, new = m.groups()
449 |             return new
450 | 
451 | 
452 | def rename_var(old_name, new_name, variables):
453 |     """takes an old and new variable name from listing and renames it
454 |         old_name: str, old variable name
455 |         new_name: str, new variable name
456 |         variables: {str, Variable}, vars in the func we're working in """
457 |     try:
458 |         var_to_rename = variables.get(old_name)
459 |         if var_to_rename:
460 |             var_to_rename.setName(new_name, SourceType.USER_DEFINED)
461 |             var_to_rename.setComment(
462 |                 'GP3O renamed this from {} to {}'.format(old_name, new_name))
463 |             logging.debug(
464 |                 'GP3O renamed variable {} to {}'.format(old_name, new_name))
465 |         else:
466 |             logging.debug('GP3O wanted to rename variable {} to {}, but no Variable found'.format(
467 |                 old_name, new_name))
468 | 
469 |     # only deals with listing vars, need to work with decomp to get the rest
470 |     except KeyError:
471 |         pass
472 | 
473 | 
474 | # https://github.com/NationalSecurityAgency/ghidra/issues/1561#issuecomment-590025081
475 | def rename_data(old_name, new_name):
476 |     """takes an old and new data name, finds the data and renames it
477 |         old_name: str, old variable name of the form DAT_{addr}
478 |         new_name: str, new variable name"""
479 |     new_name = new_name.upper()
480 |     address = int(old_name.strip('DAT_'), 16)
481 |     sym = FLATAPI.getSymbolAt(FLATAPI.toAddr(address))
482 |     sym.setName(new_name, SourceType.USER_DEFINED)
483 |     logging.debug('GP3O renamed Data {} to {}'.format(old_name, new_name))
484 | 
485 | 
486 | def rename_high_variable(symbols, old_name, new_name, data_type=None):
487 |     """takes a high variable object, a new name, and, optionally, a data type
488 |     and sets the name and data type of the high variable in the program database"""
489 | 
490 |     if old_name not in symbols:
491 |         logging.debug('GP3O wanted to rename variable {} to {}, but no variable found'.format(
492 |             old_name, new_name))
493 |         return
494 |     hv = symbols[old_name]
495 | 
496 |     if data_type is None:
497 |         data_type = hv.getDataType()
498 | 
499 |     # if running in Jython, we may need to ensure that the new name is in unicode
500 |     try:
501 |         new_name = unicode(new_name)
502 |     except NameError:
503 |         pass
504 |     try:
505 |         res = HighFunctionDBUtil.updateDBVariable(hv,
506 |                                                   new_name,
507 |                                                   data_type,
508 |                                                   SourceType.ANALYSIS)
509 |         logging.debug("Renamed {} to {}".format(old_name, new_name, res))
510 |         return res
511 |     except DuplicateNameException as e:
512 |         logging.error("Failed to rename {} to {}: {}".format(
513 |             old_name, new_name, e))
514 |         return None
515 | 
516 | 
517 | 
518 | def apply_renaming_suggestions(comment, code):
519 |     logging.info('Renaming variables...')
520 | 
521 |     func = get_current_function()
522 |     func_name = func.getName()
523 |     new_func_name = None
524 | 
525 |     if RENAME_VARIABLES:
526 |         raw_vars = [v for v in func.getAllVariables()]
527 |         variables = {var.getName(): var for var in raw_vars}
528 |         logging.debug("Variables: {}".format(variables))
529 | 
530 |         # John coming in clutch again
531 |         # https://github.com/NationalSecurityAgency/ghidra/issues/2143#issuecomment-665300865
532 |         options = DecompileOptions()
533 |         monitor = ConsoleTaskMonitor()
534 |         ifc = DecompInterface()
535 |         ifc.setOptions(options)
536 |         ifc.openProgram(func.getProgram())
537 |         res = ifc.decompileFunction(func, TIMEOUT, monitor)
538 |         high_func = res.getHighFunction()
539 |         lsm = high_func.getLocalSymbolMap()
540 |         symbols = lsm.getSymbols()
541 |         symbols = {var.getName(): var for var in symbols}
542 |         logging.debug("Symbols: {}".format(symbols))
543 | 
544 |         for old, new in parse_response_for_vars(comment):
545 |             if re.match(r"^DAT_[0-9a-f]+$", old):  # Globals with default names
546 |                 # suffix = old.split('_')[-1] # on second thought, we don't want stale address info
547 |                 # in a non-dynamic variable name
548 |                 try:
549 |                     # handy to retain the address info here
550 |                     rename_data(old, new)
551 |                 except Exception as e:
552 |                     logging.error('Failed to rename data: {}'.format(e))
553 |             elif old in symbols and symbols[old] is not None:
554 |                 try:
555 |                     rename_high_variable(symbols, old, new)
556 |                 except Exception as e:
557 |                     logging.error('Failed to rename variable: {}'.format(e))
558 |             else:
559 |                 # check for hallucination
560 |                 if old not in code:
561 |                     logging.error("G3PO wanted to rename variable {} to {}, but it may have been hallucinating.".format(old, new))
562 |                 elif old == func_name:
563 |                     new_func_name = new
564 |                 else:
565 |                     logging.error("GP3O wanted to rename variable {old} to {new}, but {old} was not found in the symbol table.".format(old=old, new=new))
566 | 
567 |     if func.getName().startswith('FUN_') or RENAME_FUNCTION:
568 |         fn = parse_response_for_function_name(comment)
569 |         new_func_name = fn or new_func_name # it may have been named with variable renaming syntax
570 |         if new_func_name:
571 |             func.setName(new_func_name, SourceType.USER_DEFINED)
572 |             logging.debug('G3P0 renamed function to {}'.format(new_func_name))
573 | 
574 | 
575 | comment, code = add_explanatory_comment_to_current_function(temperature=0.19, model=MODEL, max_tokens=MAXTOKENS)
576 | 
577 | if comment is not None and (RENAME_FUNCTION or RENAME_VARIABLES):
578 |     apply_renaming_suggestions(comment, code)
579 | 
580 | 


--------------------------------------------------------------------------------