├── LICENSE ├── Makefile ├── README ├── TODO ├── src ├── Makefile ├── datastructs.h ├── elf_parser.c ├── elf_parser.h ├── function.c ├── function.h ├── jump_block.c ├── jump_block.h ├── lang_gen.c ├── lang_gen.h ├── main.c ├── var.c └── var.h └── tests ├── Makefile ├── arith_test.c ├── arith_test64_expected ├── arith_test_expected ├── control_flow_test.c ├── control_flow_test64_expected ├── control_flow_test_expected ├── do_tests.sh ├── do_tests64.sh ├── test.c ├── test64_expected └── test_expected /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015 Justin Green 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in 11 | all copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 19 | THE SOFTWARE. 20 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | mrproper: triad clean 2 | debug: 3 | make -C src debug 4 | triad: 5 | make -C src triad 6 | sys_tests: 7 | make -C tests sys_tests && tests/do_tests.sh 8 | sys_tests64: 9 | make -C tests sys_tests64 && tests/do_tests64.sh 10 | clean: 11 | make -C src clean 12 | clean_tests: 13 | make -C tests clean 14 | install: 15 | install src/triad /usr/bin/triad 16 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | Triad decompiler version 0.4 Alpha Test. 2 | 3 | Not intended to be used for copyright infringement or other illegal activities. 4 | 5 | What is triad: 6 | TRiad Is A Decompiler 7 | Triad is a tiny, free and open source, Capstone based x86 decompiler that will 8 | take in ELF files as input and spit out pseudo-C. 9 | 10 | Installation: 11 | Triad requires Capstone to be installed first. 12 | http://www.capstone-engine.org/ 13 | For 32 bit tests, gcc-multilib is also required. 14 | 15 | First, it will be necessary to build triad. "make triad" should suffice. 16 | After its components are built, the triad binary will be placed in the build 17 | directory. To copy the binary into /usr/bin, simply use "sudo make install." 18 | 19 | Usage: triad <(optional)start address> 20 | <(optional) cutoff address> 21 | 22 | Simply run the triad binary from the command line and specify an ELF to 23 | decompile as a parameter. By default, triad will try to find the main function 24 | of the given file and start decompiling from there. 25 | 26 | Sometimes ELFs have all symbols stripped, so triad will be unable to find main. 27 | In such a scenario, the user may simply specify a starting address as the second 28 | command line parameter. But, an incorrect starting address will likely result in 29 | incorrect decompilation or no decompilation. 30 | 31 | Occasionally it is ambiguous as to where a function actually ends. If a user 32 | thinks he/she knows better where a particular function ends than triad and has 33 | specified a start address, he/she can specify a cutoff address. The default 34 | cutoff address is the end of the segment containing the entry point. 35 | 36 | Triad has the ability to follow function calls and automatically decompile 37 | callees. This is especially helpful when dealing with stripped binaries or other 38 | binaries in which relevant code isn't clearly distinguishable from data. 39 | 40 | Flags: 41 | 42 | -f: Full decompilation. This is the default. 43 | 44 | -p: Partial decompilation. Recovered control flow is always going to be 45 | bad, so Triad has an option to only partially decompile code. This 46 | means Triad will identify variables and parameters, try to recover 47 | calling convention, and translate most instructions back into their C 48 | operator equivalents, but Triad will leave jumps and comparisons as is 49 | with the philosophy that the user knows best how to follow them. 50 | 51 | -d: Disassemble. Make no attempt to decompile code, simply print out a 52 | disassembly in AT&T syntax. 53 | 54 | -s: Disable call following, just decompile main/whatever code was at the 55 | specified address. 56 | 57 | -h: Print all constants in hexadecimal format. 58 | 59 | Limitations PLEASE READ BEFORE SUBMITTING A BUG REPORT: 60 | Triad really only works on x86 and x86_64 ELF executables. Other architectures 61 | may be possible in the future, but there are currently no plans to add them. 62 | 63 | The triad decompiler is still very much an alpha. The project is nowhere near 64 | completion and as such is missing some critical features, contains numerous 65 | bugs, has several odd quirks, and has a propensity for segfaulting. 66 | 67 | Missing features include support for switch decompilation and full support for 68 | strings and statically allocated arrays (dynamically allocated arrays will 69 | actually probably work to one degree or another, but the syntax will be 70 | most unusual e.g. *(char*)(eax + (12)) = 96 instead of array[12] = 'a'). 71 | Struct analysis will be a long ways a way as well, and unions may never work 72 | properly. 73 | 74 | The only supported binary format currently supported is the Executable and 75 | Linkable Format (ELF), commonly used on UNIX like systems, such as LINUX. 76 | 77 | Control flow decompilation should be mostly correct, but it may look funky. 78 | Continues, and forward gotos inside of conditional statements might wind up as 79 | if-else statements. This is actually semantically equivalent, just different 80 | from original source. 81 | 82 | Optimization and computed jumps will probably cause a program to be decompiled 83 | completely incorrectly. 84 | 85 | Triad was designed and tested for programs compiled using gcc. 86 | 87 | It is important to understand that the generated source code will NEVER be 88 | exactly the original source (unless the program was compiled with debug 89 | symbols, of course). 90 | 91 | If triad segfaults on you, feel free to tell me. Include a stack trace and a 92 | description of the conditions that triggered the crash if at all possible. 93 | For obvious reasons, it is quite important that triad crash as little as 94 | possible. 95 | 96 | "Hacking"/Modding notes: 97 | I will be honest, the code is a bit of a mess. It is a short mess, 98 | probably less than 2 KLOC, but the amount of pointer arithmetic and number of 99 | globals used is not for the faint of heart. 100 | 101 | That said, feel free to "hack" in features! The license is just MIT, so do 102 | whatever. Feel free to contact me if you have any questions about how the code 103 | works or think you have a cool feature that should be merged into the codebase. 104 | I tried to document the source, but I'm sure certain lines will leave many 105 | programmers confused and/or horrified. 106 | 107 | My email is just electrojustin@gmail.com 108 | -------------------------------------------------------------------------------- /TODO: -------------------------------------------------------------------------------- 1 | ++++CONTROL FLOW++++ 2 | { 3 | 4 | Fix control flow decompilation (too brittle as is). Perhaps use AI? - LONG TERM 5 | 6 | Add Switches - LONG TERM 7 | 8 | } 9 | 10 | ++++POINTERS AND ARRAYS++++ 11 | { 12 | 13 | Add support for datastructures and arrays - LONG TERM 14 | 15 | } 16 | 17 | MISC 18 | { 19 | 20 | MORE DOCUMENTATION 21 | 22 | CODE CLEANUPS 23 | 24 | } 25 | -------------------------------------------------------------------------------- /src/Makefile: -------------------------------------------------------------------------------- 1 | CFLAGS=-O2 2 | mrproper: triad clean 3 | debug: CFLAGS=-g 4 | debug: triad 5 | triad: main.o elf_parser.o jump_block.o function.o var.o lang_gen.o 6 | gcc $(CFLAGS) main.o elf_parser.o jump_block.o function.o var.o lang_gen.o -o triad -lcapstone 7 | 8 | main.o: main.c elf_parser.h function.h 9 | gcc $(CFLAGS) -c main.c 10 | elf_parser.o: elf_parser.h elf_parser.c 11 | gcc $(CFLAGS) -c elf_parser.c 12 | jump_block.o: jump_block.c jump_block.h datastructs.h elf_parser.h 13 | gcc $(CFLAGS) -c jump_block.c 14 | function.o: function.c function.h datastructs.h jump_block.h 15 | gcc $(CFLAGS) -c function.c 16 | var.o: var.c var.h datastructs.h 17 | gcc $(CFLAGS) -c var.c 18 | lang_gen.o: lang_gen.c lang_gen.h var.h function.h jump_block.h 19 | gcc $(CFLAGS) -c lang_gen.c 20 | 21 | clean: 22 | rm main.o var.o lang_gen.o jump_block.o elf_parser.o function.o 23 | -------------------------------------------------------------------------------- /src/datastructs.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | //For overloading list_loop 4 | #define GET_MACRO(_1, _2, _3, _4, NAME, ...) NAME 5 | 6 | //For overloading list_cleanup 7 | #define GET_MACRO2(_1, _2, _3, NAME, ...) NAME 8 | 9 | //Linked list macros 10 | //Adds to_link directly in front of current 11 | #define link(current, to_link) ({\ 12 | if (current == NULL)\ 13 | current = to_link;\ 14 | if (current->next)\ 15 | {\ 16 | to_link->next = current->next;\ 17 | current->next = to_link;\ 18 | }\ 19 | else\ 20 | {\ 21 | to_link->next = current;\ 22 | current->next = to_link;\ 23 | }\ 24 | }) 25 | //Removes next element from list 26 | #define unlink_next(current) ({\ 27 | void* next_cpy = current->next;\ 28 | current->next = current->next->next;\ 29 | free (next_cpy);\ 30 | }) 31 | 32 | //Cleans up a list's memory 33 | #define list_cleanup2(to_cleanup, callback) ({\ 34 | while (to_cleanup->next != to_cleanup && to_cleanup->next)\ 35 | {\ 36 | callback (to_cleanup->next);\ 37 | unlink_next (to_cleanup);\ 38 | }\ 39 | callback (to_cleanup);\ 40 | free (to_cleanup);\ 41 | }) 42 | 43 | #define list_cleanup1(to_cleanup, callback, param) ({\ 44 | while (to_cleanup->next != to_cleanup && to_cleanup->next)\ 45 | {\ 46 | callback (to_cleanup->next, param);\ 47 | unlink_next (to_cleanup);\ 48 | }\ 49 | callback (to_cleanup, param);\ 50 | free (to_cleanup);\ 51 | }) 52 | 53 | #define list_cleanup(args...) GET_MACRO2(args, list_cleanup1, list_cleanup2)(args) 54 | 55 | //Calls function callback with every element in the list between start and end. Will loop through entire list if end and start are the same 56 | #define list_loop1(callback, end, start, param) ({\ 57 | void* end_cpy = end;\ 58 | do\ 59 | {\ 60 | callback (start, param);\ 61 | start = start->next;\ 62 | } while (start != end_cpy && start);\ 63 | start = end_cpy;\ 64 | }) 65 | 66 | #define list_loop2(callback, end, start) ({\ 67 | void* end_cpy = end;\ 68 | do\ 69 | {\ 70 | callback (start);\ 71 | start = start->next;\ 72 | } while (start != end_cpy && start);\ 73 | start = end_cpy;\ 74 | }) 75 | 76 | #define list_loop(args...) GET_MACRO(args, list_loop1, list_loop2)(args) 77 | 78 | -------------------------------------------------------------------------------- /src/elf_parser.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "elf_parser.h" 5 | 6 | //Load file into memory 7 | void init_file_buf (char* file_name) 8 | { 9 | FILE* executable; 10 | executable = fopen (file_name, "r"); 11 | if (executable <= 0) //Something has gone wrong with opening the file 12 | { 13 | printf ("CRITICAL ERROR: File not found or bad permissions.\n"); 14 | exit (1); 15 | } 16 | fseek (executable, 0, SEEK_END); 17 | file_size = ftell (executable); 18 | file_buf = malloc (file_size); 19 | fseek (executable, 0, SEEK_SET); 20 | fread (file_buf, 1, file_size, executable); 21 | fclose (executable); 22 | } 23 | 24 | //Use level hacking to get entry point from elf header 25 | void get_entry_point (void) 26 | { 27 | entry_point = ((Elf32_Ehdr*)&(file_buf [0]))->e_entry; 28 | } 29 | 30 | void get_entry_point64 (void) 31 | { 32 | entry_point = ((Elf64_Ehdr*)&(file_buf [0]))->e_entry; 33 | } 34 | 35 | //Gets the names of the sections from .shstrtab 36 | void get_section_names (void) 37 | { 38 | Elf32_Shdr* section_table; 39 | 40 | section_table = (Elf32_Shdr*)&(file_buf [((Elf32_Ehdr*)&(file_buf [0]))->e_shoff]); 41 | section_string_table = file_buf + section_table [((Elf32_Ehdr*)file_buf)->e_shstrndx].sh_offset; 42 | } 43 | 44 | void get_section_names64 (void) 45 | { 46 | Elf64_Shdr* section_table; 47 | 48 | section_table = (Elf64_Shdr*)&(file_buf [((Elf64_Ehdr*)&(file_buf [0]))->e_shoff]); 49 | section_string_table = file_buf + section_table [((Elf64_Ehdr*)file_buf)->e_shstrndx].sh_offset; 50 | } 51 | 52 | void get_num_sections (void) 53 | { 54 | num_sections = ((Elf32_Ehdr*)&(file_buf [0]))->e_shnum; 55 | } 56 | 57 | void get_num_sections64 (void) 58 | { 59 | num_sections = ((Elf64_Ehdr*)&(file_buf [0]))->e_shnum; 60 | } 61 | 62 | //Finds information about a number of sections of interest by looping through the section table and looking for specific names 63 | void parse_sections (void) 64 | { 65 | int loop = 0; 66 | unsigned int section_table_index; 67 | Elf32_Shdr* section_table; 68 | char* current_name; 69 | unsigned int current_offset; 70 | 71 | symbol_table.arch1 = 0; 72 | symbol_table_end.arch1 = 0; 73 | string_table = 0; 74 | 75 | section_table_index = ((Elf32_Ehdr*)&(file_buf [0]))->e_shoff; 76 | section_table = (Elf32_Shdr*)&(file_buf [section_table_index]); 77 | 78 | for (loop; loop < num_sections; loop ++) 79 | { 80 | current_name = section_string_table + section_table [loop].sh_name; 81 | current_offset = section_table [loop].sh_offset; 82 | if (current_name < file_buf || (unsigned long long)current_name > (unsigned long long)file_buf + file_size || current_offset < 0 || current_offset > file_size || (unsigned long long)current_offset > (unsigned long long)file_buf + file_size) 83 | { 84 | printf ("ERROR: Section number %d is malformed. Skipping...\n", loop); 85 | continue; 86 | } 87 | if (!strcmp (current_name, ".symtab")) //Contains "symbols" for the program. 88 | { 89 | symbol_table.arch1 = (Elf32_Sym*)(file_buf + current_offset); 90 | symbol_table_end.arch1 = (Elf32_Sym*)((char*)symbol_table.arch1 + section_table [loop].sh_size); 91 | if ((char*)symbol_table.arch1 < file_buf || (unsigned long long)symbol_table.arch1 > (unsigned long long)file_buf + file_size || (char*)symbol_table_end.arch1 < file_buf || (unsigned long long)symbol_table_end.arch1 > (unsigned long long)file_buf + file_size || symbol_table.arch1 > symbol_table_end.arch1) 92 | { 93 | symbol_table.arch1 = NULL; 94 | symbol_table_end.arch1 = NULL; 95 | printf ("ERROR: Malformed symbol table.\n"); 96 | } 97 | } 98 | 99 | if (!strcmp (current_name, ".strtab")) //Strings for the regular symbols 100 | { 101 | string_table = file_buf + current_offset; 102 | if (string_table < file_buf || (unsigned long long)string_table > (unsigned long long)file_buf + file_size) 103 | { 104 | string_table = NULL; 105 | printf ("ERROR: Malformed string table.\n"); 106 | } 107 | } 108 | 109 | } 110 | } 111 | 112 | void parse_sections64 (void) 113 | { 114 | int loop = 0; 115 | unsigned int section_table_index; 116 | Elf64_Shdr* section_table; 117 | char* current_name; 118 | unsigned int current_offset; 119 | 120 | symbol_table.arch2 = 0; 121 | symbol_table_end.arch2 = 0; 122 | string_table = 0; 123 | 124 | section_table_index = ((Elf64_Ehdr*)&(file_buf [0]))->e_shoff; 125 | section_table = (Elf64_Shdr*)&(file_buf [section_table_index]); 126 | 127 | for (loop; loop < num_sections; loop ++) 128 | { 129 | current_name = section_string_table + section_table [loop].sh_name; 130 | current_offset = section_table [loop].sh_offset; 131 | if (current_name < file_buf || (unsigned long long)current_name > (unsigned long long)file_buf + file_size || current_offset < 0 || current_offset > file_size || (unsigned long long)current_offset > (unsigned long long)file_buf + file_size) 132 | { 133 | printf ("ERROR: Section number %d is malformed. Skipping...\n", loop); 134 | continue; 135 | } 136 | if (!strcmp (current_name, ".symtab")) //Contains "symbols" for the program. 137 | { 138 | symbol_table.arch2 = (Elf64_Sym*)(file_buf + current_offset); 139 | symbol_table_end.arch2 = (Elf64_Sym*)((char*)symbol_table.arch2 + section_table [loop].sh_size); 140 | if ((char*)symbol_table.arch2 < file_buf || (unsigned long long)symbol_table.arch2 > (unsigned long long)file_buf + file_size || (char*)symbol_table_end.arch2 < file_buf || (unsigned long long)symbol_table_end.arch2 > (unsigned long long)file_buf + file_size || symbol_table.arch2 > symbol_table_end.arch2) 141 | { 142 | symbol_table.arch2 = NULL; 143 | symbol_table_end.arch2 = NULL; 144 | printf ("ERROR: Malformed symbol table.\n"); 145 | } 146 | } 147 | 148 | if (!strcmp (current_name, ".strtab")) //Strings for the regular symbols 149 | { 150 | string_table = file_buf + current_offset; 151 | if (string_table < file_buf || (unsigned long long)string_table > (unsigned long long)file_buf + file_size) 152 | { 153 | string_table = NULL; 154 | printf ("ERROR: Malformed string table.\n"); 155 | } 156 | } 157 | 158 | } 159 | } 160 | 161 | Elf32_Sym* find_sym (Elf32_Sym* sym_tab, Elf32_Sym* end, unsigned int addr) 162 | { 163 | if (!sym_tab) 164 | return NULL; 165 | 166 | int loop = 0; 167 | 168 | while (sym_tab [loop].st_value != addr && &(sym_tab [loop]) < end) 169 | loop ++; 170 | 171 | if (sym_tab [loop].st_info == STT_NOTYPE) 172 | return NULL; 173 | else 174 | return &(sym_tab [loop]); 175 | } 176 | 177 | Elf64_Sym* find_sym64 (Elf64_Sym* sym_tab, Elf64_Sym* end, unsigned int addr) 178 | { 179 | if (!sym_tab) 180 | return NULL; 181 | 182 | int loop = 0; 183 | 184 | while (sym_tab [loop].st_value != addr && &(sym_tab [loop]) < end) 185 | loop ++; 186 | 187 | if (sym_tab [loop].st_info == STT_NOTYPE) 188 | return NULL; 189 | else 190 | return &(sym_tab [loop]); 191 | } 192 | 193 | Elf32_Sym* find_reloc_sym (unsigned int addr) 194 | { 195 | if (!relocation_table.arch1 || !dynamic_symbol_table.arch1) 196 | return NULL; 197 | 198 | int loop = 0; 199 | 200 | while (relocation_table.arch1 [loop].r_offset != addr && loop < num_dynamic_symbols) 201 | loop ++; 202 | 203 | if (loop >= num_dynamic_symbols) 204 | return NULL; 205 | else 206 | return &(dynamic_symbol_table.arch1 [relocation_table.arch1 [loop].r_info >> 8]); 207 | } 208 | 209 | Elf64_Sym* find_reloc_sym64 (unsigned int addr) 210 | { 211 | if (!relocation_table.arch2 || !dynamic_symbol_table.arch2) 212 | return NULL; 213 | 214 | int loop = 0; 215 | 216 | while (relocation_table.arch2 [loop].r_offset != addr && loop < num_dynamic_symbols) 217 | loop ++; 218 | 219 | if (loop >= num_dynamic_symbols) 220 | return NULL; 221 | else 222 | return &(dynamic_symbol_table.arch2 [relocation_table.arch2 [loop].r_info >> 32]); 223 | } 224 | 225 | void get_dyn_syms (void) 226 | { 227 | Elf32_Ehdr* header = (Elf32_Ehdr*)file_buf; 228 | Elf32_Phdr* segment_table = (Elf32_Phdr*)(file_buf + header->e_phoff); 229 | Elf32_Dyn* dynamic_table; 230 | int i = 0; 231 | int j = 0; 232 | 233 | dynamic_string_table = NULL; 234 | dynamic_symbol_table.arch1 = NULL; 235 | relocation_table.arch1 = NULL; 236 | num_dynamic_symbols = 0; 237 | 238 | for (i; i < header->e_phnum; i ++) 239 | { 240 | if (segment_table [i].p_type == PT_DYNAMIC) 241 | break; 242 | } 243 | 244 | if (i >= header->e_phnum) 245 | { 246 | printf ("Error: No dynamic linking information\n"); 247 | return; 248 | } 249 | 250 | dynamic_table = (Elf32_Dyn*)(file_buf + segment_table [i].p_offset); 251 | 252 | j = 0; 253 | while (dynamic_table [j].d_tag != DT_NULL) 254 | { 255 | if (dynamic_table [j].d_tag == DT_STRTAB) 256 | dynamic_string_table = (char*)(file_buf + addr_to_index (dynamic_table [j].d_un.d_ptr)); 257 | if (dynamic_table [j].d_tag == DT_SYMTAB) 258 | dynamic_symbol_table.arch1 = (Elf32_Sym*)(file_buf + addr_to_index (dynamic_table [j].d_un.d_ptr)); 259 | if (dynamic_table [j].d_tag == DT_RELSZ) 260 | num_dynamic_symbols += dynamic_table [j].d_un.d_val /sizeof (Elf32_Rel); 261 | if (dynamic_table [j].d_tag == DT_PLTRELSZ) 262 | num_dynamic_symbols = dynamic_table [j].d_un.d_val / sizeof (Elf32_Rel); 263 | if (dynamic_table [j].d_tag == DT_REL) 264 | relocation_table.arch1 = (Elf32_Rel*)(file_buf + addr_to_index (dynamic_table [j].d_un.d_ptr)); 265 | 266 | j ++; 267 | } 268 | } 269 | 270 | void get_dyn_syms64 (void) 271 | { 272 | Elf64_Ehdr* header = (Elf64_Ehdr*)file_buf; 273 | Elf64_Phdr* segment_table = (Elf64_Phdr*)(file_buf + header->e_phoff); 274 | Elf64_Dyn* dynamic_table; 275 | int i = 0; 276 | int j = 0; 277 | 278 | dynamic_string_table = NULL; 279 | dynamic_symbol_table.arch2 = NULL; 280 | relocation_table.arch2 = NULL; 281 | num_dynamic_symbols = 0; 282 | 283 | for (i; i < header->e_phnum; i ++) 284 | { 285 | if (segment_table [i].p_type == PT_DYNAMIC) 286 | break; 287 | } 288 | 289 | if (i >= header->e_phnum) 290 | { 291 | printf ("Error: No dynamic linking information\n"); 292 | return; 293 | } 294 | 295 | dynamic_table = (Elf64_Dyn*)(file_buf + segment_table [i].p_offset); 296 | 297 | j = 0; 298 | while (dynamic_table [j].d_tag != DT_NULL) 299 | { 300 | if (dynamic_table [j].d_tag == DT_STRTAB) 301 | dynamic_string_table = (char*)(file_buf + addr_to_index (dynamic_table [j].d_un.d_ptr)); 302 | if (dynamic_table [j].d_tag == DT_SYMTAB) 303 | dynamic_symbol_table.arch2 = (Elf64_Sym*)(file_buf + addr_to_index (dynamic_table [j].d_un.d_ptr)); 304 | if (dynamic_table [j].d_tag == DT_RELASZ) 305 | num_dynamic_symbols += dynamic_table [j].d_un.d_ptr /sizeof (Elf64_Rela); 306 | if (dynamic_table [j].d_tag == DT_PLTRELSZ) 307 | num_dynamic_symbols = dynamic_table [j].d_un.d_ptr / sizeof (Elf64_Rela); 308 | if (dynamic_table [j].d_tag == DT_RELA) 309 | relocation_table.arch2 = (Elf64_Rela*)(file_buf + addr_to_index (dynamic_table [j].d_un.d_ptr)); 310 | 311 | j ++; 312 | } 313 | } 314 | 315 | void find_main (void) 316 | { 317 | if (!symbol_table.arch1 || !string_table) 318 | return; 319 | 320 | int loop = 0; 321 | 322 | loop ++; 323 | while (&(symbol_table.arch1 [loop]) < symbol_table_end.arch1) 324 | { 325 | if (symbol_table.arch1 [loop].st_name && symbol_table.arch1 [loop].st_value) 326 | { 327 | if (!strcmp (string_table + symbol_table.arch1 [loop].st_name, "main")) 328 | { 329 | main_addr = symbol_table.arch1 [loop].st_value; 330 | return; 331 | } 332 | } 333 | loop ++; 334 | } 335 | } 336 | 337 | void find_main64 (void) 338 | { 339 | if (!symbol_table.arch2 || !string_table) 340 | return; 341 | 342 | int loop = 0; 343 | 344 | loop ++; 345 | while (&(symbol_table.arch2 [loop]) < symbol_table_end.arch2) 346 | { 347 | if (symbol_table.arch2 [loop].st_name && symbol_table.arch2 [loop].st_value) 348 | { 349 | if (!strcmp (string_table + symbol_table.arch2 [loop].st_name, "main")) 350 | { 351 | main_addr = symbol_table.arch2 [loop].st_value; 352 | return; 353 | } 354 | } 355 | loop ++; 356 | } 357 | } 358 | 359 | //Handy function for changing a virtual memory address to index for file_buf 360 | int addr_to_index (unsigned int addr) 361 | { 362 | return addr-base_addr; 363 | } 364 | 365 | //Handy function for changing an index for file_buf to a virtual memory address 366 | unsigned int index_to_addr (int index) 367 | { 368 | return index+base_addr; 369 | } 370 | 371 | void get_text (void) 372 | { 373 | text_addr = entry_point; 374 | text_offset = addr_to_index (text_addr); 375 | Elf32_Phdr* segment_table = (Elf32_Phdr*)(file_buf + ((Elf32_Ehdr*)file_buf)->e_phoff); 376 | int i; 377 | 378 | for (i = 0; i < ((Elf32_Ehdr*)file_buf)->e_phnum; i ++) 379 | { 380 | if (segment_table [i].p_vaddr <= text_addr && segment_table [i].p_vaddr + segment_table [i].p_memsz > text_addr) 381 | { 382 | end_of_text = segment_table [i].p_vaddr + segment_table [i].p_memsz; 383 | break; 384 | } 385 | } 386 | 387 | if (i >= ((Elf32_Ehdr*)file_buf)->e_phnum) 388 | { 389 | printf ("ERROR: entry point not in loadable segment\n"); 390 | exit (-1); 391 | } 392 | } 393 | 394 | void get_text64 (void) 395 | { 396 | text_addr = entry_point; 397 | text_offset = addr_to_index (text_addr); 398 | Elf64_Phdr* segment_table = (Elf64_Phdr*)(file_buf + ((Elf64_Ehdr*)file_buf)->e_phoff); 399 | int i; 400 | 401 | for (i = 0; i < ((Elf64_Ehdr*)file_buf)->e_phnum; i ++) 402 | { 403 | if (segment_table [i].p_vaddr <= text_addr && segment_table [i].p_vaddr + segment_table [i].p_memsz > text_addr) 404 | { 405 | end_of_text = segment_table [i].p_vaddr + segment_table [i].p_memsz; 406 | break; 407 | } 408 | } 409 | 410 | if (i >= ((Elf64_Ehdr*)file_buf)->e_phnum) 411 | { 412 | printf ("ERROR: entry point not in loadable segment\n"); 413 | exit (-1); 414 | } 415 | } 416 | 417 | 418 | //Initialize some globals that have to deal with the ELF we're reading 419 | //Note: this must be called whether or not you're actually using the parser 420 | void init_elf_parser (char* file_name) 421 | { 422 | init_file_buf (file_name); 423 | 424 | Elf32_Ehdr* header = (Elf32_Ehdr*)file_buf; 425 | Elf64_Ehdr* header64 = (Elf64_Ehdr*)file_buf; 426 | 427 | //Sanity check 428 | header = (Elf32_Ehdr*)file_buf; 429 | if (header->e_ident [EI_MAG0] != 0x7f || header->e_ident [EI_MAG1] != 'E' || header->e_ident [EI_MAG2] != 'L' || header->e_ident [EI_MAG3] != 'F') 430 | { 431 | elf_parser_cleanup (); 432 | printf ("CRITICAL ERROR: Not an ELF file.\n"); 433 | exit (-1); 434 | } 435 | if (header->e_shoff > file_size) 436 | { 437 | elf_parser_cleanup (); 438 | printf ("ERROR: ELF file is corrupt. Invalid section header offset. Sections have probably been stripped, please specify a starting address.\n"); 439 | exit (-1); 440 | } 441 | if (header->e_phoff > file_size) 442 | { 443 | elf_parser_cleanup (); 444 | printf ("CRITICAL ERROR: ELF file is corrupt Invalid program header offset.\n"); 445 | exit (-1); 446 | } 447 | 448 | architecture = header->e_ident [EI_CLASS]; 449 | 450 | if (architecture == ELFCLASSNONE) 451 | { 452 | printf ("CRITICAL ERROR: Invalid architecture"); 453 | exit (-1); 454 | } 455 | else if (architecture == ELFCLASS32) 456 | { 457 | Elf32_Phdr* program_headers = (Elf32_Phdr*)(file_buf + header->e_phoff); 458 | int i; 459 | 460 | for (i = 0; i < header->e_phnum; i ++) 461 | { 462 | if (program_headers [i].p_type == PT_LOAD) 463 | { 464 | base_addr = program_headers [i].p_vaddr; 465 | executable_segment_size = program_headers [i].p_filesz; 466 | break; 467 | } 468 | } 469 | if (i == header->e_phnum) 470 | { 471 | printf ("CRITICAL ERROR: No loadable segments\n"); 472 | exit (-1); 473 | } 474 | 475 | symbol_table.arch1 = NULL; 476 | symbol_table_end.arch1 = NULL; 477 | num_relocs = 0; 478 | string_table = NULL; 479 | 480 | get_dyn_syms (); 481 | get_entry_point (); 482 | get_text (); 483 | } 484 | else if (architecture == ELFCLASS64) 485 | { 486 | Elf64_Phdr* program_headers = (Elf64_Phdr*)(file_buf + header64->e_phoff); 487 | int i; 488 | 489 | for (i = 0; i < header64->e_phnum; i ++) 490 | { 491 | if (program_headers [i].p_type == PT_LOAD) 492 | { 493 | base_addr = program_headers [i].p_vaddr; 494 | executable_segment_size = program_headers [i].p_filesz; 495 | break; 496 | } 497 | } 498 | if (i == header64->e_phnum) 499 | { 500 | printf ("CRITICAL ERROR: No loadable segments\n"); 501 | exit (-1); 502 | } 503 | 504 | symbol_table.arch2 = NULL; 505 | symbol_table_end.arch2 = NULL; 506 | num_relocs = 0; 507 | string_table = NULL; 508 | 509 | get_dyn_syms64 (); 510 | get_entry_point64 (); 511 | get_text64 (); 512 | } 513 | else 514 | { 515 | printf ("CRITICAL ERROR: Invalid ELF class %d", architecture); 516 | exit (-1); 517 | } 518 | } 519 | 520 | void parse_elf (char* file_name) 521 | { 522 | init_elf_parser (file_name); 523 | 524 | if (architecture == ELFCLASS32) 525 | { 526 | get_num_sections (); 527 | get_section_names (); 528 | parse_sections (); 529 | find_main (); 530 | } 531 | else 532 | { 533 | get_num_sections64 (); 534 | get_section_names64 (); 535 | parse_sections64 (); 536 | find_main64 (); 537 | } 538 | } 539 | 540 | void elf_parser_cleanup (void) 541 | { 542 | if (file_buf) 543 | free (file_buf); 544 | } 545 | -------------------------------------------------------------------------------- /src/elf_parser.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #pragma once 5 | 6 | union ElfN_Sym_ptr 7 | { 8 | Elf32_Sym* arch1; 9 | Elf64_Sym* arch2; 10 | } ElfN_Sym_ptr; 11 | 12 | union ElfN_Rel_ptr 13 | { 14 | Elf32_Rel* arch1; 15 | Elf64_Rela* arch2; 16 | } ElfN_Rel_ptr; 17 | 18 | char* file_buf; //Buffer into which the file is read. Must be free'd 19 | size_t file_size; //Size of file in bytes 20 | unsigned int text_offset; //Number of bytes from file beginning where .text starts 21 | unsigned int end_of_text; 22 | unsigned int text_addr; //Virtual memory address .text is loaded into 23 | unsigned int entry_point; //Entry point of executable. NOTE: probably don't want to start disassembling here 24 | unsigned int base_addr; 25 | unsigned int executable_segment_size; 26 | int num_sections; 27 | int num_relocs; 28 | union ElfN_Sym_ptr symbol_table; 29 | union ElfN_Sym_ptr symbol_table_end; 30 | union ElfN_Sym_ptr dynamic_symbol_table; 31 | union ElfN_Rel_ptr relocation_table; 32 | int num_dynamic_symbols; 33 | char* string_table; 34 | char* dynamic_string_table; 35 | char* section_string_table; 36 | unsigned int main_addr; 37 | char architecture; 38 | 39 | void parse_elf (char* file_name); 40 | void init_file_buf (char* file_name); 41 | void get_num_sections (void); 42 | void get_entry_point (void); 43 | void get_section_names (void); 44 | void parse_sections (void); 45 | void get_num_sections64 (void); 46 | void get_entry_point64 (void); 47 | void get_section_names64 (void); 48 | void parse_sections64 (void); 49 | void find_main (void); 50 | void find_main64 (void); 51 | void elf_parser_cleanup (void); 52 | Elf32_Sym* find_sym (Elf32_Sym* sym_tab, Elf32_Sym* end, unsigned int addr); 53 | Elf32_Sym* find_reloc_sym (unsigned int addr); 54 | Elf64_Sym* find_sym64 (Elf64_Sym* sym_tab, Elf64_Sym* end, unsigned int addr); 55 | Elf64_Sym* find_reloc_sym64 (unsigned int addr); 56 | int addr_to_index (unsigned int addr); 57 | unsigned int index_to_addr (int index); 58 | void init_elf_parser (char* file_name); 59 | void get_dyn_syms (void); 60 | void get_dyn_syms64 (void); 61 | void get_text (void); 62 | void get_text64 (void); 63 | -------------------------------------------------------------------------------- /src/function.c: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include "function.h" 4 | #include "datastructs.h" 5 | 6 | function* init_function (function* to_init, unsigned int start_addr, unsigned int stop_addr) 7 | { 8 | jump_block* root; 9 | jump_block* current; 10 | jump_block* temp; 11 | 12 | to_init->start_addr = start_addr; 13 | to_init->stop_addr = stop_addr; 14 | next_flags = 0; 15 | 16 | root = init_jump_block (malloc (sizeof (jump_block)), start_addr, stop_addr); 17 | current = root; 18 | 19 | //Find all jump blocks 20 | while (num_push_ebp != 2) 21 | { 22 | temp = init_jump_block (malloc (sizeof (jump_block)), current->end, stop_addr); 23 | link (current, temp); 24 | current = current->next; 25 | } 26 | num_push_ebp = 0; 27 | 28 | //Get jump addresses 29 | to_init->num_jump_addrs = 0; 30 | to_init->jump_addrs_buf_size = 8 * sizeof (unsigned int); 31 | to_init->jump_addrs = malloc (to_init->jump_addrs_buf_size); 32 | to_init->orig_addrs = malloc (to_init->jump_addrs_buf_size); 33 | list_loop (resolve_jumps, root, root, to_init); 34 | 35 | //Memory management for "}" placement algorithms 36 | to_init->num_dups = 0; 37 | to_init->dup_targets_buf_size = 8 * sizeof (unsigned int); 38 | to_init->dup_targets = malloc (to_init->dup_targets_buf_size); 39 | to_init->else_starts = malloc (to_init->dup_targets_buf_size); 40 | to_init->pivots = malloc (to_init->dup_targets_buf_size); 41 | 42 | //Split jump blocks 43 | struct search_params params; 44 | int i; 45 | 46 | for (i = 0; i < to_init->num_jump_addrs; i ++) 47 | { 48 | current = NULL; 49 | params.ret = (void**)¤t; 50 | params.key = to_init->jump_addrs [i]; 51 | list_loop (search_start_addrs, root, root, params); 52 | 53 | if (!current) 54 | { 55 | printf ("Error: invalid jump instruction at %p\n", to_init->orig_addrs [i]); 56 | continue; 57 | } 58 | 59 | split_jump_blocks (current, params.key, stop_addr); 60 | } 61 | 62 | to_init->jump_block_list = root; 63 | 64 | //Free some memory for the time being 65 | list_loop (cleanup_instruction_list, root, root, 1); 66 | 67 | return to_init; 68 | } 69 | 70 | void resolve_jumps (jump_block* to_resolve, function* benefactor) 71 | { 72 | int addr_temp; 73 | if (to_resolve->instructions [to_resolve->num_instructions-1].mnemonic [0] == 'j') 74 | { 75 | benefactor->num_jump_addrs ++; 76 | 77 | if (benefactor->num_jump_addrs * sizeof (unsigned int) > benefactor->jump_addrs_buf_size) 78 | { 79 | benefactor->jump_addrs_buf_size *= 2; 80 | benefactor->jump_addrs = realloc (benefactor->jump_addrs, benefactor->jump_addrs_buf_size); 81 | benefactor->orig_addrs = realloc (benefactor->orig_addrs, benefactor->jump_addrs_buf_size); 82 | 83 | } 84 | 85 | benefactor->jump_addrs [benefactor->num_jump_addrs-1] = relative_insn (&(to_resolve->instructions [to_resolve->num_instructions-1]), to_resolve->end); 86 | benefactor->orig_addrs [benefactor->num_jump_addrs-1] = to_resolve->instructions [to_resolve->num_instructions-1].address + to_resolve->start; 87 | } 88 | } 89 | 90 | //Cleanup dynamically allocated memory of a function 91 | void cleanup_function (function* to_cleanup, char scrub_insn) 92 | { 93 | free (to_cleanup->jump_addrs); 94 | free (to_cleanup->orig_addrs); 95 | jump_block_list_cleanup (to_cleanup->jump_block_list, scrub_insn); 96 | } 97 | 98 | //Properly free memory used in a function list 99 | void function_list_cleanup (function* to_cleanup, char scrub_insn) 100 | { 101 | list_cleanup (to_cleanup, cleanup_function, scrub_insn); 102 | } 103 | 104 | //Search function start addresses to look for repetition so we don't add the same function multiple times 105 | void search_func_start_addrs (function* to_test, struct search_params arg) 106 | { 107 | if (to_test->start_addr == arg.key) 108 | *(char*)(arg.ret) = 1; 109 | } 110 | 111 | //Helper function for resolve calls (callback for list_loop) 112 | void resolve_calls_help (jump_block* benefactor, function* parent) 113 | { 114 | struct search_params arg; 115 | int i = 0; 116 | char ret = 0; 117 | arg.ret = (void**)&ret; 118 | function* to_link; 119 | 120 | for (i; i < benefactor->num_calls; i ++) 121 | { 122 | arg.key = benefactor->calls [i]; 123 | list_loop (search_func_start_addrs, parent, parent, arg); 124 | if (!ret && addr_to_index (arg.key) < file_size && !find_reloc_sym (*(int*)&(file_buf [addr_to_index (arg.key)+2]))) 125 | { 126 | if (benefactor->calls [i] < text_addr) //Likely a reference to plt, data isn't in this file so don't bother 127 | continue; 128 | if (addr_to_index (benefactor->calls [i]) >= file_size) //Critical error: should not call outside of address space 129 | continue; 130 | to_link = init_function (malloc (sizeof (function)), benefactor->calls [i], parent->stop_addr); 131 | link (parent, to_link); 132 | } 133 | } 134 | } 135 | 136 | //Find every function call in every function and add that function to the list 137 | void resolve_calls (function* benefactor) 138 | { 139 | function* start = benefactor; 140 | 141 | //list_loop is already a macro, can't pass a macro to a macro like you could a function 142 | do 143 | { 144 | list_loop (resolve_calls_help, benefactor->jump_block_list, benefactor->jump_block_list, benefactor); 145 | benefactor = benefactor->next; 146 | } while (benefactor != start && benefactor); 147 | } 148 | 149 | void split_jump_blocks (jump_block* to_split, unsigned int addr, unsigned int stop_addr) 150 | { 151 | if (to_split->start == addr) 152 | return; 153 | jump_block* new_block; 154 | int i = 0; 155 | int j; 156 | unsigned int flags = to_split->flags; 157 | cs_insn* split_instruction; 158 | while (to_split->instructions [i].address + to_split->start != addr && i < to_split->num_instructions) 159 | i ++; 160 | 161 | if (i >= to_split->num_instructions) 162 | { 163 | printf ("Error: jump into instruction at %p\n", addr); 164 | return; 165 | } 166 | 167 | new_block = init_jump_block (malloc (sizeof (jump_block)), to_split->instructions [i].address + to_split->start, stop_addr); 168 | 169 | for (j = i; j < to_split->num_instructions; j ++) 170 | free (to_split->instructions [j].detail); 171 | 172 | to_split->num_instructions = i; 173 | to_split->end = new_block->start; 174 | 175 | new_block->flags = flags & (IS_LOOP | IS_CONTINUE | IS_BREAK | IS_GOTO); 176 | to_split->flags = flags & (IS_ELSE | IS_IF | IS_AFTER_ELSE | IS_IF_TARGET | IS_AFTER_LOOP); 177 | 178 | link (to_split, new_block); 179 | } 180 | -------------------------------------------------------------------------------- /src/function.h: -------------------------------------------------------------------------------- 1 | #include "jump_block.h" 2 | 3 | #pragma once 4 | 5 | struct function 6 | { 7 | struct function* next; 8 | jump_block* jump_block_list; 9 | unsigned int* jump_addrs; 10 | unsigned int* orig_addrs; 11 | unsigned int* dup_targets; 12 | unsigned int* else_starts; 13 | unsigned int* pivots; 14 | int num_dups; 15 | size_t dup_targets_buf_size; 16 | size_t jump_addrs_buf_size; 17 | int num_jump_addrs; 18 | unsigned int start_addr; 19 | unsigned int stop_addr; 20 | }; 21 | typedef struct function function; 22 | 23 | function* entry_func; 24 | 25 | struct splice_params //Throwaway parameter structure for splicing together various jump blocks into "to_form" 26 | { 27 | jump_block* to_form; 28 | int* instruction_index; 29 | int* cond_jump_index; 30 | int* calls_index; 31 | }; 32 | 33 | function* init_function (function* to_init, unsigned int start_addr, unsigned int stop_addr); 34 | void split_jump_blocks (jump_block* to_split, unsigned int addr, unsigned int stop_addr); 35 | void resolve_calls_help (jump_block* benefactor, function* parent); 36 | void resolve_calls (function* benefactor); 37 | void cleanup_function (function* to_cleanup, char scrub_insn); 38 | void function_list_cleanup (function* to_cleanup, char scrub_insn); 39 | void search_func_start_addrs (function* to_test, struct search_params arg); 40 | void resolve_jumps (jump_block* to_resolve, function* benefactor); 41 | -------------------------------------------------------------------------------- /src/jump_block.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "jump_block.h" 6 | #include "datastructs.h" 7 | 8 | char num_push_ebp = 0; 9 | 10 | jump_block* init_jump_block (jump_block* to_init, unsigned int start_addr, unsigned int stop_addr) 11 | { 12 | to_init->instructions = NULL; 13 | to_init->calls = NULL; 14 | to_init->conditional_jumps = NULL; 15 | to_init->flags = next_flags; 16 | next_flags = 0; 17 | to_init->next = NULL; 18 | 19 | //Locals to cut down on dereference operators; this code was a disaster the first time around with no locals 20 | size_t size = file_size; 21 | int num_instructions = 0; 22 | int num_calls = 0; 23 | int num_conditional_jumps = 0; 24 | unsigned long long relative_address = 0; 25 | unsigned int current_addr = start_addr; 26 | uint8_t* current = file_buf + addr_to_index (current_addr); 27 | unsigned int next_addr; 28 | cs_insn* instruction = cs_malloc (handle); 29 | 30 | to_init->start = start_addr; 31 | 32 | do 33 | { 34 | num_instructions ++; 35 | 36 | //Dynamic memory allocation stuff here 37 | if (num_instructions - 1) 38 | { 39 | if (num_instructions * sizeof (cs_insn) > to_init->instructions_buf_size) 40 | { 41 | to_init->instructions_buf_size *= 2; //Just double the buffer; I'd rather allocate too much than reallocate memory every single iteration 42 | to_init->instructions = (cs_insn*)realloc (to_init->instructions, to_init->instructions_buf_size); 43 | } 44 | } 45 | else 46 | { 47 | to_init->instructions_buf_size = 256 * sizeof (cs_insn); //My memory allocator screams at me for numbers that aren't a multiple of 8 48 | to_init->instructions = (cs_insn*)malloc (to_init->instructions_buf_size); 49 | } 50 | 51 | //Partially disassemble the instruction into machine readable format 52 | cs_disasm_iter (handle, (const uint8_t **)¤t, &file_size, (uint64_t*)&relative_address, instruction); 53 | to_init->instructions [num_instructions-1] = *instruction; 54 | to_init->instructions [num_instructions-1].detail = (cs_detail*)malloc (sizeof(cs_detail)); 55 | *(to_init->instructions [num_instructions-1].detail) = *(instruction->detail); 56 | current_addr = index_to_addr ((char*)current - file_buf); 57 | 58 | //Identify references to conditional jump blocks and function calls for later disassembly. 59 | if (instruction->detail->x86.op_count && instruction->detail->x86.operands [0].type > X86_OP_REG) //Please don't go chasing rax... 60 | { 61 | //Keep track of calls 62 | if (instruction->id == X86_INS_CALL) 63 | { 64 | num_calls ++; 65 | 66 | //More dynamic memory allocation stuff here 67 | if (num_calls - 1) 68 | { 69 | if (num_calls * sizeof (unsigned int) > to_init->calls_buf_size) 70 | { 71 | to_init->calls_buf_size *= 2; 72 | to_init->calls = realloc (to_init->calls, to_init->calls_buf_size); 73 | } 74 | } 75 | else 76 | { 77 | to_init->calls_buf_size = 8 * sizeof (unsigned int); 78 | to_init->calls = malloc (to_init->calls_buf_size); 79 | } 80 | 81 | //Add operand address to call buffer 82 | to_init->calls [num_calls-1] = relative_insn (instruction, current_addr); 83 | } 84 | } 85 | //Keep track of how many times we've seen the instruction "push %ebp". One too many and we've started on the adjacent function. 86 | if ((instruction->id >= X86_INS_PUSH && instruction->id <= X86_INS_PUSHFQ) && (instruction->detail->x86.operands [0].reg == X86_REG_EBP || instruction->detail->x86.operands [0].reg == X86_REG_RBP)) 87 | num_push_ebp ++; 88 | if (current_addr > stop_addr) //If we're outside the text section, we should be done. 89 | num_push_ebp = 2; 90 | //Stop disassembly of jump block at next unconditional jump or call 91 | } while (instruction->mnemonic [0] != 'j' && num_push_ebp != 2); //Jump block ends on jump or return 92 | 93 | //Synchronize the newly created jump block datastructure fields with locals 94 | to_init->end = current_addr; 95 | to_init->num_conditional_jumps = num_conditional_jumps; 96 | to_init->num_calls = num_calls; 97 | to_init->num_instructions = num_instructions; 98 | if (instruction->id >= X86_INS_JAE && instruction->id <= X86_INS_JS && instruction->id != X86_INS_JMP) 99 | { 100 | if (relative_insn (instruction, current_addr) < current_addr - instruction->size) 101 | { 102 | to_init->flags |= IS_LOOP; 103 | next_flags |= IS_AFTER_LOOP; 104 | } 105 | } 106 | 107 | cs_free (instruction, 1); 108 | 109 | //Print jump block start address; uncomment for debugging information 110 | //printf ("%p\n", to_init->start); 111 | 112 | return to_init; //Convenient to return the to_init param so we can chain function calls like "example (init_jump_block (malloc (sizeof (jump_block)), some_addr, block))" 113 | } 114 | 115 | //Function parsing needs all of the instructions, and translating into C needs all of the instructions, but storing all of the instructions between those two points in time 116 | //takes up an enourmous amount of memory. So we need a seperate function from the init function to disassemble all of the instructions in a jump block a second time. 117 | void parse_instructions (jump_block* to_parse) 118 | { 119 | uint8_t* current = file_buf + addr_to_index (to_parse->start); 120 | size_t size = to_parse->end - to_parse->start; 121 | 122 | cs_disasm (handle, current, size, 0x0000, 0, &(to_parse->instructions)); 123 | } 124 | 125 | void cleanup_jump_block (jump_block* to_cleanup, char scrub_insn) 126 | { 127 | if (to_cleanup->num_conditional_jumps) 128 | free (to_cleanup->conditional_jumps); 129 | if (to_cleanup->num_calls) 130 | free (to_cleanup->calls); 131 | 132 | cleanup_instruction_list (to_cleanup, scrub_insn); 133 | } 134 | 135 | void cleanup_instruction_list (jump_block* to_cleanup, char scrub_insn) 136 | { 137 | //Additional cleanup needed for instructions because operands are a dynamically allocated linked list 138 | if (scrub_insn) 139 | cs_free (to_cleanup->instructions, to_cleanup->num_instructions); 140 | 141 | else if (to_cleanup->num_instructions) 142 | free (to_cleanup->instructions); 143 | } 144 | 145 | //Free a list of jump blocks properly 146 | void jump_block_list_cleanup (jump_block* to_cleanup, char scrub_insn) 147 | { 148 | list_cleanup (to_cleanup, cleanup_jump_block, scrub_insn); 149 | } 150 | 151 | //Callback function used for cross checking a potential jump block start addresses against existing jump block address ranges 152 | void search_start_addrs (jump_block* to_test, struct search_params arg) 153 | { 154 | if (to_test->start <= arg.key && to_test->end > arg.key) 155 | *arg.ret = to_test; 156 | } 157 | 158 | cs_insn* get_insn_by_addr (jump_block* parent, unsigned int addr) 159 | { 160 | int i; 161 | 162 | for (i = 0; i < parent->num_instructions; i ++) 163 | { 164 | if (parent->instructions [i].address + parent->start == addr) 165 | return &(parent->instructions [i]); 166 | } 167 | return NULL; 168 | } 169 | 170 | //Processes relative instructions 171 | long long relative_insn (cs_insn* insn, unsigned long long address) 172 | { 173 | if (insn->id == X86_INS_LCALL || insn->id == X86_INS_LJMP) 174 | return insn->detail->x86.operands [0].imm; 175 | else 176 | return insn->detail->x86.operands [0].imm + address - insn->address - insn->size; 177 | } 178 | -------------------------------------------------------------------------------- /src/jump_block.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "elf_parser.h" 5 | 6 | #pragma once 7 | 8 | #define IS_LOOP 1 9 | #define IS_ELSE 2 10 | #define IS_IF 4 11 | #define IS_IF_TARGET 8 12 | #define IS_AFTER_ELSE 16 13 | #define IS_AFTER_LOOP 32 14 | #define IS_CONTINUE 64 15 | #define IS_BREAK 128 16 | #define IS_GOTO 256 17 | #define IS_WHILE 512 18 | #define NO_TRANSLATE 1024 19 | 20 | csh handle; 21 | unsigned int next_flags; 22 | extern char num_push_ebp; 23 | 24 | struct jump_block 25 | { 26 | unsigned int flags; 27 | unsigned long long start; 28 | unsigned long long end; 29 | cs_insn* instructions; //Array of instructions contained in jump block in human readable format 30 | int num_instructions; 31 | size_t instructions_buf_size; 32 | unsigned int* conditional_jumps; //Target addresses of all conditional jumps in block 33 | int num_conditional_jumps; 34 | size_t conditional_jumps_buf_size; 35 | unsigned int* calls; //Target addresses of all additional calls in block 36 | int num_calls; 37 | size_t calls_buf_size; 38 | struct jump_block* next; 39 | }; 40 | typedef struct jump_block jump_block; 41 | 42 | struct search_params //Throwaway parameter structure for searching through start addresses for a start address "key" 43 | { 44 | void** ret; 45 | unsigned int key; 46 | }; 47 | 48 | jump_block* init_jump_block (jump_block* to_init, unsigned int start_addr, unsigned int stop_addr); 49 | void cleanup_jump_block (jump_block* to_cleanup, char scrub_insn); 50 | void jump_block_list_cleanup (jump_block* to_cleanup, char scrub_insn); 51 | void search_start_addrs (jump_block* to_test, struct search_params arg); 52 | void resolve_conditional_jumps (jump_block* benefactor); 53 | void print_jump_block (jump_block* to_print); 54 | void print_jump_block_list (jump_block* to_print); 55 | cs_insn* get_insn_by_addr (jump_block* parent, unsigned int addr); 56 | long long relative_insn (cs_insn* insn, unsigned long long address); 57 | void cleanup_instruction_list (jump_block* to_cleanup, char scrub_insn); 58 | void parse_instructions (jump_block* to_parse); 59 | -------------------------------------------------------------------------------- /src/lang_gen.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "lang_gen.h" 7 | #include "datastructs.h" 8 | 9 | char test_conditions [14] [3] = {"<\0\0", ">=\0", "!=\0", "==\0", "<=\0", ">\0\0", "<\0\0", ">\0\0", "\0\0\0", "\0\0\0", "<\0\0", ">=\0", "<=\0", ">\0\0"}; 10 | 11 | void disassemble_insn (cs_insn instruction, jump_block* parent) 12 | { 13 | unsigned int target_addr; 14 | 15 | printf ("%p:", instruction.address + parent->start); 16 | 17 | if ((instruction.id >= X86_INS_JAE && instruction.id <= X86_INS_JS) || instruction.id == X86_INS_CALL) 18 | { 19 | target_addr = relative_insn (&instruction, instruction.address + instruction.size + parent->start); 20 | printf ("\t%s\t%p\n", instruction.mnemonic, target_addr); 21 | } 22 | else 23 | printf ("\t%s\t%s\n", instruction.mnemonic, instruction.op_str); 24 | } 25 | 26 | void decompile_insn (cs_insn instruction, cs_insn next_instruction, jump_block* parent) 27 | { 28 | char* line = malloc (128); 29 | Elf32_Sym* name_sym = NULL; 30 | Elf64_Sym* name_sym64 = NULL; 31 | char* name_string = NULL; 32 | int len; 33 | int line_length = 128; 34 | var* temp = NULL; 35 | var* temp2 = NULL; 36 | bzero (line, 128); 37 | int actual_translation_size = strlen (translation); 38 | int i; 39 | int target_addr; 40 | char is_recognized = 1; 41 | char* plt_entry; 42 | 43 | if (!instruction.detail->x86.op_count || instruction.detail->x86.operands [0].type != X86_OP_REG || (instruction.detail->x86.operands [0].reg != X86_REG_RBP && instruction.detail->x86.operands [0].reg != X86_REG_RSP && instruction.detail->x86.operands [0].reg != X86_REG_EBP && instruction.detail->x86.operands [0].reg != X86_REG_ESP)) 44 | { 45 | switch (instruction.id) 46 | { 47 | //We cut out anything involving EBP or ESP since these are not general purpose registers and would not be part of original program arithmetic 48 | //insn_mov through insn_xor are just basic arithmetic operators 49 | case X86_INS_MOVD: 50 | case X86_INS_MOVQ: 51 | case X86_INS_MOVDQ2Q: 52 | case X86_INS_MOVQ2DQ: 53 | case X86_INS_MOV: 54 | case X86_INS_LEA: 55 | temp = add_var (instruction.detail->x86.operands [0]); 56 | temp2 = add_var (instruction.detail->x86.operands [1]); 57 | if (instruction.id != X86_INS_LEA) 58 | { 59 | if (constant_format [1] == 'd') 60 | { 61 | if (temp->type != DEREF && temp2->type != DEREF) 62 | sprintf (line, "%s = %s;\n", temp->name, temp2->name); 63 | else if (temp->type == DEREF && temp2->type != DEREF) 64 | sprintf (line, "*(%s*)(%s+(%d)) = %s;\n", temp->c_type, temp->name, temp->loc.disp, temp2->name); 65 | else if (temp->type != DEREF && temp2->type == DEREF) 66 | sprintf (line, "%s = *(%s*)(%s+(%d));\n", temp->name, temp2->c_type, temp2->name, temp2->loc.disp); 67 | else 68 | sprintf (line, "*(%s*)(%s+(%d)) = *(%s*)(%s+(%d))", temp->c_type, temp->name, temp->loc.disp, temp2->c_type, temp2->name, temp2->loc.disp); 69 | } 70 | else 71 | { 72 | if (temp->type != DEREF && temp2->type != DEREF) 73 | sprintf (line, "%s = %s;\n", temp->name, temp2->name); 74 | else if (temp->type == DEREF && temp2->type != DEREF) 75 | sprintf (line, "*(%s*)(%s+(%p)) = %s;\n", temp->c_type, temp->name, temp->loc.disp, temp2->name); 76 | else if (temp->type != DEREF && temp2->type == DEREF) 77 | sprintf (line, "%s = *(%s*)(%s+(%p));\n", temp->name, temp2->c_type, temp2->name, temp2->loc.disp); 78 | else 79 | sprintf (line, "*(%s*)(%s+(%p)) = *(%s*)(%s+(%p))", temp->c_type, temp->name, temp->loc.disp, temp2->c_type, temp2->name, temp2->loc.disp); 80 | } 81 | } 82 | else 83 | sprintf (line, "%s = (%s)&%s;\n", temp->name, temp2->c_type, temp2->name); 84 | break; 85 | case X86_INS_SUB: 86 | case X86_INS_SUBPD: 87 | case X86_INS_SUBPS: 88 | case X86_INS_SUBSS: 89 | temp = add_var (instruction.detail->x86.operands [0]); 90 | temp2 = add_var (instruction.detail->x86.operands [1]); 91 | sprintf (line, "%s -= %s;\n", temp->name, temp2->name); 92 | break; 93 | case X86_INS_ADD: 94 | case X86_INS_ADDPD: 95 | case X86_INS_ADDPS: 96 | case X86_INS_ADDSD: 97 | case X86_INS_ADDSS: 98 | temp = add_var (instruction.detail->x86.operands [0]); 99 | temp2 = add_var (instruction.detail->x86.operands [1]); 100 | sprintf (line, "%s += %s;\n", temp->name, temp2->name); 101 | break; 102 | case X86_INS_IMUL: 103 | case X86_INS_MUL: 104 | temp = add_var (instruction.detail->x86.operands [0]); 105 | temp2 = add_var (instruction.detail->x86.operands [1]); 106 | sprintf (line, "%s *= %s;\n", temp->name, temp2->name); 107 | break; 108 | case X86_INS_DIV: 109 | case X86_INS_IDIV: 110 | temp = add_var (instruction.detail->x86.operands [0]); 111 | temp2 = add_var (instruction.detail->x86.operands [1]); 112 | sprintf (line, "%s /= %s;\n", temp->name, temp2->name); 113 | break; 114 | case X86_INS_AND: 115 | case X86_INS_ANDN: 116 | case X86_INS_ANDNPD: 117 | case X86_INS_ANDNPS: 118 | case X86_INS_ANDPD: 119 | case X86_INS_ANDPS: 120 | temp = add_var (instruction.detail->x86.operands [0]); 121 | temp2 = add_var (instruction.detail->x86.operands [1]); 122 | sprintf (line, "%s &= %s;\n", temp->name, temp2->name); 123 | break; 124 | case X86_INS_ORPD: 125 | case X86_INS_ORPS: 126 | temp = add_var (instruction.detail->x86.operands [0]); 127 | temp2 = add_var (instruction.detail->x86.operands [1]); 128 | sprintf (line, "%s |= %s;\n", temp->name, temp2->name); 129 | break; 130 | case X86_INS_SHR: 131 | case X86_INS_SHRD: 132 | case X86_INS_SHRX: 133 | temp = add_var (instruction.detail->x86.operands [0]); 134 | temp2 = add_var (instruction.detail->x86.operands [1]); 135 | sprintf (line, "%s = %s >> %s;\n", temp->name, temp->name, temp2->name); 136 | break; 137 | case X86_INS_SHL: 138 | case X86_INS_SHLD: 139 | case X86_INS_SHLX: 140 | temp = add_var (instruction.detail->x86.operands [0]); 141 | temp2 = add_var (instruction.detail->x86.operands [1]); 142 | sprintf (line, "%s = %s << %s;\n", temp->name, temp->name, temp2->name); 143 | break; 144 | case X86_INS_XORPD: 145 | case X86_INS_XORPS: 146 | temp = add_var (instruction.detail->x86.operands [0]); 147 | temp2 = add_var (instruction.detail->x86.operands [1]); 148 | sprintf (line, "%s ^= %s;\n", temp->name, temp2->name); 149 | break; 150 | case X86_INS_NOT: 151 | temp = add_var (instruction.detail->x86.operands [0]); 152 | sprintf (line, "%s = ~%s;\n", temp->name, temp->name); 153 | break; 154 | case X86_INS_DEC: 155 | temp = add_var (instruction.detail->x86.operands [0]); 156 | sprintf (line, "%s --;\n", temp->name); 157 | break; 158 | case X86_INS_INC: 159 | temp = add_var (instruction.detail->x86.operands [0]); 160 | sprintf (line, "%s ++;\n", temp->name); 161 | break; 162 | case X86_INS_POP: 163 | case X86_INS_POPAW: 164 | case X86_INS_POPAL: 165 | case X86_INS_POPCNT: 166 | case X86_INS_POPF: 167 | case X86_INS_POPFD: 168 | case X86_INS_POPFQ: 169 | break; 170 | case X86_INS_RET: 171 | sprintf (line, "return eax;\n"); //All functions return EAX 172 | break; 173 | case X86_INS_LEAVE: 174 | break; 175 | case X86_INS_PUSH: //pushing onto the stack is how the caller passes arguments to the the callee 176 | case X86_INS_PUSHAW: 177 | case X86_INS_PUSHAL: 178 | case X86_INS_PUSHF: 179 | case X86_INS_PUSHFD: 180 | case X86_INS_PUSHFQ: 181 | temp = add_var (instruction.detail->x86.operands [0]); 182 | if (temp->type != REG || (strcmp (temp->name, "ebp") && strcmp (temp->name, "esp") && strcmp (temp->name, "ecx"))) 183 | { 184 | //Add the variable to the argument array (caller_param). Cannot use argument linked list because the variables used are already linked into the local 185 | //variable linked list. 186 | if (!caller_param) 187 | { 188 | caller_params_size = 8*sizeof (var); 189 | caller_param = malloc (caller_params_size); 190 | } 191 | num_caller_params ++; 192 | if (num_caller_params * sizeof (var) > caller_params_size) 193 | { 194 | caller_params_size *= 2; 195 | caller_param = realloc (caller_param, caller_params_size); 196 | } 197 | caller_param [num_caller_params-1] = *temp; 198 | 199 | } 200 | break; 201 | case X86_INS_CALL: 202 | target_addr = relative_insn (&instruction, instruction.address + instruction.size + parent->start); 203 | if (addr_to_index (target_addr) < file_size) 204 | { 205 | if (architecture == ELFCLASS32) 206 | name_sym = find_reloc_sym (*(int*)&(file_buf [addr_to_index (target_addr)+2])); 207 | else 208 | { 209 | plt_entry = &(file_buf [addr_to_index (target_addr)+2]); 210 | name_sym64 = find_reloc_sym64 ((*(short*)(plt_entry) | ((int)*(short*)(plt_entry+2) << 16)) + target_addr + 6); 211 | } 212 | if (name_sym || name_sym64) 213 | { 214 | if (architecture == ELFCLASS32) 215 | name_string = dynamic_string_table + name_sym->st_name; 216 | else 217 | name_string = dynamic_string_table + name_sym64->st_name; 218 | } 219 | else 220 | { 221 | if (architecture == ELFCLASS32) 222 | name_sym = find_sym (symbol_table.arch1, symbol_table_end.arch1, target_addr); 223 | else 224 | name_sym64 = find_sym64 (symbol_table.arch2, symbol_table_end.arch2, target_addr); 225 | 226 | if (name_sym) 227 | name_string = string_table + name_sym->st_name; 228 | else if (name_sym64) 229 | name_string = string_table + name_sym64->st_name; 230 | } 231 | } 232 | if (name_string) 233 | { 234 | len = strlen (name_string); 235 | if (len + 12 + num_caller_params*22 > line_length) 236 | { 237 | line_length = len + 12 + num_caller_params*22; 238 | line = realloc (line, line_length); 239 | } 240 | sprintf (line, "eax = %s (", name_string); 241 | } 242 | else 243 | sprintf (line, "eax = func_%p (", target_addr); 244 | 245 | //Print the argument list 246 | if (caller_param) 247 | { 248 | sprintf (&(line [strlen (line)]), "%s", caller_param [num_caller_params-1].name); 249 | for (i = num_caller_params-2; i >= 0; i --) 250 | { 251 | if (caller_param [i].type == DEREF) 252 | sprintf (&(line [strlen (line)]), ", *(%s*)(%s+(%d))", caller_param [i].c_type, caller_param [i].name, caller_param [i].loc.disp); 253 | else 254 | sprintf (&(line [strlen (line)]), ", %s", caller_param [i].name); 255 | } 256 | } 257 | 258 | sprintf (&(line [strlen (line)]), ");\n"); 259 | free (caller_param); 260 | caller_param = NULL; 261 | num_caller_params = 0; 262 | caller_params_size = 0; 263 | break; 264 | case X86_INS_TEST: //the test instruction is normally found in the context test %eax,%eax. This compares EAX to 0. 265 | //Instruction after a compare or a test is usually a conditional jump 266 | target_addr = relative_insn (&next_instruction, next_instruction.address + next_instruction.size + parent->start); 267 | temp = add_var (instruction.detail->x86.operands [0]); 268 | if (language_flag == 'f') 269 | { 270 | if (parent->flags & IS_WHILE) 271 | sprintf (next_line, "while (%s %s 0)\n", temp->name, test_conditions [next_instruction.bytes [0] - 0x72]); 272 | else if (target_addr > next_instruction.address + parent->start) 273 | sprintf (next_line, "if (%s %s 0)\n", temp->name, test_conditions [next_instruction.bytes [0] - 0x72]); //The conditional jumps start with "jump if below," which has an opcode of 0x72 274 | else 275 | { 276 | sprintf (next_line, "} while (%s %s 0);\n", temp->name, test_conditions [next_instruction.bytes [0] - 0x72]); 277 | num_tabs -= 2; 278 | } 279 | num_tabs ++; 280 | } 281 | else 282 | sprintf (line, "%s %s, %s\n", instruction.mnemonic, temp->name, temp->name); 283 | 284 | break; 285 | case X86_INS_CMP: //the compare instructions just "compares" its two operands 286 | case X86_INS_CMPPD: 287 | case X86_INS_CMPPS: 288 | case X86_INS_CMPSB: 289 | case X86_INS_CMPSD: 290 | case X86_INS_CMPSQ: 291 | case X86_INS_CMPSS: 292 | case X86_INS_CMPSW: 293 | temp = add_var (instruction.detail->x86.operands [0]); 294 | temp2 = add_var (instruction.detail->x86.operands [1]); 295 | if (language_flag == 'f') 296 | { 297 | target_addr = relative_insn (&next_instruction, next_instruction.address + next_instruction.size + parent->start); 298 | if (parent->flags & IS_WHILE) 299 | sprintf (next_line, "while (%s %s %s)\n", temp->name, test_conditions [next_instruction.bytes [0] - 0x72], temp2->name); 300 | else if (target_addr > next_instruction.address + parent->start) 301 | sprintf (next_line, "if (%s %s %s)\n", temp->name, test_conditions [next_instruction.bytes [0] - 0x72], temp2->name); 302 | else 303 | { 304 | sprintf (next_line, "} while (%s %s %s);\n", temp->name, test_conditions [next_instruction.bytes [0] - 0x72], temp2->name); 305 | num_tabs -= 2; 306 | } 307 | num_tabs ++; 308 | } 309 | else 310 | sprintf (line, "%s %s, %s\n", instruction.mnemonic, temp->name, temp2->name); 311 | 312 | break; 313 | case X86_INS_JAE: 314 | case X86_INS_JA: 315 | case X86_INS_JBE: 316 | case X86_INS_JB: 317 | case X86_INS_JCXZ: 318 | case X86_INS_JECXZ: 319 | case X86_INS_JE: 320 | case X86_INS_JGE: 321 | case X86_INS_JG: 322 | case X86_INS_JLE: 323 | case X86_INS_JL: 324 | case X86_INS_JNE: 325 | case X86_INS_JNO: 326 | case X86_INS_JNP: 327 | case X86_INS_JNS: 328 | case X86_INS_JO: 329 | case X86_INS_JP: 330 | case X86_INS_JRCXZ: 331 | case X86_INS_JS: 332 | if (language_flag == 'f') 333 | { 334 | sprintf (line, next_line); 335 | bzero (next_line, 128); 336 | } 337 | else 338 | { 339 | target_addr = relative_insn (&instruction, instruction.address + instruction.size + parent->start); 340 | sprintf (line, "%s %p\n", instruction.mnemonic, target_addr); 341 | } 342 | 343 | break; 344 | case X86_INS_JMP: 345 | if (language_flag != 'f') 346 | { 347 | target_addr = relative_insn (&instruction, instruction.address + instruction.size + parent->start); 348 | sprintf (line, "%s %p\n", instruction.mnemonic, target_addr); 349 | } 350 | break; 351 | case X86_INS_NOP: 352 | break; 353 | default: 354 | sprintf (line, "\t%s\t%s", instruction.mnemonic, instruction.op_str); 355 | line [strlen (line)] = '\n'; 356 | is_recognized = 0; 357 | break; 358 | } 359 | } 360 | else if (instruction.detail->x86.operands [0].type == X86_OP_REG && (instruction.detail->x86.operands [0].reg == X86_REG_EBP || instruction.detail->x86.operands [0].reg == X86_REG_ESP)) 361 | is_recognized = 0; 362 | 363 | //Add the translated line to the final translation 364 | if (actual_translation_size + strlen (line) + num_tabs > translation_size) 365 | { 366 | translation_size = 2*(translation_size + strlen (line) + num_tabs); 367 | translation = realloc (translation, translation_size); 368 | } 369 | if (language_flag == 'f' && is_recognized) 370 | { 371 | if (line [0] && ((instruction.id < X86_INS_JAE || instruction.id > X86_INS_JS || instruction.id == X86_INS_JMP) || line [0] == '}')) 372 | { 373 | for (i = 0; i < num_tabs; i ++) 374 | { 375 | sprintf (&(translation [actual_translation_size]), "\t"); 376 | actual_translation_size ++; 377 | } 378 | } 379 | else if ((instruction.id >= X86_INS_JAE && instruction.id <= X86_INS_JS && instruction.id != X86_INS_JMP) && line [0] != '}') 380 | { 381 | for (i = 0; i < num_tabs-1; i ++) 382 | { 383 | sprintf (&(translation [actual_translation_size]), "\t"); 384 | actual_translation_size ++; 385 | } 386 | } 387 | } 388 | else if (is_recognized && (!temp || (strcmp (temp->name, "ebp") && strcmp (temp->name, "esp"))) && strncmp (instruction.mnemonic, "push", 4) && strncmp (instruction.mnemonic, "pop", 3) && instruction.id != X86_INS_LEAVE && instruction.id != X86_INS_NOP) 389 | { 390 | sprintf (&(translation [actual_translation_size]), "\t"); 391 | actual_translation_size ++; 392 | } 393 | strcpy (&(translation [actual_translation_size]), line); 394 | if (language_flag == 'f' && is_recognized) 395 | { 396 | if ((instruction.id >= X86_INS_JAE && instruction.id <= X86_INS_JS && instruction.id != X86_INS_JMP) && line [0] != '}') 397 | { 398 | actual_translation_size = strlen (translation); 399 | for (i = 0; i < num_tabs - 1; i ++) 400 | { 401 | sprintf (&(translation [actual_translation_size]), "\t"); 402 | actual_translation_size ++; 403 | } 404 | sprintf (&(translation [actual_translation_size]), "{\n"); 405 | } 406 | } 407 | 408 | actual_translation_size += strlen (line); 409 | translation [actual_translation_size] = '\0'; 410 | 411 | free (line); 412 | } 413 | 414 | void disassemble_jump_block (jump_block* to_translate) 415 | { 416 | int i; 417 | 418 | //Translate every instruction contained in jump block 419 | for (i = 0; i < to_translate->num_instructions; i ++) 420 | disassemble_insn (to_translate->instructions [i], to_translate); 421 | } 422 | 423 | void partial_decompile_jump_block (jump_block* to_translate, function* parent) 424 | { 425 | int i; 426 | int j; 427 | int len; 428 | 429 | //Translate every instruction contained in jump block 430 | for (i = 0; i < to_translate->num_instructions; i ++) 431 | decompile_insn (to_translate->instructions [i], to_translate->instructions [i+1], to_translate); 432 | 433 | for (i = 0; i < parent->num_jump_addrs; i ++) 434 | { 435 | if (to_translate->next->start == parent->jump_addrs [i]) 436 | { 437 | len = strlen (translation); 438 | if (len + 12 > translation_size) 439 | { 440 | translation_size = 2*(len + 12); 441 | translation = realloc (translation, translation_size); 442 | } 443 | sprintf (&(translation [len]), "%p:\n", to_translate->next->start); 444 | 445 | for (j = 0; j < parent->num_jump_addrs; j ++) 446 | { 447 | if (to_translate->next->start == parent->jump_addrs [j]) 448 | parent->jump_addrs [j] = 0; 449 | } 450 | } 451 | } 452 | } 453 | 454 | //Final translation of each jump block, among other things. 455 | //Also contains some routines that need to be performed per jump block. 456 | //These routines are second iteration routines (see jump_block_preprocessing header) 457 | void decompile_jump_block (jump_block* to_translate, function* parent) 458 | { 459 | int i; 460 | int j; 461 | int len; 462 | unsigned int target = 0; 463 | unsigned int target2 = 0; 464 | cs_insn* last_instruction = &(to_translate->instructions [to_translate->num_instructions-1]); 465 | 466 | if (last_instruction->id == X86_INS_JMP) //Get unconditional jump target address, if possible 467 | { 468 | target = last_instruction->size + last_instruction->address + to_translate->start; 469 | target = relative_insn (last_instruction, target); 470 | } 471 | if (last_instruction->id >= X86_INS_JAE && last_instruction->id <= X86_INS_JS && last_instruction->id != X86_INS_JMP) //Get conditional jump target address, if possible 472 | { 473 | target2 = last_instruction->size + last_instruction->address + to_translate->start; 474 | target2 = relative_insn (last_instruction, target2); 475 | } 476 | unsigned int orig = last_instruction->address + to_translate->start; //Get address of last instruction in block 477 | 478 | //Need to print out a "}" at the end of a non-nested IF/ELSE statement, so we need to override the "do not place } at an unconditional jump address" rule. 479 | if (to_translate->next && target && to_translate->next->flags & IS_ELSE) 480 | file_buf [last_instruction->address + addr_to_index (to_translate->start)] = 0xea; 481 | 482 | //Sets up pivot addresses for "}" placement algorithm. 483 | //Pivot addresses is the conditional jump of the IF statement associated with the ELSE right before the duplicated target 484 | if (to_translate->next && to_translate->next->flags & IS_IF) 485 | { 486 | for (i = 0; i < parent->num_dups; i ++) 487 | { 488 | if (target2 && target2 == parent->else_starts [i]) 489 | { 490 | parent->pivots [i] = orig; 491 | break; 492 | } 493 | } 494 | } 495 | 496 | //"}" placement algorithm 497 | if (target || target2) 498 | { 499 | for (i = 0; i < parent->num_dups; i ++) 500 | { 501 | if (target2 && target2 == parent->dup_targets [i]) 502 | { 503 | //Any jump that comes after the associated if statement, before the else statement, and isn't an unconditional jump immediately before the else 504 | //block should be redirected to the start of the else block in order to get the rest of decompile_jump_blocks to place the "}" correctly 505 | if (orig >= parent->pivots [i] && orig < parent->else_starts [i] && parent->pivots [i] && to_translate->end != parent->else_starts [i]) 506 | { 507 | if (to_translate->next->end != parent->dup_targets [i]) 508 | { 509 | for (j = 0; j < parent->num_jump_addrs; j++) 510 | { 511 | if (parent->jump_addrs [j] == parent->dup_targets [i]) 512 | { 513 | parent->jump_addrs [j] = parent->else_starts [i]; 514 | break; 515 | } 516 | } 517 | } 518 | } 519 | break; 520 | } 521 | else if (target && target == parent->dup_targets [i]) 522 | { 523 | if (orig >= parent->pivots [i] && orig < parent->else_starts [i] && parent->pivots [i] && to_translate->end != parent->else_starts [i]) 524 | { 525 | if (to_translate->next->end != parent->dup_targets [i]) 526 | { 527 | for (j = 0; j < parent->num_jump_addrs; j++) 528 | { 529 | if (parent->jump_addrs [j] == parent->dup_targets [i]) 530 | { 531 | parent->jump_addrs [j] = parent->else_starts [i]; 532 | break; 533 | } 534 | } 535 | } 536 | } 537 | break; 538 | } 539 | } 540 | } 541 | 542 | //From here on out, code in this function will ACTUALLY append the translation to the end of the "translation" string. 543 | for (i = 0; i < parent->num_jump_addrs; i ++) 544 | { 545 | if (to_translate->start == parent->jump_addrs [i]) 546 | { 547 | if (parent->orig_addrs [i]) 548 | { 549 | if ((unsigned char)(file_buf [addr_to_index (parent->orig_addrs [i])]) != 0xeb) //0xeb is the unconditional jump opcode. Don't put a "}" or a "do" inside of a while loop TODO: this is dumb, find a different way to check jumps 550 | { 551 | len = strlen (translation); 552 | if (to_translate->start > parent->orig_addrs [i]) 553 | { 554 | if (len + 2 + num_tabs > translation_size) 555 | { 556 | translation_size = 2*(len + 2 + num_tabs); 557 | translation = realloc (translation, translation_size); 558 | } 559 | num_tabs --; 560 | for (j = 0; j < num_tabs; j ++) 561 | { 562 | sprintf (&(translation [len]), "\t"); 563 | len ++; 564 | } 565 | sprintf (&(translation [len]), "}\n"); 566 | } 567 | } 568 | } 569 | } 570 | } 571 | 572 | for (i = 0; i < parent->num_jump_addrs; i ++) 573 | { 574 | if (to_translate->start == parent->jump_addrs [i]) 575 | { 576 | if (parent->orig_addrs [i]) 577 | { 578 | if ((unsigned char)(file_buf [addr_to_index (parent->orig_addrs [i])]) != 0xeb) //0xeb is the unconditional jump opcode 579 | { 580 | len = strlen (translation); 581 | if (to_translate->start < parent->orig_addrs [i]) 582 | { 583 | if (len + 6 + 2*num_tabs > translation_size) 584 | { 585 | translation_size = 2*(len + 6 + 2*num_tabs); 586 | translation = realloc (translation, translation_size); 587 | } 588 | for (j = 0; j < num_tabs; j ++) 589 | { 590 | sprintf (&(translation [len]), "\t"); 591 | len ++; 592 | } 593 | sprintf (&(translation [len]), "do\n"); 594 | len += 3; 595 | for (j = 0; j < num_tabs; j ++) 596 | { 597 | sprintf (&(translation [len]), "\t"); 598 | len ++; 599 | } 600 | sprintf (&(translation [len]), "{\n"); 601 | num_tabs ++; 602 | } 603 | } 604 | } 605 | } 606 | } 607 | 608 | if (to_translate->flags & IS_ELSE) 609 | { 610 | len = strlen (translation); 611 | if (len + 7 + 2*num_tabs > translation_size) 612 | { 613 | translation_size = 2*(len + 7 + 2*num_tabs); 614 | translation = realloc (translation, translation_size); 615 | } 616 | for (i = 0; i < num_tabs; i ++) 617 | { 618 | sprintf (&(translation [len]), "\t"); 619 | len ++; 620 | } 621 | sprintf (&(translation [len]), "else\n"); 622 | len += 5; 623 | for (i = 0; i < num_tabs; i ++) 624 | { 625 | sprintf (&(translation [len]), "\t"); 626 | len ++; 627 | } 628 | sprintf (&(translation [len]), "{\n"); 629 | num_tabs ++; 630 | } 631 | 632 | if (to_translate->flags & NO_TRANSLATE) 633 | return; 634 | 635 | //Translate every instruction contained in jump block 636 | for (i = 0; i < to_translate->num_instructions; i ++) 637 | decompile_insn (to_translate->instructions [i], to_translate->instructions [i+1], to_translate); 638 | 639 | if (to_translate->flags & IS_BREAK) 640 | { 641 | len = strlen (translation); 642 | if (len + 7 + num_tabs > translation_size) 643 | { 644 | translation_size = 2*(len + 7 + num_tabs); 645 | translation = realloc (translation, translation_size); 646 | } 647 | for (i = 0; i < num_tabs; i ++) 648 | { 649 | sprintf (&(translation [len]), "\t"); 650 | len ++; 651 | } 652 | sprintf (&(translation [len]), "break;\n"); 653 | } 654 | else if (to_translate->flags & IS_CONTINUE) 655 | { 656 | len = strlen (translation); 657 | if (len + 10 + num_tabs > translation_size) 658 | { 659 | translation_size = 2*(len + 10 + num_tabs); 660 | translation = realloc (translation, translation_size); 661 | } 662 | for (i = 0; i < num_tabs; i ++) 663 | { 664 | sprintf (&(translation [len]), "\t"); 665 | len ++; 666 | } 667 | sprintf (&(translation [len]), "continue;\n"); 668 | } 669 | else if (to_translate->flags & IS_GOTO) 670 | { 671 | len = strlen (translation); 672 | if (len + 17 + num_tabs > translation_size) 673 | { 674 | translation_size = 2*(len + 18 + num_tabs); 675 | translation = realloc (translation, translation_size); 676 | } 677 | for (i = 0; i < num_tabs; i ++) 678 | { 679 | sprintf (&(translation [len]), "\t"); 680 | len ++; 681 | } 682 | sprintf (&(translation [len]), "goto %p;\n", target); 683 | } 684 | 685 | for (i = 0; i < parent->num_jump_addrs; i ++) 686 | { 687 | if (to_translate->next->start == parent->jump_addrs [i] && !(parent->orig_addrs [i])) 688 | { 689 | len = strlen (translation); 690 | if (len + 12 > translation_size) 691 | { 692 | translation_size = 2*(len + 12); 693 | translation = realloc (translation, translation_size); 694 | } 695 | sprintf (&(translation [len]), "%p:\n", to_translate->next->start); 696 | } 697 | } 698 | } 699 | 700 | //Jump block "preprocessing" 701 | //Some per jump block routines require other routines to have been performed on EVERY jump block before they can be called. 702 | //So we put first iteration routines in this function 703 | void jump_block_preprocessing (jump_block* to_process, function* parent) 704 | { 705 | int i = 0; 706 | int j = 0; 707 | unsigned int target = 0; 708 | unsigned int target2 = 0; 709 | cs_insn* last_instruction = &(to_process->instructions [to_process->num_instructions-1]); 710 | 711 | if (last_instruction->id == X86_INS_JMP) 712 | { 713 | target = last_instruction->size + last_instruction->address + to_process->start; 714 | target = relative_insn (last_instruction, target); 715 | } 716 | if (last_instruction->id >= X86_INS_JAE && last_instruction->id <= X86_INS_JS && last_instruction->id != X86_INS_JMP) 717 | { 718 | target2 = last_instruction->size + last_instruction->address + to_process->start; 719 | target2 = relative_insn (last_instruction, target2); 720 | } 721 | unsigned int orig = last_instruction->address + to_process->start; 722 | 723 | if (target2) 724 | to_process->next->flags |= IS_IF; 725 | 726 | if (to_process->flags & IS_LOOP) 727 | to_process->next->flags |= IS_AFTER_LOOP; 728 | 729 | if (target || (last_instruction->id >= X86_INS_JAE && last_instruction->id <= X86_INS_JS && last_instruction->id != X86_INS_JMP)) 730 | { 731 | for (i = 0; i < parent->num_jump_addrs; i++) 732 | { 733 | if (parent->jump_addrs [i] == target && parent->orig_addrs [i] != orig) //This instruction has the same target address as another, therefore we have a duplicate jump target 734 | { 735 | for (j = 0; j < parent->num_dups; j ++) 736 | { 737 | if (parent->dup_targets [j] == target) 738 | break; 739 | } 740 | if (j == parent->num_dups) 741 | { 742 | parent->num_dups ++; 743 | 744 | //Memory management for "}" placement algorithms 745 | if (parent->num_dups * sizeof (unsigned int) >= parent->dup_targets_buf_size) 746 | { 747 | parent->dup_targets_buf_size *= 2; 748 | parent->dup_targets = realloc (parent->dup_targets, parent->dup_targets_buf_size); 749 | parent->else_starts = realloc (parent->else_starts, parent->dup_targets_buf_size); 750 | parent->pivots = realloc (parent->pivots, parent->dup_targets_buf_size); 751 | } 752 | 753 | //Document memory addresses that are the targets of multiple jump instructions 754 | parent->dup_targets [parent->num_dups-1] = target; 755 | parent->else_starts [parent->num_dups-1] = 0; 756 | parent->pivots [parent->num_dups-1] = 0; 757 | } 758 | } 759 | } 760 | } 761 | 762 | if (target2 && target2 > orig) 763 | { 764 | struct search_params params; 765 | jump_block* if_target; 766 | params.key = target2; 767 | params.ret = (void**)&if_target; 768 | list_loop (search_start_addrs, to_process, to_process, params); 769 | 770 | if_target->flags |= IS_IF_TARGET; 771 | } 772 | if (target) //Possibly a while loop (block ends in unconditional jump) 773 | { 774 | struct search_params params; 775 | jump_block* while_block; 776 | jump_block* new_block; 777 | params.key = target; 778 | params.ret = (void**)&while_block; 779 | list_loop (search_start_addrs, to_process, to_process, params); //Search for the jump block targetted by this jump instructions 780 | unsigned int target3; 781 | target3 = while_block->instructions [while_block->num_instructions-1].size + while_block->instructions [while_block->num_instructions-1].address + while_block->start; 782 | target3 = relative_insn (&(while_block->instructions [while_block->num_instructions-1]), target3); 783 | 784 | if (while_block->flags & IS_AFTER_LOOP) 785 | { 786 | for (i = 0; i < parent->num_jump_addrs; i++) 787 | { 788 | if (parent->jump_addrs [i] == target && parent->orig_addrs [i] == orig) 789 | parent->jump_addrs [i] = 0; 790 | } 791 | to_process->flags |= IS_BREAK; 792 | } 793 | else if (while_block->flags & IS_LOOP && target3 <= orig) 794 | { 795 | for (i = 0; i < parent->num_jump_addrs; i++) 796 | { 797 | if (parent->jump_addrs [i] == target && parent->orig_addrs [i] == orig) 798 | parent->jump_addrs [i] = 0; 799 | } 800 | to_process->flags |= IS_CONTINUE; 801 | } 802 | else if (to_process->next->flags & IS_IF_TARGET && !(target && target < orig) && !(to_process->flags & IS_AFTER_ELSE)) 803 | { 804 | to_process->next->flags |= IS_ELSE; //End of IF statement, and ELSE statement exists, so the next block should be the start of a ELSE statement 805 | 806 | //Keep track of else statements that come directly before addresses pointed to by multiple jump instructions (see "}" placement algorithm in decompile_jump_block) 807 | for (i = 0; i < parent->num_dups; i++) 808 | { 809 | if (target == parent->dup_targets [i]) 810 | { 811 | parent->else_starts [i] = to_process->next->start; 812 | break; 813 | } 814 | } 815 | 816 | while_block->flags |= IS_AFTER_ELSE; 817 | } 818 | else if (while_block->flags & IS_LOOP) //If this series of jump instructions ends in a backwards conditional jump, then this unconditional jump is the beginning of a while loop 819 | { 820 | new_block = malloc (sizeof (jump_block)); 821 | *new_block = *while_block; 822 | link (to_process, new_block); 823 | new_block->instructions = (cs_insn*)malloc ((new_block->num_instructions+1) * sizeof (cs_insn)); 824 | for (i = 0; i < new_block->num_instructions; i++) //Copy all instructions from while_block 825 | { 826 | new_block->instructions [i] = while_block->instructions [i]; 827 | new_block->instructions [i].detail = (cs_detail*)malloc (sizeof(cs_detail)); 828 | *(new_block->instructions [i].detail) = *(while_block->instructions [i].detail); 829 | } 830 | 831 | parent->num_jump_addrs ++; 832 | parent->jump_addrs [parent->num_jump_addrs-1] = while_block->end; 833 | parent->orig_addrs [parent->num_jump_addrs-1] = new_block->instructions [i-1].address + to_process->start; 834 | 835 | new_block->flags |= IS_WHILE; 836 | while_block->flags |= NO_TRANSLATE; 837 | for (i = 0; i < parent->num_jump_addrs; i++) 838 | { 839 | if (parent->orig_addrs [i] == while_block->instructions [while_block->num_instructions-1].address + while_block->start) 840 | parent->jump_addrs [i] = 0; 841 | } 842 | } 843 | else 844 | { 845 | for (i = 0; i < parent->num_jump_addrs; i++) 846 | { 847 | if (parent->jump_addrs [i] == target && parent->orig_addrs [i] == orig) 848 | parent->orig_addrs [i] = 0; 849 | } 850 | to_process->flags |= IS_GOTO; 851 | } 852 | } 853 | } 854 | 855 | void translate_func (function* to_translate) 856 | { 857 | Elf32_Sym* func_name = NULL; 858 | Elf64_Sym* func_name64 = NULL; 859 | 860 | //Disassemble the jump blocks again 861 | list_loop (parse_instructions, to_translate->jump_block_list, to_translate->jump_block_list); 862 | 863 | if (language_flag == 'd') 864 | { 865 | if (architecture == ELFCLASS32) 866 | func_name = find_sym (symbol_table.arch1, symbol_table_end.arch1, to_translate->start_addr); 867 | else 868 | func_name64 = find_sym64 (symbol_table.arch2, symbol_table_end.arch2, to_translate->start_addr); 869 | if (func_name) 870 | printf ("int func_%p (%s)\n{\n", to_translate->start_addr, string_table + func_name->st_name); 871 | else if (func_name64) 872 | printf ("int func_%p (%s)\n{\n", to_translate->start_addr, string_table + func_name64->st_name); 873 | else 874 | printf ("int func_%p\n{\n", to_translate->start_addr); 875 | list_loop (disassemble_jump_block, to_translate->jump_block_list, to_translate->jump_block_list); 876 | printf ("}\n\n"); 877 | } 878 | else 879 | { 880 | var* current_var; 881 | var* current_global = global_list; 882 | num_tabs = 1; 883 | 884 | while (current_global && current_global->next != global_list) 885 | current_global = current_global->next; 886 | 887 | //Reset variable finding state machine 888 | var_list = NULL; 889 | callee_param = NULL; 890 | caller_param = NULL; 891 | translation_size = 256; 892 | translation = malloc (translation_size); 893 | bzero (translation, translation_size); 894 | 895 | bzero (next_line, 128); 896 | 897 | //Translate all jump blocks in function 898 | if (language_flag == 'f') 899 | { 900 | list_loop (jump_block_preprocessing, to_translate->jump_block_list, to_translate->jump_block_list, to_translate); 901 | list_loop (decompile_jump_block, to_translate->jump_block_list, to_translate->jump_block_list, to_translate); 902 | } 903 | else 904 | list_loop (partial_decompile_jump_block, to_translate->jump_block_list, to_translate->jump_block_list, to_translate); 905 | 906 | if (current_global) 907 | list_loop (print_declarations, current_global, global_list, 0); 908 | else if (global_list) 909 | list_loop (print_declarations, global_list, global_list, 0); 910 | printf ("\n"); 911 | 912 | //Print function header 913 | //NOTE: EAX will always be returned, what EAX means is up to the caller. 914 | //Since EAX is returned, a 32 bit int will always be returned 915 | if (architecture == ELFCLASS32) 916 | func_name = find_sym (symbol_table.arch1, symbol_table_end.arch1, to_translate->start_addr); 917 | else 918 | func_name64 = find_sym64 (symbol_table.arch2, symbol_table_end.arch2, to_translate->start_addr); 919 | if (func_name) 920 | printf ("int %s (", string_table + func_name->st_name); 921 | else if (func_name64) 922 | printf ("int %s (", string_table + func_name64->st_name); 923 | else 924 | printf ("int func_%p (", to_translate->start_addr); 925 | current_var = callee_param; 926 | if (!callee_param) 927 | printf ("void)\n{\n"); 928 | else 929 | { 930 | //Print parameter list 931 | printf ("%s %s", current_var->c_type, current_var->name); 932 | current_var = current_var->next; 933 | while (current_var != callee_param && current_var) 934 | { 935 | printf (", "); 936 | printf ("%s %s", current_var->c_type, current_var->name); 937 | current_var = current_var->next; 938 | } 939 | printf (")\n{\n"); 940 | 941 | } 942 | 943 | //Print all variable declarations 944 | if (var_list) 945 | { 946 | list_loop (print_declarations, var_list, var_list, 1); 947 | } 948 | printf ("\n"); 949 | 950 | //Print the string translation of the given instructions 951 | printf ("%s}\n\n", translation); 952 | } 953 | 954 | //Cleanup 955 | if (var_list) 956 | clean_var_list (var_list); 957 | if (callee_param) 958 | clean_var_list (callee_param); 959 | free (translation); 960 | free (to_translate->dup_targets); 961 | free (to_translate->else_starts); 962 | free (to_translate->pivots); 963 | } 964 | 965 | void translate_function_list (function* function_list) 966 | { 967 | global_list = NULL; 968 | list_loop (translate_func, function_list, function_list); 969 | if (global_list) 970 | clean_var_list (global_list); 971 | } 972 | 973 | void print_declarations (var* to_print, char should_tab) 974 | { 975 | if (to_print->type != DEREF && to_print->type != CONST && strcmp (to_print->name, "ebp") && strcmp (to_print->name, "esp")) //Dont need to declare constants. ESP and EBP are NOT general purpose 976 | { 977 | if (to_print->type == REG && should_tab) 978 | printf ("\tregister "); 979 | else if (to_print->type == REG) 980 | printf ("register "); 981 | if (should_tab && to_print->type != REG) 982 | printf ("\t%s %s", to_print->c_type, to_print->name); 983 | else 984 | printf ("%s %s", to_print->c_type, to_print->name); 985 | if (to_print->type != PARAM) 986 | printf (";\n"); 987 | } 988 | } 989 | -------------------------------------------------------------------------------- /src/lang_gen.h: -------------------------------------------------------------------------------- 1 | #include "function.h" 2 | #include "jump_block.h" 3 | #include "var.h" 4 | 5 | #pragma once 6 | 7 | char* translation; //String representation of equivalen C code 8 | size_t translation_size; 9 | extern char test_conditions [14] [3]; 10 | char next_line [128]; 11 | int num_tabs; 12 | char language_flag; //f for full decompilation, p for partial decompilation (don't try to interpret control structures), d for disassembly (no decompilation) 13 | 14 | void translate_func (function* to_translate); //Translate and print the C equivalent of the current function 15 | void decompile_jump_block (jump_block* to_translate, function* parent); //Translate all instructions in jump block 16 | void jump_block_preprocessing (jump_block* to_process, function* parent); 17 | void decompile_insn (cs_insn instruction, cs_insn next_instruction, jump_block* parent); 18 | void disassemble_insn (cs_insn instruction, jump_block* parent); 19 | void disassemble_jump_block (jump_block* to_translate); 20 | void partial_decompile_jump_block (jump_block* to_translate, function* parent); 21 | void print_declarations (var* to_print, char should_tab); //Helper function. We frequently need to print a variable's type followed by its name 22 | void translate_function_list (function* function_list); //Print all functions in given function list 23 | -------------------------------------------------------------------------------- /src/main.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "elf_parser.h" 5 | #include "lang_gen.h" 6 | #include "var.h" 7 | 8 | int main (int argc, char** argv) 9 | { 10 | int i; 11 | int j; 12 | char* file_name = NULL; 13 | char* beginning_address_string = NULL; 14 | char* cutoff_address_string = NULL; 15 | unsigned int stop_addr; 16 | char follow_calls = 1; 17 | language_flag = 'f'; 18 | constant_format [0] = '%'; 19 | constant_format [1] = 'd'; 20 | constant_format [2] = '\0'; 21 | 22 | //Parse the command line 23 | for (i = 1; i < argc; i ++) 24 | { 25 | if (argv [i][0] == '-') 26 | { 27 | j = 1; 28 | while (argv [i][j] != '\0') 29 | { 30 | switch (argv [i][j]) 31 | { 32 | case 'f': 33 | break; 34 | case 'p': 35 | language_flag = 'p'; 36 | break; 37 | case 'd': 38 | language_flag = 'd'; 39 | break; 40 | case 's': 41 | follow_calls = 0; 42 | break; 43 | case 'h': 44 | constant_format [1] = 'p'; 45 | break; 46 | default: 47 | printf ("Unrecognized flag \"%c\"\n", argv [i][j]); 48 | exit (-1); 49 | } 50 | j ++; 51 | } 52 | } 53 | else if (file_name == NULL) 54 | file_name = argv [i]; 55 | else if (beginning_address_string == NULL) 56 | beginning_address_string = argv [i]; 57 | else if (cutoff_address_string == NULL) 58 | cutoff_address_string = argv [i]; 59 | else 60 | { 61 | printf ("Unrecognized option \"%s\"\n", argv [i]); 62 | exit (-1); 63 | } 64 | } 65 | 66 | 67 | unsigned int beginning_address; 68 | function* func; 69 | if (beginning_address_string == NULL) 70 | parse_elf (file_name); 71 | 72 | if (beginning_address_string) 73 | { 74 | init_elf_parser (file_name); 75 | if (architecture == ELFCLASS32) 76 | { 77 | if (cs_open (CS_ARCH_X86, CS_MODE_32, &handle) != CS_ERR_OK) 78 | { 79 | printf ("CRITICAL ERROR: Could not initialize Capstone\n"); 80 | exit (-1); 81 | } 82 | } 83 | else 84 | { 85 | if (cs_open (CS_ARCH_X86, CS_MODE_64, &handle) != CS_ERR_OK) 86 | { 87 | printf ("CRITICAL ERROR: Could not initialize Capstone\n"); 88 | exit (-1); 89 | } 90 | } 91 | cs_option (handle, CS_OPT_DETAIL, CS_OPT_ON); 92 | beginning_address = strtoul (beginning_address_string, NULL, 16); 93 | 94 | if (cutoff_address_string) 95 | stop_addr = strtoul (cutoff_address_string, NULL, 16); 96 | else 97 | stop_addr = end_of_text; 98 | 99 | if (beginning_address) 100 | func = init_function (malloc (sizeof (function)), beginning_address, stop_addr); 101 | else 102 | printf ("Error: invalid start address\n"); 103 | } 104 | else if (main_addr) 105 | { 106 | if (architecture == ELFCLASS32) 107 | { 108 | if (cs_open (CS_ARCH_X86, CS_MODE_32, &handle) != CS_ERR_OK) 109 | { 110 | printf ("CRITICAL ERROR: Could not initialize Capstone\n"); 111 | exit (-1); 112 | } 113 | } 114 | else 115 | { 116 | if (cs_open (CS_ARCH_X86, CS_MODE_64, &handle) != CS_ERR_OK) 117 | { 118 | printf ("CRITICAL ERROR: Could not initialize Capstone\n"); 119 | exit (-1); 120 | } 121 | } 122 | 123 | cs_option (handle, CS_OPT_DETAIL, CS_OPT_ON); 124 | 125 | if (cutoff_address_string) 126 | stop_addr = strtoul (cutoff_address_string, NULL, 16); 127 | else 128 | stop_addr = end_of_text; 129 | 130 | func = init_function (malloc (sizeof (function)), main_addr, stop_addr); 131 | } 132 | else 133 | { 134 | printf ("Error: could not find main and no start address specified\n"); 135 | exit (-1); 136 | } 137 | func->next = NULL; 138 | if (follow_calls) 139 | resolve_calls (func); 140 | translate_function_list (func); 141 | 142 | function_list_cleanup (func, 1); //Make sure those operands don't leak 143 | elf_parser_cleanup (); 144 | cs_close (&handle); 145 | } 146 | -------------------------------------------------------------------------------- /src/var.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "var.h" 7 | #include "jump_block.h" 8 | #include "datastructs.h" 9 | 10 | int name_ind = 0; 11 | char last_name [20] = {'a'-1, 0, 0, 0, 12 | 0, 0, 0, 0, 13 | 0, 0, 0, 0, 14 | 0, 0, 0, 0, 15 | 0, 0, 0, 0}; 16 | char c_types [4][10] = {"char\0", "short\0", "int\0", "long long\0"}; 17 | 18 | char* gen_var_name (void) 19 | { 20 | int i = name_ind; 21 | int len; 22 | char* name_buf; 23 | 24 | last_name [i] ++; 25 | 26 | while (last_name [i] > 'z') 27 | { 28 | if (i == 0) //Out of names 29 | { 30 | name_ind ++; //Add one letter 31 | if (name_ind == 19) //We've used far too many names (26^18 names) 32 | { 33 | printf ("Error: Too many variables\n"); 34 | exit (-1); 35 | } 36 | for (i=0; i <= name_ind; i++) 37 | last_name [i] = 'a'; 38 | break; 39 | } 40 | 41 | last_name [i] = 'a'; 42 | i --; 43 | last_name [i] ++; 44 | } 45 | 46 | len = strlen (last_name); 47 | 48 | if (len == 3 && last_name [0] == 'e') //Don't want to confuse with registers, which have the naming pattern exz, x and z being other letters 49 | last_name [0] ++; 50 | if (len == 2 && (last_name [1] == 'x' || last_name [1] == 'h' || last_name [1] == 'l')) //Avoid 16 and 8 bit register names also e.g. ax, cx 51 | last_name [1] ++; 52 | 53 | name_buf = malloc (len+1); 54 | strcpy (name_buf, last_name); 55 | name_buf [len] = 0; //Make sure we null terminate 56 | return name_buf; 57 | } 58 | 59 | var* init_var (var* to_init, cs_x86_op operand) 60 | { 61 | to_init->name = NULL; 62 | if (operand.type == X86_OP_IMM) //constant expression 63 | { 64 | to_init->type = CONST; 65 | to_init->name = malloc (20); //2^64-1 is 20 digits long 66 | bzero (to_init->name, 20); 67 | to_init->loc.disp = operand.imm; 68 | sprintf (to_init->name, constant_format, to_init->loc.disp); 69 | to_init->c_type = NULL; 70 | } 71 | else if (operand.type == X86_OP_REG) //Not a variable of any kind, but an x86 register 72 | { 73 | to_init->name = malloc (MAX_REGNAME); 74 | strcpy (to_init->name, cs_reg_name (handle, operand.reg)); 75 | to_init->type = REG; 76 | } 77 | else 78 | { 79 | if (operand.type == X86_OP_MEM && !operand.mem.base && !operand.mem.index) //Absolute address, i.e. global variable 80 | { 81 | to_init->type = GLOBAL; 82 | to_init->loc.addr = operand.mem.disp; 83 | } 84 | else 85 | { 86 | if (operand.mem.index || (operand.mem.base && operand.mem.base != X86_REG_EBP && operand.mem.base != X86_REG_RBP)) //We're dereferencing a general purpose register 87 | { 88 | to_init->name = malloc (MAX_REGNAME); 89 | if (operand.mem.index) 90 | strcpy (to_init->name, cs_reg_name (handle, operand.mem.index)); 91 | else 92 | strcpy (to_init->name, cs_reg_name (handle, operand.mem.base)); 93 | to_init->type = DEREF; 94 | } 95 | else if (operand.mem.disp < 0) 96 | to_init->type = LOCAL; 97 | else //Should be a parameter otherwise 98 | to_init->type = PARAM; 99 | to_init->loc.disp = operand.mem.disp; 100 | } 101 | } 102 | 103 | to_init->next = NULL; 104 | return to_init; 105 | } 106 | 107 | void search_vars (var* to_check, var* key) 108 | { 109 | if (to_check->type == key->type) 110 | { 111 | if (to_check->type == GLOBAL) 112 | { 113 | if (to_check->loc.addr == key->loc.addr) 114 | key->next = to_check; 115 | } 116 | else if (to_check->type == REG) 117 | { 118 | if (to_check->name [1] == key->name [1]) 119 | key->next = to_check; 120 | } 121 | else if (to_check->type == DEREF) 122 | { 123 | if (to_check->name [1] == key->name [1] && to_check->loc.disp == key->loc.disp) 124 | key->next = to_check; 125 | } 126 | else //Parameter, local, or constant 127 | { 128 | if (to_check->loc.disp == key->loc.disp) 129 | key->next = to_check; 130 | } 131 | } 132 | } 133 | 134 | var* add_var (cs_x86_op operand) 135 | { 136 | var* to_add = init_var (malloc (sizeof (var)), operand); //Generate what the variable would be if it were to be added 137 | 138 | //Check for duplicate variables 139 | if (var_list) 140 | list_loop (search_vars, var_list, var_list, to_add); //Search the variable lists 141 | if (callee_param) 142 | list_loop (search_vars, callee_param, callee_param, to_add); //Search the parameter list for variable 143 | if (global_list) 144 | list_loop (search_vars, global_list, global_list, to_add); 145 | 146 | if (!to_add->next) //Search function will return pointer to first instance of variable if found. Otherwise, it's not a dupe 147 | { 148 | if (to_add->type == REG) 149 | { 150 | if (architecture == ELFCLASS32) 151 | to_add->c_type = c_types [2]; //All registers are 1 word long 152 | else 153 | to_add->c_type = c_types [3]; 154 | to_add->loc.addr = 0; 155 | } 156 | else if (to_add->type == DEREF) 157 | { 158 | switch (operand.size) 159 | { 160 | case 1: 161 | to_add->c_type = c_types [0]; 162 | break; 163 | case 2: 164 | to_add->c_type = c_types [1]; 165 | break; 166 | case 4: 167 | to_add->c_type = c_types [2]; 168 | break; 169 | case 8: 170 | to_add->c_type = c_types [3]; 171 | break; 172 | } 173 | } 174 | else if (to_add->type != CONST) 175 | { 176 | to_add->name = gen_var_name (); //Generate a random variable name for non-constants. Constants' names are just a string representation of the constant. 177 | switch (operand.size) 178 | { 179 | case 1: 180 | to_add->c_type = c_types [0]; 181 | break; 182 | case 2: 183 | to_add->c_type = c_types [1]; 184 | break; 185 | case 4: 186 | to_add->c_type = c_types [2]; 187 | break; 188 | case 8: 189 | to_add->c_type = c_types [3]; 190 | break; 191 | } 192 | } 193 | 194 | if (to_add->type == PARAM) 195 | { 196 | //Add parameter to the callee_param list instead of the variable list 197 | //Parameters appear on the stack in the same order as they appear in a function prototype 198 | //Therefore, we must sort the parameters by displacement to print a correct function prototype 199 | if (callee_param && callee_param->next) 200 | { 201 | var* current = callee_param; 202 | do 203 | { 204 | if (current->next->loc.disp > to_add->loc.disp && current->loc.disp < to_add->loc.disp) 205 | break; 206 | current = current->next; 207 | } while (current->next != callee_param); 208 | link (current, to_add); 209 | } 210 | else if (callee_param) 211 | { 212 | if (to_add->loc.disp < callee_param->loc.disp) 213 | { 214 | var* temp = callee_param; 215 | callee_param = to_add; 216 | link (callee_param, temp); 217 | } 218 | else 219 | link (callee_param, to_add); 220 | } 221 | else 222 | callee_param = to_add; 223 | } 224 | else if (to_add->type == GLOBAL) 225 | link (global_list, to_add); 226 | else 227 | link (var_list, to_add); //Add variable to the list 228 | 229 | return to_add; //Return newly created variable 230 | } 231 | else 232 | { 233 | //Variable is a dupe, cleanup and return the first instance found 234 | var* next = to_add->next; 235 | cleanup_var (to_add); 236 | free (to_add); 237 | return next; //Return the other occurence of the variable in the variable list 238 | } 239 | } 240 | 241 | void cleanup_var (var* to_cleanup) 242 | { 243 | if (to_cleanup->name) 244 | free (to_cleanup->name); 245 | } 246 | 247 | void clean_var_list (var* to_cleanup) 248 | { 249 | list_cleanup (to_cleanup, cleanup_var); 250 | } 251 | -------------------------------------------------------------------------------- /src/var.h: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #pragma once 4 | 5 | #define MAX_REGNAME 3 6 | 7 | enum var_type 8 | { 9 | REG, 10 | GLOBAL, 11 | LOCAL, 12 | PARAM, 13 | CONST, 14 | DEREF 15 | }; 16 | 17 | struct var //Later recycled as a search parameter struct for search_vars 18 | { 19 | char* name; 20 | char* c_type; 21 | enum var_type type; 22 | union 23 | { 24 | unsigned int addr; 25 | int disp; 26 | } loc; //Location in memory of variable or value of constant 27 | struct var* next; //Also used as a return field in search_vars 28 | }; 29 | typedef struct var var; 30 | 31 | extern int name_ind; //Furthest "digit" of name modified 32 | extern char last_name [20]; //Last name generated by gen_var_name 33 | extern char c_types [4][10]; //String representations of data types in a normal C program, e.g. char or int 34 | char constant_format [3]; 35 | 36 | var* var_list; 37 | var* global_list; 38 | var* callee_param; //Linked list of parameters passed to current function 39 | var* caller_param; //Dynamically allocated array filled with copies of the variables being passed to a function called within this function 40 | int num_caller_params; 41 | size_t caller_params_size; 42 | 43 | char* gen_var_name (void); //Generates a unique variable name 44 | var* init_var (var* to_init, cs_x86_op operand); //Translates an operand into a C style variable 45 | var* add_var (cs_x86_op operand); //Add variable to linked list if not a dupe 46 | void search_vars (var* to_check, var* key); 47 | void cleanup_var (var* to_cleanup); 48 | void clean_var_list (var* to_cleanup); 49 | -------------------------------------------------------------------------------- /tests/Makefile: -------------------------------------------------------------------------------- 1 | sys_tests: test arith_test control_flow_test 2 | 3 | sys_tests64: test64 arith_test64 control_flow_test64 4 | 5 | test: test.c 6 | gcc -m32 test.c -o test 7 | arith_test: arith_test.c 8 | gcc -m32 arith_test.c -o arith_test 9 | control_flow_test: control_flow_test.c 10 | gcc -m32 control_flow_test.c -o control_flow_test 11 | test64: test.c 12 | gcc test.c -o test64 13 | arith_test64: arith_test.c 14 | gcc arith_test.c -o arith_test64 15 | control_flow_test64: control_flow_test.c 16 | gcc control_flow_test.c -o control_flow_test64 17 | 18 | clean: 19 | rm test test64 arith_test arith_test64 control_flow_test control_flow_test64 *trial 20 | -------------------------------------------------------------------------------- /tests/arith_test.c: -------------------------------------------------------------------------------- 1 | //Simple static test case for decompilation 2 | 3 | int func1 (int a, int b); 4 | int func2 (int a, int b); 5 | int func3 (int a, int b); 6 | 7 | int main (void) 8 | { 9 | int a, b, c, d, e; 10 | a = 2; 11 | b = 3; 12 | c = func1 (a, b); //Make some random function calls 13 | d = func2 (a, b); 14 | e = func3 (a, b); 15 | return c+d+e; 16 | return 0; 17 | } 18 | 19 | int func1 (int a, int b) //Silly arithmetic 20 | { 21 | return a+b; 22 | } 23 | 24 | int func2 (int a, int b) 25 | { 26 | return b-a; 27 | } 28 | 29 | int func3 (int a, int b) 30 | { 31 | return a*b; 32 | } 33 | -------------------------------------------------------------------------------- /tests/arith_test64_expected: -------------------------------------------------------------------------------- 1 | 2 | int main (void) 3 | { 4 | int a; 5 | int e; 6 | int d; 7 | int c; 8 | register long long esi; 9 | register long long eax; 10 | register long long edx; 11 | int b; 12 | 13 | a = 2; 14 | b = 3; 15 | edx = b; 16 | eax = a; 17 | esi = edx; 18 | edx = eax; 19 | eax = func1 (); 20 | c = eax; 21 | edx = b; 22 | eax = a; 23 | esi = edx; 24 | edx = eax; 25 | eax = func2 (); 26 | d = eax; 27 | edx = b; 28 | eax = a; 29 | esi = edx; 30 | edx = eax; 31 | eax = func3 (); 32 | e = eax; 33 | edx = c; 34 | eax = d; 35 | edx += eax; 36 | eax = e; 37 | eax += edx; 38 | return eax; 39 | } 40 | 41 | 42 | int func3 (void) 43 | { 44 | int f; 45 | register long long r15; 46 | register long long eax; 47 | register long long esi; 48 | int g; 49 | register long long edi; 50 | 51 | f = edi; 52 | g = esi; 53 | eax = f; 54 | eax *= g; 55 | return eax; 56 | r15 = edi; 57 | r15 = (long long)&rip; 58 | } 59 | 60 | 61 | int func2 (void) 62 | { 63 | int h; 64 | register long long eax; 65 | register long long esi; 66 | int i; 67 | register long long edi; 68 | 69 | h = edi; 70 | i = esi; 71 | eax = i; 72 | eax -= h; 73 | return eax; 74 | } 75 | 76 | 77 | int func1 (void) 78 | { 79 | int j; 80 | register long long eax; 81 | register long long esi; 82 | int k; 83 | register long long edi; 84 | 85 | j = edi; 86 | k = esi; 87 | edi = j; 88 | eax = k; 89 | eax += edi; 90 | return eax; 91 | } 92 | 93 | -------------------------------------------------------------------------------- /tests/arith_test_expected: -------------------------------------------------------------------------------- 1 | 2 | int main (void) 3 | { 4 | register int ecx; 5 | int f; 6 | register int edx; 7 | int e; 8 | int d; 9 | register int eax; 10 | int c; 11 | int b; 12 | int a; 13 | 14 | ecx = (int)&esp; 15 | a = 2; 16 | b = 3; 17 | eax = func1 (a, b, *(int*)(ecx+(-4))); 18 | c = eax; 19 | eax = func2 (a, b); 20 | d = eax; 21 | eax = func3 (a, b); 22 | e = eax; 23 | edx = c; 24 | eax = d; 25 | edx += eax; 26 | eax = e; 27 | eax += edx; 28 | ecx = f; 29 | return eax; 30 | } 31 | 32 | 33 | int func3 (int g, int h) 34 | { 35 | register int eax; 36 | 37 | eax = g; 38 | eax *= h; 39 | return eax; 40 | } 41 | 42 | 43 | int func2 (int j, int i) 44 | { 45 | register int eax; 46 | 47 | eax = i; 48 | eax -= j; 49 | return eax; 50 | } 51 | 52 | 53 | int func1 (int k, int l) 54 | { 55 | register int edx; 56 | register int eax; 57 | 58 | edx = k; 59 | eax = l; 60 | eax += edx; 61 | return eax; 62 | } 63 | 64 | -------------------------------------------------------------------------------- /tests/control_flow_test.c: -------------------------------------------------------------------------------- 1 | //Simple static test case for decompilation 2 | 3 | int main (void) 4 | { 5 | int a, b, c, d, e; 6 | a = 1; 7 | b = 10; 8 | label: 9 | c = 11; 10 | d = 0; 11 | 12 | while (a > 0) 13 | { 14 | a --; 15 | if (b) 16 | break; 17 | b ++; 18 | } 19 | 20 | do 21 | { 22 | b --; 23 | if (c) 24 | continue; 25 | c --; 26 | } while (b >= a); 27 | 28 | if (b) 29 | { 30 | if (a) 31 | c = 1; 32 | else 33 | c = 2; 34 | } 35 | else 36 | { 37 | if (!d) 38 | c = 6; 39 | } 40 | 41 | a = 0; 42 | if (a) 43 | a = 2; 44 | else 45 | a = 3; 46 | 47 | while (a < 10) 48 | a ++; 49 | a = 11; 50 | if (c == b) 51 | goto label; 52 | c = 10; 53 | b = c; 54 | 55 | return c; 56 | } 57 | -------------------------------------------------------------------------------- /tests/control_flow_test64_expected: -------------------------------------------------------------------------------- 1 | 2 | int main (void) 3 | { 4 | int a; 5 | register long long edi; 6 | register long long r15; 7 | register long long eax; 8 | int d; 9 | int c; 10 | int b; 11 | 12 | a = 1; 13 | b = 10; 14 | 0x4004c8: 15 | c = 11; 16 | d = 0; 17 | while (a > 0) 18 | { 19 | a -= 1; 20 | if (b != 0) 21 | { 22 | break; 23 | } 24 | b += 1; 25 | } 26 | do 27 | { 28 | b -= 1; 29 | if (c != 0) 30 | { 31 | continue; 32 | } 33 | c -= 1; 34 | eax = b; 35 | } while (eax >= a); 36 | if (b != 0) 37 | { 38 | if (a != 0) 39 | { 40 | c = 1; 41 | } 42 | else 43 | { 44 | c = 2; 45 | } 46 | } 47 | else 48 | { 49 | if (d == 0) 50 | { 51 | c = 6; 52 | } 53 | } 54 | a = 0; 55 | if (a != 0) 56 | { 57 | a = 2; 58 | } 59 | else 60 | { 61 | a = 3; 62 | } 63 | while (a <= 9) 64 | { 65 | a += 1; 66 | } 67 | a = 11; 68 | eax = c; 69 | if (eax == b) 70 | { 71 | goto 0x4004c8; 72 | } 73 | c = 10; 74 | eax = c; 75 | b = eax; 76 | eax = c; 77 | return eax; 78 | r15 = edi; 79 | r15 = (long long)&rip; 80 | } 81 | 82 | -------------------------------------------------------------------------------- /tests/control_flow_test_expected: -------------------------------------------------------------------------------- 1 | 2 | int main (void) 3 | { 4 | int a; 5 | register int eax; 6 | int d; 7 | int c; 8 | int b; 9 | 10 | a = 1; 11 | b = 10; 12 | 0x80483df: 13 | c = 11; 14 | d = 0; 15 | while (a > 0) 16 | { 17 | a -= 1; 18 | if (b != 0) 19 | { 20 | break; 21 | } 22 | b += 1; 23 | } 24 | do 25 | { 26 | b -= 1; 27 | if (c != 0) 28 | { 29 | continue; 30 | } 31 | c -= 1; 32 | eax = b; 33 | } while (eax >= a); 34 | if (b != 0) 35 | { 36 | if (a != 0) 37 | { 38 | c = 1; 39 | } 40 | else 41 | { 42 | c = 2; 43 | } 44 | } 45 | else 46 | { 47 | if (d == 0) 48 | { 49 | c = 6; 50 | } 51 | } 52 | a = 0; 53 | if (a != 0) 54 | { 55 | a = 2; 56 | } 57 | else 58 | { 59 | a = 3; 60 | } 61 | while (a <= 9) 62 | { 63 | a += 1; 64 | } 65 | a = 11; 66 | eax = c; 67 | if (eax == b) 68 | { 69 | goto 0x80483df; 70 | } 71 | c = 10; 72 | eax = c; 73 | b = eax; 74 | eax = c; 75 | return eax; 76 | } 77 | 78 | -------------------------------------------------------------------------------- /tests/do_tests.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | cd tests 4 | 5 | triad arith_test > arith_test_trial 6 | triad control_flow_test > control_flow_test_trial 7 | triad ./test > test_trial 8 | 9 | if [ -n "$(diff arith_test_trial arith_test_expected)" ]; then 10 | echo "ERROR: Failed arithmetic test" 11 | exit -1 12 | fi 13 | 14 | if [ -n "$(diff control_flow_test_trial control_flow_test_expected)" ]; then 15 | echo "ERROR: Failed control flow test" 16 | exit -1 17 | fi 18 | 19 | if [ -n "$(diff test_trial test_expected)" ]; then 20 | echo "ERROR: Failed test" 21 | exit -1 22 | fi 23 | 24 | echo "Round 1 of tests passed" 25 | 26 | cd .. 27 | -------------------------------------------------------------------------------- /tests/do_tests64.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | cd tests 4 | 5 | triad arith_test64 > arith_test64_trial 6 | triad control_flow_test64 > control_flow_test64_trial 7 | triad ./test64 > test64_trial 8 | 9 | if [ -n "$(diff arith_test64_trial arith_test64_expected)" ]; then 10 | echo "ERROR: Failed arithmetic test" 11 | exit -1 12 | fi 13 | 14 | if [ -n "$(diff control_flow_test64_trial control_flow_test64_expected)" ]; then 15 | echo "ERROR: Failed control flow test" 16 | exit -1 17 | fi 18 | 19 | if [ -n "$(diff test64_trial test64_expected)" ]; then 20 | echo "ERROR: Failed test" 21 | exit -1 22 | fi 23 | 24 | echo "Round 2 of tests passed" 25 | 26 | cd .. 27 | -------------------------------------------------------------------------------- /tests/test.c: -------------------------------------------------------------------------------- 1 | //Simple static test case for decompilation 2 | 3 | int main (void) 4 | { 5 | int a = 1; 6 | int b; 7 | 8 | switch (a) 9 | { 10 | case 1: 11 | b = 2; 12 | case 2: 13 | b = 1; 14 | case 3: 15 | b = -6; 16 | case 4: 17 | b = -7; 18 | default: 19 | b = 1024; 20 | } 21 | } 22 | -------------------------------------------------------------------------------- /tests/test64_expected: -------------------------------------------------------------------------------- 1 | Error: invalid jump instruction at 0x4004dd 2 | 3 | int main (void) 4 | { 5 | int a; 6 | register long long edi; 7 | register long long r15; 8 | int b; 9 | register long long eax; 10 | 11 | a = 1; 12 | eax = a; 13 | if (eax != 2) 14 | { 15 | if (eax > 2) 16 | { 17 | if (eax != 1) 18 | { 19 | } 20 | else 21 | { 22 | if (eax != 3) 23 | { 24 | if (eax != 4) 25 | { 26 | } 27 | } 28 | else 29 | { 30 | b = 2; 31 | } 32 | b = 1; 33 | } 34 | b = -6; 35 | } 36 | b = 1024; 37 | return eax; 38 | r15 = edi; 39 | r15 = (long long)&rip; 40 | } 41 | 42 | -------------------------------------------------------------------------------- /tests/test_expected: -------------------------------------------------------------------------------- 1 | Error: invalid jump instruction at 0x80483f4 2 | 3 | int main (void) 4 | { 5 | int a; 6 | int b; 7 | register int eax; 8 | 9 | a = 1; 10 | eax = a; 11 | if (eax != 2) 12 | { 13 | if (eax > 2) 14 | { 15 | if (eax != 1) 16 | { 17 | } 18 | else 19 | { 20 | if (eax != 3) 21 | { 22 | if (eax != 4) 23 | { 24 | } 25 | } 26 | else 27 | { 28 | b = 2; 29 | } 30 | b = 1; 31 | } 32 | b = -6; 33 | } 34 | b = 1024; 35 | return eax; 36 | } 37 | 38 | 39 | int __x86.get_pc_thunk.bx (void) 40 | { 41 | register int ebx; 42 | 43 | ebx = *(int*)(esp+(0)); 44 | } 45 | 46 | --------------------------------------------------------------------------------