├── README.md ├── asm ├── exit.asm ├── hello.asm ├── smc.asm └── sp.asm ├── constants.py ├── go.sh ├── main.py └── util ├── asm_to_str └── gcc_asm_to_elf /README.md: -------------------------------------------------------------------------------- 1 | # elf-from-scratch 2 | 3 | This was a challenge to myself: build an ELF64 file from scratch without using 4 | a toolchain (other than an assembler to get raw `x86_64` instructions) 5 | 6 | [main.py](main.py) will write to stdout an executable ELF64 file with a single program header 7 | table entry, and no sections. 8 | 9 | [main.py](main.py) currently hardcodes the program to write- it's "hello world". 10 | See [asm/hello.asm](asm/hello.asm) for a program listing. 11 | 12 | [go.sh](go.sh) will run the main script, call `readelf -a` on the output, and run the 13 | ELF64 file. 14 | 15 | Here's what `readelf` has to say about the produced ELF: 16 | 17 | ELF Header: 18 | Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 19 | Class: ELF64 20 | Data: 2's complement, little endian 21 | Version: 1 (current) 22 | OS/ABI: UNIX - System V 23 | ABI Version: 0 24 | Type: EXEC (Executable file) 25 | Machine: Advanced Micro Devices X86-64 26 | Version: 0x1 27 | Entry point address: 0x400078 28 | Start of program headers: 64 (bytes into file) 29 | Start of section headers: 0 (bytes into file) 30 | Flags: 0x0 31 | Size of this header: 64 (bytes) 32 | Size of program headers: 56 (bytes) 33 | Number of program headers: 1 34 | Size of section headers: 0 (bytes) 35 | Number of section headers: 0 36 | Section header string table index: 0 37 | 38 | There are no sections in this file. 39 | 40 | There are no sections to group in this file. 41 | 42 | Program Headers: 43 | Type Offset VirtAddr PhysAddr 44 | FileSiz MemSiz Flags Align 45 | LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 46 | 0x000000000000002f 0x000000000000002f RWE 0x4 47 | 48 | There is no dynamic section in this file. 49 | 50 | There are no relocations in this file. 51 | 52 | The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported. 53 | 54 | Dynamic symbol information is not available for displaying symbols. 55 | 56 | No version information found in this file. 57 | 58 | # TODO 59 | 60 | * Why does my program header table entry have to have an offset of 0? I get segfaults otherwise :( 61 | - Might be due to [this][1] - seems like `p_offset` must be page aligned? 62 | 63 | [1]: https://stackoverflow.com/questions/5104060/elf-program-header-offset 64 | 65 | # references 66 | 67 | Links I found useful while making this: 68 | 69 | - [ELF-64 Object File Format](https://www.uclibc.org/docs/elf-64-gen.pdf) 70 | - [Linux X86_64 syscalls table](http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/) 71 | 72 | # useful commands 73 | 74 | see [util/](util/) folder for handy scripts using the below commands. 75 | 76 | nasm assembler 77 | 78 | # compile only x86_64 instructions 79 | nasm -f bin -o out.bin in.asm 80 | 81 | # compile to linkable elf 82 | nasm -f elf64 -o out.o in.asm 83 | 84 | link `out.o` file without stdlib and gcc cruft: 85 | 86 | gcc out.o -o out.elf -nostartfiles -nostdinc -nostdlib 87 | -------------------------------------------------------------------------------- /asm/exit.asm: -------------------------------------------------------------------------------- 1 | BITS 64 ; 64-bit mode 2 | 3 | SECTION .text 4 | global main 5 | 6 | main: 7 | mov rax, 60 ; exit 8 | mov rdi, 0 ; success 9 | syscall; 10 | -------------------------------------------------------------------------------- /asm/hello.asm: -------------------------------------------------------------------------------- 1 | BITS 64 ; 64-bit mode 2 | 3 | SECTION .text 4 | global main 5 | 6 | main: 7 | ; put "hello world\n" on the stack 8 | ; a .data section? what's that? 9 | mov rax, 0x0a646c72 10 | push rax 11 | mov rax, 0x6f77206f6c6c6568 12 | push rax 13 | 14 | ; write(1, rsp, 16) 15 | mov rax, 1 ; write 16 | mov rdi, 1 ; fd = stdout 17 | mov rsi, rsp ; buf = rsp 18 | mov rdx, 12 ; len("hello world\n") 19 | syscall; 20 | 21 | ; exit(0) 22 | mov rax, 60 ; exit 23 | mov rdi, r10 ; success 24 | syscall 25 | -------------------------------------------------------------------------------- /asm/smc.asm: -------------------------------------------------------------------------------- 1 | BITS 64 ; 64-bit mode 2 | 3 | SECTION .text 4 | global main 5 | 6 | ; some silly self modifying code. 7 | ; prints "hur", then changes itself to print "dur", then exits. 8 | ; 9 | ; NOTE: extremely brittle, because it bakes in exact addresses when doing 10 | ; self-modification. Nasm probably has a way to make this nicer? 11 | ; 12 | ; When using elf-from-scratch, main will be loaded into memory at 0x400078. 13 | main: 14 | ; put "hur\n" on the stack 15 | mov rax, 0x0a727568 16 | push rax 17 | 18 | ; Print the string stored at RSP 19 | mov rax, 1 ; write 20 | mov rdi, 1 ; fd = stdout 21 | mov rsi, rsp ; buf = rsp 22 | mov rdx, 4 ; len("hur\n") 23 | syscall; 24 | 25 | ; Pop the stack 26 | ; NOTE: on lines 37+38, this & the following instruction are overwritten 27 | ; with a 2-byte relative JMP to line 44 (the exit syscall @ "end"). 28 | pop rax 29 | 30 | ; modify code (x86-64 encodes the 4-byte constant of the first instruction @ 31 | ; address main+1, so we modify the first byte to be "d" instead of "h". 32 | mov byte [0x400079], 0x64 ; WARNING: more baked in offsets 33 | 34 | ; modify "pop rax" to instead jump to end. 35 | ; here we directly write the instruction encoding: 36 | ; * 0xEB for JMP relative, 37 | ; * 0x19 for 8-bit relative offset 38 | ; see here for reference: https://www.felixcloutier.com/x86/jmp 39 | mov byte [0x400092], 0xEB ; opcode for relative jump 40 | mov byte [0x400093], 0x19 ; relative offset (calculated using gdb) 41 | 42 | ; now jump back to main and execute the new (modified) code: 43 | ; * print "dur" 44 | ; * jump directly to end 45 | jmp main 46 | 47 | end: 48 | ; exit(0) 49 | mov rax, 60 ; exit 50 | mov rdi, r10 ; success 51 | syscall 52 | -------------------------------------------------------------------------------- /asm/sp.asm: -------------------------------------------------------------------------------- 1 | BITS 64 ; 64-bit mode 2 | 3 | SECTION .text 4 | global main 5 | 6 | main: 7 | mov rax, 60 ; exit 8 | mov rdi, rsp ; rsp value 9 | syscall; 10 | -------------------------------------------------------------------------------- /constants.py: -------------------------------------------------------------------------------- 1 | """ A definitely-not-complete list of some useful constants for generating an 2 | ELF64 file 3 | """ 4 | 5 | ET_EXEC = 2 6 | 7 | EV_CURRENT = 1 8 | 9 | # see https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf 10 | EM_X86_64 = 62 11 | 12 | SHN_UNDEF=0 13 | 14 | PT_LOAD=1 15 | 16 | PF_X = 0x1 17 | PF_W = 0x2 18 | PF_R = 0x4 19 | -------------------------------------------------------------------------------- /go.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | ./main.py > out.elf 4 | readelf -a out.elf 5 | chmod +x out.elf 6 | ./out.elf 7 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | """ TODO 4 | 1. Why phdr has to include ELF file?? 5 | """ 6 | 7 | import numbers 8 | from constants import * 9 | 10 | DEFAULT_ENTRY_POINT = 0x400000 11 | 12 | # ELF HEADER 13 | # ---------------- 14 | # Prog Hdr table 15 | # ---------------- 16 | # Segment 17 | # ---------------- 18 | # ... 19 | # ---------------- 20 | # Section Hdr Tbl 21 | 22 | def uint(n, x): 23 | return x.to_bytes(n, 'little') 24 | 25 | ELF64_TYPE_SIZE = { 26 | 'addr': 8, # 8 unsigned program address 27 | 'off': 8, # 8 unsigned file offset 28 | 'half': 2, # 2 unsigned medium integer 29 | 'word': 4, # 4 unsigned integer 30 | 'sword': 4, # 4 signed integer 31 | 'xword': 8, # 8 unsigned long integer 32 | 'sxword': 8, # 8 signed long integer 33 | 'unsigned_char': 1, # 1 unsigned small integer 34 | } 35 | 36 | def to_size(elf64_type): 37 | result = ELF64_TYPE_SIZE.get(elf64_type, elf64_type) 38 | if not isinstance(result, numbers.Number): 39 | raise Exception("{} is not a size".format(elf64_type)) 40 | return result 41 | 42 | def serialize(x, elf64_type): 43 | """ elf64_type is either: 44 | str in ELF64_TYPE_SIZE.keys() 45 | number (length in bytes) 46 | """ 47 | size = to_size(elf64_type) 48 | 49 | if isinstance(x, numbers.Number): 50 | return x.to_bytes(size, 'little') 51 | 52 | elif isinstance(x, bytes): 53 | if len(x) != size: 54 | raise Exception("data length {} != type {} length of {}".format(len(x), elf64_type, size)) 55 | return x 56 | 57 | raise Exception("can't serialize {} of type {}".format(x, elf64_type)) 58 | 59 | ELF64_EHDR = [ 60 | ('e_ident', 16), # ELF identification (actually 16-length array of unsigned char) 61 | ('e_type', 'half'), # Object file type 62 | ('e_machine', 'half'), # Machine type 63 | ('e_version', 'word'), # Object file version 64 | ('e_entry', 'addr'), # Entry point address 65 | ('e_phoff', 'off'), # Program header offset 66 | ('e_shoff', 'off'), # Section header offset 67 | ('e_flags', 'word'), # Processor-specific flags 68 | ('e_ehsize', 'half'), # ELF header size 69 | ('e_phentsize', 'half'), # Size of program header entry 70 | ('e_phnum', 'half'), # Number of program header entries 71 | ('e_shentsize', 'half'), # Size of section header entry 72 | ('e_shnum', 'half'), # Number of section header entries 73 | ('e_shstrndx', 'half'), # Section name string table index 74 | ] 75 | 76 | def e_ident(): 77 | return b''.join([ 78 | b'\x7FELF', # magic 79 | b'\x02', # ELF64 80 | b'\x01', # Little-endian 81 | b'\x01', # EI_ABIVERSION 82 | b'\x00', # ELFOSABI_HPUX ?? # ELFOSABI_SYSV (TODO \x04 instead? LINUX?) 83 | b'\x00', # epends on EI_ABIVERSION 84 | b'\x00' * 6, # Padding 85 | b'\x00', # size of e_ident ?? 86 | ]) 87 | 88 | def serialize_struct(layout, values): 89 | result = b'' 90 | for key, field_type in layout: 91 | result += serialize(values[key], field_type) 92 | return result 93 | 94 | def struct_len(layout): 95 | return sum(to_size(t) for _, t in layout) 96 | 97 | def elf64_ehdr(): 98 | # TODO: why does this have to happen :[ 99 | prefix_size = struct_len(ELF64_EHDR) + struct_len(ELF64_PHDR) 100 | entry_point = DEFAULT_ENTRY_POINT + prefix_size # virtual addr 101 | 102 | phlen = len(program_header()) # quicc check 103 | e_ehsize = struct_len(ELF64_EHDR) # size of ELF64 header 104 | e_phoff = struct_len(ELF64_EHDR) # prog header offset 105 | e_shoff = 0 # no sections 106 | 107 | return serialize_struct(ELF64_EHDR, dict( 108 | e_ident=e_ident(), 109 | e_type=ET_EXEC, 110 | e_machine=EM_X86_64, 111 | e_version=EV_CURRENT, 112 | e_entry=entry_point, # virtual addr entry point 113 | e_phoff=e_phoff, # file offset for program header 114 | e_shoff=e_shoff, # File offset for section header 115 | e_flags=0, # processor specific flags 116 | e_ehsize=e_ehsize, # size in bytes of ELF header. (constant!) 117 | e_phentsize=phlen, # size in bytes of prog header table entry 118 | e_phnum=1, # num entries in prog header 119 | e_shentsize=0, # size in bytes of section header table entry 120 | e_shnum=0, # num entries in section header 121 | e_shstrndx=SHN_UNDEF # no section name strings table 122 | )) 123 | 124 | ELF64_PHDR = [ 125 | ('p_type', 'word'), # Type of segment 126 | ('p_flags', 'word'), # Segment attributes 127 | ('p_offset', 'off'), # Offset in file 128 | ('p_vaddr', 'addr'), # Virtual address in memory 129 | ('p_paddr', 'addr'), # Reserved 130 | ('p_filesz', 'xword'), # Size of segment in file 131 | ('p_memsz', 'xword'), # Size of segment in memory 132 | ('p_align', 'xword'), # Alignment of segment 133 | ] 134 | 135 | def program_header(code=b''): 136 | prefix_size = struct_len(ELF64_EHDR) + struct_len(ELF64_PHDR) 137 | p_vaddr = DEFAULT_ENTRY_POINT 138 | p_paddr = p_vaddr 139 | p_offset = 0 # load whole ELF file. TODO: why can't I just load the code? :( 140 | p_filesz = len(code) 141 | p_memsz = len(code) 142 | return serialize_struct(ELF64_PHDR, dict( 143 | p_type=PT_LOAD, 144 | p_flags=PF_R | PF_W | PF_X, 145 | p_offset=p_offset, # Offset in bytes from start of file 146 | p_vaddr=p_vaddr, # virtual address in memory 147 | p_paddr=p_paddr, # unused? reserved for physical addressing 148 | p_filesz=p_filesz, # size in bytes of *file image* of segment 149 | p_memsz=p_memsz, # size in bytes of *memory image* of segment 150 | p_align=0x4 # alignment (136/4) 151 | )) 152 | 153 | # awesome syscalls table: 154 | # http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/ 155 | 156 | # Hello world program 157 | # see asm_to_str script 158 | code = b'\xb8rld\nPH\xb8hello woP\xb8\x01\x00\x00\x00\xbf\x01\x00\x00\x00H\x89\xe6\xba\x0c\x00\x00\x00\x0f\x05\xb8<\x00\x00\x00L\x89\xd7\x0f\x05' 159 | 160 | if __name__ == "__main__": 161 | import sys 162 | data = b''.join([ 163 | elf64_ehdr(), 164 | program_header(code), 165 | code 166 | ]) 167 | # print(len(data)) 168 | sys.stdout.buffer.write(data) 169 | -------------------------------------------------------------------------------- /util/asm_to_str: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | nasm -f bin -o $1.bin $1 4 | cat $1.bin | python3 -c 'import sys; print(sys.stdin.buffer.read())' 5 | -------------------------------------------------------------------------------- /util/gcc_asm_to_elf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | nasm -f elf64 -o $1.o $1 4 | gcc $1.o -o $1.elf -nostartfiles -nostdinc -nostdlib 5 | --------------------------------------------------------------------------------