├── .gitignore ├── Makefile ├── README.md ├── add-eax-ebx.md ├── bibliography.md ├── global-structure.md ├── intel-manual-format.md ├── introduction.md ├── mov-al-1.asm ├── mov-al-1.md ├── mov-ax-1.asm ├── mov-ax-1.md ├── mov-eax-1.asm ├── mov-eax-1.md ├── mov-eax-address.asm ├── mov-eax-address.md ├── mov-eax-ebx.asm ├── mov-eax-ebx.md ├── mov-eax-memory.asm ├── mov-eax-memory.md ├── mov-ebx-1.asm ├── mov-ebx-1.md ├── nop.asm ├── nop.md ├── push-eax.asm └── push-eax.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.bin 2 | *.hd 3 | *.o 4 | test.asm 5 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .POSIX: 2 | 3 | BIN_EXT ?= .bin 4 | IN_EXT ?= .asm 5 | OBJ_EXT ?= .o 6 | OUT_EXT ?= .hd 7 | 8 | INS := $(wildcard *$(IN_EXT)) 9 | OUTS := $(patsubst %$(IN_EXT),%$(OUT_EXT),$(INS)) 10 | 11 | .PHONY: all clean run 12 | .PRECIOUS: %$(BIN_EXT) %$(OBJ_EXT) 13 | 14 | all: $(OUTS) 15 | 16 | %$(OUT_EXT): %$(BIN_EXT) 17 | od -An -tx1 '$<' | tail -c+2 > '$@' 18 | 19 | %$(BIN_EXT): %$(OBJ_EXT) 20 | objcopy -O binary --only-section=.text '$<' '$@' 21 | 22 | %$(OBJ_EXT): %$(IN_EXT) 23 | nasm -f elf32 -o '$@' '$<' 24 | @# For raw 16 bit. Would need to remove the objcopy step. 25 | @#nasm -f bin -o '$@' '$<' 26 | 27 | clean: 28 | rm -f *$(BIN_EXT) *$(OBJ_EXT) *$(OUT_EXT) 29 | 30 | run: all 31 | tail -n+1 *$(OUT_EXT) 32 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # x86 Instruction Encoding Tutorial 2 | 3 | 1. [Introduction](introduction.md) 4 | 1. [Global structure](global-structure.md) 5 | 1. [Intel manual format](intel-manual-format.md) 6 | 1. Examples 7 | 1. [nop](nop.md) 8 | 1. [push eax](push-eax.md) 9 | 1. [mov eax, 1](mov-eax-1.md) 10 | 1. [mov ebx, 1](mov-ebx-1.md) 11 | 1. [mov ax, 1](mov-ax-1.md) 12 | 1. [mov al, 1](mov-al-1.md) 13 | 1. [mov eax, ebx](mov-eax-ebx.md) 14 | 1. [mov eax, address](mov-eax-address.md) 15 | 1. [mov eax, [memory]](mov-eax-memory.md) 16 | 1. [Bibliography](bibliography.md) 17 | -------------------------------------------------------------------------------- /add-eax-ebx.md: -------------------------------------------------------------------------------- 1 | # add eax, ebx 2 | 3 | Output: 4 | 5 | 01 d8 6 | ^^ ^^ 7 | 8 | 1. Opcode 9 | 1. ModR/M 10 | 11 | Opcode bits: 12 | 13 | 0 0 0 0 0 0 0 1 14 | ^^^^^^^^^^^ ^ ^ 15 | 1 2 3 16 | 17 | 1. This is an add. 18 | 2. Add REG to R/M as represented on the ModR/M byte. Otherwise, other way around. 19 | 3. 32-bit operands. Otherwise, 8-bit. 20 | 21 | ModR/M bits: 22 | 23 | 1 1 0 1 1 0 0 0 24 | ^^^ ^^^^^ ^^^^^ 25 | 1 2 3 26 | 27 | 1. MOD = 3: REG and R/M are registers. 28 | 2. REG = 3: EBX 29 | 3. REG = 0: EAX 30 | 31 | So from the opcode, we move REG (EBX) into R/M (EAX). 32 | 33 | Note that two encodings are possible on reg / reg operations: we could swap the before last bit to 1 and both registers. 34 | 35 | Both possible encodings are documented on the instruction table: 36 | 37 | 01 /r ADD r/m32, r32 38 | 03 /r ADD r32, r/m32 39 | 40 | `/r` says that a MOdR/M follows the opcode, and that the 2 last bits describe it. 41 | -------------------------------------------------------------------------------- /bibliography.md: -------------------------------------------------------------------------------- 1 | # Bibliography 2 | 3 | - Intel® 64 and IA-32 Architectures Software Developer’s Manua 4 | 5 | - section 2.1: binary serialization 6 | - section 3.1: documentation format 7 | 8 | - 9 | 10 | - 11 | 12 | - 13 | 14 | - 15 | -------------------------------------------------------------------------------- /global-structure.md: -------------------------------------------------------------------------------- 1 | # Global structure 2 | 3 | Legend: `X-Y: description`, where `X` is the minimum, and `Y` is the maximum number of bytes. 4 | 5 | - 0-4: instruction prefixes 6 | - 1-4: opcode 7 | - 0-1: ModR/M 8 | - 0-1: SIB 9 | - 0-4: displacement 10 | - 0-4: immediate 11 | 12 | The most interesting bytes to start learning are the opcode and ModR/M. 13 | 14 | ## Opcode 15 | 16 | Says which instruction is being run. 17 | 18 | Sometimes, this can be further decomposed into smaller parts which say what is the source of data. E.g. [push ebp](push-ebp.asm), documented in the manual as `+rd`. 19 | 20 | ## ModR/M 21 | 22 | Says where data is being moved to. Bits: 23 | 24 | 0 1 2 3 4 5 6 7 25 | ^^^ ^^^^^ ^^^^^ 26 | 1 2 3 27 | 28 | 1. MOD 29 | 30 | Determines how the next fields are interpreted. 31 | 32 | - 00: Indirect addressing mode. 33 | - 01: Same as 00 but a 8-bit displacement is added to the value before dereferencing. 34 | - 10: same as 01 but a 32-bit displacement is added to the value. 35 | - 11: Reg and R/M byte will each refer to a register. 36 | 37 | 2. REG 38 | 39 | - 000 (0): EAX (AX if data size is 16 bits, AL if data size is 8 bits) 40 | - 001 (1): ECX/CX/CL 41 | - 010 (2): EDX/DX/DL 42 | - 011 (3): EBX/BX/BL 43 | - 100 (4): ESP/SP (AH if data size is defined as 8 bits) 44 | - 101 (5): EBP/BP (CH if data size is defined as 8 bits) 45 | - 110 (6): ESI/SI (DH if data size is defined as 8 bits) 46 | - 111 (7): EDI/DI (BH if data size is defined as 8 bits) 47 | 48 | 3. R/M 49 | 50 | ## Prefixes 51 | 52 | ### 66 53 | 54 | If given while on 16 bit mode, treat the memory as 32 bit. 55 | 56 | If given while on 32 bit mode, treat the memory as 16 bit. 57 | -------------------------------------------------------------------------------- /intel-manual-format.md: -------------------------------------------------------------------------------- 1 | # Intel manual format 2 | 3 | How the Intel manual documents the instruction encodings. 4 | 5 | - Opcode 6 | - Instruction 7 | - Op / En 8 | - 64-Bit Mode 9 | - Compat / Leg Mode 10 | - `CPUID` feature flag 11 | - Description 12 | 13 | They are explained in section 3.1. 14 | 15 | ### Instruction 16 | 17 | E.g.: 18 | 19 | XCHG EAX, r32 20 | 21 | Means: takes 2 arguments: 22 | 23 | - `EAX`: TODO 24 | - `r32`: a 32-bit register 25 | 26 | Other important values: 27 | 28 | - `r/m32`: either a 32-bit register or RAM Memory 29 | - `imm32`: value directly encoded on memory 30 | 31 | ### Op/En 32 | 33 | ### Operand Encoding 34 | 35 | Refers to an entry on the "Instruction Operand Encoding" table. 36 | 37 | Every instruction has it's own "Instruction Operand Encoding" table. 38 | 39 | TODO understand an operand encoding table, e.g. for `mov`. 40 | 41 | ### CPUID feature flag 42 | 43 | Which version of CPU support the feature as reported by CPUID. 44 | 45 | ### Compat / Leg Mode 46 | 47 | - valid 48 | - invalid: can be encoded, but generates an exception 49 | - N.E.: not encodable 50 | 51 | ### 64-bit mode 52 | 53 | - V: Supported. 54 | - I: Not supported. 55 | - N.E.: instruction syntax is not encodable in 64-bit mode (it may represent part of a sequence of valid instructions in other modes). 56 | - N.P.: REX prefix does not affect the legacy instruction in 64-bit mode. 57 | - N.I.: opcode is treated as a new instruction in 64-bit mode. 58 | - N.S.: requires an address override prefix in 64-bit mode and is not supported. Using an address override prefix in 64-bit mode may result in model-specific execution behavior 59 | -------------------------------------------------------------------------------- /introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Convert all assembly inputs `.asm` into decompiled hexdump `.hd`: 4 | 5 | sudo apt-get install nasm 6 | make run 7 | 8 | Prerequisites: basics of how x86 assembly works, Intel syntax. 9 | 10 | More assembly info at: 11 | 12 | To learn, rotate quickly between: 13 | 14 | - the examples 15 | - the general instruction organization 16 | - the Intel manual 17 | 18 | until your brain starts to absorb them. 19 | -------------------------------------------------------------------------------- /mov-al-1.asm: -------------------------------------------------------------------------------- 1 | mov al, 1 2 | -------------------------------------------------------------------------------- /mov-al-1.md: -------------------------------------------------------------------------------- 1 | # mov al, 1 2 | 3 | Output: 4 | 5 | b0 01 6 | 7 | Intel manual says: 8 | 9 | B8 +rd id 10 | 11 | Obviously correct from previous examples. 12 | -------------------------------------------------------------------------------- /mov-ax-1.asm: -------------------------------------------------------------------------------- 1 | mov ax, 1 2 | -------------------------------------------------------------------------------- /mov-ax-1.md: -------------------------------------------------------------------------------- 1 | # mov ax, 1 2 | 3 | Output: 4 | 5 | 66 b8 01 00 6 | ^^ ^^ ^^^^^ 7 | 1 2 3 8 | 9 | 1. 66 prefix: indicates that instead of 32-bit memory, this instruction uses 16 bit memory. 10 | 2. Same as `mov eax, 1`. 11 | 3. Same as `mov eax, 1`, but 16 bit only. 12 | 13 | Intel manual says: `mov eax, 1`, which is analogous to 14 | -------------------------------------------------------------------------------- /mov-eax-1.asm: -------------------------------------------------------------------------------- 1 | mov eax, 1 2 | -------------------------------------------------------------------------------- /mov-eax-1.md: -------------------------------------------------------------------------------- 1 | # mov eax, 1 2 | 3 | Basic mov immediate instruction. 4 | 5 | Output: 6 | 7 | b8 01 00 00 00 8 | ^^ ^^^^^^^^^^^ 9 | 1 2 10 | 11 | 1. Opcode 12 | 2. Immediate value: `1` in little endian 13 | 14 | Opcode bits: 15 | 16 | 1 0 1 1 1 0 0 0 17 | ^^^^^^^^^ ^^^^^ 18 | 1 2 19 | 20 | 1. What to do. 21 | 2. Where to move to. `000` is `eax`. 22 | 23 | Intel documentation says: 24 | 25 | - Opcode: `B8 + rd id`. 26 | 27 | `+rd` says that the 3 bits at the end are the destination register. 28 | 29 | `id` says that a double word immediate follows. 30 | 31 | - Op/En: `OI`. 32 | 33 | The "Instruction Operand Encoding" table for `mov` and `OI` says: 34 | 35 | Operand 1: `opcode + rd (w)` 36 | Operand 2: `imm8/16/32/64` 37 | -------------------------------------------------------------------------------- /mov-eax-address.asm: -------------------------------------------------------------------------------- 1 | mov eax, address 2 | address: 3 | db 0xFF 4 | -------------------------------------------------------------------------------- /mov-eax-address.md: -------------------------------------------------------------------------------- 1 | # mov eax, address 2 | 3 | Output: 4 | 5 | b8 05 00 00 00 ff 6 | ^^^^^^^^^^^^^^ ^^ 7 | 8 | 1. Moving an immediate of value 5 to `eax` 9 | 2. The hardcoded `ff` byte we point to. 10 | 11 | If we do `objdump -Sr` on the object file, we see: 12 | 13 | 14 | 00000000 : 15 | 0: b8 05 00 00 00 mov $0x5,%eax 16 | 1: R_386_32 .text 17 | 18 | The rellocation applied is `R_386_32`. 19 | -------------------------------------------------------------------------------- /mov-eax-ebx.asm: -------------------------------------------------------------------------------- 1 | mov eax, ebx 2 | -------------------------------------------------------------------------------- /mov-eax-ebx.md: -------------------------------------------------------------------------------- 1 | # mov eax, ebx 2 | 3 | Output: 4 | 5 | 89 d8 6 | ^^ ^^ 7 | 1 2 8 | 9 | 1. Opcode 10 | 1. ModR/M 11 | 12 | Opcode bits: 13 | 14 | 1 0 0 0 1 0 0 1 15 | ^^^^^^^^^^^ ^ ^ 16 | 1 2 3 17 | 18 | 1. This is a `mov`. 19 | 2. Move REG to R/M as represented on the ModR/M byte. Otherwise, other way around. 20 | 3. 32-bit operands. Otherwise, 8-bit. 21 | 22 | ModR/M bits: 23 | 24 | 1 1 0 1 1 0 0 0 25 | ^^^ ^^^^^ ^^^^^ 26 | 1 2 3 27 | 28 | 1. MOD = 3: REG and R/M are registers. 29 | 2. REG = 3: EBX 30 | 3. REG = 0: EAX 31 | 32 | So from the opcode, we move REG (EBX) into R/M (EAX). 33 | 34 | Note that two encodings are possible on reg / reg operations: we could swap the before last bit to 1 and both registers. 35 | 36 | Both possible encodings are documented on the instruction table: 37 | 38 | 01 /r MOV r/m32, r32 39 | 03 /r MOV r32, r/m32 40 | 41 | `/r` says that a MOdR/M follows the opcode, and that the 2 last bits describe it. 42 | -------------------------------------------------------------------------------- /mov-eax-memory.asm: -------------------------------------------------------------------------------- 1 | mov eax, [memory] 2 | memory: 3 | db 0xFF 4 | -------------------------------------------------------------------------------- /mov-eax-memory.md: -------------------------------------------------------------------------------- 1 | # mov eax, [memory] 2 | 3 | TODO 4 | -------------------------------------------------------------------------------- /mov-ebx-1.asm: -------------------------------------------------------------------------------- 1 | mov ebx, 1 2 | -------------------------------------------------------------------------------- /mov-ebx-1.md: -------------------------------------------------------------------------------- 1 | # mov ebx, 1 2 | 3 | Compare with [mov eax, 1](mov-eax-1.md) to see how `ebx` is encoded. 4 | 5 | Output: 6 | 7 | bb 01 00 00 00 8 | 9 | Bits of opcode: 10 | 11 | 1 0 1 1 1 0 1 1 12 | ^^^^^^^^^ ^^^^^ 13 | 1 2 14 | 15 | Field 2 contains `3` which corresponds to `ebx` as expected. 16 | -------------------------------------------------------------------------------- /nop.asm: -------------------------------------------------------------------------------- 1 | nop 2 | -------------------------------------------------------------------------------- /nop.md: -------------------------------------------------------------------------------- 1 | # nop 2 | 3 | `0x90` is the simple form. 4 | 5 | But also has other multi-byte forms that can be used for alignment. 6 | -------------------------------------------------------------------------------- /push-eax.asm: -------------------------------------------------------------------------------- 1 | push eax 2 | -------------------------------------------------------------------------------- /push-eax.md: -------------------------------------------------------------------------------- 1 | # push eax 2 | 3 | Output: 4 | 5 | 50 6 | 7 | Which is a single opcode. 8 | 9 | The opcode can be further decomposed into the following bits: 10 | 11 | 0 1 0 1 0 0 0 0 12 | ^^^^^^^^^ ^^^^^ 13 | 1 2 14 | 15 | 1. This is a `push` instruction. 16 | 2. From where we will push. `000` is `eax`. 17 | 18 | This is documented as: opcode == `50+rd` in the Intel manual. The `+rd` part says that the 3 last bits indicate where to push from. 19 | --------------------------------------------------------------------------------