├── LICENSE
├── README.md
└── x64.h


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Martin Cohen
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # x64
  2 | 
  3 | x64 assembler library in C.
  4 | 
  5 | # About
  6 | 
  7 | The goal is to be able to comfortably generate x64 native code to memory and then run it. Useful for runtime code generation, compilers, code optimizations.
  8 | 
  9 | Made as part of learning how x64 encoding works. Library is set out to produce same code as `ML64` currently, since it's the assembler used for generating tests. The `GCC` one does prefer slightly different variants of some byte encodings. I'm trying to keep track of these so I will be able to generate tests from `GCC` too.
 10 | 
 11 | # x64 instructions
 12 | 
 13 | - `mov`
 14 | - `add`
 15 | - `sub`
 16 | - `and`
 17 | - `or`
 18 | - `xor`
 19 | - `push`
 20 | - `pop`
 21 | - `ret`
 22 | 
 23 | Features:
 24 | 
 25 | - All `reg/mem`, `mem/reg`, `mem/i` and `reg/i` supported.
 26 | - `REX` prefix generated only when needed.
 27 | - Absolute addressing mode supported.
 28 | - Relative addressing mode supported.
 29 | - `RBP` indexing supported.
 30 | - GCC-style `RSP` indexing supported (as long as scale is 1).
 31 | - Always choosing the shortest byte sequence as long as the result is the same. (This will need some more testing, though.)
 32 | 
 33 | Missing:
 34 | 
 35 | - Opcodes working just with `RAX` with `reg/*` not supported yet.
 36 | 
 37 | # Usage
 38 | 
 39 | **Still under development with some major changes pending.** Released out for people to use it early should they need it. Library is developed in my private repository and each major change is pushed here. I'll eventually move to this repo for further development.
 40 | 
 41 | ```c
 42 | // Binary
 43 | X64Inst x64_mov(X64Size size, X64Operand D, X64Operand S);
 44 | X64Inst x64_add(X64Size size, X64Operand D, X64Operand S);
 45 | X64Inst x64_sub(X64Size size, X64Operand D, X64Operand S);
 46 | X64Inst x64_and(X64Size size, X64Operand D, X64Operand S);
 47 | X64Inst x64_or (X64Size size, X64Operand D, X64Operand S);
 48 | X64Inst x64_xor(X64Size size, X64Operand D, X64Operand S);
 49 | 
 50 | // Unary
 51 | X64Inst x64_pop(X64Operand S);
 52 | X64Inst x64_push(X64Operand S);
 53 | 
 54 | // Nullary
 55 | X64Inst x64_ret();
 56 | ```
 57 | 
 58 | Functions return `X64Inst` structure, which is just a static buffer with `.bytes` and `.count`. The structure also contains `.error` string which is set in case there was an error processing. This behavior will eventually change to a proper buffered writer and custom error handler.
 59 | 
 60 | Where `X64Size size` denotes intent of size on the operation. In case it is not possible to satisfy the size and operands for given instruction, error is returned via `XInst.error`.
 61 | 
 62 | ```c
 63 | enum X64Size {
 64 |     X64_S8
 65 |     X64_S16
 66 |     X64_S32
 67 |     X64_S64
 68 | };
 69 | ```
 70 | 
 71 | `X64Operand` can be constructed either directly or via three helper functions:
 72 | 
 73 | - Register operand:
 74 |   - `X64Operand x64r(X64Reg reg)`
 75 | - Memory expression operand:
 76 |   - `X64Operand x64m(X64Reg base, X64Reg index, X64Scale scale, uint64_t displacement)`
 77 | - Immediate (constant) operand:
 78 |   - `X64Operand x64i(uint64_t imm)`
 79 | 
 80 | # TODO
 81 | 
 82 | - Obviously more instructions.
 83 |   - Priority on instructions that map to C-like language expressions (arithmetics, calls)
 84 |   - Floating point (via SSE+).
 85 |   - Vector operations.
 86 | - Some API changes regarding how `size` argument is used.
 87 | - Use proper buffer writer with user callback.
 88 | - Use user callback for error handling.
 89 | 
 90 | # Testing
 91 | 
 92 | There's a test generator for all of the instruction with all happy-path possibilities. Additionally I'm adding test for error cases. The tests are not yet part of the release, but will be pushed out soon.
 93 | 
 94 | Tests are working as follows:
 95 | 
 96 | 1. First we generate all permutations of all valid arguments.
 97 | 2. Then we generate and assembly file with corresponding `ML64` notation.
 98 | 3. The assembly file is passed through `ML64` assembler that produces `OBJ` file.
 99 | 4. `OBJ` file is disassembled through `DUMPBIN` and we extract bytes for each instruction.
100 | 5. We generate a C source file with each permutation mapped to it's expected result bytes from `DUMPBIN`.
101 | 6. We compare result from the library to result from `ML64`.
102 | 
103 | Tests coverage per instruction:
104 | 
105 | ```
106 |  mov 7510 tests
107 |  sub 7508 tests
108 |  add 7508 tests
109 |  and 7508 tests
110 |   or 7508 tests
111 |  xor 7508 tests
112 |  pop  442 tests
113 | push  445 tests
114 | ```
115 | 
116 | # Links
117 | 
118 | - [x64 encoding writeup](https://github.com/martincohen/Wiki/wiki/x64)
119 |   - Not comprehensive yet, but can help with additional instructions should one need them.
120 | - [Development streams](https://twitch.tv/martincohen)
121 |   - Occasional streams.
122 | - [Development streams archive on YouTube](https://www.youtube.com/playlist?list=PLPdqby1EYYdUJw27y0LpIffko8EhP6ICs)
123 |   - Kept in sync with archive on Twitch.
124 | 


--------------------------------------------------------------------------------
/x64.h:
--------------------------------------------------------------------------------
  1 | #pragma once
  2 | 
  3 | // ---
  4 | // Author:  Martin 'Halt' Cohen, @martin_cohen
  5 | // License: MIT (see LICENSE)
  6 | // ---
  7 | 
  8 | #include <stdbool.h>
  9 | #include <stdint.h>
 10 | 
 11 | // NOTE: To use these, you'll have to include appropriate headers yourself.
 12 | 
 13 | #ifndef X64_ERROR
 14 |     #define X64_ERROR(Message) { printf(Message); abort(); }
 15 | #endif
 16 | 
 17 | #ifndef X64_ASSERT
 18 |     #define X64_ASSERT assert
 19 | #endif
 20 | 
 21 | #ifndef X64_ASSERT_DEBUG
 22 |     #define X64_ASSERT_DEBUG assert
 23 | #endif
 24 | 
 25 | #define X64_TODO X64_ERROR("todo")
 26 | 
 27 | typedef enum X64Reg {
 28 |     X64_None = 0,
 29 | 
 30 |     X64_RAX, // 000
 31 |     X64_RCX, // 001
 32 |     X64_RDX, // 010
 33 |     X64_RBX, // 011
 34 |     X64_RSP, // 100
 35 |     X64_RBP, // 101
 36 |     X64_RSI, // 110
 37 |     X64_RDI, // 111
 38 | 
 39 |     X64_R8,  // 1.000
 40 |     X64_R9,  // 1.001
 41 |     X64_R10, // 1.010
 42 |     X64_R11, // 1.011
 43 |     X64_R12, // 1.100
 44 |     X64_R13, // 1.101
 45 |     X64_R14, // 1.110
 46 |     X64_R15, // 1.111
 47 | 
 48 |     X64Reg_LAST__,
 49 | 
 50 |     // Special-case register, used for RIP-based addressing mode with Mod R/M.
 51 |     X64_RIP,
 52 | } X64Reg;
 53 | 
 54 | #define x64reg_check(Reg) \
 55 |     X64_ASSERT_DEBUG((Reg) > X64_None && (Reg) < X64Reg_LAST__)
 56 | 
 57 | #define x64reg_is_int_ext(Reg) \
 58 |     ((Reg) >= X64_R8 && (Reg) <= X64_R15)
 59 | //
 60 | //
 61 | //
 62 | 
 63 | typedef enum X64Size {
 64 |     X64_SDefault,
 65 |     X64_S8,
 66 |     X64_S16,
 67 |     X64_S32,
 68 |     X64_S64,
 69 | } X64Size;
 70 | 
 71 | typedef enum X64Scale {
 72 |     X64_X1 = 0b00,
 73 |     X64_X2 = 0b01,
 74 |     X64_X4 = 0b10,
 75 |     X64_X8 = 0b11,
 76 | } X64Scale;
 77 | 
 78 | typedef enum X64OperandKind {
 79 |     X64O_Reg,
 80 |     X64O_Mem,
 81 |     X64O_Imm
 82 | } X64OperandKind;
 83 | 
 84 | typedef struct X64Operand {
 85 |     X64OperandKind kind;
 86 |     union {
 87 |         X64Reg reg;
 88 |         struct {
 89 |             X64Reg base;
 90 |             X64Reg index;
 91 |             X64Scale scale;
 92 |             // TODO: Is this supposed to be signed int?
 93 |             int32_t displacement;
 94 |         } mem;
 95 |         uint64_t imm;
 96 |     };
 97 | } X64Operand;
 98 | 
 99 | #define x64o_pair(A, B) ((A << 4) | B)
100 | 
101 | #define x64r(Reg) \
102 |     (X64Operand) { .kind = X64O_Reg, .reg = Reg }
103 | 
104 | #define x64m(Base, Index, Scale, Displacement) \
105 |     (X64Operand) { \
106 |         .kind = X64O_Mem, \
107 |         .mem.base = Base, \
108 |         .mem.index = Index, \
109 |         .mem.scale = Scale, \
110 |         .mem.displacement = Displacement \
111 |     }
112 | 
113 | #define x64i(Immediate) \
114 |     (X64Operand) { .kind = X64O_Imm, .imm = (Immediate) }
115 | 
116 | #define x64o_swap(A, B) { \
117 |         X64Operand t = A; \
118 |         A = B; \
119 |         B = t; \
120 |     }
121 | 
122 | //
123 | //
124 | //
125 | 
126 | typedef enum X64ModRMMode {
127 |     X64ModRM_Indirect       = 0b00,
128 |     X64ModRM_IndirectDisp8  = 0b01,
129 |     X64ModRM_IndirectDisp32 = 0b10,
130 |     X64ModRM_Direct         = 0b11,
131 | } X64ModRMMode;
132 | 
133 | //
134 | //
135 | //
136 | 
137 | // Whenever we have `rm` we encode other register in modrm.reg.
138 | // Otherwise (reg/imm) we encode register in opcode.
139 | 
140 | // Using int16_t to be able to denote -1 as "not available" in
141 | // case the opset doesn't support it. This is because `0` is
142 | // used with ADD r/m8,r8.
143 | typedef struct X64OpBinary
144 | {
145 |     // Register in modrm.reg
146 |     // Opcode must support 16, 32 and 64 operand sizes.
147 |     int16_t reg_rm;
148 |     // Register in modrm.reg
149 |     // Opcode must support 16, 32 and 64 operand sizes.
150 |     int16_t rm_reg;
151 | 
152 |     // Register in modrm.reg.
153 |     int16_t rm8_reg8;
154 |     // Register in modrm.reg.
155 |     // In case of (reg8, reg8) GCC seems to refer rm8_reg8,
156 |     // while ML64 preferes reg8_rm8.
157 |     // TODO: Check if there's any difference.
158 |     int16_t reg8_rm8;
159 | 
160 |     // rmX_immX:
161 |     //  - In case when used for writing to a register:
162 |     //  - modrm.mode = 11
163 |     //  - modrm.reg = 0
164 |     //  - modrm.rm = destination register
165 | 
166 |     // rmX_immX
167 |     // Opcode must support 16, 32 and 64 operand sizes.
168 |     int16_t rm_imm8;
169 |     // Extends rm_imm8 opcode with modrm.reg field.
170 |     // Opcode must support 16, 32 and 64 operand sizes.
171 |     int16_t rm_imm8_op;
172 | 
173 |     // rmX_immX
174 |     // Opcode must support 16, 32 and 64 operand sizes.
175 |     int16_t rm_imm32;
176 |     // Extends rm_imm32 opcode with modrm.reg field.
177 |     int16_t rm_imm32_op;
178 | 
179 |     // rmX_immX
180 |     // So far all opcodes have this note: In 64-bit no AH, BH, CH, DH.
181 |     int16_t rm8_imm8;
182 |     // Extends rm8_imm8 opcode with modrm.reg field.
183 |     int16_t rm8_imm8_op;
184 | 
185 |     // Regsiter in opcode.
186 |     int16_t reg8_imm8;
187 |     // Register in opcode.
188 |     // In 64-bit the imm32 is sign-extended to 64-bit.
189 |     // So far all opcodes support 16, 32 and sign-extended 64.
190 |     int16_t reg32_imm32;
191 |     // Register in opcode.
192 |     // So far reg32_imm32 with reg.
193 |     // So far only mov has this variant.
194 |     int16_t reg64_imm64;
195 | } X64OpBinary;
196 | 
197 | const X64OpBinary X64Op_Mov =
198 | {
199 |     .reg_rm         = 0x8B,
200 |     .rm_reg         = 0x89,
201 |     .rm8_reg8       = 0x88,
202 |     .reg8_rm8       = 0x8A,
203 |     .rm_imm8        = -1,
204 |     .rm_imm32       = 0xC7, .rm_imm32_op = 0,
205 |     .rm8_imm8       = 0xC6, .rm8_imm8_op = 0,
206 |     .reg8_imm8      = 0xB0,
207 |     .reg32_imm32    = 0xB8,
208 |     .reg64_imm64    = 0xB8,
209 | };
210 | 
211 | const X64OpBinary X64Op_Sub =
212 | {
213 |     .reg_rm         = 0x2B,
214 |     .rm_reg         = 0x29,
215 |     .rm8_reg8       = 0x28,
216 |     .reg8_rm8       = 0x2A,
217 |     .rm_imm8        = 0x83, .rm_imm8_op = 5,
218 |     .rm_imm32       = 0x81, .rm_imm32_op = 5,
219 |     .rm8_imm8       = 0x80, .rm8_imm8_op = 5,
220 |     .reg8_imm8      = -1, // Not available.
221 |     .reg32_imm32    = -1, // Not available.
222 |     .reg64_imm64    = -1, // Not available.
223 |     // TODO: reg8_imm8   only for AL
224 |     // TODO: reg32_imm32 only for AX, EAX, RAX.
225 | };
226 | 
227 | const X64OpBinary X64Op_Add =
228 | {
229 |     .reg_rm         = 0x03,
230 |     .rm_reg         = 0x01,
231 |     .rm8_reg8       = 0x00,
232 |     .reg8_rm8       = 0x02,
233 |     .rm_imm8        = 0x83, .rm_imm8_op  = 0,
234 |     .rm_imm32       = 0x81, .rm_imm32_op = 0,
235 |     .rm8_imm8       = 0x80, .rm8_imm8_op = 0,
236 |     .reg8_imm8      = -1,
237 |     .reg32_imm32    = -1,
238 |     .reg64_imm64    = -1,
239 |     // TODO: reg8_imm8   only for AL
240 |     // TODO: reg32_imm32 only for AX, EAX, RAX.
241 | };
242 | 
243 | const X64OpBinary X64Op_And =
244 | {
245 |     .reg_rm         = 0x23,
246 |     .rm_reg         = 0x21,
247 |     .rm8_reg8       = 0x20,
248 |     .reg8_rm8       = 0x22,
249 |     .rm_imm8        = 0x83, .rm_imm8_op  = 4,
250 |     .rm_imm32       = 0x81, .rm_imm32_op = 4,
251 |     .rm8_imm8       = 0x80, .rm8_imm8_op = 4,
252 |     .reg8_imm8      = -1,
253 |     .reg32_imm32    = -1,
254 |     .reg64_imm64    = -1,
255 | };
256 | 
257 | const X64OpBinary X64Op_Or =
258 | {
259 |     .reg_rm         = 0x0B,
260 |     .rm_reg         = 0x09,
261 |     .rm8_reg8       = 0x08,
262 |     .reg8_rm8       = 0x0A,
263 |     .rm_imm8        = 0x83, .rm_imm8_op  = 1,
264 |     .rm_imm32       = 0x81, .rm_imm32_op = 1,
265 |     .rm8_imm8       = 0x80, .rm8_imm8_op = 1,
266 |     .reg8_imm8      = -1,
267 |     .reg32_imm32    = -1,
268 |     .reg64_imm64    = -1,
269 | };
270 | 
271 | const X64OpBinary X64Op_Xor =
272 | {
273 |     .reg_rm         = 0x33,
274 |     .rm_reg         = 0x31,
275 |     .rm8_reg8       = 0x30,
276 |     .reg8_rm8       = 0x32,
277 |     .rm_imm8        = 0x83, .rm_imm8_op  = 6,
278 |     .rm_imm32       = 0x81, .rm_imm32_op = 6,
279 |     .rm8_imm8       = 0x80, .rm8_imm8_op = 6,
280 |     .reg8_imm8      = -1,
281 |     .reg32_imm32    = -1,
282 |     .reg64_imm64    = -1,
283 | };
284 | 
285 | typedef struct X64OpUnary {
286 |     uint8_t rm;
287 |     // Number to use on 'modrm.reg' when we're encoding memory expression.
288 |     // This is extension of the op code.
289 |     // SDM notes this after the opcode as /<register number>, for example
290 |     // 8F /0 -> pop,  opcode = 8F, modrm.reg set to 0 (RAX)
291 |     // FF /6 -> push, opcode = FF, modrm.reg set to 6 (RSI)
292 |     uint8_t rm_op;
293 |     uint8_t reg;
294 |     uint8_t imm8;
295 |     uint8_t imm32;
296 | 
297 | } X64OpUnary;
298 | 
299 | static inline x64opunary_has_imm(const X64OpUnary op) {
300 |     return op.imm8 || op.imm32;
301 | }
302 | 
303 | const X64OpUnary X64Op_Pop = {
304 |     .reg = 0x58,
305 |     // TODO: Cannot encode 32-bit operand size.
306 |     // NOTE: Notated as 8F /0 in SDM.
307 |     .rm = 0x8F, .rm_op = 0,
308 | };
309 | 
310 | const X64OpUnary X64Op_Push = {
311 |     .reg = 0x50,
312 |     // NOTE: Notated as FF /6 in SDM.
313 |     .rm = 0xFF, .rm_op = 6,
314 |     .imm8 = 0x6A,
315 |     .imm32 = 0x68,
316 | };
317 | 
318 | //
319 | //
320 | //
321 | 
322 | typedef struct X64Inst {
323 |     uint8_t bytes[
324 |         3 + // prefixes
325 |         3 + // opcode
326 |         1 + // mod r/m
327 |         1 + // sib
328 |         8 + // displacement (some rare instructions take 8B displacement)
329 |         8 + // immediate    (some rare instructions take 8B immediate)
330 |         0
331 |     ];
332 |     uint8_t count;
333 |     const char* error;
334 | } X64Inst;
335 | 
336 | //
337 | //
338 | //
339 | 
340 | static inline int8_t
341 | x64imm_get_size(uint64_t imm)
342 | {
343 |     if (imm <= 0xff) {
344 |         return 1;
345 |     } else if (imm <= 0xffffffffull) {
346 |         return 4;
347 |     } else {
348 |         return 8;
349 |     }
350 | }
351 | 
352 | // 'w' operand size is 64-bit
353 | // 'r' extension of modrm.reg
354 | // 'x' extension of sib.index
355 | // 'b' extension of modrm.rm, sib.base or opcode reg field
356 | static inline uint8_t
357 | x64rex(int8_t w, int8_t r, uint8_t x, uint8_t b) {
358 |     // bits: 0100 W R X B
359 |     uint8_t wrxb = (w & 1) << 3 | (r & 1) << 2 | (x & 1) << 1 | (b & 1) << 0;
360 |     if (wrxb) {
361 |         return 0b01000000 | wrxb;
362 |     }
363 |     return 0;
364 | }
365 | 
366 | // 'mode'
367 | //   - 00 - memory expression with no displacement
368 | //   - 01 - memory expression with 8-bit displacement
369 | //   - 10 - memory expression with 32-bit displacement
370 | //   - 11 - register
371 | // 'reg' is reg/opcode field
372 | //   - specifies either a register number or three more bits of opcode information.
373 | //     - for exampel PUSH is FF opcode, with value 6 in this field.
374 | // 'rm'
375 | //   - can specify a register as an operand or it can be combined with
376 | //     the mod field to encode an addressing mode. Sometimes, certain
377 | //     combinations of the mod field and the rm field are used to express
378 | //     opcode information for some instructions.
379 | static inline uint8_t
380 | x64modrm(X64ModRMMode mode, X64Reg reg, X64Reg rm) {
381 |     X64_ASSERT_DEBUG(mode >= 0 && mode < 4);
382 |     x64reg_check(reg);
383 |     x64reg_check(rm);
384 |     return
385 |         (mode << 6) |
386 |         (((reg - 1) & 7) << 3) |
387 |         ((rm - 1) & 7);
388 | }
389 | 
390 | static inline uint8_t
391 | x64sib(X64Scale scale, X64Reg index, X64Reg base) {
392 |     X64_ASSERT_DEBUG(scale >= 0 && scale < 4);
393 |     x64reg_check(index);
394 |     x64reg_check(base);
395 |     return
396 |         (scale << 6) |
397 |         (((index - 1) & 7) << 3) |
398 |         ((base - 1) & 7);
399 | }
400 | 
401 | static inline uint8_t
402 | x64op_reg(int16_t op, X64Reg reg) {
403 |     X64_ASSERT_DEBUG(op != -1);
404 |     x64reg_check(reg);
405 |     return op | ((reg - 1) & 7);
406 | }
407 | 
408 | //
409 | // Full instruction encoders
410 | //
411 | 
412 | static inline uint8_t*
413 | x64e_bytes_(uint8_t* it, uint8_t* bytes, int bytes_count) {
414 |     while (bytes_count) {
415 |         *it++ = *bytes++;
416 |         --bytes_count;
417 |     }
418 |     return it;
419 | }
420 | 
421 | // Mem/Reg (rm_reg)
422 | // Reg/Mem (reg_rm)
423 | // Mem/Imm (some of rm_imm, others are encoded with x64e_modrm_)
424 | // Instruction with memory expression operand.
425 | static inline uint8_t*
426 | x64e_modrm_sib_disp_(uint8_t* it, X64Size size, int opcode, X64Reg reg, X64Reg base, X64Reg index, X64Scale scale, uint64_t displacement, char** error)
427 | {
428 |     X64_ASSERT_DEBUG(opcode != -1);
429 | 
430 |     int modrm_mode = -1;
431 |     int modrm_reg = reg;
432 |     int modrm_rm = base;
433 | 
434 |     bool sib = false;
435 |     int sib_scale = 0;
436 |     int sib_index = 0;
437 |     int sib_base = 0;
438 | 
439 |     uint8_t rex = 0;
440 | 
441 |     int8_t displacement_size = 0;
442 |     if (displacement == 0) {
443 |         displacement_size = 0;
444 |         modrm_mode = X64ModRM_Indirect;
445 |     } else if (displacement < 0x100) {
446 |         displacement_size = 1;
447 |         modrm_mode = X64ModRM_IndirectDisp8;
448 |     } else {
449 |         displacement_size = 4;
450 |         modrm_mode = X64ModRM_IndirectDisp32;
451 |     }
452 | 
453 |     if (base == X64_RIP)
454 |     {
455 |         if (index != 0 || scale != 0) {
456 |             *error = "index and scale must be 0 in when base is RIP";
457 |             return NULL;
458 |         }
459 |         // Special case.
460 |         // Forcing mode to Indirect, and displacement size to 4.
461 |         modrm_mode = X64ModRM_Indirect;
462 |         modrm_rm = X64_RBP;
463 |         displacement_size = 4;
464 |     }
465 |     else if (index == 0)
466 |     {
467 |         if (scale != 0) {
468 |             X64_ERROR("scale must be set to X1");
469 |         }
470 | 
471 |         // Signal no index.
472 |         sib_index = X64_RSP;
473 | 
474 |         if (base == 0) {
475 |             // No base, no index, assuming absolute addressing.
476 |             sib = true;
477 |             modrm_rm = X64_RSP;
478 |             sib_base = X64_RBP;
479 |             modrm_mode = X64ModRM_Indirect;
480 |             displacement_size = 4;
481 |         } else if (base == X64_RBP || base == X64_R13) {
482 |             // RBS-base, no index.
483 |             if (modrm_mode == X64ModRM_Indirect) {
484 |                 // Special case.
485 |                 X64_ASSERT_DEBUG(displacement == 0);
486 |                 displacement_size = 1;
487 |                 modrm_mode = X64ModRM_IndirectDisp8;
488 |             }
489 |         } else if (base == X64_RSP || base == X64_R12) {
490 |             // RSP-base, no index.
491 |             // Because RSP has special meaning in modrm.rm,
492 |             // we need to force SIB here and do it through sib.base.
493 |             sib = true;
494 |             modrm_rm = X64_RSP;
495 |             sib_base = base;
496 |         } else {
497 |             // Base only, no index.
498 |             // Other-than RBP base, no SIB.
499 |             X64_ASSERT_DEBUG(sib == false);
500 |             X64_ASSERT_DEBUG(modrm_rm);
501 |         }
502 |     }
503 |     else if (index == X64_RSP)
504 |     {
505 |         // This is a special feature provided on assembler level.
506 |         // If we want to index by RSP, we can only do so by setting it as sib.base.
507 |         // That means that sib.scale has to be set to X1.
508 |         if (scale != X64_X1) {
509 |             *error = "cannot index by RSP with scale other than 1";
510 |             return NULL;
511 |         }
512 | 
513 |         // Same as branch index == 0 && base == X64_RSP.
514 |         sib = true;
515 |         modrm_rm = X64_RSP;
516 |         sib_base = X64_RSP;
517 |         if (base == 0) {
518 |             sib_index = X64_RSP;
519 |         } else {
520 |             sib_index = base;
521 |         }
522 |     }
523 |     else
524 |     {
525 |         sib = true;
526 |         modrm_rm = X64_RSP;
527 | 
528 |         sib_index = index;
529 |         sib_scale = scale;
530 |         sib_base = base;
531 |         if (base == 0) {
532 |             // Flag we're using no base.
533 |             sib_base = X64_RBP;
534 |             // We have to switch mode to 0b00 and force displacement_size to 4.
535 |             modrm_mode = X64ModRM_Indirect;
536 |             displacement_size = 4;
537 |         } else if (base == X64_RBP || base == X64_R13) {
538 |             // RBS-base, no index.
539 |             if (modrm_mode == X64ModRM_Indirect) {
540 |                 // Special case.
541 |                 X64_ASSERT_DEBUG(displacement == 0);
542 |                 displacement_size = 1;
543 |                 modrm_mode = X64ModRM_IndirectDisp8;
544 |             }
545 |         }
546 |     }
547 | 
548 |     rex = x64rex(
549 |         size == X64_S64,
550 |         x64reg_is_int_ext(reg),
551 |         sib && x64reg_is_int_ext(sib_index),
552 |         sib
553 |             ? x64reg_is_int_ext(sib_base)
554 |             : x64reg_is_int_ext(modrm_rm));
555 |     if (rex) *it++ = rex;
556 | 
557 |     *it++ = opcode;
558 |     *it++ = x64modrm(modrm_mode, modrm_reg, modrm_rm);
559 |     if (sib) *it++ = x64sib(sib_scale, sib_index, sib_base);
560 |     it = x64e_bytes_(it, (uint8_t*)&displacement, displacement_size);
561 | 
562 |     return it;
563 | }
564 | 
565 | // Reg/Imm (rmX_immX)
566 | // Reg and OpCode extension in ModRm
567 | static inline uint8_t*
568 | x64e_modrm_(uint8_t* it, X64Size size, int opcode, int opcode_ext, X64Reg reg)
569 | {
570 |     X64_ASSERT_DEBUG(opcode != -1);
571 |     X64_ASSERT_DEBUG(reg != 0);
572 | 
573 |     // This encodes:
574 |     // 1. OpCode
575 |     // 2. ModRm with Mode=11, Reg=opcode_ext, RM=reg
576 |     // 3. No SIB it seems.
577 |     // TODO: Test variants of this with ModRm/SIB registers.
578 | 
579 |     uint8_t rex = x64rex(size == X64_S64, 0, 0, x64reg_is_int_ext(reg));
580 |     if (rex) *it++ = rex;
581 |     *it++ = opcode;
582 |     *it++ = x64modrm(X64ModRM_Direct, opcode_ext + 1, reg);
583 |     return it;
584 | }
585 | 
586 | // Reg/Imm (regX_immX)
587 | // Reg in OpCode (hence rex extends it via `.b`)
588 | static inline uint8_t*
589 | x64e_op_reg_(uint8_t* it, X64Size size, int opcode, X64Reg reg)
590 | {
591 |     X64_ASSERT_DEBUG(opcode != -1);
592 | 
593 |     uint8_t rex = x64rex(size == X64_S64, 0, 0, x64reg_is_int_ext(reg));
594 |     if (rex) *it++ = rex;
595 |     // Encode reg in opcode (bottom 3 bits).
596 |     *it++ = x64op_reg(opcode, reg);
597 | 
598 |     return it;
599 | }
600 | 
601 | //
602 | //
603 | //
604 | 
605 | X64Inst
606 | x64_emit_error(const char* error)
607 | {
608 |     X64_ASSERT_DEBUG(error);
609 |     return (X64Inst){ .error = error };
610 | }
611 | 
612 | //
613 | //
614 | //
615 | 
616 | 
617 | uint8_t*
618 | x64_emit_binary_reg_reg_(uint8_t* it, X64Size size, const X64OpBinary op, X64Operand D, X64Operand S, char** error)
619 | {
620 |     // reg_rm
621 |     // rm8_reg8 -- in case size == X64_S8
622 | 
623 |     if (D.reg == 0) {
624 |         *error = "destination register cannot be none";
625 |         return 0;
626 |     }
627 | 
628 |     int16_t opcode = op.reg_rm;
629 |     uint8_t rex = 0;
630 |     if (size == X64_S8) {
631 |         // ML64 seems to prefer rm8_reg8, while
632 |         // GCC prefers reg8_rm8.
633 | #if 0
634 |         if (op.rm8_reg8) {
635 |             opcode = op.rm8_reg8;
636 |             x64o_swap(D, S);
637 |         }
638 | #else
639 |         if (op.reg8_rm8) {
640 |             opcode = op.reg8_rm8;
641 |         }
642 | #endif
643 |     }
644 |     X64_ASSERT_DEBUG(opcode != -1);
645 | 
646 |     rex = x64rex(size == X64_S64, x64reg_is_int_ext(D.reg), 0, x64reg_is_int_ext(S.reg));
647 |     if (rex) *it++ = rex;
648 |     *it++ = opcode;
649 |     *it++ = x64modrm(X64ModRM_Direct, D.reg, S.reg);
650 |     return it;
651 | }
652 | 
653 | uint8_t*
654 | x64_emit_binary_reg_imm_(uint8_t* it, X64Size size, const X64OpBinary op, X64Operand D, X64Operand S, char **error)
655 | {
656 |     // This encodes one of two op codes:
657 |     // - reg8_imm8 (or rm8_imm8)
658 |     // - reg32_imm32 (or rm_imm32)
659 |     // In case of rmX_immX variant, we encode:
660 |     // - modrm.mode == 0b11
661 |     // - modrm.reg = op.reg8_imm8_op
662 |     // - modrm.rm = D.reg
663 |     // In case of regX_immX:
664 |     // - we encode D.reg into opcode directly
665 | 
666 |     // In case this is specified and required, we'll use:
667 |     // reg64_imm64
668 |     // If not defined, but required, we'll raise and error.
669 | 
670 |     X64_ASSERT_DEBUG(D.kind == X64O_Reg);
671 |     X64_ASSERT_DEBUG(S.kind == X64O_Imm);
672 |     if (D.reg == 0) {
673 |         *error = "destination register cannot be none";
674 |         return 0;
675 |     }
676 | 
677 |     int imm_size = x64imm_get_size(S.imm);
678 | 
679 |     // (uint8_t)a -= 400
680 | 
681 |     switch (size)
682 |     {
683 |         case X64_S8:
684 |             if (imm_size > 1) {
685 |                 *error = "immediate value truncated to 8 bits because of size argument";
686 |                 return NULL;
687 |             }
688 |             imm_size = 1;
689 |             if (op.reg8_imm8 == -1) {
690 |                 it = x64e_modrm_(it, X64_S32, op.rm8_imm8, op.rm8_imm8_op, D.reg);
691 |             } else {
692 |                 it = x64e_op_reg_(it, X64_S8, op.reg8_imm8, D.reg);
693 |             }
694 |             break;
695 | 
696 |         case X64_S64:
697 |             if (imm_size > 4) {
698 |                 // 64-bit
699 |                 if (op.reg64_imm64 == -1) {
700 |                     *error = "64-bit immediate value not supported with this instruction";
701 |                     return NULL;
702 |                 }
703 |                 it = x64e_op_reg_(it, X64_S64, op.reg64_imm64, D.reg);
704 |                 break;
705 |             }
706 |             // Fallthrough.
707 |         case X64_S32:
708 |             // WARNING: X64_S64 falls through here as well.
709 |             if (imm_size == 1 && op.rm_imm8 != -1) {
710 |                 // We're using `size` here because we want REX in case we fallthrough
711 |                 // from size == X64_S64.
712 |                 it = x64e_modrm_(it, size, op.rm_imm8, op.rm_imm8_op, D.reg);
713 |             } else {
714 |                 if (imm_size > 4) {
715 |                     *error = "immediate value truncated to 32 bits because of size argument";
716 |                     return NULL;
717 |                 }
718 |                 imm_size = 4;
719 |                 if (size == X64_S32 && op.reg32_imm32 != -1) {
720 |                     // We're not taking this branch in case the size 64 fallsthrough here.
721 |                     // Otherwise we'd have to encode imm64, even if it's way smaller.
722 |                     it = x64e_op_reg_(it, X64_S32, op.reg32_imm32, D.reg);
723 |                 } else {
724 |                     // We're using `size` here because we want REX in case we fallthrough
725 |                     // from size == X64_S64.
726 |                     it = x64e_modrm_(it, size, op.rm_imm32, op.rm_imm32_op, D.reg);
727 |                 }
728 |             }
729 |             break;
730 | 
731 |     }
732 | 
733 |     return x64e_bytes_(it, (uint8_t*)&S.imm, imm_size);
734 | }
735 | 
736 | uint8_t*
737 | x64_emit_binary_mem_imm_(uint8_t* it, X64Size size, const X64OpBinary op, X64Operand D, X64Operand S, char** error)
738 | {
739 |     X64_ASSERT(D.kind == X64O_Mem && S.kind == X64O_Imm);
740 | 
741 |     int8_t imm_size = x64imm_get_size(S.imm);
742 | 
743 |     int opcode = -1;
744 |     int opcode_ext = 0;
745 | 
746 |     switch (size)
747 |     {
748 |         case X64_S8:
749 |             if (imm_size > 1) {
750 |                 *error = "immediate value truncated to 8 bits because of size argument";
751 |                 return NULL;
752 |             }
753 |             imm_size = 1;
754 |             opcode = op.rm8_imm8;
755 |             opcode_ext = op.rm8_imm8_op;
756 |             break;
757 | 
758 |         case X64_S64:
759 |             if (imm_size > 4) {
760 |                 *error = "operation mem64, imm64 is not supported";
761 |                 return NULL;
762 |             }
763 |             // Fallthrough as we'll use rm_imm8 or rm_imm32 with rex (produced by x64e used).
764 |         case X64_S32:
765 |             // TODO: Check whether SDefault works here as it's supposed to.
766 |             if (imm_size == 1 && op.rm_imm8 != -1) {
767 |                 opcode = op.rm_imm8;
768 |                 opcode_ext = op.rm_imm8_op;
769 |             } else {
770 |                 if (imm_size > 4) {
771 |                     *error = "immediate value truncated to 32 bits because of size argument";
772 |                     return NULL;
773 |                 }
774 |                 opcode = op.rm_imm32;
775 |                 opcode_ext = op.rm_imm32_op;
776 |                 imm_size = 4;
777 |             }
778 |             break;
779 |     }
780 | 
781 |     it = x64e_modrm_sib_disp_(it,
782 |         size,
783 |         opcode,
784 |         opcode_ext + 1,
785 |         D.mem.base,
786 |         D.mem.index,
787 |         D.mem.scale,
788 |         D.mem.displacement,
789 |         error);
790 |     return x64e_bytes_(it, (uint8_t*)&S.imm, imm_size);
791 | }
792 | 
793 | uint8_t*
794 | x64_emit_binary_reg_mem_(uint8_t* it, X64Size size, const X64OpBinary op, X64Operand D, X64Operand S, char** error)
795 | {
796 |     int opcode = 0;
797 |     switch (x64o_pair(D.kind, S.kind))
798 |     {
799 |         case x64o_pair(X64O_Reg, X64O_Mem):
800 |             // All good.
801 |             opcode = size == X64_S8
802 |                 ? op.reg8_rm8
803 |                 : op.reg_rm;
804 |             break;
805 |         case x64o_pair(X64O_Mem, X64O_Reg):
806 |             x64o_swap(D, S);
807 |             opcode = size == X64_S8
808 |                 ? op.rm8_reg8
809 |                 : op.rm_reg;
810 |             break;
811 |         default:
812 |             X64_ERROR("invalid operands");
813 |             return it;
814 |     }
815 | 
816 |     if (opcode == -1) {
817 |         X64_ERROR("opcode is not defined");
818 |         return it;
819 |     }
820 | 
821 |     return x64e_modrm_sib_disp_(it, size, opcode, D.reg, S.mem.base, S.mem.index, S.mem.scale, S.mem.displacement, error);
822 | }
823 | 
824 | X64Inst
825 | x64_emit_binary(X64Size size, const X64OpBinary op, X64Operand D, X64Operand S)
826 | {
827 |     X64Inst inst = {0};
828 |     uint8_t* it = inst.bytes;
829 | 
830 |     char* error = "uknown error";
831 |     switch (x64o_pair(D.kind, S.kind))
832 |     {
833 |         case x64o_pair(X64O_Reg, X64O_Reg):
834 |             it = x64_emit_binary_reg_reg_(it, size, op, D, S, &error);
835 |             break;
836 | 
837 |         case x64o_pair(X64O_Reg, X64O_Imm):
838 |             it = x64_emit_binary_reg_imm_(it, size, op, D, S, &error);
839 |             break;
840 | 
841 |         case x64o_pair(X64O_Mem, X64O_Reg):
842 |         case x64o_pair(X64O_Reg, X64O_Mem):
843 |             it = x64_emit_binary_reg_mem_(it, size, op, D, S, &error);
844 |             break;
845 | 
846 |         case x64o_pair(X64O_Mem, X64O_Imm):
847 |             it = x64_emit_binary_mem_imm_(it, size, op, D, S, &error);
848 |             break;
849 | 
850 | 
851 |         default:
852 |             X64_ERROR("unexpected arguments");
853 |     }
854 | 
855 |     if (it == 0) {
856 |         return x64_emit_error(error);
857 |     }
858 | 
859 |     inst.count = it - inst.bytes;
860 |     return inst;
861 | }
862 | 
863 | X64Inst x64_emit_unary(X64OpUnary op, X64Operand D)
864 | {
865 |     X64Inst inst = {0};
866 |     uint8_t* it = inst.bytes;
867 | 
868 |     char* error = "unknown error";
869 |     uint8_t rex = 0;
870 |     switch (D.kind)
871 |     {
872 |         case X64O_Reg:
873 |             if (D.reg == 0) {
874 |                 return x64_emit_error("invalid register");
875 |             }
876 |             rex = x64rex(0, 0, 0, x64reg_is_int_ext(D.reg));
877 |             if (rex) *it++ = rex;
878 |             *it++ = op.reg | (D.reg - 1) & 7;
879 |             break;
880 | 
881 |         case X64O_Imm: {
882 |             if (!x64opunary_has_imm(op)) {
883 |                 return x64_emit_error("immediate values are not supported with this operation");
884 |             }
885 |             if (op.imm8 && D.imm <= 0xFF) {
886 |                 *it++ = op.imm8;
887 |                 it = x64e_bytes_(it, (uint8_t*)&D.imm, 1);
888 |             } else if (op.imm32 && D.imm <= 0xFFFFffff) {
889 |                 *it++ = op.imm32;
890 |                 it = x64e_bytes_(it, (uint8_t*)&D.imm, 4);
891 |             } else {
892 |                 return x64_emit_error("32-bit immediate is maximum");
893 |             }
894 |             break;
895 |         }
896 | 
897 |         case X64O_Mem: {
898 |             it = x64e_modrm_sib_disp_(it, X64_SDefault, op.rm, op.rm_op + 1, D.mem.base, D.mem.index, D.mem.scale, D.mem.displacement, &error);
899 |             if (it == 0) {
900 |                 return x64_emit_error(error);
901 |             }
902 |             break;
903 |         }
904 | 
905 |         default:
906 |             return x64_emit_error("invalid operand");
907 |     }
908 | 
909 |     inst.count = it - inst.bytes;
910 |     return inst;
911 | }
912 | 
913 | //
914 | //
915 | //
916 | 
917 | static inline X64Inst x64_mov(X64Size size, X64Operand D, X64Operand S) { return x64_emit_binary(size, X64Op_Mov, D, S); }
918 | static inline X64Inst x64_sub(X64Size size, X64Operand D, X64Operand S) { return x64_emit_binary(size, X64Op_Sub, D, S); }
919 | static inline X64Inst x64_add(X64Size size, X64Operand D, X64Operand S) { return x64_emit_binary(size, X64Op_Add, D, S); }
920 | static inline X64Inst x64_and(X64Size size, X64Operand D, X64Operand S) { return x64_emit_binary(size, X64Op_And, D, S); }
921 | static inline X64Inst  x64_or(X64Size size, X64Operand D, X64Operand S) { return x64_emit_binary(size, X64Op_Or,  D, S); }
922 | static inline X64Inst x64_xor(X64Size size, X64Operand D, X64Operand S) { return x64_emit_binary(size, X64Op_Xor, D, S); }
923 | 
924 | static inline X64Inst x64_pop(X64Operand D)  { return x64_emit_unary(X64Op_Pop, D); }
925 | static inline X64Inst x64_push(X64Operand S) { return x64_emit_unary(X64Op_Push, S); }
926 | 
927 | // TODO: What about '0xCB RET Far return to calling procedure.'?
928 | static inline X64Inst x64_ret() { return (X64Inst){ .bytes = { 0xC3 }, .count = 1 }; }
929 | 
930 | // TODO: `lea`
931 | // TODO: `not`
932 | // TODO: `shl`
933 | // TODO: `shr`
934 | // TODO: `sal`
935 | // TODO: `sar`
936 | // TODO: `int3`
937 | // TODO: `call reg`
938 | // TODO: `call imm64`
939 | // TODO: `imul` (result in rdx and rax)
940 | //  - https://youtu.be/ieuUHIWaIqM?list=PL0C5C980A28FEE68D&t=340
941 | //  - there's single operand, double operand and tripple operand versions
942 | // TODO: `idiv`


--------------------------------------------------------------------------------