├── .gitignore ├── README.md ├── call.s ├── defs.i ├── dropper.s ├── ebpf_asm.py ├── net_hdrs.i ├── paren.py ├── regression.py ├── release.sh └── test.s /.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | a.out 3 | *.pyc 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ebpf_asm 2 | 3 | An assembler for eBPF programs written in an Intel-like assembly syntax. 4 | 5 | ## Synopsis 6 | 7 | `ebpf_asm.py [...] -o ` 8 | 9 | ## Rationale 10 | 11 | It's great that you can write eBPF programs in C and then compile them with 12 | clang/LLVM. But clang's really rather big, and sometimes you don't have room 13 | for a gigantic toolchain — and your program is really small and simple. For 14 | such a case, writing the program directly in assembly is a feasible alternative. 15 | 16 | I chose the Intel syntax because my ability to read assembly code is directly 17 | proportional to how much it resembles Z80. If you dislike that as much as I 18 | dislike AT&T-syntax x86 assembly, this may not be the tool for you. 19 | 20 | ## License 21 | 22 | `ebpf_asm` itself and the supplied header files (`defs.i` and `net_hdrs.i`) are 23 | provided under the MIT license (see comment block at the top of `ebpf_asm.py`). 24 | The included example programs (`test.s`, `dropper.s` and `call.s`) are dual 25 | MIT/GPL. 26 | 27 | ## Syntax 28 | 29 | Comments are introduced with a semicolon `;` and continue to end of line. 30 | 31 | A backslash (`\`) at end of line indicates line continuation. This remains the 32 | case even within comments, for example: 33 | `; This is all \` 34 | ` one comment` 35 | 36 | ### Directives 37 | 38 | #### .text 39 | 40 | Following sections will consist of program text (i.e. executable instructions). 41 | 42 | #### .data 43 | 44 | Following sections will consist of program data (currently just asciz strings). 45 | 46 | #### .section 47 | 48 | `.section maps` starts (or continues) the maps section, containing 49 | [map definitions](#map-definitions). 50 | 51 | `.section .BTF` starts (or continues) the BTF section, containing 52 | [type definitions](#type-definitions). 53 | 54 | Otherwise, `.section name` starts (or continues) a section with the given name. 55 | This section will contain either text or [data](#data-definitions) (depending on 56 | the last `.text` or `.data` directive) and will run until the next `.section`, 57 | `.text` or `.data` directive. 58 | 59 | #### .include 60 | 61 | `.include name` includes the specified file (relative to the cwd) textually. 62 | 63 | #### .equ 64 | 65 | `.equ name, immediate` defines the given name to equal the `immediate` (which 66 | could be a literal, or the name of another equate). The `immediate` does not 67 | accept a size suffix. 68 | 69 | `name` is any string which does not start with a digit and does not contain a 70 | comma (`,`). It _may_ contain internal whitespace. 71 | 72 | Register names are legal as equate names, but where an operand could be either 73 | it will be treated as a register name. However, operands which are required to 74 | be immediates, not registers, will treat it as an equate. This is potentially 75 | very confusing, so don't do this! 76 | 77 | An equate can be defined with a `name` that ends in a size suffix, but accessing 78 | the equate in a context where a size suffix would be allowed will require using 79 | **two** size suffixes. This is also confusing, so don't do this either! 80 | 81 | An equate can be redefined; the new value takes effect from the following line. 82 | This could also be confusing, so maybe you shouldn't do it. 83 | 84 | #### .globl 85 | 86 | `.globl name` specifies that the label `name:` should create a global, rather 87 | than local, symbol; it is only needed for labels in text, not data, sections. 88 | 89 | The scope of the directive is the containing section; a .globl in one section 90 | does not affect similar labels in other sections. 91 | 92 | A warning will be written to stderr for any `.globl` whose referenced label does 93 | not exist, or for any `.globl` appearing in a non-text section. 94 | 95 | ### Labels 96 | 97 | A label consists of a sequence of alphanumeric characters followed by a colon 98 | `:`, which is omitted when referring to the label. The label points at the 99 | following instruction (in .text) or datum (in .data). A label may not begin 100 | with a digit, since that could cause confusion if references to the label look 101 | like numeric literals. (Strictly speaking we could allow this, because jumps 102 | always prefix their literals with `+` or `-`, but we forbid it so that when you 103 | forget the `+` you get a meaningful error.) 104 | 105 | Note that code cannot appear on the same line as the label! This is something 106 | we probably ought to support, but currently don't. 107 | 108 | Labels appear as symbols in the output binary; by default labels in .text are 109 | local whereas those in .data or maps sections are global. Global labels in 110 | .text sections can be created with the [.globl directive](#.globl). 111 | 112 | ### Instructions 113 | 114 | Text sections consist of instructions generally in the form `op dst, src`, 115 | though a few instructions take more (or fewer) operands. 116 | 117 | Operands typically may be either register names (`r0` to `r10`, or `fp` as a 118 | synonym for `r10`) or literals (decimal, 0octal, 0xhex, or an equate name). 119 | Literals normally must fit in a 32-bit signed integer, except for 120 | [`ld reg.q, imm`](#register-to-register). Some instructions can also take 121 | memory references `[reg+disp]` for some operands. 122 | 123 | Operands in many cases can also include a _size suffix_, a dot `.` followed by a 124 | letter: 125 | 126 | * `.b` byte 127 | * `.w` word (16 bits) 128 | * `.l` long (32 bits) 129 | * `.q` quad (64 bits) 130 | 131 | #### ld 132 | 133 | The load instruction `ld dst, src` is used for register-to-register, register- 134 | to-memory and memory-to-register moves. 135 | 136 | If both operands have size suffixes, they must match; if neither has, then quad 137 | (`.q`) is assumed. 138 | 139 | ##### Register-to-register 140 | 141 | `ld dst_reg, src_reg` 142 | 143 | `ld dst_reg, src_imm` 144 | 145 | Size must be quad (`.q`) or long (`.l`). For size quad, `src_imm` may be a map 146 | name (defined in the maps section); otherwise, it is an _unsigned_ 64-bit 147 | integer. 148 | 149 | ##### Register-to-memory 150 | 151 | `ld [ptr_reg+disp], src_reg` 152 | 153 | `ld [ptr_reg+disp], src_imm` 154 | 155 | The displacement `disp` may be omitted (as `ld [ptr_reg], src`) or negative (as 156 | `ld [ptr_reg-disp], src`). It is a signed 16-bit quantity (i.e. word) and does 157 | not accept a size suffix. 158 | 159 | A size suffix goes outside the brackets (as `ld [ptr_reg].sz, src`), not inside 160 | (since the pointer must always be full-sized). 161 | 162 | Regardless of size suffix, `src_imm` must fit in a signed 32-bit integer. 163 | 164 | ##### Memory-to-register 165 | 166 | `ld dst_reg, [ptr_reg+disp]` 167 | 168 | The same notes apply to the `[ptr_reg+disp]` as for Register-to-memory, above. 169 | 170 | #### ldpkt 171 | 172 | The packet-load instruction `ldpkt r0, src` is used for reading packet data into 173 | registers, in a complicated way for historical reasons. It represents the 174 | BPF\_ABS and BPF\_IND modes of the BPF\_LD opcode, which can only be used in 175 | socket filter, sched\_cls and sched\_act programs. 176 | 177 | `ldpkt r0, [disp]` 178 | 179 | `ldpkt r0, [off_reg+disp]` 180 | 181 | If both operands have size suffixes, they must match; if neither has, then, 182 | **unlike most other instructions**, long (`.l`) is assumed. This is because 183 | these instructions, being holdovers from classic BPF, _do not have_ quad-sized 184 | forms (which would be rejected by the verifier). The displacement `disp` may be 185 | omitted from the latter form, and in either case does not accept a size suffix. 186 | 187 | There are other restrictions on its use: the destination register must be `r0`, 188 | `r6` must contain a pointer to the sk\_buff, and registers `r1`-`r5` are 189 | clobbered. The value read will be converted to host-endianness. 190 | 191 | Unless you know you want this, you probably want an ordinary 192 | [memory-to-register load](#memory-to-register) using a packet-pointer, instead. 193 | 194 | See the kernel's [BPF documentation][1] for further enlightenment. 195 | 196 | [1]: https://www.kernel.org/doc/Documentation/networking/filter.txt 197 | 198 | #### xadd 199 | 200 | `xadd [ptr_reg+disp], src_reg` 201 | 202 | Atomic memory add (BPF\_STX | BPF\_XADD). The same notes apply to the 203 | `[ptr_reg+disp]` as for `ld` instructions, above. 204 | 205 | #### jr 206 | 207 | The relative jump instruction, `jr offset` or `jr cc, dst, src, offset`, is used 208 | to jump elsewhere in the program. `offset` may be either a signed literal (the 209 | `+` must be included for positive values) or a label name; it does not accept a 210 | size suffix. 211 | 212 | ##### Unconditional jump 213 | 214 | `jr offset` 215 | 216 | ##### Conditional jump 217 | 218 | `jr cc, dst, src, offset` 219 | 220 | Jump if condition `cc` holds on `dst` (a register) and `src` (a register or 221 | immediate). There are multiple synonyms for each condition. 222 | 223 | * `eq`, `e`, `=`, `z`: Jump if `dst` is equal to `src`. 224 | * `ne`, `!=`, `nz`: Jump if `dst` is not equal to `src`. 225 | * `gt`, `>`: Jump if `dst` is strictly greater than `src`. 226 | * `ge`, `>=`: Jump if `dst` is greater than or equal to `src`. 227 | * `lt`, `<`: Jump if `dst` is strictly less than `src`. 228 | * `le`, `<=`: Jump if `dst`is less than or equal to `src`. 229 | * `sgt`, `s>`: Signed greater-than. 230 | * `sge`, `s>=`, `p`: Signed greater-than-or-equal. 231 | * `slt`, `s<`, `n`: Signed less-than. 232 | * `sle`, `s<=`: Signed less-than-or-equal. 233 | * `set`, `&`, `and`: Jump if the bitwise AND of `dst` and `src` is nonzero. 234 | 235 | Both `dst` and `src` registers are considered as quads (`.q`); a `src` immediate 236 | is considered a long (`.l`). Explicit size suffixes are not accepted; the 237 | instruction encoding for jumps only supports these sizes (note in particular 238 | that although the comparison is performed on 64-bit values, the immediate is 239 | still limited to (signed) 32 bits). 240 | 241 | #### call 242 | 243 | ##### Helper function 244 | 245 | `call helper_function_id` 246 | 247 | In eBPF, the original call instruction calls a helper function identified by an 248 | integer (see defs.i), taking arguments `r1` to `r5` and returning in `r0`; these 249 | registers are clobbered, while the remaining registers (`r6` to `r9` and `fp`) 250 | are preserved across the call. Consult the kernel's eBPF documentation for 251 | details. The `helper_function_id` does not accept a size suffix. 252 | 253 | ##### BPF-to-BPF 254 | 255 | `call offset` 256 | 257 | Since Linux 4.16, eBPF programs can make calls to other functions within the 258 | same program. Currently these must be statically linked; the kernel is unable 259 | to resolve the relocation entries at program load time. Thus the `offset` to 260 | such a `call` instruction is similar to that on a `jr`. However, since negative 261 | numbers are accepted as `helper_function_id`s, a call with a negative literal 262 | offset has to be written like `call +-1` to mark it as an `offset`. 263 | Thus, the possible forms of BPF-to-BPF call are as follows: 264 | 265 | `call label` 266 | 267 | `call +1` 268 | 269 | `call +-1` 270 | 271 | `call +equate` 272 | 273 | `call +-equate` 274 | 275 | In most circumstances, however, only the first (`label`) form is likely to be 276 | useful. A simple example of usage can be found in the `call.s` sample program. 277 | 278 | #### exit 279 | 280 | `exit` 281 | 282 | Exit the program, returning the current value of `r0`. 283 | 284 | #### add, sub, mul, div, mod, and, or, xor, lsh, rsh, arsh 285 | 286 | `alu_op dst_reg, src_reg` 287 | 288 | `alu_op dst_reg, src_imm` 289 | 290 | Size must be either quad (`.q`) or long (`.l`). If both operands have size 291 | suffixes, they must match; if neither has, then quad (`.q`) is assumed. 292 | `src_imm` is a signed 32-bit quantity, even when size is quad (`.q`). 293 | 294 | Note the slight oddity that even for `lsh`, `rsh`, `arsh` instructions (where 295 | the size of the source operand should be irrelevant), the size suffix rules 296 | still apply - e.g. `lsh r1, 2.l` is a 32-bit shift. 297 | 298 | #### neg 299 | 300 | `neg dst_reg` 301 | 302 | Negate the specified register. Size must be either quad (`.q`) or long (`.l`); 303 | if omitted, quad (`.q`) is assumed. 304 | 305 | #### end 306 | 307 | `end le, dst_reg.sz` 308 | 309 | `end be, dst_reg.sz` 310 | 311 | Converts the specified register between Little or Big Endian and CPU endianness. 312 | Size `.sz` must be one of quad (`.q`), long (`.l`) or word (`.w`); if omitted, 313 | quad (`.q`) is assumed. 314 | The same operation is used for conversions both from and to CPU endianness. 315 | 316 | ### Map definitions 317 | 318 | May only appear in `.section maps`. 319 | 320 | `name: type, key_size, value_size, max_entries` 321 | 322 | `name: type, key_size, value_size, max_entries, flags` 323 | 324 | Defines a map with the given `name`, which can subsequently be used as a quad 325 | immediate. `type` is an integer ID (see defs.i). `key_size` and `value_size` 326 | are the sizes, in bytes, of the map key and map value. `max_entries` is the 327 | maximum number of entries this map can hold. 328 | 329 | `flags` is one or more of the following letters: 330 | 331 | * `P`: `BPF_F_NO_PREALLOC` 332 | * `L`: `BPF_F_NO_COMMON_LRU` 333 | 334 | Consult the kernel documentation for details of these flags and of the various 335 | map types. 336 | 337 | Normally, maps will be auto-pinned when the program is loaded. But unlike 338 | `iproute2`, `bpftool` doesn't support auto-pinning and will reject object files 339 | which request this in the map metadata. So, the ebpf_asm command-line option 340 | `--no-pin-maps` can be used to suppress this. 341 | 342 | ### Data definitions 343 | 344 | As it is not possible to reference .data sections from eBPF code, they have 345 | rather limited uses; hence the assembler has rather limited support for them. 346 | 347 | #### asciz 348 | 349 | `asciz "String text"` 350 | 351 | NUL-terminated ASCII string. Typically this is only used for the following 352 | snippet: 353 | ``` 354 | .data 355 | .section license 356 | _license: 357 | asciz "GPL" 358 | ``` 359 | 360 | ### Type definitions 361 | 362 | May only appear in `.section .BTF`. 363 | 364 | `name: definition` 365 | 366 | Defines a type with the given `name`. As well as being associated with the 367 | type's entry in the BTF section of the binary, the name can also be used in 368 | subsequent definitions. Note, however, that a definition must precede all uses 369 | of the name; use [forward declarations](#forward-declarations) to get around 370 | this when defining e.g. self- or mutually-referential types. 371 | 372 | The type `void` is pre-defined (as `BTF_KIND_UNKN`). 373 | 374 | A `definition` consists of a `kind` followed by arguments (whose number and 375 | semantics depend on the `kind`). A group of arguments enclosed by parentheses 376 | acts as a single argument, allowing the recursive construction of complex types. 377 | 378 | A type is _sizeable_ if its size in bytes can be calculated. Some derived types 379 | require their underlying types to be _sizeable_; see below for details. 380 | 381 | #### Integers 382 | 383 | `int encoding nbits` 384 | 385 | `int (encoding encoding ...) nbits` 386 | 387 | Defines an integer type. 388 | 389 | `encoding` is one of the following flags: `signed`, `unsigned`, `char`, `bool`. 390 | Since `unsigned` is the default, the following definitions are equivalent: 391 | 392 | `int unsigned 32` 393 | 394 | `int () 32` 395 | 396 | As of Linux 4.19, the kernel does not accept any combination of flags (there is 397 | no flag bit associated with `unsigned`), but the field in the BTF structures is 398 | clearly intended as a bitmask. 399 | 400 | `nbits` is the number of bits in the integer. At present the assembler only 401 | properly supports power-of-two sizes, as it doesn't support struct bitfields. 402 | 403 | An integer type is _sizeable_. 404 | 405 | #### Pointers 406 | 407 | `* type` 408 | 409 | Defines a type of pointer to `type`, which is either a `definition` or the 410 | `name` of another type defined previously. `type` does not need to be enclosed 411 | in parentheses, even if it consists of multiple tokens. 412 | 413 | A pointer type is _sizeable_ even if `type` is not. 414 | 415 | #### Arrays 416 | 417 | `array type nelems` 418 | 419 | Defines a type of array of `type` with `nelems` elements. `type` is either a 420 | `definition` or the `name` of another type defined previously; if it consists of 421 | multiple tokens, it must be parenthesised. `nelems` is an immediate literal, 422 | and may be an [equate name](#equ). 423 | 424 | `type` must be _sizeable_, as is the resulting array type. 425 | 426 | #### Structures 427 | 428 | `struct (type name) [(type name) ...]` 429 | 430 | Defines a structure with members of the given types and names. `type` is either 431 | a `definition` or the `name` of another type defined previously; if it consists 432 | of multiple tokens, it must be parenthesised. `name` is unquoted and thus may 433 | contain any characters other than parens and whitespace. 434 | 435 | Each `type` must be _sizeable_, as is the resulting structure type. 436 | 437 | #### Unions 438 | 439 | `union (type name) [(type name) ...]` 440 | 441 | Defines a union with members of the given types and names. `type` is either a 442 | `definition` or the `name` of another type defined previously; if it consists of 443 | multiple tokens, it must be parenthesised. `name` is unquoted and thus may 444 | contain any characters other than parens and whitespace. 445 | 446 | Each `type` must be _sizeable_, as is the resulting structure type. 447 | 448 | #### Enumerations 449 | 450 | `enum size (name value) [(name value) ...]` 451 | 452 | Defines an enumeration of `size` bytes, with defined values of the given names 453 | and values. `name` is unquoted and thus may contain any characters other than 454 | parens and whitespace. `value` is an immediate literal, and may be an 455 | [equate name](#equ). 456 | 457 | An enumeration type is _sizeable_. 458 | 459 | #### Forward declarations 460 | 461 | `...` 462 | 463 | Defines an incomplete type. If this is a named type, it may be overridden by a 464 | later redefinition of the same name; thus for instance a singly-linked list 465 | could be defined as: 466 | ``` 467 | list: ... 468 | list: struct ((* list) next) 469 | ``` 470 | 471 | Alternatively, the type may be left incomplete, in which case a `BTF_KIND_FWD` 472 | definition will be emitted. Such a type is not _sizeable_. 473 | 474 | #### Typedefs 475 | 476 | `typedef type` 477 | 478 | Defines a type identical to `type` but with a different name. `type` is either 479 | a `definition` or the `name` of another type defined previously. `type` does 480 | not need to be enclosed in parentheses, even if it consists of multiple tokens. 481 | 482 | A typedef is _sizeable_ if and only if its underlying `type` is. 483 | 484 | #### Qualifiers 485 | 486 | `qualifier type` 487 | 488 | Defines a type derived from `type` but qualified according to `qualifier`. 489 | `type` is either a `definition` or the `name` of another type defined 490 | previously. `type` does not need to be enclosed in parentheses, even if it 491 | consists of multiple tokens. `qualifier` is one of `const`, `volatile` or 492 | `restrict`. 493 | 494 | A qualified type is _sizeable_ if and only if its underlying `type` is. 495 | 496 | ## Output format 497 | 498 | The assembler generates ELF object files, suitable for passing to standard tools 499 | like iproute2's `ip link set dev ethX xdp obj object-file.o verb`. Currently 500 | only little-endian output (aka 'bpfel') is supported. 501 | If using the `bpftool` utility from the kernel's `tools/lib/bpf`, as in 502 | `bpftool prog load object-file.o /sys/fs/bpf/xdp/name type xdp`, note that you 503 | will need to assemble with `--no-pin-maps` (see [maps](#map-definitions)). 504 | 505 | ## Testing 506 | 507 | `ebpf_asm` has a suite of regression tests: run `./regression.py`. If all is 508 | well, there should be no output, and the return code will be zero. For verbose 509 | mode, use the switch `-v`. 510 | 511 | ## To Do 512 | 513 | Ideas for the future. 514 | 515 | * Test behaviour around trying to use labels as immediates/displacements. 516 | * Tests for map definitions. 517 | * "Loose mode" that allows bad things like registers `r11`-`r15`, a `raw` 518 | instruction that takes a 5-tuple, invalid sizes to various ops, etc.; in order 519 | to construct bad binaries to test the kernel's verifier. 520 | * Support `label: instruction`. 521 | * Support big-endian output ('bpfeb') and _maybe_ default to host endianness. 522 | * Constant expressions. Wherever a literal is expected, we should be able to 523 | have an expression instead. We can even use `(parentheses)` for grouping, 524 | since indirection uses `[brackets]`. 525 | -------------------------------------------------------------------------------- /call.s: -------------------------------------------------------------------------------- 1 | ; call.s 2 | ; XDP program to test intra-program CALL instruction. 3 | ; Copyright (c) 2018 Solarflare Communications Ltd 4 | 5 | .include defs.i 6 | .include net_hdrs.i 7 | 8 | .text 9 | .section prog 10 | call pass_fn 11 | exit 12 | 13 | pass_fn: 14 | ld r0.l, XDP_PASS 15 | exit 16 | 17 | .data 18 | .section license 19 | _license: 20 | asciz "Dual MIT/GPL" 21 | -------------------------------------------------------------------------------- /defs.i: -------------------------------------------------------------------------------- 1 | ; defs.i 2 | ; Platform constant definitions header 3 | ; 4 | ; Copyright (c) 2017 Solarflare Communications Ltd 5 | ; Provided under the MIT license; see top of ebpf_asm.py for details 6 | 7 | ; Map types 8 | .equ hash, 1 9 | .equ array, 2 10 | .equ prog_array, 3 11 | .equ perf_event_array, 4 12 | .equ percpu_hash, 5 13 | .equ percpu_array, 6 14 | .equ stack_trace, 7 15 | .equ cgroup_array, 8 16 | .equ lru_hash, 9 17 | .equ lru_percpu_hash, 10 18 | .equ lpm_trie, 11 19 | .equ array_of_maps, 12 20 | .equ hash_of_maps, 13 21 | .equ devmap, 14 22 | .equ sockmap, 15 23 | 24 | ; Helper function IDs 25 | .equ bpf_map_lookup_elem, 1 26 | .equ bpf_map_update_elem, 2 27 | .equ bpf_map_delete_elem, 3 28 | .equ bpf_probe_read, 4 29 | .equ bpf_ktime_get_ns, 5 30 | .equ bpf_trace_printk, 6 31 | .equ bpf_get_prandom_u32, 7 32 | .equ bpf_get_smp_processor_id, 8 33 | .equ bpf_skb_store_bytes, 9 34 | .equ bpf_l3_csum_replace, 10 35 | .equ bpf_l4_csum_replace, 11 36 | .equ bpf_tail_call, 12 37 | .equ bpf_clone_redirect, 13 38 | .equ bpf_get_current_pid_tgid, 14 39 | .equ bpf_get_current_uid_gid, 15 40 | .equ bpf_get_current_comm, 16 41 | .equ bpf_get_cgroup_classid, 17 42 | .equ bpf_skb_vlan_push, 18 43 | .equ bpf_skb_vlan_pop, 19 44 | .equ bpf_skb_get_tunnel_key, 20 45 | .equ bpf_skb_set_tunnel_key, 21 46 | .equ bpf_perf_event_read, 22 47 | .equ bpf_redirect, 23 48 | .equ bpf_get_route_realm, 24 49 | .equ bpf_perf_event_output, 25 50 | .equ bpf_skb_load_bytes, 26 51 | .equ bpf_get_stackid, 27 52 | .equ bpf_csum_diff, 28 53 | .equ bpf_skb_get_tunnel_opt, 29 54 | .equ bpf_skb_set_tunnel_opt, 30 55 | .equ bpf_skb_change_proto, 31 56 | .equ bpf_skb_change_type, 32 57 | .equ bpf_skb_under_cgroup, 33 58 | .equ bpf_get_hash_recalc, 34 59 | .equ bpf_get_current_task, 35 60 | .equ bpf_probe_write_user, 36 61 | .equ bpf_current_task_under_cgroup, 37 62 | .equ bpf_skb_change_tail, 38 63 | .equ bpf_skb_pull_data, 39 64 | .equ bpf_csum_update, 40 65 | .equ bpf_set_hash_invalid, 41 66 | .equ bpf_get_numa_node_id, 42 67 | .equ bpf_skb_change_head, 43 68 | .equ bpf_xdp_adjust_head, 44 69 | .equ bpf_probe_read_str, 45 70 | .equ bpf_get_socket_cookie, 46 71 | .equ bpf_get_socket_uid, 47 72 | .equ bpf_set_hash, 48 73 | .equ bpf_setsockopt, 49 74 | .equ bpf_skb_adjust_room, 50 75 | .equ bpf_redirect_map, 51 76 | .equ bpf_sk_redirect_map, 52 77 | .equ bpf_sock_map_update, 53 78 | 79 | ; XDP return codes 80 | .equ XDP_ABORTED, 0 81 | .equ XDP_DROP, 1 82 | .equ XDP_PASS, 2 83 | .equ XDP_TX, 3 84 | .equ XDP_REDIRECT, 4 85 | 86 | ; struct xdp_md 87 | .equ XDP_MD_DATA, 0 88 | .equ XDP_MD_DATA_END, 4 89 | -------------------------------------------------------------------------------- /dropper.s: -------------------------------------------------------------------------------- 1 | ; dropper.s 2 | ; simple IP-based XDP drop program 3 | ; Copyright (c) 2017 Solarflare Communications Ltd 4 | 5 | .include defs.i 6 | .include net_hdrs.i 7 | 8 | .text 9 | .section prog 10 | ld r0.l, XDP_PASS ; On errors we return pass 11 | ld r2.l, [r1+XDP_MD_DATA] 12 | ld r3.l, [r1+XDP_MD_DATA_END] 13 | ld r1, r2 14 | add r1, ETHER_HDR__LEN 15 | jr ge, r3, r1, +1 ; Do we have the entire ether-hdr? 16 | exit 17 | ld r4.w, [r2+ETHER_HDR_PROTO] 18 | end be, r4.w 19 | jr z, r4, ETHERTYPE_IPV4, +1 ; Is it IPv4? 20 | exit 21 | ld r2, r1 ; done with the ether-hdr 22 | add r1, IP_HDR__LEN 23 | jr ge, r3, r1, +1 ; Do we have the entire IP hdr? 24 | exit 25 | add r2, IP_HDR_SADDR 26 | ld r1, dropcnt 27 | call bpf_map_lookup_elem 28 | jr nz, r0, 0, drop 29 | ; Not in the map, pass it 30 | ld r0.l, XDP_PASS 31 | exit 32 | drop: 33 | ; Increment the counter 34 | ld r1.l, [r0] 35 | add r1.l, 1 36 | ld [r0], r1.l 37 | ; return drop verdict 38 | ld r0.l, XDP_DROP 39 | exit 40 | 41 | .section .BTF 42 | u32: int unsigned 32 ; int encoding nbits 43 | __be32: typedef u32 ; No support yet for endianness / __bitwise in BTF :-( 44 | ____btf_map_dropcnt: struct (__be32 key) (u32 value) ; define map type 45 | ; A bunch more types, just to test BTF support 46 | __le32: typedef typedef int signed 32 ; gratuitous example of anonymous types 47 | bool: int (bool) 8 48 | char: int (char) 8 ; can't use (signed char) as kernel rejects combination 49 | ppi: * (* int () 32) ; pointer-to-pointer-to-int 50 | name: array (char) 4 ; the brackets are unnecessary but permitted 51 | names: struct ((name) first) (name last) ; mumble sizes 52 | ipv4: union (__be32 addr) ((array char 4) octets) 53 | xdprc: enum 4 (XDP_DROP XDP_DROP) (XDP_PASS XDP_PASS) (XDP_ABORTED 0) ; size (name value) 54 | crpvi: const restrict (* volatile (u32)); const restrict pointer to volatile int 55 | list: ... ; forward-declaration 56 | list: struct ((* list) next) 57 | forward: ... ; uncompleted fwd-declaration 58 | memptr: * void ; pointer to void 59 | be32_to_cpu: func u32 (__be32) ; u32 be32_to_cpu(__be32 x) 60 | apply: func u32 (* proto u32 (u32)) u32 ; u32 apply(u32 (*)(u32), u32) 61 | ops: struct ((* proto u32 (* char)) strlen) \ 62 | ((* proto bool (* void) (* void) u32) memcmp) 63 | 64 | .section maps 65 | ; __be32 ip.src => u32 counter 66 | dropcnt: percpu_hash, 4, 4, 256, P 67 | .data 68 | .section license 69 | _license: 70 | asciz "Dual MIT/GPL" 71 | -------------------------------------------------------------------------------- /ebpf_asm.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | # Copyright (c) 2017-2018 Solarflare Communications Ltd 3 | # Permission is hereby granted, free of charge, to any person obtaining a copy 4 | # of this software and associated documentation files (the "Software"), to deal 5 | # in the Software without restriction, including without limitation the rights 6 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | # copies of the Software, and to permit persons to whom the Software is 8 | # furnished to do so, subject to the following conditions: 9 | # 10 | # The above copyright notice and this permission notice shall be included in all 11 | # copies or substantial portions of the Software. 12 | # 13 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | # SOFTWARE. 20 | 21 | 22 | from __future__ import division 23 | from builtins import map 24 | from builtins import range 25 | from builtins import object 26 | from builtins import bytes 27 | import re 28 | import struct 29 | import ast 30 | import optparse 31 | import math 32 | import sys 33 | 34 | import paren 35 | 36 | VERSION = None 37 | 38 | """Input language for .section prog: 39 | 40 | Based on Intel syntax: 41 | * operands go dest, source 42 | * memory dereferences in [square brackets] 43 | 44 | We have to cover the following insn classes: 45 | * BPF_LD 0x00 46 | * BPF_LDX 0x01 47 | * BPF_ST 0x02 48 | * BPF_STX 0x03 49 | * BPF_ALU 0x04 50 | * BPF_JMP 0x05 51 | * BPF_ALU64 0x07 52 | Let's first consider LD[X]/ST[X]. These have a two-bit 'size' field: 53 | * BPF_W 0x00 l 54 | * BPF_H 0x08 w 55 | * BPF_B 0x10 b 56 | * BPF_DW 0x18 q 57 | and a three-bit 'mode' field: 58 | * BPF_IMM 0x00 59 | * BPF_ABS 0x20 60 | * BPF_IND 0x40 61 | * BPF_MEM 0x60 62 | * BPF_XADD 0xc0 63 | For BPF_IMM we can write 64 | ld reg, imm ; implicitly 64-bit 65 | ld reg.q, imm ; can be explicit instead 66 | For BPF_MEM we can write 67 | ld reg.sz, [reg+disp] ; LDX. Omitting .sz always implies 'q', i.e. 64-bit 68 | ld [reg+disp], reg.sz ; STX 69 | ld [reg+disp], imm.sz ; ST 70 | For BPF_ABS, we can write 71 | ldpkt r0.sz, [disp] 72 | For BPF_IND, we can write 73 | ldpkt r0.sz, [reg+disp] 74 | For BPF_XADD we can write 75 | xadd [reg+disp], reg.sz ; sz must be 'q' or 'l' 76 | Next let's consider ALU[64]. This has a one-bit 'source' (K|X) field, and a 77 | four-bit 'operation code' field: 78 | * BPF_ADD 0x00 79 | * BPF_SUB 0x10 80 | * BPF_MUL 0x20 81 | * BPF_DIV 0x30 82 | * BPF_OR 0x40 83 | * BPF_AND 0x50 84 | * BPF_LSH 0x60 85 | * BPF_RSH 0x70 86 | * BPF_NEG 0x80 87 | * BPF_MOD 0x90 88 | * BPF_XOR 0xa0 89 | * BPF_MOV 0xb0 90 | * BPF_ARSH 0xc0 91 | * BPF_END 0xd0 92 | For BPF_MOV we can write: 93 | ld reg, reg.sz ; sz must be 'q' or 'l' 94 | ld reg.sz, reg ; alternate syntax for the above 95 | ld reg.l, imm ; only 32-bit (for 64-bit see LD|IMM) 96 | For BPF_NEG we can write: 97 | neg reg.sz ; sz must be 'q' or 'l' 98 | For BPF_END we can write: 99 | end be, reg.sz ; sz must be 'q', 'l', or 'w'; 'q' assumed 100 | end le, reg.sz ; cpu to/from LE 101 | For the others, we use the lower-case name of the op and write e.g. 102 | add reg, reg.sz ; sz must be 'q' or 'l' 103 | add reg.sz, reg ; alternate syntax for the above 104 | add reg.l, imm ; '.l' may be omitted 105 | Finally the BPF_JMP class: 106 | * BPF_JA 0x00 107 | * BPF_JEQ 0x10 108 | * BPF_JGT 0x20 109 | * BPF_JGE 0x30 110 | * BPF_JSET 0x40 111 | * BPF_JNE 0x50 112 | * BPF_JSGT 0x60 113 | * BPF_JSGE 0x70 114 | * BPF_CALL 0x80 115 | * BPF_EXIT 0x90 116 | * BPF_JLT 0xa0 117 | * BPF_JLE 0xb0 118 | * BPF_JSLT 0xc0 119 | * BPF_JSLE 0xd0 120 | For BPF_CALL, we can write: 121 | call function ; calls helper function, r0=function(r1, r2, r3, r4, r5) 122 | For BPF_EXIT, we can write: 123 | exit ; return r0 124 | For BPF_JA, we can write: 125 | jr offset_or_label 126 | For the others, we use the lower-case name of the op and write e.g. 127 | jr z, reg, reg, offset_or_label ; if (reg1.q == reg2.q) jr offset_or_label 128 | jr z, reg, imm, offset_or_label ; if (reg.q == imm.l) jr offset_or_label 129 | 130 | Register names: r0-r10, fp (== r10) 131 | """ 132 | 133 | class BaseAssembler(object): 134 | def __init__(self, equates): 135 | self.equates = equates 136 | self.globls = set() 137 | _size_re = re.compile(r'\.([bwlq])$') 138 | _octal_re = re.compile(r'0\d+$') 139 | _decimal_re = re.compile(r'\d+$') 140 | _hex_re = re.compile(r'0x[0-9a-fA-F]+$') 141 | def parse_immediate(self, imm): 142 | d = {} 143 | neg = False 144 | if imm.startswith('-'): 145 | neg = True 146 | imm = imm[1:] 147 | if self._octal_re.match(imm): 148 | d['imm'] = int(imm, 8) 149 | elif self._decimal_re.match(imm): 150 | d['imm'] = int(imm) 151 | elif self._hex_re.match(imm): 152 | d['imm'] = int(imm, 16) 153 | elif imm in self.equates: 154 | d['imm'] = self.equates[imm] 155 | else: 156 | raise Exception("Bad immediate", imm) 157 | if neg: 158 | d['imm'] = -d['imm'] 159 | return d 160 | _op_args_re = re.compile('(\S+)\s+(\S.*)$') 161 | def parse_op_args(self, line): 162 | m = self._op_args_re.match(line) 163 | if not m: 164 | return (line, '') 165 | return m.groups() 166 | _label_re = re.compile('\s*(?=\D)(\w+):$') 167 | 168 | class ProgAssembler(BaseAssembler): 169 | class PseudoCall(str): pass 170 | elf_flags = 'AX' 171 | def __init__(self, equates): 172 | super(ProgAssembler, self).__init__(equates) 173 | self.symbols = {} 174 | self.symrefs = {} 175 | self.relocs = {} 176 | self.section = [] 177 | @property 178 | def current_index(self): 179 | return len(self.section) 180 | def append_insn(self, r): 181 | f,so,si = r 182 | if so is not None: 183 | self.symrefs[self.current_index] = so 184 | if si is not None: 185 | self.relocs[self.current_index] = si 186 | self.section.append(f) 187 | def feed_line(self, line): 188 | d = self.parse_line(line.strip()) 189 | if 'label' in d: 190 | self.symbols[d['label']] = self.current_index 191 | return 192 | e = self.generate_insn(d) 193 | r = self.assemble_insn(e) # little-endian for now 194 | if isinstance(r, list): # BPF_LD_IMM64 is two insns 195 | self.append_insn(r[0]) 196 | self.append_insn(r[1]) 197 | else: 198 | self.append_insn(r) 199 | def resolve_symbols(self): 200 | for index in self.symrefs: 201 | symbol = self.symrefs[index] 202 | if symbol not in self.symbols: 203 | raise Exception("Undefined symbol", symbol) 204 | offset = self.symbols[symbol] - index 205 | op, regs, off, imm = struct.unpack('': 'jgt', 370 | 'ge': 'jge', '>=': 'jge', 371 | 'set': 'jset', '&': 'jset', 'and': 'jset', 372 | 'ne': 'jne', '!=': 'jne', 'nz': 'jne', # difference Not Zero 373 | 'sgt': 'jsgt', 's>': 'jsgt', 374 | 'sge': 'jsge', 's>=': 'jsge', 'p': 'jsge', # difference Positive 375 | 'lt': 'jlt', '<': 'jlt', 376 | 'le': 'jle', '<=': 'jle', 377 | 'slt': 'jslt', 's<': 'jslt', 'n': 'jslt', # difference Negative 378 | 'sle': 'jsle', 's<=': 'jsle',} 379 | 380 | def parse_jrcc(self, op, args): 381 | cc = args[0] 382 | if cc not in self.jr_conds: 383 | raise Exception("Bad jump op", cc) 384 | dst, src = list(map(self.parse_direct_operand, args[1:3])) 385 | if 'size' in dst: # compares always use full quad size 386 | raise Exception("Bad size in jump dst", args[1]) 387 | if 'size' in src: # compares always use full quad size 388 | raise Exception("Bad size in jump src", args[2]) 389 | off = self.parse_offset(args[3]) 390 | return {'op': 'jr', 'cc': self.jr_conds[cc], 'dst': dst, 'src': src, 391 | 'off': off} 392 | 393 | def parse_jmp(self, op, args): 394 | """jr forms: 395 | 396 | jr label_or_offset 397 | jr cc, dst, src, label_or_offset 398 | """ 399 | if len(args) == 1: 400 | return self.parse_ja(op, args) 401 | if len(args) == 4: 402 | return self.parse_jrcc(op, args) 403 | raise Exception("Bad jr, expected 1 or 4 args, got", args) 404 | 405 | def parse_call(self, op, args): 406 | """call forms: 407 | 408 | call function ; implicit args 409 | call label_or_offset ; implicit args""" 410 | if len(args) != 1: 411 | raise Exception("Bad call, expected 1 arg, got", args) 412 | try: 413 | imm = self.parse_immediate(args[0])['imm'] 414 | return {'op': 'call', 'function': imm} 415 | except: 416 | try: 417 | off = self.parse_offset(args[0]) 418 | return {'op': 'call', 'off': off} 419 | except: 420 | raise Exception("Bad call, expected function identifier, label or offset, but got", args[0]) 421 | 422 | def parse_exit(self, op, args): 423 | if args: 424 | raise Exception("Bad exit, expected no args, got", args) 425 | return {'op': 'exit'} 426 | 427 | def parse_xadd(self, op, args): 428 | if len(args) != 2: 429 | raise Exception("Bad xadd, expected 2 args, got", args) 430 | dst = self.parse_operand(args[0]) 431 | src = self.parse_direct_operand(args[1]) 432 | return {'op': 'xadd', 'dst': dst, 'src': src} 433 | 434 | op_parsers = {'ld': parse_ld, 'add': parse_alu, 435 | 'sub': parse_alu, 'mul': parse_alu, 436 | 'div': parse_alu, 'or': parse_alu, 437 | 'and': parse_alu, 'lsh': parse_alu, 438 | 'rsh': parse_alu, 'neg': parse_neg, 439 | 'mod': parse_alu, 'xor': parse_alu, 440 | 'arsh': parse_alu, 'end': parse_end, 441 | 'jr': parse_jmp, 'call': parse_call, 442 | 'exit': parse_exit, 'xadd': parse_xadd, 443 | 'ldpkt': parse_ldpkt,} 444 | 445 | def parse_line(self, line): 446 | m = self._label_re.match(line) 447 | if m: 448 | return {'label': m.group(1)} 449 | op, args = self.parse_op_args(line) 450 | if op not in self.op_parsers: 451 | raise Exception("Unrecognised instruction", line) 452 | args = list(map(str.strip, args.split(','))) 453 | if args == ['']: 454 | args = [] 455 | d = self.op_parsers[op](self, op, args) 456 | d['line'] = line # save source line for error messages 457 | return d 458 | 459 | def generate_ld(self, insn): 460 | """ld forms: 461 | 462 | ld reg.q, imm ; BPF_LD|BPF_IMM 463 | ld reg.q, reg ; BPF_MOV64, src=X 464 | ld reg.l, imm ; BPF_MOV, src=K 465 | ld reg.l, reg ; BPF_MOV, src=X 466 | ld [reg+disp], src ; BPF_ST[X]_MEM 467 | ld reg.sz, [reg+disp] ; BPF_LDX_MEM 468 | (note: LD IND/ABS are 'ldpkt' insn) 469 | """ 470 | src = insn['src'] 471 | dst = insn['dst'] 472 | size = src.get('size', dst.get('size')) 473 | if 'size' in src and 'size' in dst: 474 | # Normally we don't specify both. But if we do, they must match 475 | if size != dst['size']: 476 | raise Exception("Mismatched sizes", insn['line']) 477 | if dst.get('imm') is not None: 478 | raise Exception("ld imm,... illegal", insn['line']) 479 | if dst.get('ind'): # BPF_ST[X]_MEM 480 | assert dst.get('reg') is not None, dst 481 | if src.get('ind'): 482 | raise Exception("ld mem,mem illegal", insn['line']) 483 | if src.get('off') is not None: 484 | raise Exception("ld mem,reg+disp illegal", insn['line']) 485 | if src.get('reg') is not None: # BPF_STX_MEM (size, dst, src, off) 486 | return {'class': 'stx', 'mode': 'mem', 'size': size or 'q', 487 | 'dst': dst['reg'], 'src': src['reg'], 488 | 'off': dst.get('off', 0)} 489 | # BPF_ST_MEM (size, dst, off) 490 | return {'class': 'st', 'mode': 'mem', 'size': size or 'q', 491 | 'dst': dst['reg'], 'off': dst.get('off', 0), 'imm': src['imm']} 492 | if dst.get('off') is not None: 493 | raise Exception("ld reg+disp,... illegal (missing []?)", insn['line']) 494 | if src.get('ind'): # BPF_LDX_MEM 495 | assert dst.get('reg') is not None, dst 496 | if src.get('reg') is None: 497 | raise Exception("ld ...,[imm] illegal", insn['line']) 498 | # BPF_LDX_MEM (size, dst, src, off) 499 | return {'class': 'ldx', 'mode': 'mem', 'size': size or 'q', 500 | 'dst': dst['reg'], 'src': src['reg'], 'off': src.get('off', 0)} 501 | if src.get('off') is not None: 502 | raise Exception("ld ...,reg+disp illegal (missing []?)", insn['line']) 503 | assert dst.get('reg') is not None, dst 504 | if size is None: 505 | size = 'q' 506 | if size not in ['q', 'l']: 507 | raise Exception("Bad size", size, "for register load", insn['line']) 508 | if src.get('reg') is not None: # ld reg, reg 509 | if size == 'q': # BPF_MOV64 510 | return {'class': 'alu64', 'op': 'mov', 511 | 'src': src['reg'], 'dst': dst['reg']} 512 | # BPF_MOV 513 | return {'class': 'alu', 'op': 'mov', 514 | 'src': src['reg'], 'dst': dst['reg']} 515 | # ld reg, imm 516 | if size == 'q': # BPF_LD_IMM64 517 | return {'class': 'ld', 'mode': 'imm', 'size': 'q', 'dst': dst['reg'], 518 | 'imm': src['imm']} 519 | # BPF_MOV 520 | return {'class': 'alu', 'op': 'mov', 'dst': dst['reg'], 'imm': src['imm']} 521 | 522 | def generate_alu(self, insn): 523 | """alu forms: 524 | 525 | alu dst, src ; BPF_ALU[64], BPF_X 526 | alu dst, imm ; BPF_ALU[64], BPF_K 527 | """ 528 | op = insn['op'] 529 | src = insn['src'] 530 | dst = insn['dst'] 531 | size = src.get('size', dst.get('size')) 532 | if 'size' in src and 'size' in dst: 533 | # Normally we don't specify both. But if we do, they must match 534 | if size != dst['size']: 535 | raise Exception("Mismatched sizes", insn['line']) 536 | if size in ['q', None]: 537 | klass = 'alu64' 538 | elif size == 'l': 539 | klass = 'alu' 540 | else: 541 | raise Exception("Bad size", size, "for ALU op", insn['line']) 542 | if dst.get('imm') is not None: 543 | raise Exception(op+" imm,... illegal", insn['line']) 544 | if src.get('reg') is not None: # BPF_X 545 | return {'class': klass, 'op': op, 'dst': dst['reg'], 'src': src['reg']} 546 | # BPF_K 547 | return {'class': klass, 'op': op, 'dst': dst['reg'], 'imm': src['imm']} 548 | 549 | def generate_neg(self, insn): 550 | dst = insn['dst'] 551 | size = dst.get('size') 552 | if size in ['q', None]: 553 | klass = 'alu64' 554 | elif size == 'l': 555 | klass = 'alu' 556 | else: 557 | raise Exception("Bad size", size, "for ALU op", insn['line']) 558 | if dst.get('imm') is not None: 559 | raise Exception("neg imm illegal", insn['line']) 560 | return {'class': klass, 'op': 'neg', 'dst': dst['reg']} 561 | 562 | def generate_end(self, insn): 563 | dst = insn['dst'] 564 | size = dst.get('size', 'q') 565 | imm = {'q': 64, 'l': 32, 'w': 16}.get(size) 566 | if imm is None: 567 | raise Exception("Bad size", size, "for endian op", insn['line']) 568 | if dst.get('imm') is not None: 569 | raise Exception("end ..., imm illegal", insn['line']) 570 | dr = insn['dir'] 571 | # All BPF_END (even 64-bit) use (32-bit) BPF_ALU class 572 | if dr == 'le': # BPF_TO_LE == BPF_K 573 | return {'class': 'alu', 'op': 'end', 'dst': dst['reg'], 574 | 'imm': imm} 575 | if dr == 'be': # BPF_TO_BE == BPF_X, so use 'fake' src reg 576 | return {'class': 'alu', 'op': 'end', 'dst': dst['reg'], 'src': 0, 577 | 'imm': imm} 578 | # can't happen, already checked in parse_end 579 | raise Exception("Bad direction", dr, "for endian op", insn['line']) 580 | 581 | def generate_jr(self, insn): 582 | off = insn['off'] 583 | cc = insn.get('cc') 584 | if cc is not None: 585 | dst = insn['dst'] 586 | src = insn['src'] 587 | if dst.get('reg') is None: 588 | raise Exception("jr cc,imm,... illegal", insn['line']) 589 | dst = dst['reg'] 590 | if src.get('reg') is None: # JMP_IMM 591 | return {'class': 'jmp', 'op': cc, 'dst': dst, 'imm': src['imm'], 592 | 'off': off} 593 | # JMP_REG 594 | src = src['reg'] 595 | return {'class': 'jmp', 'op': cc, 'dst': dst, 'src': src, 'off': off} 596 | return {'class': 'jmp', 'op': 'ja', 'off': off} 597 | 598 | def generate_call(self, insn): 599 | if 'off' in insn: 600 | off = insn['off'] 601 | # BPF_PSEUDO_CALL = 1 602 | return {'class': 'jmp', 'op': 'call', 'imm': off, 'src': 1} 603 | func = insn['function'] 604 | return {'class': 'jmp', 'op': 'call', 'imm': func} 605 | 606 | def generate_exit(self, insn): 607 | return {'class': 'jmp', 'op': 'exit'} 608 | 609 | def generate_xadd(self, insn): 610 | dst = insn['dst'] 611 | src = insn['src'] 612 | size = src.get('size', dst.get('size')) 613 | if 'size' in src and 'size' in dst: 614 | # Normally we don't specify both. But if we do, they must match 615 | if size != dst['size']: 616 | raise Exception("Mismatched sizes", insn['line']) 617 | if not dst.get('ind'): 618 | raise Exception("xadd direct_operand,... illegal", insn['line']) 619 | if dst.get('reg') is None: 620 | raise Exception("xadd [imm],... illegal", insn['line']) 621 | if src.get('reg') is None: 622 | raise Exception("xadd ...,imm illegal", insn['line']) 623 | if size is None: 624 | size = 'q' 625 | if size not in ['q', 'l']: 626 | raise Exception("Bad size", size, "for xadd", insn['line']) 627 | return {'class': 'stx', 'mode': 'xadd', 'size': size, 628 | 'dst': dst['reg'], 'src': src['reg'], 'off': dst.get('off', 0)} 629 | 630 | def generate_ldpkt(self, insn): 631 | dst = insn['dst'] 632 | src = insn['src'] 633 | size = src.get('size', dst.get('size')) 634 | if 'size' in src and 'size' in dst: 635 | # Normally we don't specify both. But if we do, they must match 636 | if size != dst['size']: 637 | raise Exception("Mismatched sizes", insn['line']) 638 | if size is None: 639 | size = 'l' 640 | if size == 'q': 641 | raise Exception("ldpkt .q illegal", insn['line']) 642 | if dst['reg'] != 0: 643 | raise Exception("ldpkt dst must be r0, not r%(reg)d" % dst) 644 | if src.get('reg') is None: 645 | # LD_ABS 646 | return {'class': 'ld', 'mode': 'abs', 'size': size, 647 | 'imm': src['imm'], 'dst': 0, 'src': 0, 'off': 0} 648 | # LD_IND 649 | return {'class': 'ld', 'mode': 'ind', 'size': size, 'src': src['reg'], 650 | 'imm': src.get('off', 0), 'dst': 0, 'off': 0} 651 | 652 | op_generators = {'ld': generate_ld, 'add': generate_alu, 653 | 'sub': generate_alu, 'mul': generate_alu, 654 | 'div': generate_alu, 'or': generate_alu, 655 | 'and': generate_alu, 'lsh': generate_alu, 656 | 'rsh': generate_alu, 'neg': generate_neg, 657 | 'mod': generate_alu, 'xor': generate_alu, 658 | 'arsh': generate_alu, 'end': generate_end, 659 | 'jr': generate_jr, 'call': generate_call, 660 | 'exit': generate_exit, 'xadd': generate_xadd, 661 | 'ldpkt': generate_ldpkt,} 662 | 663 | def generate_insn(self, insn): 664 | if insn['op'] not in self.op_generators: 665 | raise Exception("Unhandled op", insn) 666 | d = self.op_generators[insn['op']](self, insn) 667 | d['line'] = insn['line'] # saved source line for error messages 668 | return d 669 | 670 | # Output format: op:8, dst_reg:4, src_reg:4, off:16, imm:32 671 | classes = {'ld': 0, 'ldx': 1, 'st': 2, 'stx': 3, 'alu': 4, 'jmp': 5, 'alu64': 7} 672 | ld_modes = {'imm': 0x00, 'abs': 0x20, 'ind': 0x40, 'mem': 0x60, 'xadd': 0xc0} 673 | alu_ops = {'add': 0x00, 'sub': 0x10, 'mul': 0x20, 'div': 0x30, 'or': 0x40, 674 | 'and': 0x50, 'lsh': 0x60, 'rsh': 0x70, 'neg': 0x80, 'mod': 0x90, 675 | 'xor': 0xa0, 'mov': 0xb0, 'arsh': 0xc0, 'end': 0xd0} 676 | jmp_ops = {'ja': 0x00, 'jeq': 0x10, 'jgt': 0x20, 'jge': 0x30, 'jset': 0x40, 677 | 'jne': 0x50, 'jsgt': 0x60, 'jsge': 0x70, 'call': 0x80, 'exit': 0x90, 678 | 'jlt': 0xa0, 'jle': 0xb0, 'jslt': 0xc0, 'jsle': 0xd0} 679 | sizes = {'l': 0x00, 'w': 0x08, 'b': 0x10, 'q': 0x18} 680 | BPF_K = 0 681 | BPF_X = 8 682 | 683 | def check_s16(self, imm): 684 | if not isinstance(imm, int) or imm > 0x7fff or imm < -0x8000: 685 | raise Exception("Value out of range for s16", imm) 686 | return imm 687 | 688 | def check_s32(self, imm): 689 | if not isinstance(imm, int) or imm > 0x7fffffff or imm < -0x80000000: 690 | raise Exception("Value out of range for s32", imm) 691 | return imm 692 | 693 | def check_u64(self, imm): 694 | if not isinstance(imm, int) or imm >= (1 << 64) or imm < 0: 695 | raise Exception("Value out of range for u64", imm) 696 | return imm 697 | 698 | def assemble_ld(self, insn): 699 | # LD_IMM64: class, dst, src, imm + second insn 700 | # LD_ABS: class, mode, size, imm32 701 | # LD_IND: class, mode, size, src, imm32 702 | op = self.classes[insn['class']] | self.ld_modes[insn['mode']] | self.sizes[insn['size']] 703 | if insn['mode'] == 'imm': 704 | regs = insn['dst'] 705 | if isinstance(insn['imm'], str): 706 | return [(op, regs, 0, insn['imm']), (0, 0, 0, 0)] 707 | lo, hi = struct.unpack('> 24) 1006 | #define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16) 1007 | #define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff) 1008 | # no support for offset field 1009 | intdata = (self.encoding << 24) | (self.nbits & 0xff) 1010 | return hdr + struct.pack(' %d+%d:' % (group[0][1], 34 | group[-1][2] - group[0][1], 35 | group[0][3], 36 | group[-1][4] - group[0][3])) 37 | for o, ai, aj, bi, bj in group: 38 | if o == 'equal': 39 | for a in range(ai, aj): 40 | delta.append(' ' + str(self.out[a])) 41 | else: 42 | for a in range(ai, aj): 43 | delta.append('- ' + str(self.out[a])) 44 | for b in range(bi, bj): 45 | delta.append('+ ' + str(self.prog_dis[b])) 46 | raise TestFailure("%s failed:\n%s" % (self.name, '\n'.join(delta))) 47 | 48 | class RejectTestMixin(object): 49 | """Test an invalid program snippet, check that correct error is thrown""" 50 | def __init__(self, name, src, err): 51 | self.name = name 52 | self.src = src 53 | self.err = err 54 | def __str__(self): 55 | return 'REJ ' + self.name 56 | def run(self): 57 | try: 58 | self.assemble() 59 | except Exception as e: 60 | msg = ' '.join(map(str, e.args)) 61 | if not msg.startswith(self.err): 62 | raise TestFailure("%s failed:\n expected: %s\n got: %s" % 63 | (self.name, self.err, msg)) 64 | else: 65 | raise TestFailure("%s failed\n expected: %s\n but prog was accepted" % 66 | (self.name, self.err)) 67 | 68 | class BaseAsmTest(object): 69 | @property 70 | def prog_header(self): 71 | return """ 72 | .include defs.i 73 | 74 | .text 75 | .section prog 76 | """ 77 | @property 78 | def prog(self): 79 | return self.prog_header + self.src 80 | def assemble(self): 81 | self.asm = asm.Assembler() 82 | for line in self.prog.splitlines(): 83 | self.asm.feed_line(line) 84 | self.asm.resolve_symbols() 85 | self.prog_bin = self.asm.sections['prog'].binary 86 | def dump(self): 87 | self.prog_dis = [] 88 | for i in range(0, len(self.prog_bin), 8): 89 | op, regs, off, imm = struct.unpack('> 4 92 | self.prog_dis.append((op, dst, src, off, imm)) 93 | 94 | class AsmTest(BaseAsmTest, AcceptTestMixin): pass 95 | class BadAsmTest(BaseAsmTest, RejectTestMixin): pass 96 | 97 | class BaseDataTest(object): 98 | @property 99 | def prog_header(self): 100 | return """ 101 | .include defs.i 102 | 103 | .data 104 | .section data 105 | """ 106 | @property 107 | def prog(self): 108 | return self.prog_header + self.src 109 | def assemble(self): 110 | self.asm = asm.Assembler() 111 | for line in self.prog.splitlines(): 112 | self.asm.feed_line(line) 113 | self.asm.resolve_symbols() 114 | self.prog_bin = self.asm.sections['data'].binary 115 | def dump(self): 116 | # TODO teach this to return offsets & symbols 117 | self.prog_dis = self.prog_bin.split(b'\0') 118 | 119 | class DataTest(BaseDataTest, AcceptTestMixin): pass 120 | class BadDataTest(BaseDataTest, RejectTestMixin): pass 121 | 122 | AllTests = [ 123 | 124 | ## PROGRAM TEXT 125 | 126 | # Preprocessing 127 | 128 | AsmTest('Comments', """ 129 | ; Text1 130 | ; Text2 131 | """, []), 132 | AsmTest('Continuation in comment', """ 133 | ; Text1 \ 134 | Text2 135 | """, []), 136 | 137 | # Bogus format 138 | 139 | BadAsmTest('Nonexistent insn', 'frob r0', 'Unrecognised instruction frob'), 140 | BadAsmTest('Label and insn on same line', 'label: exit', 'Unrecognised instruction label:'), 141 | BadAsmTest('Comma before insn', ', exit', 'Unrecognised instruction ,'), 142 | BadAsmTest('Comma after insn', 'ld, r0', 'Unrecognised instruction ld,'), 143 | BadAsmTest('Bad character in label', 'a,:', 'Unrecognised instruction a,:'), 144 | BadAsmTest('Whitespace in label', 'a :', 'Unrecognised instruction a'), 145 | BadAsmTest('Numeric label', '1:', 'Unrecognised instruction 1:'), 146 | BadAsmTest('Invalid label', '-1:', 'Unrecognised instruction -1:'), 147 | BadAsmTest('Data in program section', 'asciz "foo"', 'Unrecognised instruction asciz'), 148 | 149 | # Register-to-register loads 150 | 151 | BadAsmTest('Invalid operand', 'ld r0, :', 'Bad direct operand :'), 152 | BadAsmTest('Empty operand', 'ld , 1', 'Bad direct operand '), 153 | BadAsmTest('Too few args to ld', 'ld r0', 'Bad ld, expected 2 args'), 154 | BadAsmTest('Too many args to ld', 'ld r0, r1, r2', 'Bad ld, expected 2 args'), 155 | 156 | AsmTest('ld reg, imm', """ 157 | ld r1, 2 158 | ld r2, 0x7fffffff.l 159 | ld r3.q, 0x7fffffff00000001 160 | """, [ 161 | (0x18, 1, 0, 0, 2), 162 | (0, 0, 0, 0, 0), 163 | (0xb4, 2, 0, 0, (1<<31) - 1), 164 | (0x18, 3, 0, 0, 1), 165 | (0, 0, 0, 0, (1<<31) - 1), 166 | ]), 167 | 168 | BadAsmTest('Size mismatch in ld reg, imm', 'ld r0.l, 1.q', 'Mismatched sizes'), 169 | BadAsmTest('Word-sized ld reg, imm', 'ld r0.w, 1', 'Bad size w for register load'), 170 | BadAsmTest('Byte-sized ld reg, imm', 'ld r0, 1.b', 'Bad size b for register load'), 171 | BadAsmTest('Offset where imm expected', 'ld r0, +1', 'Bad direct operand +1'), 172 | BadAsmTest('Immediate too big', 'ld r0.l, 0x80000000', 'Value out of range for s32'), 173 | BadAsmTest('Immediate too big', 'ld r0.l, -0x80000001', 'Value out of range for s32'), 174 | BadAsmTest('Immediate too big', 'ld r0, 0x10000000000000000', 'Value out of range for u64'), 175 | BadAsmTest('Negative imm64', 'ld r0, -1', 'Value out of range for u64'), 176 | BadAsmTest('ld imm, reg', 'ld 0, r0', 'ld imm,... illegal'), 177 | BadAsmTest('Non-existent register', 'ld r11, 0', 'Bad register r11'), 178 | BadAsmTest('Double size suffix on ld dst', 'ld r0.l.q, 1', 'Bad direct operand r0.l'), 179 | BadAsmTest('Double size suffix on ld src_imm', 'ld r0, 1.l.q', 'Bad direct operand 1.l'), 180 | 181 | AsmTest('ld reg, reg', """ 182 | ld r1, r2 183 | ld r3.l, r4 184 | ld r5, r6.l 185 | ld r7.l, r8.l 186 | ld r10.q, fp 187 | """, [ 188 | (0xbf, 1, 2, 0, 0), 189 | (0xbc, 3, 4, 0, 0), 190 | (0xbc, 5, 6, 0, 0), 191 | (0xbc, 7, 8, 0, 0), 192 | (0xbf, 10, 10, 0, 0), 193 | ]), 194 | 195 | BadAsmTest('Size mismatch in ld reg, reg', 'ld r0.l, r1.q', 'Mismatched sizes'), 196 | BadAsmTest('Word-sized ld reg, imm', 'ld r0.w, r1', 'Bad size w for register load'), 197 | BadAsmTest('Byte-sized ld reg, imm', 'ld r0, r1.b', 'Bad size b for register load'), 198 | BadAsmTest('Offset operand without indirection', 'ld r0, r1+1', 'Bad direct operand r1+1'), 199 | BadAsmTest('Double size suffix on ld src_reg', 'ld r0, r1.l.q', 'Bad direct operand r1.l'), 200 | 201 | # Register-to-memory loads 202 | 203 | BadAsmTest('Empty indirection', 'ld [], r0', 'Bad direct operand'), 204 | BadAsmTest('Missing ]', 'ld [r0, r1', 'Bad indirect operand'), 205 | BadAsmTest('Missing ] before size', 'ld [r0.l, r1', 'Bad indirect operand'), 206 | BadAsmTest('Size inside indirection', 'ld [r0.l], r1', 'Bad size in indirect operand'), 207 | BadAsmTest('Size inside indirection', 'ld [r0.q+0], r1', 'Bad size in offset operand'), 208 | 209 | AsmTest('ld [ptr], imm', """ 210 | ld [r1], 2 211 | ld [r1+0x7fff].l, 2 212 | ld [r0+1], 2.w 213 | ld [r0-0x8000].b, -2.b 214 | ld [r0], 0x7fffffff 215 | ld [r0], -0x80000000 216 | """, [ 217 | (0x7a, 1, 0, 0, 2), 218 | (0x62, 1, 0, 32767, 2), 219 | (0x6a, 0, 0, 1, 2), 220 | (0x72, 0, 0, -32768, -2), 221 | (0x7a, 0, 0, 0, (1<<31) - 1), 222 | (0x7a, 0, 0, 0, -(1<<31)), 223 | ]), 224 | 225 | BadAsmTest('Size mismatch in ld [ptr], imm', 'ld [r0].l, 1.q', 'Mismatched sizes'), 226 | BadAsmTest('Offset where imm expected', 'ld [r0], +1', 'Bad direct operand +1'), 227 | BadAsmTest('Immediate too big', 'ld [r0], 0x80000000', 'Value out of range for s32'), 228 | BadAsmTest('Offset too big', 'ld [r0+0x8000], 1', 'Value out of range for s16'), 229 | BadAsmTest('Offset too big', 'ld [r0-0x8001], 1', 'Value out of range for s16'), 230 | BadAsmTest('Size suffix on displacement', 'ld [r0+1.b], 1', 'Bad immediate 1.b'), 231 | BadAsmTest('Double size suffix on ld src_imm', 'ld [r0], 4.w.b', 'Bad direct operand 4.w'), 232 | 233 | AsmTest('ld [ptr], reg', """ 234 | ld [r1], r2 235 | ld [r1+2].l, r3 236 | ld [r1-2], r3.w 237 | ld [fp-1].b, r3.b 238 | """, [ 239 | (0x7b, 1, 2, 0, 0), 240 | (0x63, 1, 3, 2, 0), 241 | (0x6b, 1, 3, -2, 0), 242 | (0x73, 10, 3, -1, 0), 243 | ]), 244 | 245 | BadAsmTest('Size mismatch in ld [ptr], reg', 'ld [r0].l, r1.q', 'Mismatched sizes'), 246 | BadAsmTest('Offset operand without indirection', 'ld [r0], r1+1', 'Bad direct operand r1+1'), 247 | BadAsmTest('Double size suffix on ld src_reg', 'ld [r0], r1.w.b', 'Bad direct operand r1.w'), 248 | 249 | # Memory-to-register loads 250 | 251 | BadAsmTest('Empty indirection', 'ld r0, []', 'Bad direct operand'), 252 | BadAsmTest('Missing ]', 'ld r0, [r1', 'Bad indirect operand'), 253 | BadAsmTest('Missing ] before size', 'ld r0, [r1.l', 'Bad indirect operand'), 254 | BadAsmTest('Size inside indirection', 'ld r0, [r1.l]', 'Bad size in indirect operand'), 255 | BadAsmTest('Size inside indirection', 'ld r0, [r1.q+0]', 'Bad size in offset operand'), 256 | 257 | AsmTest('ld reg, [ptr]', """ 258 | ld r2, [r1] 259 | ld r3, [r1+2].l 260 | ld r3.w, [r1-2] 261 | ld r3.b, [fp-1].b 262 | """, [ 263 | (0x79, 2, 1, 0, 0), 264 | (0x61, 3, 1, 2, 0), 265 | (0x69, 3, 1, -2, 0), 266 | (0x71, 3, 10, -1, 0), 267 | ]), 268 | 269 | BadAsmTest('ld imm, reg', 'ld 0, [r0]', 'ld imm,... illegal'), 270 | BadAsmTest('Offset too big', 'ld r1, [r0+0x8000]', 'Value out of range for s16'), 271 | BadAsmTest('Offset too big', 'ld r1, [r0-0x8001]', 'Value out of range for s16'), 272 | BadAsmTest('Size mismatch in ld reg, [ptr]', 'ld r1.q, [r0].l', 'Mismatched sizes'), 273 | BadAsmTest('Offset operand without indirection', 'ld r1+1, [r0]', 'Bad direct operand r1+1'), 274 | BadAsmTest('Double size suffix on ld dst', 'ld r0.w.b, [r1]', 'Bad direct operand r0.w'), 275 | BadAsmTest('Size suffix on displacement', 'ld r0, [fp-1.b]', 'Bad immediate 1.b'), 276 | 277 | # ldpkt 278 | 279 | BadAsmTest('Too few args to ldpkt', 'ldpkt r0', 'Bad ldpkt, expected 2 args'), 280 | BadAsmTest('Too many args to ldpkt', 'ldpkt r0, [r1], +2', 'Bad ldpkt, expected 2 args'), 281 | BadAsmTest('ldpkt missing indirection', 'ldpkt r0, r1', 'Bad ldpkt, src must be indirect'), 282 | BadAsmTest('ldpkt missing indirection', 'ldpkt r0, r1+2', 'Bad direct operand r1+2'), 283 | 284 | # LD_ABS 285 | AsmTest('LD_ABS (ldpkt)', """ 286 | ldpkt r0, [1] 287 | ldpkt r0.w, [2] 288 | ldpkt r0, [-0x80000000].b 289 | ldpkt r0.l, [0x7fffffff].l 290 | """, [ 291 | (0x20, 0, 0, 0, 1), 292 | (0x28, 0, 0, 0, 2), 293 | (0x30, 0, 0, 0, -(1 << 31)), 294 | (0x20, 0, 0, 0, (1 << 31) - 1), 295 | ]), 296 | 297 | BadAsmTest('ldpkt bad dst_reg', 'ldpkt r1, [0]', 'ldpkt dst must be r0, not r1'), 298 | BadAsmTest('Extraneous + before disp', 'ldpkt r0, [+2]', 'Bad direct operand'), 299 | BadAsmTest('Displacement too big', 'ldpkt r0, [0x80000000]', 'Value out of range for s32'), 300 | BadAsmTest('Displacement too big', 'ldpkt r0, [-0x80000001]', 'Value out of range for s32'), 301 | BadAsmTest('64-bit ldpkt', 'ldpkt r0.q, [0]', 'ldpkt .q illegal'), 302 | BadAsmTest('Size mismatch in LD_ABS', 'ldpkt r0.q, [0].l', 'Mismatched sizes'), 303 | BadAsmTest('Size inside LD_ABS indirection', 'ldpkt r0, [0.l]', 'Bad size in indirect operand'), 304 | BadAsmTest('Size suffix on displacement', 'ldpkt r0, [-1.b]', 'Bad size in indirect operand'), 305 | 306 | # LD_IND 307 | AsmTest('LD_IND (ldpkt)', """ 308 | ldpkt r0, [r1] 309 | ldpkt r0.w, [r1+-2] ; that +- isn't pretty but we allow it 310 | ldpkt r0, [r2-0x80000000].b 311 | ldpkt r0.l, [r2+0x7fffffff].l 312 | """, [ 313 | (0x40, 0, 1, 0, 0), 314 | (0x48, 0, 1, 0, -2), 315 | (0x50, 0, 2, 0, -(1 << 31)), 316 | (0x40, 0, 2, 0, (1 << 31) - 1), 317 | ]), 318 | 319 | BadAsmTest('ldpkt bad dst_reg', 'ldpkt r1, [r0]', 'ldpkt dst must be r0, not r1'), 320 | BadAsmTest('Extraneous + before disp', 'ldpkt r0, [r1++2]', 'Bad immediate +2'), 321 | BadAsmTest('Displacement too big', 'ldpkt r0, [r1+0x80000000]', 'Value out of range for s32'), 322 | BadAsmTest('Displacement too big', 'ldpkt r0, [r1-0x80000001]', 'Value out of range for s32'), 323 | BadAsmTest('64-bit ldpkt', 'ldpkt r0.q, [r1]', 'ldpkt .q illegal'), 324 | BadAsmTest('Size mismatch in LD_IND', 'ldpkt r0.q, [r1].l', 'Mismatched sizes'), 325 | BadAsmTest('Size inside LD_IND indirection', 'ldpkt r0, [r1.l]', 'Bad size in indirect operand'), 326 | BadAsmTest('Size suffix on displacement', 'ldpkt r0, [r1-1.b]', 'Bad immediate 1.b'), 327 | 328 | # xadd 329 | 330 | BadAsmTest('Too few args to xadd', 'xadd [r0+0]', 'Bad xadd, expected 2 args'), 331 | BadAsmTest('Too many args to xadd', 'xadd [r0+0], r0, 0', 'Bad xadd, expected 2 args'), 332 | BadAsmTest('xadd missing indirection', 'xadd r0, r1', 'xadd direct_operand,... illegal'), 333 | BadAsmTest('xadd missing indirection', 'xadd r0+0, r1', 'Bad direct operand r0+0'), 334 | BadAsmTest('xadd indirect src', 'xadd [r0], [r1]', 'Bad direct operand [r1]'), 335 | 336 | AsmTest('xadd', """ 337 | xadd [r0], r1 338 | xadd [r1+0x7fff].l, r3 339 | xadd [r1-2], r3.l 340 | xadd [r1+-0x8000].l, r3.l 341 | """, [ 342 | (0xdb, 0, 1, 0, 0), 343 | (0xc3, 1, 3, 32767, 0), 344 | (0xc3, 1, 3, -2, 0), 345 | (0xc3, 1, 3, -32768, 0), 346 | ]), 347 | 348 | BadAsmTest('Extraneous + before disp', 'xadd [r0++2], r1', 'Bad immediate +2'), 349 | BadAsmTest('Displacement too big', 'xadd [r1+0x8000], r0', 'Value out of range for s16'), 350 | BadAsmTest('Displacement too big', 'xadd [r1-0x8001], r0', 'Value out of range for s16'), 351 | BadAsmTest('Size mismatch in xadd', 'xadd [r1].q, r0.l', 'Mismatched sizes'), 352 | BadAsmTest('Word-sized xadd', 'xadd [r0].w, r1', 'Bad size w for xadd'), 353 | BadAsmTest('Byte-sized xadd', 'xadd [r0], r1.b', 'Bad size b for xadd'), 354 | 355 | # Jumps 356 | 357 | BadAsmTest('Jump with no args', 'jr', 'Bad jr, expected 1 or 4 args'), 358 | BadAsmTest('Jump with two args', 'jr cc, +1', 'Bad jr, expected 1 or 4 args'), 359 | BadAsmTest('Jump with five args', 'jr nz, r0, r1, 2, +1', 'Bad jr, expected 1 or 4 args'), 360 | 361 | # Unconditional 362 | AsmTest('Unconditional jump', """ 363 | jr +0x7fff 364 | label: 365 | jr -0x8000 366 | jr label 367 | """, [ 368 | (0x05, 0, 0, 32767, 0), 369 | (0x05, 0, 0, -32768, 0), 370 | (0x05, 0, 0, -2, 0), 371 | ]), 372 | 373 | BadAsmTest('Jump offset too big', 'jr +0x8000', 'Value out of range for s16'), 374 | BadAsmTest('Jump offset too big', 'jr -0x8001', 'Value out of range for s16'), 375 | BadAsmTest('Jump to undefined label', 'jr undefined', 'Undefined symbol undefined'), 376 | BadAsmTest('Jump offset missing +', 'jr 1', 'Bad jump offset (missing + sign?)'), 377 | BadAsmTest('Jump offset with two + signs', 'jr ++1', 'Bad immediate +1'), 378 | BadAsmTest('Size suffix on jump offset', 'jr +1.b', 'Bad immediate 1.b'), 379 | 380 | # Conditional, BPF_K 381 | AsmTest('Compare immediate and jump', """ 382 | jr z, r1, 0, +1 383 | label: 384 | jr gt, r1, 0x7fffffff, +2 385 | jr eq, r1, -0x80000000, label 386 | jr &, r1, 1, -1 387 | jr sle, fp, 0, +-1 ; that +- isn't pretty but we allow it 388 | """, [ 389 | (0x15, 1, 0, 1, 0), 390 | (0x25, 1, 0, 2, (1 << 31) - 1), 391 | (0x15, 1, 0, -2, -(1 << 31)), 392 | (0x45, 1, 0, -1, 1), 393 | (0xd5, 10, 0, -1, 0), 394 | ]), 395 | 396 | BadAsmTest('Immediate too big', 'jr ge, r1, 0x80000000, +1', 'Value out of range for s32'), 397 | BadAsmTest('Immediate too big', 'jr sgt, r1, -0x80000001, +1', 'Value out of range for s32'), 398 | BadAsmTest('Jump with bogus cc', 'jr foo, r1, 0, +1', 'Bad jump op foo'), 399 | BadAsmTest('jr cc, imm, imm', 'jr nz, 0, 0, +0', 'jr cc,imm,... illegal'), 400 | BadAsmTest('Offset where imm expected', 'jr nz, r0, +0, +0', 'Bad direct operand +0'), 401 | BadAsmTest('Size suffix on compare dst_reg', 'jr z, r0.l, 1, +1', 'Bad size in jump dst'), 402 | BadAsmTest('Size suffix on compare immediate', 'jr z, r0, 1.l, +1', 'Bad size in jump src'), 403 | BadAsmTest('Size suffix on jump offset', 'jr nz, r0, 1, +1.b', 'Bad immediate 1.b'), 404 | 405 | # Conditional, BPF_X 406 | AsmTest('Compare register and jump', """ 407 | jr ne, r1, r2, +1 408 | jr <, r3, r4, -1 409 | jr sge, r0, fp, +0 410 | """, [ 411 | (0x5d, 1, 2, 1, 0), 412 | (0xad, 3, 4, -1, 0), 413 | (0x7d, 0, 10, 0, 0), 414 | ]), 415 | 416 | BadAsmTest('jr cc, imm, reg', 'jr nz, 0, r0, +0', 'jr cc,imm,... illegal'), 417 | BadAsmTest('Jump dst [ptr]', 'jr nz, [r0], r1, +0', 'Bad direct operand [r0]'), 418 | BadAsmTest('Jump src [ptr]', 'jr nz, r0, [r1], +0', 'Bad direct operand [r1]'), 419 | BadAsmTest('Size suffix on compare src_reg', 'jr z, r0, r1.l, +1', 'Bad size in jump src'), 420 | BadAsmTest('Size suffix on jump offset', 'jr nz, r0, r1, +1.b', 'Bad immediate 1.b'), 421 | 422 | # Function calls 423 | 424 | BadAsmTest('Missing arg to call', 'call', 'Bad call, expected 1 arg'), 425 | BadAsmTest('Too many args to call', 'call 1, 2', 'Bad call, expected 1 arg'), 426 | 427 | AsmTest('Function calls', """ 428 | call 011 ; let's test octal while we're here 429 | call bpf_map_update_elem 430 | label: 431 | call 0x7fffffff 432 | call -0x80000000 433 | call label ; BPF-to-BPF call with symbol resolution 434 | .equ joff, 2 435 | call joff ; helper call with literal operand 436 | call -joff ; helper call with literal operand 437 | call +joff ; BPF-to-BPF call with literal operand 438 | call +-joff ; BPF-to-BPF call with literal operand 439 | call +-0x80000000 440 | """, [ 441 | (0x85, 0, 0, 0, 9), 442 | (0x85, 0, 0, 0, 2), 443 | (0x85, 0, 0, 0, (1 << 31) - 1), 444 | (0x85, 0, 0, 0, -(1 << 31)), 445 | (0x85, 0, 1, 0, -3), 446 | (0x85, 0, 0, 0, 2), 447 | (0x85, 0, 0, 0, -2), 448 | (0x85, 0, 1, 0, 2), 449 | (0x85, 0, 1, 0, -2), 450 | (0x85, 0, 1, 0, -(1 << 31)), 451 | ]), 452 | 453 | BadAsmTest('Immediate too big', 'call 0x80000000', 'Value out of range for s32'), 454 | BadAsmTest('Immediate too big', 'call -0x80000001', 'Value out of range for s32'), 455 | BadAsmTest('Call undefined function', 'call undefined', 'Undefined symbol undefined'), 456 | BadAsmTest('Call register', 'call r0', 'Undefined symbol r0'), # just parsed as a label or equate 457 | BadAsmTest('Call undefined offset', 'call +undefined', 'Bad call, expected function identifier, label or offset'), 458 | BadAsmTest('Call label as offset', 'label:\ncall +label', 'Bad call, expected function identifier, label or offset'), 459 | BadAsmTest('Size suffix on call number', 'call 1.b', 'Bad call, expected function identifier, label or offset'), 460 | 461 | # Program exit 462 | 463 | BadAsmTest('Too many args to exit', 'exit 1', 'Bad exit, expected no args'), 464 | # let's take the opportunity to test continuation lines in code, too 465 | AsmTest('exit', 'ex\\\nit', [(0x95, 0, 0, 0, 0)]), 466 | 467 | # ALU 468 | 469 | # binary ops 470 | BadAsmTest('Too few args to ALU binary op', 'add r1', 'Bad add, expected 2 args'), 471 | BadAsmTest('Too many args to ALU binary op', 'sub r1, r2, r3', 'Bad sub, expected 2 args'), 472 | 473 | AsmTest('ALU binary ops, BPF_K', """ 474 | add r1, 2 475 | sub r2, 0x7fffffff 476 | and r3, -0x80000000 477 | xor r4.l, 1 478 | lsh r5, 03.l ; this treats r5 as a .l, which is slightly odd 479 | mod r6.l, 0x10.l 480 | arsh fp, 1.q 481 | """, [ 482 | (0x07, 1, 0, 0, 2), 483 | (0x17, 2, 0, 0, (1 << 31) - 1), 484 | (0x57, 3, 0, 0, -(1 << 31)), 485 | (0xa4, 4, 0, 0, 1), 486 | (0x64, 5, 0, 0, 3), 487 | (0x94, 6, 0, 0, 16), 488 | (0xc7, 10, 0, 0, 1), 489 | ]), 490 | 491 | BadAsmTest('ALU indirect dst', 'add [r1], 0', 'Bad direct operand [r1]'), 492 | BadAsmTest('ALU immediate dst', 'or 1, 0', 'or imm,... illegal'), 493 | BadAsmTest('Immediate too big', 'mul r1, 0x80000000', 'Value out of range for s32'), 494 | BadAsmTest('Immediate too big', 'div r1, -0x80000001', 'Value out of range for s32'), 495 | BadAsmTest('Word-sized ALU', 'add r0.w, 1', 'Bad size w for ALU op'), 496 | BadAsmTest('Byte-sized ALU', 'add r0.b, 1', 'Bad size b for ALU op'), 497 | BadAsmTest('Size mismatch in ALU reg, imm', 'add r0.q, 0.l', 'Mismatched sizes'), 498 | BadAsmTest('Offset where imm expected', 'add r1, +0', 'Bad direct operand +0'), 499 | 500 | AsmTest('ALU binary ops, BPF_X', """ 501 | or r1, r2 502 | mul r3.l, r4 503 | rsh r5, r6.l 504 | div r7.l, r8.l 505 | add r9, fp.q 506 | """, [ 507 | (0x4f, 1, 2, 0, 0), 508 | (0x2c, 3, 4, 0, 0), 509 | (0x7c, 5, 6, 0, 0), 510 | (0x3c, 7, 8, 0, 0), 511 | (0x0f, 9, 10, 0, 0), 512 | ]), 513 | 514 | BadAsmTest('ALU indirect dst', 'add [r1], r0', 'Bad direct operand [r1]'), 515 | BadAsmTest('ALU immediate dst', 'xor 1, r0', 'xor imm,... illegal'), 516 | BadAsmTest('Word-sized ALU', 'add r0.w, r1', 'Bad size w for ALU op'), 517 | BadAsmTest('Byte-sized ALU', 'add r0.b, r1', 'Bad size b for ALU op'), 518 | BadAsmTest('Size mismatch in ALU reg, imm', 'add r0.q, r1.l', 'Mismatched sizes'), 519 | 520 | # unary op (neg) 521 | BadAsmTest('Too few args to neg', 'neg', 'Bad neg, expected 1 arg'), 522 | BadAsmTest('Too many args to neg', 'neg r1, 0', 'Bad neg, expected 1 arg'), 523 | 524 | AsmTest('ALU unary neg', """ 525 | neg r1 526 | neg r2.l 527 | neg fp.q 528 | """, [ 529 | (0x87, 1, 0, 0, 0), 530 | (0x84, 2, 0, 0, 0), 531 | (0x87, 10, 0, 0, 0), 532 | ]), 533 | 534 | BadAsmTest('neg immediate dst', 'neg 1', 'neg imm illegal'), 535 | BadAsmTest('neg indirect dst', 'neg [r1]', 'Bad direct operand [r1]'), 536 | BadAsmTest('Word-sized neg', 'neg r0.w', 'Bad size w for ALU op'), 537 | BadAsmTest('Byte-sized neg', 'neg r0.b', 'Bad size b for ALU op'), 538 | 539 | # endianness op 540 | BadAsmTest('Too few args to end', 'end le', 'Bad end, expected 2 args'), 541 | BadAsmTest('Too many args to end', 'end le, r1.w, r2', 'Bad end, expected 2 args'), 542 | 543 | AsmTest('Endianness op', """ 544 | end le, r1 545 | end be, r2.w 546 | end le, fp.q 547 | end le, r3.l 548 | """, [ 549 | (0xd4, 1, 0, 0, 64), 550 | (0xdc, 2, 0, 0, 16), 551 | (0xd4, 10, 0, 0, 64), 552 | (0xd4, 3, 0, 0, 32), 553 | ]), 554 | 555 | BadAsmTest('end immediate dst', 'end le, 1', 'end ..., imm illegal'), 556 | BadAsmTest('end indirect dst', 'end le, [r1]', 'Bad direct operand [r1]'), 557 | BadAsmTest('Byte-sized end', 'end le, r0.b', 'Bad size b for endian op'), 558 | BadAsmTest('Bad endian direction', 'end r1, r2', 'Bad end, expected le or be'), 559 | BadAsmTest('Size on endian direction', 'end le.l, r0', 'Bad end, expected le or be'), 560 | 561 | ## STATIC DATA 562 | 563 | BadDataTest('Instruction in data section', 'ld r0, 0', 'No such .data insn'), 564 | BadDataTest('asciz bad type', 'asciz 1', 'asciz takes a string'), 565 | BadDataTest('asciz malformed', 'asciz "', 'EOL while scanning string literal'), 566 | BadDataTest('Too few args to asciz', 'asciz', 'unexpected EOF while parsing'), 567 | BadDataTest('Too many args to asciz', 'asciz "a", "b"', 'asciz takes a string'), 568 | 569 | DataTest('Static strings', """ 570 | strings: 571 | asciz "foo" 572 | asciz 'ba"r' 573 | asciz '''quu'x''' 574 | """, [b'foo', b'ba"r', b"quu'x", b'']), 575 | 576 | ## ASSEMBLER DIRECTIVES 577 | 578 | # Equates 579 | 580 | BadAsmTest('Too few args to .equ', '.equ name', 'Bad .equ, expected 2 args'), 581 | BadAsmTest('Too many args to .equ', '.equ name, 1, 2', 'Bad .equ, expected 2 args'), 582 | BadAsmTest('Malformed equate value', '.equ name, :', 'Bad immediate :'), 583 | BadAsmTest('Comma after .equ', '.equ, name, 1', 'No such directive .equ,'), 584 | BadAsmTest('Empty equate name', '.equ , 1', 'Bad .equ name '), 585 | BadAsmTest('Equate name starts with digit', '.equ 1, 2', 'Bad .equ name 1'), 586 | BadAsmTest('Equate value undefined', '.equ name, value', 'Bad immediate value'), 587 | BadAsmTest('Offset where imm expected', '.equ name, +0', 'Bad immediate +0'), 588 | 589 | AsmTest('Equates', """ 590 | .equ foo, 1 591 | .equ a b, foo 592 | .equ :, -1 593 | .equ r1, : 594 | .equ foo.b, 2 595 | ld r1, a b 596 | ld r2, :.l 597 | ld r3, r1.l ; register name takes priority over equate name 598 | ld [r4+r1], 1 ; can't be a register, so must be an equate 599 | ld [r5], foo.b ; resolves to foo 600 | ld [r6], foo.b.b ; resolves to foo.b 601 | .equ foo, 6 ; redefining equates is allowed 602 | ld r1, foo 603 | """, [ 604 | (0x18, 1, 0, 0, 1), 605 | (0, 0, 0, 0, 0), 606 | (0xb4, 2, 0, 0, -1), 607 | (0xbc, 3, 1, 0, 0), 608 | (0x7a, 4, 0, -1, 1), 609 | (0x72, 5, 0, 0, 1), 610 | (0x72, 6, 0, 0, 2), 611 | (0x18, 1, 0, 0, 6), 612 | (0, 0, 0, 0, 0), 613 | ]), 614 | 615 | BadAsmTest('Size suffix stripping from equate', """ 616 | .equ foo.b, 1 617 | ld [r1], foo.b ; 'foo' matches the _label_ref_re 618 | """, 'Value out of range for s32 foo'), 619 | BadAsmTest('Size suffix stripping from equate', """ 620 | .equ foo.b, 1 621 | ld [r1], foo.b.b.b 622 | """, 'Bad direct operand foo.b.b'), 623 | BadAsmTest('Size suffix on equate value', '.equ foo, 1.b', 'Bad immediate 1.b'), 624 | 625 | # Sections 626 | 627 | AsmTest('Sections', """ 628 | .section foo 629 | end le, r0 630 | .section prog 631 | ld r0, 0 632 | .data 633 | .section data 634 | asciz 'bar' 635 | .text 636 | .section prog 637 | exit 638 | """, [ 639 | (0x18, 0, 0, 0, 0), 640 | (0, 0, 0, 0, 0), 641 | (0x95, 0, 0, 0, 0), 642 | ]), 643 | 644 | BadAsmTest('Section type mismatch', """ 645 | .data 646 | .section prog 647 | """, 'Section prog redefined as different type'), 648 | AsmTest('Section after maps', """ 649 | .section maps 650 | .section prog 651 | exit 652 | """, [(0x95, 0, 0, 0, 0)]), 653 | 654 | ] 655 | 656 | def run_testset(tests, verbose=False): 657 | passes = 0 658 | fails = 0 659 | for i,test in enumerate(tests): 660 | try: 661 | test.run() 662 | if verbose: 663 | print("%03d %s PASS" % (i, test)) 664 | passes += 1 665 | except Exception as e: 666 | print("%03d %s FAIL %s" % (i, test, e)) 667 | fails += 1 668 | return passes, fails 669 | 670 | if __name__ == '__main__': 671 | import sys 672 | verbose = '-v' in sys.argv[1:] 673 | passes, fails = run_testset(AllTests, verbose=verbose) 674 | if verbose: 675 | print("DONE; %d PASS, %d FAIL" % (passes, fails)) 676 | if fails or not passes: 677 | sys.exit(1) 678 | -------------------------------------------------------------------------------- /release.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | set -e 4 | 5 | # Run the regression tests to ensure the release is good 6 | ./regression.py 7 | 8 | vers=`git describe --tags` 9 | 10 | reld="ebpf_asm-$vers" 11 | mkdir "$reld" 12 | 13 | sed -e "/VERSION =/cVERSION = '$vers'" < ebpf_asm.py > "$reld/ebpf_asm.py" 14 | chmod +x "$reld/ebpf_asm.py" 15 | cp README.md *.i regression.py paren.py "$reld/" 16 | -------------------------------------------------------------------------------- /test.s: -------------------------------------------------------------------------------- 1 | ; test.s 2 | ; A semantically-nonsense test file for the eBPF assembler 3 | 4 | .include defs.i 5 | 6 | .text 7 | .section prog 8 | ld r0.l, [r1+013] 9 | ld r2, fp 10 | ld r4.l, 0xfe 11 | add r0, r4 12 | ; Let's test writing \ 13 | continuation lines 14 | mul r1,\ 15 | 4 16 | neg r2 17 | ld [r0].b, 12 18 | ld r1, bar 19 | jr +6 20 | jr nz, r0, r1, +6 21 | jr >=, r0, 14, +6 22 | xadd [r0], r1 23 | foo: 24 | call bpf_map_update_elem 25 | exit 26 | jr foo 27 | .section maps 28 | ; ip.src => counter 29 | bar: percpu_hash, 4, 4, 1024, P 30 | .data 31 | .section license 32 | _license: 33 | asciz "Dual MIT/GPL" 34 | --------------------------------------------------------------------------------