├── .gitignore ├── book.toml └── src ├── SUMMARY.md └── process-thread-coroutine.md /.gitignore: -------------------------------------------------------------------------------- 1 | book 2 | -------------------------------------------------------------------------------- /book.toml: -------------------------------------------------------------------------------- 1 | [book] 2 | authors = ["Rust you don't know authers"] 3 | language = "en" 4 | multilingual = false 5 | src = "src" 6 | title = "rust you don't know" 7 | -------------------------------------------------------------------------------- /src/SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | - [Part1. From unsafe to safe]() 4 | - [Memory Layout]() 5 | - [Pointer & Reference]() 6 | - [Function Safety]() 7 | - [FFI & ABI]() 8 | - [Memory Allocation]() 9 | - [Case Study]() 10 | - [Part2. Sync to async]() 11 | - [Process, Thread, and Coroutine](./process-thread-coroutine.md) 12 | -------------------------------------------------------------------------------- /src/process-thread-coroutine.md: -------------------------------------------------------------------------------- 1 | Process, Thread, and Coroutine 2 | --- 3 | 4 | Before we start discussing the asynchronization of Rust, we'd better firstly 5 | talk about how the operating system organizes and schedules the tasks, which 6 | will help us understand the motivation of the language-level asynchronization 7 | mechanisms. 8 | 9 | # Process and thread 10 | 11 | People always want to run multiple tasks simultaneously on the OS even though 12 | there's only one CPU core because one task usually can't occupy the whole CPU 13 | core at most times. Following the idea, we have to answer two questions to get 14 | the final design, how to abstract the task and how to schedule the tasks on the 15 | hardware CPU core. 16 | 17 | Usually, we don't want tasks to affect each other, which means they can run 18 | separately and manage their states. As states are stored in the memory, tasks 19 | must hold their own memory space to achieve the above goal. For instance, the 20 | execution flow is a kind of in-memory state, recording the current instruction 21 | position and the on-stack states. In one word, **processes** are tasks having 22 | separated memory spaces on Linux. 23 | 24 | Though memory space separation is one of the key features of processes, they 25 | sometimes have to share some memory. First, the kernel code is the same across 26 | all processes, kernel part memory space sharing reduces unnecessary memory 27 | redundant. Secondly, processes need to cooperate so that inter-process 28 | communications (IPC) are unavoidable, and most high-performance [IPCs][1] are 29 | some kind of memory sharing/transferring. Considering the above requirements 30 | sharing the whole memory space across tasks is more convenient in some 31 | scenarios, where thread helps. 32 | 33 | A process can contain one (single-thread process) or more threads. Threads in a 34 | process share the same memory space, which means most state changes are 35 | observable by all these threads except for the execution stacks. Each thread has 36 | its execution flow and can run on any CPU core concurrently. 37 | 38 | Now we know that process and thread are the basic execution units/tasks on most 39 | OSes, let's try to run them on the real hardware, CPU cores. 40 | 41 | ## Schedule 42 | 43 | The first challenge we meet when trying to run processes and threads is the 44 | limited hardware resources, the CPU core number is limited. When I write this 45 | section, one x86 CPU can at most run 128 tasks at the same time, [AMD Ryzen™ 46 | Threadripper™ PRO 5995WX Processor][2]. But it's too easy to create thousands of 47 | processes or threads on Linux, we have to decide how to place them on the CPU 48 | core and when to stop a task, where [OS task scheduler][3] helps. 49 | 50 | Schedulers can interrupt an executing task regardless of its state, and schedule 51 | a new one. It's called preemptive schedule and is used by most OSes like Linux. The 52 | advantage is that it can share the CPU time slice between tasks fairly no matter 53 | what they're running, but the tasks have no idea about the scheduler. To 54 | interrupt a running task, hardware interruption like time interruption is 55 | necessary. 56 | 57 | The other schedulers are called non-preemptive schedulers, which have to 58 | cooperate with the task while scheduling. Here tasks are not interrupted, 59 | instead, they decide when to release the computing resource. The tasks usually 60 | schedule themselves out when doing I/O operations, which usually take a while 61 | to complete. Fairness is hard to be guaranteed as the task itself may run 62 | forever without stopping, in which case other tasks have no opportunity to be 63 | scheduled on that core. 64 | 65 | No matter what kind of scheduler is taken, tasks scheduling always needs to do the 66 | following steps: 67 | 68 | * Store current process/thread execution flow information. 69 | * Change page table mapping (memory space) and flush TLB if necessary. 70 | * Restore the new process/thread execution flow from the previously stored 71 | state. 72 | 73 | After adopting a scheduler operating system can run tens of thousands of 74 | processes/threads on the limited hardware resource. 75 | 76 | # Coroutine 77 | 78 | We have basic knowledge of OS scheduling, and it seems to work fine in most 79 | cases. Next, let's see how it performs in extreme scenarios. Free software 80 | developer, [Jim Blandy][4], did an [interesting test][5] to show how much time 81 | it takes to do a context switch on Linux. In the test, the app creates 500 82 | thread and connect them with pipes like a chain, and then pass a one-byte 83 | message from one side to the other side. The whole test runs 10000 iterations to 84 | get a stable result. The result shows that a thread context switch takes around 85 | 1.7us, compared to 0.2us of a Rust async task switch. 86 | 87 | It's the first time to mention "Rust async task", which is a concrete 88 | implementation of [coroutine][6] in Rust. The coroutines are lightweight tasks 89 | for non-preemptive multitasking, whose execution can be suspended and resumed. 90 | Usually, the task itself decides when to suspend and wait for a notification to 91 | resume. To suspend and resume tasks' execution flow, the execution states should 92 | be saved, just like what OS does. Saving the CPU register values is easy for the 93 | OS, but not for the applications. Rust saves it to a state machine, and the 94 | machine can only be suspended and resumed from the valid states in that machine. 95 | To make it easy, We name the state machine "Future". 96 | 97 | ## Future 98 | 99 | We all know that the `Future` is the data structure returned from an async function, 100 | an async block is also a future. When we get it, it does nothing, it's just a 101 | plan and a blueprint, telling us what it's going to do. Let's see the example 102 | below: 103 | 104 | ```rust 105 | async fn async_fn() -> u32 { 106 | return 0; 107 | } 108 | ``` 109 | 110 | We can't see any "Future" structure in the function definition, but the compiler 111 | will translate the function signature to another one returning a "Future": 112 | 113 | ```rust 114 | fn async_fn() -> Future { 115 | ... 116 | } 117 | ``` 118 | 119 | Rust compiler does us a great favor to generate the state machine for us. Here's 120 | the Futures API from [std lib][7]: 121 | 122 | ```rust 123 | pub trait Future { 124 | type Output; 125 | 126 | fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll; 127 | } 128 | 129 | pub enum Poll { 130 | Ready(T), 131 | Pending, 132 | } 133 | ``` 134 | 135 | The `poll` function tries to drive the state machine until a final result 136 | `Output` is returned. The state machine is a black box for the caller of the 137 | `poll` function, since that `Poll::Pending` means it's not in the final state, 138 | and `Poll::Ready(T)` means it's in the final state. Whenever the `Poll::Pending` 139 | is returned it means the coroutine is suspended. Every call to `poll` is trying 140 | to resume the coroutine. 141 | 142 | 143 | ## Runtime 144 | 145 | Since `Future`s are state machines, there should be a driver that pushes the 146 | machine state forward. Though we can write the driver manually by `poll`ing the 147 | `Future`s one by one until we get the final result, that work should be done 148 | once and reused everywhere, in the result the `runtime` comes. A Rust async 149 | runtime handles the following tasks: 150 | 151 | 1. Drive the received `Future`s forward. 152 | 2. Park or store the blocked `Future`s. 153 | 3. Get notification to restore or resume the blocked `Future`s. 154 | 155 | # Summary 156 | 157 | In this chapter, we learned that "Rust async" is a way to schedule tasks. And 158 | the execution state is stored in a state machine named `Future`. In the next 159 | chapters, we'll discuss `Future` automatical generation by the compiler and its 160 | optimizations. 161 | 162 | [1]: https://en.wikipedia.org/wiki/Inter-process_communication 163 | [2]: https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-pro-5995wx 164 | [3]: https://en.wikipedia.org/wiki/Scheduling_(computing) 165 | [4]: https://www.red-bean.com/jimb/ 166 | [5]: https://github.com/jimblandy/context-switch 167 | [6]: https://en.wikipedia.org/wiki/Coroutine 168 | [7]: https://doc.rust-lang.org/std/future/trait.Future.html 169 | --------------------------------------------------------------------------------