├── .gitignore
├── book.toml
└── src
    ├── SUMMARY.md
    └── process-thread-coroutine.md


/.gitignore:
--------------------------------------------------------------------------------
1 | book
2 | 


--------------------------------------------------------------------------------
/book.toml:
--------------------------------------------------------------------------------
1 | [book]
2 | authors = ["Rust you don't know authers"]
3 | language = "en"
4 | multilingual = false
5 | src = "src"
6 | title = "rust you don't know"
7 | 


--------------------------------------------------------------------------------
/src/SUMMARY.md:
--------------------------------------------------------------------------------
 1 | # Summary
 2 | 
 3 | - [Part1. From unsafe to safe]()
 4 |     - [Memory Layout]()
 5 |     - [Pointer & Reference]()
 6 |     - [Function Safety]()
 7 |     - [FFI & ABI]()
 8 |     - [Memory Allocation]()
 9 |     - [Case Study]()
10 | - [Part2. Sync to async]()
11 |     - [Process, Thread, and Coroutine](./process-thread-coroutine.md)
12 | 


--------------------------------------------------------------------------------
/src/process-thread-coroutine.md:
--------------------------------------------------------------------------------
  1 | Process, Thread, and Coroutine
  2 | ---
  3 | 
  4 | Before we start discussing the asynchronization of Rust, we'd better firstly
  5 | talk about how the operating system organizes and schedules the tasks, which
  6 | will help us understand the motivation of the language-level asynchronization
  7 | mechanisms.
  8 | 
  9 | # Process and thread
 10 | 
 11 | People always want to run multiple tasks simultaneously on the OS even though
 12 | there's only one CPU core because one task usually can't occupy the whole CPU
 13 | core at most times. Following the idea, we have to answer two questions to get
 14 | the final design, how to abstract the task and how to schedule the tasks on the
 15 | hardware CPU core.
 16 | 
 17 | Usually, we don't want tasks to affect each other, which means they can run
 18 | separately and manage their states. As states are stored in the memory, tasks
 19 | must hold their own memory space to achieve the above goal. For instance, the
 20 | execution flow is a kind of in-memory state, recording the current instruction
 21 | position and the on-stack states. In one word, **processes** are tasks having
 22 | separated memory spaces on Linux.
 23 | 
 24 | Though memory space separation is one of the key features of processes, they
 25 | sometimes have to share some memory. First, the kernel code is the same across
 26 | all processes, kernel part memory space sharing reduces unnecessary memory
 27 | redundant. Secondly, processes need to cooperate so that inter-process
 28 | communications (IPC) are unavoidable, and most high-performance [IPCs][1] are
 29 | some kind of memory sharing/transferring. Considering the above requirements
 30 | sharing the whole memory space across tasks is more convenient in some
 31 | scenarios, where thread helps.
 32 | 
 33 | A process can contain one (single-thread process) or more threads. Threads in a
 34 | process share the same memory space, which means most state changes are
 35 | observable by all these threads except for the execution stacks. Each thread has
 36 | its execution flow and can run on any CPU core concurrently.
 37 | 
 38 | Now we know that process and thread are the basic execution units/tasks on most
 39 | OSes, let's try to run them on the real hardware, CPU cores.
 40 | 
 41 | ## Schedule
 42 | 
 43 | The first challenge we meet when trying to run processes and threads is the
 44 | limited hardware resources, the CPU core number is limited. When I write this
 45 | section, one x86 CPU can at most run 128 tasks at the same time, [AMD Ryzen™
 46 | Threadripper™ PRO 5995WX Processor][2]. But it's too easy to create thousands of
 47 | processes or threads on Linux, we have to decide how to place them on the CPU
 48 | core and when to stop a task, where [OS task scheduler][3] helps.
 49 | 
 50 | Schedulers can interrupt an executing task regardless of its state, and schedule
 51 | a new one. It's called preemptive schedule and is used by most OSes like Linux. The
 52 | advantage is that it can share the CPU time slice between tasks fairly no matter
 53 | what they're running, but the tasks have no idea about the scheduler. To
 54 | interrupt a running task, hardware interruption like time interruption is
 55 | necessary.
 56 | 
 57 | The other schedulers are called non-preemptive schedulers, which have to
 58 | cooperate with the task while scheduling. Here tasks are not interrupted,
 59 | instead, they decide when to release the computing resource. The tasks usually 
 60 | schedule themselves out when doing I/O operations, which usually take a while
 61 | to complete. Fairness is hard to be guaranteed as the task itself may run
 62 | forever without stopping, in which case other tasks have no opportunity to be
 63 | scheduled on that core.
 64 | 
 65 | No matter what kind of scheduler is taken, tasks scheduling always needs to do the
 66 | following steps:
 67 | 
 68 | * Store current process/thread execution flow information.
 69 | * Change page table mapping (memory space) and flush TLB if necessary.
 70 | * Restore the new process/thread execution flow from the previously stored
 71 |   state.
 72 | 
 73 | After adopting a scheduler operating system can run tens of thousands of
 74 | processes/threads on the limited hardware resource.
 75 | 
 76 | # Coroutine
 77 | 
 78 | We have basic knowledge of OS scheduling, and it seems to work fine in most
 79 | cases. Next, let's see how it performs in extreme scenarios. Free software
 80 | developer, [Jim Blandy][4], did an [interesting test][5] to show how much time
 81 | it takes to do a context switch on Linux. In the test, the app creates 500
 82 | thread and connect them with pipes like a chain, and then pass a one-byte
 83 | message from one side to the other side. The whole test runs 10000 iterations to
 84 | get a stable result. The result shows that a thread context switch takes around
 85 | 1.7us, compared to 0.2us of a Rust async task switch.
 86 | 
 87 | It's the first time to mention "Rust async task", which is a concrete
 88 | implementation of [coroutine][6] in Rust. The coroutines are lightweight tasks
 89 | for non-preemptive multitasking, whose execution can be suspended and resumed.
 90 | Usually, the task itself decides when to suspend and wait for a notification to
 91 | resume. To suspend and resume tasks' execution flow, the execution states should
 92 | be saved, just like what OS does. Saving the CPU register values is easy for the
 93 | OS, but not for the applications. Rust saves it to a state machine, and the
 94 | machine can only be suspended and resumed from the valid states in that machine.
 95 | To make it easy, We name the state machine "Future".
 96 | 
 97 | ## Future
 98 | 
 99 | We all know that the `Future` is the data structure returned from an async function,
100 | an async block is also a future. When we get it, it does nothing, it's just a
101 | plan and a blueprint, telling us what it's going to do. Let's see the example
102 | below:
103 | 
104 | ```rust
105 | async fn async_fn() -> u32 {
106 |     return 0;
107 | }
108 | ```
109 | 
110 | We can't see any "Future" structure in the function definition, but the compiler
111 | will translate the function signature to another one returning a "Future":
112 | 
113 | ```rust
114 | fn async_fn() -> Future<Output=u32> {
115 | ...
116 | }
117 | ```
118 | 
119 | Rust compiler does us a great favor to generate the state machine for us. Here's
120 | the Futures API from [std lib][7]:
121 | 
122 | ```rust
123 | pub trait Future {
124 |     type Output;
125 | 
126 |     fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
127 | }
128 | 
129 | pub enum Poll<T> {
130 |     Ready(T),
131 |     Pending,
132 | }
133 | ```
134 | 
135 | The `poll` function tries to drive the state machine until a final result
136 | `Output` is returned. The state machine is a black box for the caller of the
137 | `poll` function, since that `Poll::Pending` means it's not in the final state,
138 | and `Poll::Ready(T)` means it's in the final state. Whenever the `Poll::Pending`
139 | is returned it means the coroutine is suspended. Every call to `poll` is trying
140 | to resume the coroutine.
141 | 
142 | 
143 | ## Runtime
144 | 
145 | Since `Future`s are state machines, there should be a driver that pushes the
146 | machine state forward. Though we can write the driver manually by `poll`ing the
147 | `Future`s one by one until we get the final result, that work should be done
148 | once and reused everywhere, in the result the `runtime` comes. A Rust async
149 | runtime handles the following tasks:
150 | 
151 | 1. Drive the received `Future`s forward.
152 | 2. Park or store the blocked `Future`s.
153 | 3. Get notification to restore or resume the blocked `Future`s.
154 | 
155 | # Summary
156 | 
157 | In this chapter, we learned that "Rust async" is a way to schedule tasks. And
158 | the execution state is stored in a state machine named `Future`. In the next
159 | chapters, we'll discuss `Future` automatical generation by the compiler and its
160 | optimizations.
161 | 
162 | [1]: https://en.wikipedia.org/wiki/Inter-process_communication
163 | [2]: https://www.amd.com/en/products/cpu/amd-ryzen-threadripper-pro-5995wx
164 | [3]: https://en.wikipedia.org/wiki/Scheduling_(computing)
165 | [4]: https://www.red-bean.com/jimb/
166 | [5]: https://github.com/jimblandy/context-switch
167 | [6]: https://en.wikipedia.org/wiki/Coroutine
168 | [7]: https://doc.rust-lang.org/std/future/trait.Future.html
169 | 


--------------------------------------------------------------------------------