├── LICENSE
├── Makefile
├── README.md
├── REPORT.pdf
├── bin
    └── .gitignore
├── inc
    ├── common.h
    ├── context.h
    └── coroutine_pool.h
├── lib
    └── context.S
└── src
    ├── binary_search.cpp
    ├── sample.cpp
    └── sleep_sort.cpp


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 KujoStar
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | HEADERS=$(wildcard inc/*.h)
 2 | TARGETS=bin/sample bin/sleep_sort bin/binary_search
 3 | 
 4 | FLAGS=-O2 -g -Iinc -pthread
 5 | 
 6 | all: ${TARGETS}
 7 | 
 8 | bin/%: src/%.cpp lib/context.S ${HEADERS}
 9 | 	g++ --std=c++17 -fno-omit-frame-pointer $< lib/context.S ${FLAGS} -o $@
10 | 
11 | clean:
12 | 	rm -rf ${TARGETS}


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 计算机系统概论 2022 秋 协程实验
  2 | 
  3 | 在本实验中，同学们将亲手借助汇编/内嵌汇编的方式，在熟悉 x86_64 系统架构的前提下，完成一个简单的用户态的有栈协程库，并利用协程库进行睡眠排序（sleep_sort）的编写以及二分查找的优化。
  4 | 
  5 | 本次实验共分为三个子任务，同学需要依次完成三个子任务并编写实验报告。
  6 | 
  7 | 三个子任务如下：
  8 | 
  9 | 1. 完成协程库的编写，并通过基础测试。
 10 | 2. 向协程库添加 sleep 函数，并能够通过 sleep_sort 测试。
 11 | 3. 使用协程库对二分查找进行优化，并报告优化结果。
 12 | 
 13 | 你需要在 REPORT.md 文件中以 Markdown 格式编写实验报告，然后把整个目录压缩成 zip 格式再提交到网络学堂。
 14 | 
 15 | ## 环境配置
 16 | 
 17 | 本实验的运行环境是 x86_64 Linux。你可以在课程提供的远程 Linux 服务器上完成本实验。
 18 | 
 19 | 编译的时候需要 GCC 和 Make。在根目录下运行 `make` 即可编译，生成的二进制在 `bin` 目录下。
 20 | 
 21 | ## Task 1: 协程库的编写
 22 | 
 23 | 你需要认真阅读框架代码并以及代码中的注释并完成标有 `TODO: Task 1` 字样的函数的编写。
 24 | 
 25 | 我们建议分为两个小步骤来实现这一部分：
 26 | 
 27 | 1. 实现 `coroutine_pool::serial_execute_all` 函数中启动协程的部分以及 `context.S` 中切换协程的汇编代码。完成这两部分之后可以自己实现一个不带 yield 的“协程库”。在此基础上可以进行一个简单的测试以检查协程栈是否正常分配在堆上，协程返回的时候是否正常等。
 28 | 2. 实现 `yield` 以及完善 `coroutine_pool::serial_execute_all`。完成协程的 yield 以及 resume 和重新调用。
 29 | 
 30 | 关于这一部分的实现细节在代码注释中有比较详细的内容，如果你认为实现有困难可以参考代码中给出的注释提示。
 31 | 
 32 | 实验报告的额外要求：
 33 | 
 34 | 1. 绘制出在协程切换时，栈的变化过程；
 35 | 2. 并结合源代码，解释协程是如何开始执行的，包括 `coroutine_entry` 和 `coroutine_main` 函数以及初始的协程状态；
 36 | 
 37 | 完成 Task 1 以后，应该可以正常运行 `bin/sample` 程序，并得到下面的输出：
 38 | 
 39 | ```
 40 | in show(): 0
 41 | in show(): 0
 42 | in show(): 1
 43 | in show(): 1
 44 | in show(): 2
 45 | in show(): 2
 46 | in show(): 3
 47 | in show(): 3
 48 | in show(): 4
 49 | in show(): 4
 50 | in main(): 0
 51 | in main(): 0
 52 | in main(): 1
 53 | in main(): 1
 54 | in main(): 2
 55 | in main(): 2
 56 | in main(): 3
 57 | in main(): 3
 58 | in main(): 4
 59 | in main(): 4
 60 | ```
 61 | 
 62 | ## Task 2: 实现 sleep 函数
 63 | 
 64 | 在协程中，不能使用操作系统提供的 sleep 函数，因为它会阻塞整个线程，但希望的效果是切换到其他协程，等到 sleep 时间结束后，再继续执行协程。
 65 | 
 66 | 因此，协程库也会提供一个 sleep 函数，它的实现方法是，标记当前协程 `ready = false`，并注册一个 `ready_func`，它会检查当前的时间，是否已经超过了应该继续执行的时间，然后进行 `yield`。那么，`coroutine_pool::serial_execute_all` 就需要判断协程的当前状态，如果它 `ready == true`，说明可以继续执行；如果它 `ready == false`，则调用 `ready_func`，如果返回 `true`，说明可以继续执行了，就设置 `ready = true` 并切换到协程。
 67 | 
 68 | 你需要实现 `sleep` 函数，具体定义可以参照代码注释，并修改 `coroutine_pool::serial_execute_all`，实现对 `ready` 的判断。并完成 sleep_sort 的测试。
 69 | 
 70 | **提示**: 你可以使用 `parallel_execute_all` 与 `serial_execute_all` 进行对比。
 71 | 
 72 | 实验报告的额外要求：
 73 | 
 74 | 1. 按照时间线，绘制出 `sleep_sort` 中不同协程的运行情况；
 75 | 2. 目前的协程库实现方式是轮询 `ready_func` 是否等于 `true`，设计一下，能否有更加高效的方法。
 76 | 
 77 | 完成 Task 2 以后，应该可以正常运行 `bin/sleep_sort` 程序，可以输入一组数，程序会从小到大输出排序后的数。下面是输入 1, 3, 4, 5, 2 的例子：
 78 | 
 79 | ```
 80 | $ ./bin/sleep_sort
 81 | 5
 82 | 1 3 4 5 2
 83 | 1
 84 | 2
 85 | 3
 86 | 4
 87 | 5
 88 | ```
 89 | 
 90 | ## Task 3: 利用协程优化二分查找
 91 | 
 92 | 在数据量比较大的时候，二分查找会产生大量的缓存缺失，而从内存读取数据到 CPU 需要花费几百个 CPU 周期。因此，可以利用协程来优化二分查找。优化方法是，修改二分查找中容易产生缓存缺失的代码，改为先使用预取指令，让 CPU 异步地读取数据，紧接着调用 `yield` 来切换到其他协程。当多次切换后返回到刚才使用预取指令的协程的时候，CPU 已经把数据读取到了缓存中，此时就节省了很多时间。
 93 | 
 94 | 你需要修改 `lookup_coroutine` 函数，插入代码，来实现上面所述的优化算法，并在实验报告中汇报性能的提升效果。你可以通过命令行参数设置不同的参数，观察不同参数下的性能。
 95 | 
 96 | 实验报告的额外要求：
 97 | 
 98 | 1. 汇报性能的提升效果。
 99 | 
100 | 可以参考下列文献：
101 | 
102 | Georgios Psaropoulos, Thomas Legler, Norman May, and Anastasia Ailamaki. 2017. Interleaving with coroutines: a practical approach for robust index joins. Proc. VLDB Endow. 11, 2 (October 2017), 230–242. https://doi.org/10.14778/3149193.3149202
103 | 
104 | ## 实验报告
105 | 
106 | 你需要在 REPORT.md 文件中以 Markdown 格式编写实验报告，内容包括：
107 | 
108 | 1. 姓名，学号和班级；
109 | 2. 三个小节，分别记录你为了实现三个子任务所添加到代码，并对代码进行详细的注释；每个子任务都有实验报告的额外要求，请阅读上面的文本；
110 | 3. 记录你在完成本实验的过程中，与哪些同学进行了交流，查阅了哪些网站或者代码；
111 | 4. 总结和感想。


--------------------------------------------------------------------------------
/REPORT.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KujoStar/ICS_coroutinelab/fda6f9afe8504e26db034036d3747a98e47eae92/REPORT.pdf


--------------------------------------------------------------------------------
/bin/.gitignore:
--------------------------------------------------------------------------------
1 | *
2 | !.gitignore
3 | 


--------------------------------------------------------------------------------
/inc/common.h:
--------------------------------------------------------------------------------
 1 | #pragma once
 2 | #include "context.h"
 3 | #include "coroutine_pool.h"
 4 | #include <cstdlib>
 5 | 
 6 | // 获取当前时间
 7 | auto get_time() { return std::chrono::system_clock::now(); }
 8 | 
 9 | /**
10 |  * @brief yield函数
11 |  *
12 |  * TODO: Task 1
13 |  * 协程主动暂停执行，保存协程的寄存器和栈帧。
14 |  * 将上下文转换至 coroutine_pool.serial_execute_all() 中的上下文进行重新的
15 |  * schedule 调用。
16 |  */
17 | void yield() {
18 |   if (!g_pool->is_parallel) {
19 |     // 从 g_pool 中获取当前协程状态
20 |     auto context = g_pool->coroutines[g_pool->context_id];
21 |     coroutine_switch(context->callee_registers, context->caller_registers);
22 |     // 调用 coroutine_switch 切换到 coroutine_pool 上下文
23 |   }
24 | }
25 | 
26 | /**
27 |  * @brief 完成 sleep 函数
28 |  *
29 |  * TODO: Task 2
30 |  * 你需要完成 sleep 函数。
31 |  * 此函数的作用为：
32 |  *  1. 将协程置为不可用状态。
33 |  *  2. yield 协程。
34 |  *  3. 在至少 @param ms 毫秒之后将协程置为可用状态。
35 |  */
36 | void sleep(uint64_t ms) {
37 |   if (g_pool->is_parallel) {
38 |     auto cur = get_time();
39 |     while (std::chrono::duration_cast<std::chrono::milliseconds>(get_time() - cur).count() < ms)
40 |       ;
41 |   } else {
42 |     // 从 g_pool 中获取当前协程状态
43 | 
44 |     // 获取当前时间，更新 ready_func
45 |     // ready_func：检查当前时间，如果已经超时，则返回 true
46 | 
47 |     // 调用 coroutine_switch 切换到 coroutine_pool 上下文
48 |     auto context = g_pool->coroutines[g_pool->context_id];
49 |     auto current_time = get_time();
50 |     context->ready = false;
51 |     context->ready_func = [ms, current_time]() {return std::chrono::duration_cast<std::chrono::milliseconds>(get_time() - current_time).count() >= ms;};
52 |     yield();
53 |   }
54 | }
55 | 


--------------------------------------------------------------------------------
/inc/context.h:
--------------------------------------------------------------------------------
  1 | #pragma once
  2 | #include <assert.h>
  3 | #include <cstdint>
  4 | #include <functional>
  5 | #include <tuple>
  6 | #include <type_traits>
  7 | 
  8 | enum class Registers : int {
  9 |   RAX = 0,
 10 |   RDI,
 11 |   RSI,
 12 |   RDX,
 13 |   R8,
 14 |   R9,
 15 |   R10,
 16 |   R11,
 17 |   RSP,
 18 |   RBX,
 19 |   RBP,
 20 |   R12,
 21 |   R13,
 22 |   R14,
 23 |   R15,
 24 |   RIP,//当前指令的下一条指令对应的行的地址
 25 |   RegisterCount
 26 | };
 27 | 
 28 | extern "C" {
 29 | void coroutine_entry();//进入协程
 30 | void coroutine_switch(uint64_t *save, uint64_t *restore);//切换协程
 31 | }
 32 | 
 33 | struct basic_context {
 34 |   uint64_t *stack;
 35 |   uint64_t stack_size;
 36 |   uint64_t caller_registers[(int)Registers::RegisterCount];//调用前的寄存器状态
 37 |   uint64_t callee_registers[(int)Registers::RegisterCount];//正在运行时的寄存器状态
 38 |   bool finished;//协程是否结束
 39 |   bool ready;//协程是否可以运行
 40 |   std::function<bool()> ready_func;
 41 | 
 42 |   basic_context(uint64_t stack_size)
 43 |       : finished(false), ready(true), stack_size(stack_size) {
 44 |     stack = new uint64_t[stack_size];
 45 | 
 46 |     // TODO: Task 1
 47 |     // 在实验报告中分析以下代码
 48 |     // 对齐到 16 字节边界
 49 |     uint64_t rsp = (uint64_t)&stack[stack_size - 1];
 50 |     rsp = rsp - (rsp & 0xF);
 51 | 
 52 |     void coroutine_main(struct basic_context * context);
 53 | 
 54 |     callee_registers[(int)Registers::RSP] = rsp;
 55 |     // 协程入口是 coroutine_entry
 56 |     callee_registers[(int)Registers::RIP] = (uint64_t)coroutine_entry;
 57 |     // 设置 r12 寄存器为 coroutine_main 的地址
 58 |     callee_registers[(int)Registers::R12] = (uint64_t)coroutine_main;
 59 |     // 设置 r13 寄存器，用于 coroutine_main 的参数
 60 |     callee_registers[(int)Registers::R13] = (uint64_t)this;
 61 |   }
 62 | 
 63 |   ~basic_context() { delete[] stack; }
 64 | 
 65 |   virtual void run() = 0;
 66 |   virtual void resume() = 0;
 67 | };
 68 | 
 69 | // TODO: Task 1
 70 | // 在实验报告中分析以下代码
 71 | void coroutine_main(struct basic_context *context) {
 72 |   context->run();
 73 |   context->finished = true;
 74 |   coroutine_switch(context->callee_registers, context->caller_registers);
 75 | 
 76 |   // unreachable
 77 |   assert(false);
 78 | }
 79 | 
 80 | extern __thread basic_context *g_current_context;
 81 | 
 82 | // boilerplate code to handle variadic function arguments
 83 | #define EXPAND_CALL_0(args)
 84 | #define EXPAND_CALL_1(args) (std::get<0>(args))
 85 | #define EXPAND_CALL_2(args) EXPAND_CALL_1(args), (std::get<1>(args))
 86 | #define EXPAND_CALL_3(args) EXPAND_CALL_2(args), (std::get<2>(args))
 87 | #define EXPAND_CALL_4(args) EXPAND_CALL_3(args), (std::get<3>(args))
 88 | #define EXPAND_CALL_5(args) EXPAND_CALL_4(args), (std::get<4>(args))
 89 | #define EXPAND_CALL_6(args) EXPAND_CALL_5(args), (std::get<5>(args))
 90 | #define EXPAND_CALL_7(args) EXPAND_CALL_6(args), (std::get<6>(args))
 91 | 
 92 | #define CALLER_IMPL(func, x, args)                                             \
 93 |   if constexpr (std::tuple_size_v<std::decay_t<decltype(args)>> == x)          \
 94 |   func(EXPAND_CALL_##x(args))
 95 | 
 96 | #define CALL(func, args)                                                       \
 97 |   CALLER_IMPL(func, 0, args);                                                  \
 98 |   CALLER_IMPL(func, 1, args);                                                  \
 99 |   CALLER_IMPL(func, 2, args);                                                  \
100 |   CALLER_IMPL(func, 3, args);                                                  \
101 |   CALLER_IMPL(func, 4, args);                                                  \
102 |   CALLER_IMPL(func, 5, args);                                                  \
103 |   CALLER_IMPL(func, 6, args);                                                  \
104 |   CALLER_IMPL(func, 7, args);
105 | 
106 | /**
107 |  * @brief
108 |  * 协程运行时资源管理。存储了协程函数，以及协程函数的运行时栈即寄存器内容等。
109 |  *
110 |  * @tparam F 协程函数类
111 |  * @tparam Args 协程函数所需要的参数列表
112 |  * 在当前情况下，协程函数支支持展开至多 7 个参数。
113 |  * 如果需要更多的参数需要
114 |  *   1. 参考修改 CALL 的宏定义以及添加对应的 EXPAND_CALL_X 的宏定义。
115 |  *   2. 需要修改构造函数中的 static_assert。
116 |  */
117 | template <typename F, typename... Args>
118 | struct coroutine_context : public basic_context {
119 |   F f;
120 |   std::tuple<Args...> args;
121 | 
122 |   // construct a stacked coroutine,with stack size 16 KB
123 |   coroutine_context(F f, Args... args)
124 |       : f(f), args(std::tuple<Args...>(args...)),
125 |         basic_context(16 * 1024 / sizeof(uint64_t)) {
126 |     static_assert(sizeof...(args) <= 7);
127 |   }
128 | 
129 |   // construct a stacked coroutine, with stack_size (in KB)
130 |   coroutine_context(uint64_t stack_size, F f, Args... args)
131 |       : f(f), args(std::tuple<Args...>(args...)),
132 |         basic_context(stack_size * 1024 / sizeof(uint64_t)) {
133 |     static_assert(sizeof...(args) <= 7);
134 |   }
135 | 
136 |   /**
137 |    * @brief 恢复协程函数运行。
138 |    * TODO: Task 1
139 |    * 你需要保存 callee-saved 寄存器，并且设置协程函数栈帧，然后将 rip 恢复到协程
140 |    * yield 之后所需要执行的指令地址。
141 |    */
142 |   virtual void resume() {
143 |     coroutine_switch(caller_registers, callee_registers);
144 |     // 调用 coroutine_switch
145 |     // 在汇编中保存 callee-saved 寄存器，设置协程函数栈帧，然后将 rip 恢复到协程 yield 之后所需要执行的指令地址。
146 |   }
147 | 
148 |   virtual void run() { CALL(f, args); }
149 | };


--------------------------------------------------------------------------------
/inc/coroutine_pool.h:
--------------------------------------------------------------------------------
 1 | #pragma once
 2 | #include "context.h"
 3 | #include <memory>
 4 | #include <thread>
 5 | #include <vector>
 6 | 
 7 | struct coroutine_pool;
 8 | extern coroutine_pool *g_pool;
 9 | 
10 | /**
11 |  * @brief 协程池
12 |  * 保存所有需要同步执行的协程函数。并可以进行并行/串行执行。
13 |  */
14 | struct coroutine_pool {
15 |   std::vector<basic_context *> coroutines;
16 |   int context_id;
17 | 
18 |   // whether run in threads or coroutines
19 |   bool is_parallel;
20 | 
21 |   ~coroutine_pool() {
22 |     for (auto context : coroutines) {
23 |       delete context;
24 |     }
25 |   }
26 | 
27 |   // add coroutine to pool
28 |   template <typename F, typename... Args>
29 |   void new_coroutine(F f, Args... args) {
30 |     coroutines.push_back(new coroutine_context(f, args...));
31 |   }
32 | 
33 |   /**
34 |    * @brief 以并行多线程的方式执行所有协程函数
35 |    */
36 |   void parallel_execute_all() {
37 |     g_pool = this;
38 |     is_parallel = true;
39 |     std::vector<std::thread> threads;
40 |     for (auto p : coroutines) {
41 |       threads.emplace_back([p]() { p->run(); });
42 |     }
43 | 
44 |     for (auto &thread : threads) {
45 |       thread.join();
46 |     }
47 |   }
48 | 
49 |   /**
50 |    * @brief 以协程执行的方式串行并同时执行所有协程函数
51 |    * TODO: Task 1, Task 2
52 |    * 在 Task 1 中，我们不需要考虑协程的 ready
53 |    * 属性，即可以采用轮询的方式挑选一个未完成执行的协程函数进行继续执行的操作。
54 |    * 在 Task 2 中，我们需要考虑 sleep 带来的 ready
55 |    * 属性，需要对协程函数进行过滤，选择 ready 的协程函数进行执行。
56 |    *
57 |    * 当所有协程函数都执行完毕后，退出该函数。
58 |    */
59 |   void serial_execute_all() {
60 |     is_parallel = false;
61 |     g_pool = this;
62 |     while(true) {
63 |       int cnt = 0;
64 |       for (int i = 0; i < coroutines.size(); i ++) {
65 |         if (coroutines[i]->finished == false) {
66 |           context_id = i;
67 |           cnt += 1;
68 |           if (coroutines[i]->ready == true) {
69 |             coroutines[i]->resume();
70 |           }
71 |           else {
72 |             if(coroutines[i]->ready_func() == true) {
73 |               coroutines[i]->ready = true;
74 |               coroutines[i]->resume();
75 |             }
76 |           }
77 |         }
78 |       }
79 |       if (cnt == 0) {
80 |         break;
81 |       }
82 |     }
83 |     for (auto context : coroutines) {
84 |       delete context;
85 |     }
86 |     coroutines.clear();
87 |   }
88 | };
89 | 


--------------------------------------------------------------------------------
/lib/context.S:
--------------------------------------------------------------------------------
 1 | .global coroutine_entry
 2 | coroutine_entry:
 3 |     movq %r13, %rdi
 4 |     callq *%r12
 5 | 
 6 | .global coroutine_switch
 7 | coroutine_switch:
 8 |     # TODO: Task 1
 9 |     # 保存 callee-saved 寄存器到 %rdi 指向的上下文
10 |     # 保存的上下文中 rip 指向 ret 指令的地址（.coroutine_ret）
11 | 
12 |     # 从 %rsi 指向的上下文恢复 callee-saved 寄存器
13 |     # 最后 jmpq 到上下文保存的 rip
14 |     movq %rax, (%rdi)
15 |     movq %rdi, 8(%rdi)
16 |     movq %rsi, 16(%rdi)
17 |     movq %rdx, 24(%rdi)
18 |     movq %r8, 32(%rdi)
19 |     movq %r9, 40(%rdi)
20 |     movq %r10, 48(%rdi)
21 |     movq %r11, 56(%rdi)
22 |     movq %rsp, 64(%rdi)
23 |     movq %rbx, 72(%rdi)
24 |     movq %rbp, 80(%rdi)
25 |     movq %r12, 88(%rdi)
26 |     movq %r13, 96(%rdi)
27 |     movq %r14, 104(%rdi)
28 |     movq %r15, 112(%rdi)
29 |     leaq .coroutine_ret(%rip), %r8
30 |     movq %r8, 120(%rdi)
31 | 
32 |     movq (%rsi), %rax
33 |     movq 24(%rsi), %rdx
34 |     movq 32(%rsi), %r8
35 |     movq 40(%rsi), %r9
36 |     movq 48(%rsi), %r10
37 |     movq 56(%rsi), %r11
38 |     movq 64(%rsi), %rsp
39 |     movq 72(%rsi), %rbx
40 |     movq 80(%rsi), %rbp
41 |     movq 88(%rsi), %r12
42 |     movq 96(%rsi), %r13
43 |     movq 104(%rsi), %r14
44 |     movq 112(%rsi), %r15
45 |     jmpq *120(%rsi)
46 | .coroutine_ret:
47 |     ret
48 | 


--------------------------------------------------------------------------------
/src/binary_search.cpp:
--------------------------------------------------------------------------------
  1 | #include "common.h"
  2 | #include <assert.h>
  3 | #include <random>
  4 | #include <stdint.h>
  5 | #include <stdio.h>
  6 | #include <stdlib.h>
  7 | #include <sys/time.h>
  8 | #include <unistd.h>
  9 | 
 10 | coroutine_pool *g_pool;
 11 | 
 12 | void lookup_coroutine(const uint32_t *table, size_t size, uint32_t value,
 13 |                       uint32_t *result) {
 14 |   size_t low = 0;
 15 |   while ((size / 2) > 0) {
 16 |     size_t half = size / 2;
 17 |     size_t probe = low + half;
 18 | 
 19 |     // TODO: Task 3
 20 |     // 使用 __builtin_prefetch 预取容易产生缓存缺失的内存
 21 |     // 并调用 yield
 22 |     __builtin_prefetch(table + probe);
 23 |     yield();
 24 | 
 25 |     uint32_t v = table[probe];
 26 |     if (v <= value) {
 27 |       low = probe;
 28 |     }
 29 |     size -= half;
 30 |   }
 31 |   *result = low;
 32 | }
 33 | 
 34 | void lookup(const uint32_t *table, size_t size, uint32_t value,
 35 |             uint32_t *result) {
 36 |   size_t low = 0;
 37 |   while ((size / 2) > 0) {
 38 |     size_t half = size / 2;
 39 |     size_t probe = low + half;
 40 |     uint32_t v = table[probe];
 41 |     if (v <= value) {
 42 |       low = probe;
 43 |     }
 44 |     size -= half;
 45 |   }
 46 |   *result = low;
 47 | }
 48 | 
 49 | uint32_t *naive(int m, int n, int batch, size_t log2_bytes, uint32_t *data) {
 50 |   std::uniform_int_distribution<uint32_t> distr;
 51 |   std::minstd_rand eng(0);
 52 | 
 53 |   uint32_t *res = new uint32_t[m];
 54 |   uint32_t *keys = new uint32_t[m];
 55 |   for (int i = 0; i < m; i++) {
 56 |     keys[i] = distr(eng) % n;
 57 |   }
 58 | 
 59 |   auto time_begin = get_time();
 60 | 
 61 |   for (int i = 0; i < m; i++) {
 62 |     uint32_t key = keys[i];
 63 |     lookup(data, n, key, &res[i]);
 64 |   }
 65 | 
 66 |   uint64_t time_elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(
 67 |                               get_time() - time_begin)
 68 |                               .count();
 69 | 
 70 |   printf("naive: %.2f ns per search, %.2f ns per access\n",
 71 |          (double)time_elapsed / m, (double)time_elapsed / m / log2_bytes);
 72 |   return res;
 73 | }
 74 | 
 75 | uint32_t *coroutine_batched(int m, int n, int batch, size_t log2_bytes,
 76 |                        uint32_t *data) {
 77 |   std::uniform_int_distribution<uint32_t> distr;
 78 |   // https://stackoverflow.com/questions/22883840/c-get-random-number-from-0-to-max-long-long-integer
 79 |   std::minstd_rand eng(0);
 80 | 
 81 |   uint32_t *res = new uint32_t[m];
 82 |   uint32_t *keys = new uint32_t[m];
 83 |   for (int i = 0; i < m; i++) {
 84 |     keys[i] = distr(eng) % n;
 85 |   }
 86 | 
 87 |   auto time_begin = get_time();
 88 | 
 89 |   assert(m % batch == 0);
 90 | 
 91 |   coroutine_pool pool;
 92 |   for (int i = 0; i < m; i += batch) {
 93 |     for (int j = 0; j < batch; j++) {
 94 |       uint32_t key = keys[i + j];
 95 |       pool.new_coroutine(lookup_coroutine, data, n, key, &res[i + j]);
 96 |     }
 97 |     pool.serial_execute_all();
 98 |   }
 99 | 
100 |   uint64_t time_elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(
101 |                               get_time() - time_begin)
102 |                               .count();
103 | 
104 |   printf("coroutine batched: %.2f ns per search, %.2f ns per access\n",
105 |          (double)time_elapsed / m, (double)time_elapsed / m / log2_bytes);
106 |   return res;
107 | }
108 | 
109 | int main(int argc, char *argv[]) {
110 |   // 4 GiB
111 |   size_t log2_bytes = 32;
112 |   int m = 1000000;
113 |   int batch = 16;
114 | 
115 |   int opt;
116 |   while ((opt = getopt(argc, argv, "l:m:b:")) != -1) {
117 |     switch (opt) {
118 |     case 'l':
119 |       sscanf(optarg, "%ld", &log2_bytes);
120 |       break;
121 |     case 'm':
122 |       sscanf(optarg, "%d", &m);
123 |       break;
124 |     case 'b':
125 |       sscanf(optarg, "%d", &batch);
126 |       break;
127 |     default:
128 |       fprintf(stderr, "Usage: %s [-l log2_size] [-m loop] [-b batch]\n",
129 |               argv[0]);
130 |       exit(EXIT_FAILURE);
131 |     }
132 |   }
133 | 
134 |   size_t bytes = 1LL << log2_bytes;
135 | 
136 |   printf("Size: %ld\n", bytes);
137 |   printf("Loops: %d\n", m);
138 |   printf("Batch size: %d\n", batch);
139 |   fflush(stdout);
140 | 
141 |   size_t n = bytes / sizeof(uint32_t);
142 |   uint32_t *data = new uint32_t[n];
143 | 
144 |   for (size_t i = 0; i < n; i++) {
145 |     data[i] = i;
146 |   }
147 | 
148 |   printf("Initialization done\n");
149 |   fflush(stdout);
150 | 
151 |   uint32_t *naive_res = naive(m, n, batch, log2_bytes, data);
152 |   uint32_t *coroutine_res = coroutine_batched(m, n, batch, log2_bytes, data);
153 |   for (int i = 0; i < m; i++) {
154 |     assert(naive_res[i] == coroutine_res[i]);
155 |   }
156 |   return 0;
157 | }
158 | 


--------------------------------------------------------------------------------
/src/sample.cpp:
--------------------------------------------------------------------------------
 1 | #include "common.h"
 2 | #include "context.h"
 3 | #include "coroutine_pool.h"
 4 | #include <cstdio>
 5 | #include <iostream>
 6 | 
 7 | coroutine_pool *g_pool;
 8 | 
 9 | // example code to run in coroutine
10 | std::vector<int> p;
11 | 
12 | void show(int x) {
13 |   for (int i = 0; i < x; i++) {
14 |     p.push_back(i);
15 |     printf("in show(): %d\n", i);
16 |     yield();
17 |   }
18 | }
19 | 
20 | int main() {
21 |   coroutine_pool pool;
22 |   // spawn two coroutines
23 |   for (int i = 0; i < 2; i++)
24 |     pool.new_coroutine(show, 5);
25 | 
26 |   // execute and print result
27 |   // pool.parallel_execute_all();
28 |   pool.serial_execute_all();
29 |   for (auto i : p) {
30 |     printf("in main(): %d\n", i);
31 |   }
32 | 
33 |   return 0;
34 | }


--------------------------------------------------------------------------------
/src/sleep_sort.cpp:
--------------------------------------------------------------------------------
 1 | #include "common.h"
 2 | #include <cstdio>
 3 | #include <cstdlib>
 4 | 
 5 | using namespace std;
 6 | 
 7 | coroutine_pool *g_pool;
 8 | int main() {
 9 |   int n, a[100];
10 |   coroutine_pool coroutines;
11 | 
12 |   // input N
13 |   scanf("%d", &n);
14 |   for (int i = 0; i < n; i++) {
15 |     // input N numbers
16 |     scanf("%d", &a[i]);
17 | 
18 |     // create a coroutine for each number
19 |     coroutines.new_coroutine(
20 |         [](int x) {
21 |           sleep(x);
22 |           printf("%d\n", x);
23 |         },
24 |         a[i]);
25 |   }
26 | 
27 |   // execute and print
28 |   coroutines.serial_execute_all();
29 |   return 0;
30 | }


--------------------------------------------------------------------------------