├── .clang-format ├── .gitignore ├── .idea └── .gitignore ├── CodingStyle.md ├── How To Select an Array.xlsx ├── LICENSE ├── README.md ├── articles ├── Execution-1.md ├── Execution-2.md ├── Execution-3.md ├── Execution-4.md ├── Execution-5.md ├── Execution-6.md ├── Execution-7.md ├── TaskGraph.drawio └── media │ ├── Reform-2.png │ ├── Reform.png │ ├── Senders-LinkedList.png │ ├── Senders-List.png │ ├── Senders.png │ ├── SndRcvWrong.png │ └── TaskGraph.png ├── brainstorms ├── 1. Text Processing.md ├── AsyncExecutionModel.md ├── CustomizationPoints.md └── PMC++ Topics.md ├── examples ├── CMakeLists.txt ├── CMakePresets.json ├── CppVersion.h ├── Ex_3_TextProcessing_String_Manip_Cpp11.cpp ├── Ex_3_TextProcessing_String_Manip_Cpp14.cpp ├── Ex_3_TextProcessing_String_Manip_Cpp23.cpp ├── Ex_X_Executor_Cpp20.cpp ├── Pex.py ├── main.cpp ├── unifex │ └── Findunifex.cmake └── vcpkg.json └── slides ├── PMC++.0_Intro.md ├── PMC++.1_SmartPointers.1.md ├── PMC++.1_SmartPointers.2.md ├── PMC++.1_SmartPointers.3.md ├── PMC++.1_SmartPointers.4.md ├── PMC++.3_TextProcessing.1.md ├── PMC++.4_TextProcessing.2.md ├── PMC++.4_TextProcessing.3.ipynb ├── PMC++.4_TextProcessing.3.md ├── PMC++.drawio └── media ├── Euler_diag_for_jp_charsets.svg ├── Lightmatter_panda.jpg ├── SPLayout.svg ├── SharedPtrLifetime.png ├── UTF-16 Sample.png ├── Venn_diagram_gr_la_ru.svg.png ├── fck.gif ├── hanzi_standard_fonts.png ├── hanzi_standard_fonts_marked.png ├── notsimple.jpg ├── python-cyclic-gc-5-new-page.png ├── tutorial.png ├── zhen.png └── 小问号.jpeg /.clang-format: -------------------------------------------------------------------------------- 1 | BasedOnStyle: LLVM 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | out/ 2 | .obsidian/ 3 | .cache/ 4 | compile_commands.json 5 | -------------------------------------------------------------------------------- /.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # Default ignored files 2 | /shelf/ 3 | /workspace.xml 4 | # Editor-based HTTP Client requests 5 | /httpRequests/ 6 | # Datasource local storage ignored files 7 | /dataSources/ 8 | /dataSources.local.xml 9 | -------------------------------------------------------------------------------- /CodingStyle.md: -------------------------------------------------------------------------------- 1 | # `#include ` 2 | 3 | # Naming the entities 4 | 5 | ## General naming rules 6 | * **This rule is not applied to the name that shared with other systems.** 7 | * For e.g., FIFO/HW/Reg Module names 8 | * **Don't** abuse abbreviation for variable/function/class name 9 | * **DON'T**: the head words 10 | * For e.g. barycentric**Point** 11 | * **DON'T**: short words (In general, <= 8 characters) 12 | * For e.g. `Mode` -> `md` , `Module` -> `Mdu`, `Core` -> `Cor`, `Port` -> `Prt`, etc. 13 | * **DON'T**: ambigous abbreviations 14 | * **DO**: Abbr. CAN be used where there is clear context *WITHOUT* misunderstanding 15 | * e.g.
`Context ctx;`
`std::function func;` 16 | * Common abbreviations: https://github.com/kisvegabor/abbreviations-in-code 17 | * **PLEASE FIX TYPOs!** 18 | 19 | ## Naming convention 20 | 21 | * Macro 22 | * All capital, underscore as separator 23 | * `HAS_DEFAULT_CONSTRUCTORS(MyClass);` 24 | * Type name 25 | * starts with capital letter, and capital letter for each new word. 26 | * `ThisIsAType` 27 | * Variable name 28 | * Local varaible as `localVariable` 29 | * Class member variable as `mFieldName` 30 | * Struct member variable as `FieldName` 31 | * Static member variable as `sFieldName` 32 | * `const static` or `const local variable as `kValue` 33 | * Function name 34 | * Regular functions have mixed case; accessors and mutators may be named like variables.
Ordinarily, functions should start with a capital letter and have a capital letter for each new word. 35 | * Prefer verb-object phrase 36 | * For e.g. `AddTriangle` 37 | * Enumeration 38 | * `enum` 39 | * The name of `enum` is following type naming rule. 40 | * Field names in `enum` like the macros and should have a prefix to identify the `enum`. 41 | * e.g. `enum PrimitiveTopology{ PRIMTOPO_TRIANGLE_LIST = 0; };` 42 | * `enum class` 43 | * The name of `enum class` and fields follows the naming conversion of structure. 44 | 45 | ``` C++ 46 | 47 | struct MyStruct { 48 | int Value; 49 | }; 50 | 51 | template 52 | class MyClass { 53 | public: 54 | void DoSomething() { 55 | int localVariable; 56 | // ... 57 | } 58 | private: 59 | T mSomeField; 60 | static int sField; 61 | static int const kMaxValue; 62 | }; 63 | ``` 64 | 65 | # Statements 66 | 67 | ## `if` - `else` 68 | ``` C++ 69 | // Rule: `cond` should be a bool 70 | if (cond) { return false; } 71 | 72 | if (cond) { 73 | return false; 74 | } 75 | 76 | if (cond) 77 | { 78 | return false; 79 | } 80 | 81 | if (cond) 82 | { 83 | // ... do something ... 84 | } 85 | 86 | // Rule: Braces are requied even only one statement in the branch. 87 | if (cond) 88 | { 89 | return localVariable; 90 | } 91 | 92 | // Rule: Convert to bool explicitly, except pointer. 93 | if (iValue != 0) 94 | { 95 | // ... do something ... 96 | } 97 | ``` 98 | 99 | ## `switch` 100 | 101 | ``` C++ 102 | // Rule: Pass-thru is only available for shared logic 103 | switch (v) { 104 | case A: 105 | // ... do something ... 106 | break; // don't pass through 107 | 108 | // Pass-thru is only allowed for following case 109 | case B: 110 | case C: 111 | case D: 112 | // ... do something for B, C and D ... 113 | break; 114 | 115 | // Rule: Default is required 116 | default: 117 | // ... do something or validation ... 118 | break; 119 | } 120 | ``` 121 | 122 | # Function 123 | 124 | ``` C++ 125 | class SmallStruct { 126 | void* p; 127 | int v; 128 | }; 129 | 130 | class BigClass { 131 | int arr[20]; 132 | } 133 | 134 | // Rule: For small structure (<= 2 * sizeof(void*)), passing by value 135 | // Rule: For big class, passing by const reference 136 | void Foo(SmallStruct smallValue, BigClass const& bigValue) 137 | { 138 | // Rule: var 139 | int apple = 5; 140 | // ... do something ... 141 | } 142 | ``` 143 | 144 | # `struct`/`class` with template 145 | 146 | ## Naming convension 147 | 148 | 149 | 150 | ## Guides 151 | ### Don't use macro in `class/struct` 152 | Don't 153 | ``` C++ 154 | struct Value { 155 | int v; 156 | #if defined(DEBUG) 157 | DebugInfo dbg; // !!! Danger !!! 158 | #endif 159 | }; 160 | ``` 161 | Do 162 | ``` 163 | struct Value { 164 | int v; 165 | std::unqiue_ptr dbg; 166 | }; 167 | ``` 168 | 169 | # `assert` and `expect` 170 | # Standard library 171 | -------------------------------------------------------------------------------- /How To Select an Array.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/How To Select an Array.xlsx -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [2023] [Ye Wu] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Practical Modern C++ 2 | 3 | Practical Modern C++, slides and part of code. 4 | 5 | ## Preview Slides 6 | * VSCode with Extensions: 7 | * `Marp for VS Code` 8 | 9 | ## Build Testing Code 10 | ### Environment 11 | * *Linux* or *Windows + WSL + Linux* 12 | * Our dev environment: 13 | * Windows 11 + WSL + Ubuntu 22.04 LTS 14 | 15 | ### Installation steps 16 | * Install following build toolchain by package manager of Linux 17 | * **cmake** >= 3.22 18 | * **clang** >= 14 19 | * **ninja-build** 20 | * Install following package with offcial document 21 | * **Vcpkg**: [Vcpkg: Overview / Quick Start: Unix](https://github.com/microsoft/vcpkg/blob/master/README.md#quick-start-unix) 22 | * Install IDE for developinng and debugging 23 | * **GDB** (up to date) 24 | * VSCode on your host machine 25 | * Change "toolchainFile" attribute in CMakePresets.json to your vcpkg.cmake file path. 26 | * Suggested extensions: 27 | * `clangd` 28 | * `CMake` 29 | * `CMake Tools` 30 | * `C/C++ Extension Pack` 31 | * Build by CMake or VSCode. 32 | 33 | -------------------------------------------------------------------------------- /articles/Execution-1.md: -------------------------------------------------------------------------------- 1 | # 一 “Our eyes are yet to open, fear the C plus plus.” 2 | 3 | 在远古时代,C++11正式发布了。十年过去,它增添了许多新特性,表达能力更强,当然写出来的代码也可能会更加令人抓狂 ———— 正如那些有着无穷知识,却又令人发狂的上位者们。 4 | 5 | 本系列中是对其中一名上位者 —— C++ Execution —— 进行探索的癫狂之旅。 6 | 7 | 探究上位者秘辛之前,需要一个例子作为祭品: 8 | 9 | > 设计一个功能单元,输入一个数字`arg`,返回`arg + 42`,比如: 10 | > ``` C++ 11 | > #include 12 | > #include 13 | > 14 | > int main(int argc, char* argv[]) { 15 | > int in_value = atoi(argv[0]); 16 | > /******************* 17 | > ... 你的代码 ... 18 | > ********************/ 19 | > std::cout << "Result: " << out_value << std::endl; 20 | > } 21 | > ``` 22 | 23 | 对于这个问题,最基本的办法自然是直接了当地写: 24 | 25 | ``` C++ 26 | // ... 27 | int out_value = in_value + 42; 28 | // ... 29 | ``` 30 | 31 | 如果考虑把实现细节封装起来并提供可复用性,可以写成: 32 | 33 | ``` C++ 34 | int add42(int a) { 35 | return a + 42; 36 | } 37 | 38 | // ... 39 | int out_value = add42(in_value); 40 | // ... 41 | ``` 42 | 43 | 再泛化一下`arg`的类型,使得它可以用于更广泛的场合: 44 | 45 | ``` C++ 46 | template 47 | auto add42(T arg) { 48 | return arg + 42; 49 | } 50 | ``` 51 | 52 | 或者再宽泛一点,42也不是什么奇妙的、独一无二的数字: 53 | 54 | ``` C++ 55 | template 56 | auto addK(T arg) { 57 | return arg + I; 58 | } 59 | 60 | // ... 61 | int out_value = addK<42>(in_value); 62 | // ... 63 | ``` 64 | 65 | 如果只针对题目本身,可以说以上几种解答都是可以满足要求的。只不过不同的解法都有一些额外的特性,比如有些*封装了实现细节*,有一些则提供了*泛型*性。当这些特性连同题目的要求一起被需要的时候,那么这些解就不再是面目可憎,而的确是不同组合条件下的较优解。 66 | 67 | 接下来,就要看看上面这个淳朴的祭品在上位者 C++ Execution 眼中扭曲的样子 [https://godbolt.org/z/rv41cqPeq]: 68 | 69 | ``` C++ 70 | // ... 71 | sender auto s = just(in_value) | then([](int i) {return i + 42;}); 72 | sync_wait(s).value(); 73 | // ... 74 | ``` 75 | 76 | Hmm,欢迎你,外乡人。 77 | 78 | > 我们的眼界尚不足以意识到C++的可怖之处。 —— Master Willem 79 | -------------------------------------------------------------------------------- /articles/Execution-2.md: -------------------------------------------------------------------------------- 1 | # 二 "But where's an outsider like yourself to begin?" 2 | 3 | 在理解 C++ Execution 是如何完成 `arg + 42` 这个请求之前,我们需要先了解用什么样的姿势对它祈祷才能获得它的认可。 4 | 5 | ``` C++ 6 | // ... 7 | sender auto s = just(in_value) | then([](int i) {return i + 42;}); 8 | int out_value = sync_wait(s).value(); 9 | // ... 10 | ``` 11 | 12 | 例子一共两句代码。第一句代码构造了一个数据流动的“管道”,`just` 会从外部获得一个数据(`in_value`),然后发送给 `then`;`then` 拿到这个数据后,调用参数上的匿名函数,获得一个增加42之后的结果,然后再试图传递给下一级。数据传递的关系使用了管道符('|')表示,就和linux命令行类似。 13 | 14 | 在第一句结束之后,这个数据的管道就构建完毕了。这个时候,这个数据管道并没有正式启动。返回值 `s` 是一个符合 `sender` 概念的类型的对象,它只表示了这个数据管道,而不是直接执行并返回 `arg + 42` 的结果。关于 `sender` 概念,我们会在之后的章节中进行解析。 15 | 16 | 只有在第二句使用 `sync_wait` 调用 `s` 的时候,才会真正的驱动 `s` 的执行,并返回一个 `std::optional` 的对象。这个对象中包含有计算之后的结果。 17 | 18 | 这种 _表达(声明)_ 与 _执行_ 两段式的做法,在C++中并不罕见,比如 _Expression templates_。其它语言中也可能会因为其它的原因(比如Immutation的传递性)而设计出类似的机制,比如Haskell中的`Monad` 。 19 | 20 | 因为一个数据管道被 _声明_ 后未必会被立刻执行 —— 尽管它的代码看起来就和执行了一样,所以这也被称作 _延迟求解 (lazy evaluation)_。这里并不是说它非延迟不可。通过特定的实现,它完全可以做到 _立即求解_ (eager evaluation)。比如: 21 | 22 | ``` C++ 23 | int out_value = then(just(in_value), [](int i){ return i + 42; }); 24 | ``` 25 | 26 | 只需要 27 | 28 | ``` C++ 29 | auto just(auto v) { 30 | return v; 31 | } 32 | auto then(auto v, auto f) { 33 | return f(v); 34 | } 35 | ``` 36 | 37 | 就可以做到这一点[https://godbolt.org/z/s5aEKxhfM]。当然,你应该已经发现了,这里 `out_value` 计算的表达式和例子中略有不同。这里我们没有使用管道符进行传递,而是使用了嵌套的函数调用。这两者是等价的,只要能正确的重载 `operator |`。比如我们这样: 38 | 39 | ``` C++ 40 | auto operator | (auto v, auto callable) { 41 | return callable(v); 42 | } 43 | ``` 44 | 45 | 最终可以得到我们的“人偶”: 46 | 47 | ``` C++ 48 | #include 49 | 50 | auto just(auto v) { return v; } 51 | 52 | auto then(auto v, auto f) { return f(v); } 53 | 54 | auto then(auto f) { 55 | return [&f](auto v) { return then(v, f); }; 56 | } 57 | 58 | // 这里不加约束地、粗暴地重载了 operator | 会在和许多第三方库联合编译时产生错误,比如{fmt}。 59 | // 这并不是正确的写法,只是为了最直接的解决问题。 60 | auto operator | (auto v, auto c) { return c(v); } 61 | 62 | int main() 63 | { 64 | int in_value = 1; 65 | auto ret1 = then(just(in_value), [](int i) {return i + 42;}); 66 | auto ret2 = just(in_value) | then([](int i) {return i + 42;}); 67 | std::cout << ret1 << " " << ret2 << std::endl; // Output: 43 43 68 | return 0; 69 | } 70 | ``` 71 | 72 | 为了实现完整的管道调用,除了增加了 `operator |`,我们还为`then`增加一种重载。 73 | 除了采用的是`eager evaluation`导致的返回值类型不同,其它简直可以说和我们可爱的祭品一模一样。 74 | 75 | 这个时候可能有人会说,那我这么写: 76 | 77 | ``` C++ 78 | auto foo = [in_value]() { 79 | return just(in_value) | then([](int i ){return i + 42;}); 80 | }; 81 | ``` 82 | 83 | 这不也是 _延期求解_ 吗?也是不调用 `foo` 就不求解。是的,它当然也可以算作是广义上的 _延迟求解_。但是和一般意义上的 _延迟求解_ 不同,被函数封装后的表达式已经不再是它自身。这样使得表达式的自然组合被这个刻意的函数所隔断。 84 | 85 | 这样的 “_延迟求解_” 并不能让你自然地从较小的 _延迟求解_ 的结构去拼接成一个更大的 _延迟求解_ 的结构。在所有需要延迟的地方,都得手动设计代码以规避 _立即求解_。 86 | 87 | 那么这里 _立即求解_ 有什么缺点呢?为什么我们要舍弃 _立即求解_ 的直观和简单,而使用绕了一道弯的 _延迟求解_ 呢? 88 | 89 | > 但是,像你一样的外乡人要从哪里开始了解C++ Execution呢?简单,你只需要给自己来点 `auto` 和 `template` … 90 | -------------------------------------------------------------------------------- /articles/Execution-3.md: -------------------------------------------------------------------------------- 1 | # 三 "We are born of the async, made men by the async, undone by the async." 2 | 3 | **软件工程基本定理**(Fundamental theorem of software engineering, FTSE)说, 4 | 5 | > “通过添加间接层,我们可以解决任何问题。” 6 | 7 | 或者我们也可以说:增加间接层的目的,一般是为了解决某个问题。 8 | 9 | 在此处,和 _立即求解_ 的版本相比,_延期求解_ 版本的 Execution 额外提供了 “声明(表达)” 这一间接层,使我们得以将问题的 *表达* 和 *求解(执行)* 分离开来。 10 | 11 | 这样,拥有**完整**问题信息的、或者说全知、先知的框架,就可以根据情况,选择和调整具体的 *执行* 过程,比如: 12 | * 为任务可以选择一个特定的执行时间; 13 | * 将任务部署到某个线程或者某个核心上; 14 | * 增加、去除、重排、修改某些执行步骤; 15 | * 将任务部署到不同设备,比如GPU上。 16 | 17 | 这些 *执行* 层面上的变化,都可以另行定制,基本不需要对 `表达` 进行修改。这样便同时达到了既灵活,又可控的要求。 18 | 19 | 作为一个上位者,`std::execution` 的别名是 “ *异步(asynchronization)* 与 *异构(heterogeneous)* 的 *处刑人(executor)* ”。可以看出, _延期求解_ 的设计恰好迎合了上位者的挑剔的口味。 20 | 21 | 实现 *延期求解* 有多种办法,比如使用设计模式中的*intepreter*或者*builder*。不过,因为C++本身就提供了相对完善地元编程能力,比如模板、泛型和操作符重载。如果能充分运用这些特性,编译器就可以获得充足的信息以充分优化代码。因此在C++中完成*延期求解*这一行为,更加流行的方法是“血疗”:**表达式模板(expression template)**。 22 | 23 | 这里我们用一个简单的祭品:**表达式求值** 来帮助大家回顾一下“血疗”(本血疗手册参考自:[https://en.wikipedia.org/wiki/Expression_templates])。 24 | 25 | 表达式求值几乎是所有语言的入门课。在C++中对表达式求值可以看作是立即执行的。我们需要进行一些设计,把立即执行的表达式求值变成*延期求解*的。 26 | 27 | *表达式*是一个递归的树状结构。也就是说,*表达式*的一部分也是一个*表达式*。`a + 2 + b` 是一个加法表达式,它将两个子表达式 `a + 2` 和 `b` 用加号连接了起来。我们可以设计一个空的结构`struct Expr`,作为表达式这个概念的根节点。 28 | 29 | 可以通过判断一个类或者对象是否继承自Expr来认定它是不是一个表达式。 30 | 31 | 表达式树的叶节点,一般是一个字面值或者一个变量。为了简单,我们认为叶节点只有变量一种形式、运算符也只有加号。于是我们设计出以下结构: 32 | 33 | ``` C++ 34 | struct Expr {}; 35 | 36 | template 37 | struct Value: Expr { 38 | T const& get() const { return v; } 39 | T const& v; 40 | explicit Value(T const& a): v(a) {} 41 | }; 42 | ``` 43 | 44 | 那么,两个`Expr`求和之后的结果是什么呢? 45 | 答案是:*两个Expr之和*。啊,对你没看错,就是这么一句废话。 46 | 47 | ``` C++ 48 | template 49 | struct Add_: Expr { 50 | E0 a; 51 | E1 b; 52 | }; 53 | 54 | template 55 | Add_ operator + (E0 e0, E1 e1) { 56 | return {{}, e0, e1}; // 初始化列表中第一个参数 {} 是用于构造父类 Expr 的。 57 | } 58 | ``` 59 | 60 | 这段代码距离真正能运行的代码还差了两步: 61 | - *表达*是*表达*了,那真到了*求值*的时候应该怎么办呢? 62 | - 如何限制我们的 `operator +` 仅适用于两个参数都是`Expr`子类的时候? 63 | 64 | 对于第一个问题,简单,给它增加一个函数 `get`,在这个函数内完成实际的*求值*动作就好了。 65 | 对于第二个问题,我们在这里使用`concepts`,告诉它这个操作符重载仅适用于特定的条件。 66 | 67 | 68 | 所以完整的`Add_`和`operator +`长这样: 69 | 70 | ``` C++ 71 | template 72 | struct Add_: Expr { 73 | E0 a; 74 | E1 b; 75 | auto get() const { 76 | return a.get() + b.get(); 77 | } 78 | }; 79 | 80 | auto operator + (std::derived_from auto e1, std::derived_from auto e2) { 81 | return Add_{{}, e1, e2}; 82 | } 83 | 84 | // 或者写成下面这样: 85 | 86 | template< 87 | std::derived_from E1, // 使用Concept要求E1必须要继承自Expr。 88 | std::derived_from E2> 89 | auto operator + (E1 e1, E2 e2) { 90 | return Add_{{}, e1, e2}; 91 | } 92 | ``` 93 | 94 | `Add_::get()`的实现中调用了小弟们的`get()`,这说明`Add_`求值的时候不能就他一个人惨,还要带着它的小弟们一起惨。总之,在增加了必要的代码后,下面这段代码就可以执行了: 95 | 96 | ``` C++ 97 | int a{1}, b{2}, c{3}; 98 | Value va{a}, vb{b}, vc{c}; 99 | auto r = va + vb + vc; // The type of r is: Add_, Value> 100 | fmt::print("{}", r.get()); // Output: 6 101 | ``` 102 | 103 | 完整的代码在 [https://godbolt.org/z/Ev7n7WeYr]。 104 | 105 | 你看,这是不是就和我们的祭品长得差不多了? 106 | 107 | 当然在这个演示语法的例子因为太简单,并不太能看得出*表达式模板*的作用;问题开头给出的“血疗”参考的链接中更能说明*表达式模板*的作用。 108 | *表达式模板*最早、也是最广泛的应用是在线性代数库中。这是因为它的*延迟求解*的特性,可以调整实际的计算路径,完成诸如循环融合、矩阵乘法的重组等一系列对性能大有助益的优化动作。迄今为止它也是*Eigen*等C++数值库所采用的主要优化方法之一。 109 | 110 | 除此之外,C++中还常使用*表达式模板*用来构造方言(dialect)。这些方言通常是类似于LINQ或者SQL那种声明式语言,其具体的执行步骤往往和方言的语法语义不完全一致。此类方言可用于构造解释器等概念简单、实现复杂的功能。这一类“方言”库比较典型的有Boost.Spirit和Boost.Proto. 111 | 112 | 但是和这里演示的*表达式模板*相比,Execution对*表达式模板*的使用又有所不同。因为:一切为了异步。 113 | 114 | > 我等,因异步而成人,因异步而超人,因异步而非人。无知者啊,敬畏异步吧! 115 | -------------------------------------------------------------------------------- /articles/Execution-4.md: -------------------------------------------------------------------------------- 1 | # 四 “the push and the pull are one.” 2 | 3 | 观察上一节中的*表达式模板*,可以发现它有两个特征: 4 | 5 | 1. 在表达式求值这个例子中,求值函数 `get()` 的调用顺序是先调用最靠近结果的`Expr`,然后依次递归,最后调用叶节点上Value的`get()`。这一从结果到源头的调用顺序,我们可称之为 "Pull Mode",也就是说,只要表达式不去拉取子表达式的结果,子表达式就啥都不做。 6 | 2. 整个求值过程在一个当前线程中同步执行完成。它的效果和直接的递归并没有什么两样。 7 | 8 | 我们要模仿的上位者Execution,在这两点上都和我们的表达式模板略有不同。上位者的执行可以是异步的,不同表达式之间的执行可以并行;其次Execution的求值驱动顺序是*push mode*,也就是先计算前面一级,然后把结果向数据管道的后一级 `push`,以驱动后面的管道执行自己的操作。 9 | 10 | 根本上讲*push/pull*并不存在对立性,它们的实际执行顺序皆由数据间的依赖关系所决定。比如在表达式这个例子中,表面上看起来,`Add_::get()` 是最先*执行*的,`Value::get()`是最后执行的,但实际上,因为求和必须要在两个子树都计算出结果之后才能执行,因此从运算顺序上,`Value::get()` 反而是最先*完成*的。而优化后的执行代码,也往往会变成从叶节点往根节点执行。表达式例子中实际翻译出来的代码会变成下面这样: 11 | 12 | ``` nasm 13 | mov DWORD PTR [rsp+4], 1 14 | mov DWORD PTR [rsp+8], 2 15 | mov DWORD PTR [rsp+12], 3 16 | mov eax, DWORD PTR [rsp+4] ;; o = a 17 | mov ecx, DWORD PTR [rsp+8] 18 | mov edx, DWORD PTR [rsp+12] 19 | add eax, ecx ;; o += b 20 | lea rcx, [rsp+16] 21 | add eax, edx ;; o += c 22 | mov edx, 1 23 | mov DWORD PTR [rsp+16], eax 24 | ``` 25 | 26 | 但从表达上来说,如果任务是从一个起点到一个或多个可能的终点这样的树状结构会更适合*push*的结构;相应的,如果任务由不同的起点汇聚到一个终点(比如表达式求值),那*pull*会更加适合。 27 | 28 | ![](TaskGraph.png) 29 | 30 | 特别是当树或者图中只有部分路径会被执行的时候,如果执行顺序不太合适就可能会导致无谓的计算,此时需要做额外的结构避免此类的性能劣化。 31 | 32 | 我们也可以为表达式模板增加异步计算的功能([示例](https://godbolt.org/z/5qf7WT6c5))。 33 | 34 | 假设我们的设备是一个特别缓慢的设备,取一个数字需要`2ms`, 执行一次加法需要`3ms`,执行一次乘法需要`7ms`: 35 | 36 | ``` C++ 37 | auto slow_fetch(auto const &v) { 38 | std::this_thread::sleep_for(2ms); 39 | return v; 40 | } 41 | 42 | struct _slow_add { 43 | auto operator() (auto a, auto b) const { 44 | std::this_thread::sleep_for(3ms); 45 | return a + b; 46 | } 47 | }; 48 | inline constexpr auto slow_add = _slow_add{}; 49 | 50 | // 这里 slow_add 使用函数对象,因为我们需要将这个泛型函数作为参数,传递给别的函数以用于回调: 51 | // async_eval(slow_add, a, b); 52 | // 此时如果写成 53 | // auto slow_add(auto a, auto b) { ... } 54 | // 会因为 slow_add 本身不是一个变量,会导致编译错误。 55 | 56 | struct _slow_mul { 57 | auto operator() (auto a, auto b) const { 58 | std::this_thread::sleep_for(7ms); 59 | return a * b; 60 | } 61 | }; 62 | inline constexpr auto slow_mul = _slow_mul{}; 63 | ``` 64 | 65 | 那么我们可以把原先是同步的程序,使用`std::future`和`std::async`改造成异步程序: 66 | 67 | ``` C++ 68 | template struct Value : Expr { 69 | future eval() { 70 | return async( 71 | [this]() { return slow_fetch(v); } 72 | ); 73 | } 74 | T const &v; 75 | explicit Value(T const &a) : v(a) {} 76 | }; 77 | 78 | template struct Add_ : Expr { 79 | OpT1 a; 80 | OpT2 b; 81 | auto eval() { 82 | return async( 83 | [this]() mutable { 84 | auto a_future = a.eval(); 85 | auto b_future = b.eval(); 86 | return slow_add(a_future.get(), b_future.get()); 87 | }); 88 | } 89 | }; 90 | 91 | template struct Mul_ : Expr { 92 | OpT1 a; 93 | OpT2 b; 94 | auto eval() { 95 | return async( 96 | [this]() mutable { 97 | auto a_future = a.eval(); 98 | auto b_future = b.eval(); 99 | return slow_mul(a_future.get(), b_future.get()); 100 | }); 101 | } 102 | }; 103 | 104 | auto operator+(std::derived_from auto e1, std::derived_from auto e2) { 105 | return Add_{{}, e1, e2}; 106 | } 107 | 108 | auto operator*(std::derived_from auto e1, std::derived_from auto e2) { 109 | return Mul_{{}, e1, e2}; 110 | } 111 | 112 | int main(int argc, char* argv[]) { 113 | // current time: 0ms 114 | int a{1}, b{2}, c{3}, d{4}; 115 | Value va{a}, vb{b}, vc{c}, vd{4}; 116 | auto r = (va + vb) * (vc + vd); 117 | auto r_future = r.eval(); 118 | fmt::print("{}", r_future.get()); // start at ~0ms, end at ~12ms = 2ms(4T) + 3ms(2T) + 7ms(1T) 119 | return 0; 120 | } 121 | ``` 122 | 123 | 当然,经过观察可以发现,`Add_::eval` 和 `Mul_::eval` 具有相似的结构和逻辑: 124 | 1. 调用子表达式的 `eval`,触发子表达式的执行,并获得一个 `std::future` 用于等待值的完成; 125 | 2. 启动一个异步执行的函数,这个函数在执行的时候会等待子表达式计算完成、并完成自身的求值。同时返回一个 `std::future` 可以让别人等他的结果。 126 | 127 | 唯一的差别点就在于求值本身是调用`slow_add`还是`slow_mul`。 128 | 129 | 因此我们可以把这两个函数提取出一个公共函数 `async_eval` 来 —— 甚至我们还可以把`Add_`和`Mul_`抽象成`BinaryOp_`([示例](https://godbolt.org/z/hdnr1Gbh1)): 130 | 131 | ``` C++ 132 | // 亿点点小技巧 133 | template 134 | auto async_eval(ImmFn&& fn, SubExprsT&&... subExprs) { 135 | auto future_tuple = make_tuple(subExprs.get()...); 136 | return async([&fn, future_tuple = std::move(future_tuple)]() mutable{ 137 | auto invoke_with_future_eval = [&fn](auto&&... future_args) { 138 | return fn(future_args.get()...); 139 | }; 140 | return apply(invoke_with_future_eval, std::move(future_tuple)); 141 | }); 142 | } 143 | 144 | // 合并之后的二元运算符 145 | template 146 | struct BinaryOpExpr_ : Expr { 147 | OpT1 a; 148 | OpT2 b; 149 | OpFunc op; 150 | auto get() { 151 | return async_eval(op, a, b); 152 | } 153 | }; 154 | 155 | auto operator+(std::derived_from auto e1, std::derived_from auto e2) { 156 | return BinaryOpExpr_{ 157 | {}, e1, e2, slow_add}; 158 | } 159 | 160 | auto operator*(std::derived_from auto e1, std::derived_from auto e2) { 161 | return BinaryOpExpr_{ 162 | {}, e1, e2, slow_mul}; 163 | } 164 | ``` 165 | 166 | 这样我们就仿照上位者Execution,根据自己的知识构造了“眼” —— 虽然看起来还是挺畸形的。 167 | 168 | 当我们以为窥明神秘之时,上位者正在呢喃。 169 | 170 | > THE PUSH AND PULL ARE ONE. 171 | 172 | # Backlog 173 | 174 | * Expression Templates 175 | * Monadic 176 | * Continuation-passing style 177 | 178 | 吾等,因血而成人,因血而超人,因血而非人。无知者啊,敬畏血吧! 179 | 180 | ## used techniques 181 | 182 | Features: 183 | 184 | * Expression structure building and evaluation are separated. 185 | * Similar concepts: 186 | * *expression templates* 187 | * *monadic* 188 | * *continuation-passing style* 189 | * Support lazy evaluation 190 | * Structured (asynchronous) data flow 191 | * Embedded structure with clear boundary 192 | * Strong-typed data flow 193 | * RAII is well preserved 194 | * Evaluation and scheduling are separated 195 | * Rich and flexible customization (i.e. hookable or injectable) points 196 | * Related design patterns: 197 | * Decorator 198 | * Visitor 199 | 200 | https://www.gcores.com/articles/95998 201 | 202 | -------------------------------------------------------------------------------- /articles/Execution-5.md: -------------------------------------------------------------------------------- 1 | # 五 "Is that you, Responsibility Chain? No, you're someone else." 2 | 3 | 在第三节中,我们简单概括了上位者先祖 *Expression Templates* 可能具有的异能包括: 4 | * 为任务可以选择一个特定的执行时间; 5 | * 将任务部署到某个线程或者某个核心上; 6 | * 增加、去除、重排、修改某些执行步骤; 7 | * 将任务部署到不同设备,比如GPU上。 8 | 9 | 在第三节、第四节中我们拙劣愚蠢的造物*表达式求值*,拥有其中部分异能。比如可以免去不必要的执行步骤、异步执行等。 10 | 其它异能,比如如何重排执行步骤、或者部署执行任务都还没有做。这是因为我们的 *执行* 函数 `get()`(或`eval()`)直接放置在*表达*的类型(即 `Add_` 和 `Mul_`)之中。于是*执行*部分的结构,和*表达*的结构高度相似,比如下图。 11 | 12 | ![](media/Reform.png) 13 | 14 | 这种相似性带来的优势,就是容易理解,实现简洁;劣势是不够灵活。我们用一个例子来解释对“灵活”的需求。 15 | 16 | 一般来说,四则运算的表达式都是一个二叉树。比如三个元素的加法 `a + b + c` 一般会表示成 `(a + b) + c`。如果沿用第四节的实现,那么我们的执行步骤也一定是先执行子树 `a + b`,再执行` + c`。如果这个连续加法表达式很长,那么将整棵加法树一层层递归求解显然会遇到效率问题(这里不考虑过于聪明的编译器)。此时我们可以增加一类新的节点 `Sum_`,它直接用 *for-loop* 对子树求和。 `get()`不直接返回结果,而是将`Add_`树转换成一个`Sum_`节点,最后调用`Sum_`的`get()`函数完成求值。如果用图来表示,大概类似于下面这样: 17 | 18 | ![](media/Reform-2.png) 19 | 20 | 这一血疗过程的具体细节,这里不做进一步的展开,有兴趣的异乡人可以自己来一口。 21 | 22 | 在引入了变换之后,*Expression Templates*中 原本 *表达* —— *执行* 的两段式求值,演变成了 *表达* —— *演化* —— *执行* 的三段式结构。新增的*演化*一步给了我们无限想象的可能。 23 | 24 | 我们也终于开始慢慢接近上位者 `std::execution` 让人癫狂的真相。 25 | 26 | 在现在的教义中,上位者 `std::execution` 也被解释为 `Senders / Receivers Idiom` 。如果是不求甚解的教徒,可能会把这一对词汇理解成类似于下面这样: 27 | 28 | ![](media/SndRcvWrong.png) 29 | 30 | 这就有点像设计模式 responsibility chain。每个task都从上游接受一个信号,然后处理一下发送到下游。然后接收信号的角色称之为*Recevier*,发送信号的角色称之为*Sender*,然后*Sender*的信号必须发送给*Receiver*。 31 | 32 | 这种真诚而淳朴、由字面意思对设计而进行的美好设想,对于多数语境下的系统来说并不会出什么大问题,这些系统可能就是如此的单纯善良。 33 | 34 | 只可惜这是C++,一片充满了丑陋、诡怪、恐惧和失常的异世界。 35 | 36 | > "是你吗,职责链?哦不,你不是它。 … 我需要脑浆。黯淡的,黏稠的,脑浆。———— 爱德琳" -------------------------------------------------------------------------------- /articles/Execution-6.md: -------------------------------------------------------------------------------- 1 | # 六 “Here, to welcome the new hunter.” 2 | 3 | 我们回望第一节的例子: 4 | 5 | ``` C++ 6 | // ... 7 | sender auto s = just(in_value) | then([](int i) {return i + 42;}); 8 | sync_wait(s).value(); 9 | // ... 10 | ``` 11 | 12 | 仅从字面意义上来说,代码行`s = ...`构造了一个*数据管道*的*表达*,当这个*数据管道*被*执行*的时候,数据从`just()`发出,传递给`then()`,然后`then`会调用一下参数上的匿名函数进行加工并向后推送。 13 | 14 | 对于上位者`std::execution`而言,数据管道中的每一截,`just(...)`也好,`then(...)`,都称为`Senders`。所以从用户的视角看,数据就是从一个`Sender`发送到另一个`Sender`,并没有`Receiver`什么事。所以我们先讨论`Senders`,稍后再讨论`Receiver`。 15 | 16 | 如果我们的信息只给这么多,由你来实现这一想法。那么按照普通C++程序的套路,它的实现可能是: 17 | 18 | ``` C++ 19 | Sender just(value) { ... } 20 | Sender then(Fn) { ... } 21 | 22 | SenderList operator | (SenderList snds, Sender& s) { 23 | snds.append(s); 24 | return std::move(snds); 25 | } 26 | ``` 27 | 28 | 其结构类似于下图: 29 | 30 | ![](media/Senders-List.png) 31 | 32 | 或者如果`sender`自身是一个链表结构的话,也可以这样: 33 | 34 | ``` C++ 35 | Sender& operator | (Sender& currentSender, Sender& nextSender) { 36 | currentSender.connect(nextSender); 37 | return currentSender; 38 | } 39 | ``` 40 | 41 | 那么对应的结构大概长这样: 42 | 43 | ![](media/Senders-LinkedList.png) 44 | 45 | 因为`currentSender`需要把数据传递给`nextSender`,这个数据类型可能是任意的。如果`Sender`是一个固定类型而缺乏泛型支持,就意味着前一个`Sender`无法向后一个`Sender`传递一个强类型数据。数据在传递的过程中,需要进行**类型擦除**,比如罪恶的`void*`或者不那么罪恶的`std::any`。 46 | 47 | 这并不符合C++教会上层人士对上位者们的信仰。类型擦除存在两个问题,一是类型的正确性无法在编译期得到保证,即便`std::any`等现代类型擦除的实现可以在运行时进行检查;二是前后两段本来强相关的代码被分离开了,使得编译器无法将两段代码整合在一起进行优化,就如同我们在*表达式求值*这个例子中所看到的那样。 48 | 49 | 为了让类型可以在`Sender`之间进行传递,`std::execution`的实现`libunifex`和`stdexec`都选择使用嵌套将多个`Senders`连接起来: 50 | 51 | ``` C++ 52 | 53 | template 54 | just_sender just(T value) { ... } 55 | 56 | template 57 | then_sender then(Sender snd, Fn func) { 58 | return then_sender(snd, func); 59 | } 60 | 61 | then( 62 | just(in_value), 63 | [](int i) {return i + 42;} 64 | ); 65 | 66 | ``` 67 | 68 | 所以链式结构在`std::execution`中,表达为下面这样: 69 | 70 | ![](media/Senders.png) 71 | 72 | 当然最外层的类型会变成类似于 73 | 74 | ``` C++ 75 | then_sender< 76 | then_sender< 77 | just_sender 78 | , Fn> 79 | , Fn> s; 80 | ``` 81 | 82 | 这种复杂的嵌套结构。虽然看着有些反直觉,但是它确实是个链式结构。链条的头部在最内层,尾部在最外层。在使用的时候,既可以通过递归的方式正向或者逆向遍历链表,也可以当成一个整体去干点别的什么事。这样,每一个后继节点都拥有它之前的节点的完整类型信息,彻底解决了类型传递的问题。 83 | 84 | 由此,我们获得了上位者的外观 —— 即*表示*部分。接下来要解决的,就是从*表示*变成可以*执行*的结构的过程。 85 | 86 | 初始示例在执行了第二行 `sync_wait(s).value()`,之后就可以获得结果。显然,使得祷告生效的秘密就存在这个`sync_wait`之中。 87 | 88 | 我们来看看这个`sync_wait`到底做了什么。将`libunifex`或`stdexec`中的对应代码的简化一下,会得到类似这样的代码: 89 | 90 | ``` C++ 91 | template 92 | auto sync_wait(Sender&& sender) { 93 | auto ctx = event_loop(); 94 | auto rcv = sync_wait_receiver{ctx}; 95 | auto op = sender.connect(receiver); 96 | op.start(); 97 | ctx.run(); 98 | } 99 | ``` 100 | 101 | 这里我们先抛开`event_loop`,整个函数有三条脐带,连接了上位者与猎人的梦境: 102 | 1. 构造了一个`receiver`对象。这里`receiver`的具体类型是`sync_wait_receiver` —— 这里终于出现`receiver`了。我们稍后将会展示它的作用和结构。 103 | 2. `Sender`必须要有一个成员函数叫`connect`,它接受一个`receiver`。调用它,获得一个返回值`op`。注意:和之前所叙述的一些方案不同,`receiver`并不会直接或间接地接驳到`senders`上。关于`receiver`的意义和用法我们会在下一篇章揭示。 104 | 3. 这个`connect`函数所返回的对象拥有一个`void start()`成员函数。注意,这里返回值`op`的类型是个泛型。在`std::execution`中,这些拥有一个`void start()`成员函数的类型,称之为符合`operation_state` *concept*。 105 | 106 | 这个`operation_state`,可以认为保存了我们变换之后的执行图。当`start()`被调用的时候,就预示着祷告生效,上位者真的莅临了。 107 | 108 | 那么,三个关键的问题呼之欲出: 109 | 1. 聆听了我们祷告的上位者的代理人`receiver`究竟是什么? 110 | 2. `sender.connect(receiver)` 到底做了什么? 111 | 3. `operation_state` 究竟是一种怎样的存在? 112 | 113 | 我们会在下一章揭晓。 114 | 115 | > 我们在此迎接新的猎手踏入此地。 -------------------------------------------------------------------------------- /articles/Execution-7.md: -------------------------------------------------------------------------------- 1 | 七 "Ah-hah! There's something I want to tell you." 2 | 3 | 4 | 1. the concept of receiver 5 | 2. what happended when connect(sender, receiver) called 6 | 3. operation_state 7 | 8 | > 啊哈!有点事情我要告诉你。从鼎鼎大名的sender那里得到的小小的信息。 -------------------------------------------------------------------------------- /articles/media/Reform-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/articles/media/Reform-2.png -------------------------------------------------------------------------------- /articles/media/Reform.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/articles/media/Reform.png -------------------------------------------------------------------------------- /articles/media/Senders-LinkedList.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/articles/media/Senders-LinkedList.png -------------------------------------------------------------------------------- /articles/media/Senders-List.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/articles/media/Senders-List.png -------------------------------------------------------------------------------- /articles/media/Senders.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/articles/media/Senders.png -------------------------------------------------------------------------------- /articles/media/SndRcvWrong.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/articles/media/SndRcvWrong.png -------------------------------------------------------------------------------- /articles/media/TaskGraph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/articles/media/TaskGraph.png -------------------------------------------------------------------------------- /brainstorms/1. Text Processing.md: -------------------------------------------------------------------------------- 1 | # Strings in C/C++ 2 | ## C-string (null-terminated string) 3 | ### Looking back 4 | ### Problems 5 | The type of a part of C-string is not a C-string 6 | Time complexity of getting length is O(n) 7 | 8 | ## String representation 9 | * C String 10 | * `std::basic_string`, `std::string` & `std::wstring` 11 | * `c_str` and not null-terminated string 12 | * `string_view` (C++17) 13 | 14 | ## Environments 15 | * C/C++ (<= C++14) **w/o** 3rd party libraries 16 | * C/C++ with standard libs 17 | * C++ 17 `string_view` 18 | * C++ 20 *ranges*, *format*, `starts_with`, `ends_with` 19 | * C++ 2b `contains`, `join_with` 20 | * C/C++ with **quasi-standard libs** (11 or 14 required) 21 | * *{fmt}*, *range-v3*, *abseil*, *boost*, *folly* 22 | * C/C++ with 3rd Party Libs 23 | 24 | # Operation 25 | ## Manipulation 26 | * Construct 27 | * Copy 28 | * Concatenate 29 | * Join 30 | * Split (by *Position* or by *Separator*) 31 | * Trim (a.k.a Strip) 32 | ## Examination 33 | * Startwith 34 | * Endwith 35 | * Count 36 | * Index/Find 37 | * Count 38 | * Compare 39 | ## Partitioning 40 | * By separator 41 | * By pos 42 | ## Generating 43 | 44 | # Character sets and string 45 | String is first-class class in most managed languages because 46 | 1. Best performance 47 | 2. Good use experience 48 | 3. **Character set** 49 | 50 | ## Character sets 51 | * Legacy 52 | * ASCII 53 | * ISO codepages 54 | * 8859-1: Latin-1 55 | * ... 56 | * Windows code pages 57 | * cp936 (GBK) 58 | * cp932 (Shift_JIS) 59 | * Unicode 60 | * Code point 61 | * ISO/IEC 10646 62 | * Lower 64k code points called BMP or UCS-2 63 | * Most frequently used characters are located in BMP 64 | * UTFs 65 | * Variable length coding 66 | * UTF-8 67 | * UTF-16 (BE/LE) 68 | * Some samples of UTF-16 ![[UTF-16 Sample.png]] 69 | * UTF-32 70 | 71 | ## Charater sets and C++ `*char*` types 72 | | Char Type | `string` type | Std | Char system | Notes | 73 | | ---------- | ------------- | -------- | -------------------------- | ----- | 74 | | `char` | `string` | < C++11 | UTF-8/SBCS/MBCS | | 75 | | `wchar_t` | `wstring` | < C++11 | UTF-16(Win)/UTF-32 (Linux) | | 76 | | `char8_t` | `u8string` | >= C++20 | UTF-8 | | 77 | | `char16_t` | `u16string` | >= C++11 | UTF-16 | | 78 | | `char32_t` | `u32string` | >= C++11 | UTF-32 | | 79 | 80 | ## Prefer to use UTF-8 81 | ### Why 82 | * Advatages of UTF-8 string 83 | * Compacted storage 84 | * Best compatibility 85 | * Disadvantages of UTF-8 string 86 | * Variable length 87 | * Hard to implements k-th charater, position based splitting, etc. 88 | * Storage size of output string cannot be predicted precisely. 89 | ### Example 90 | String in python 91 | * [PEP 393 – Flexible String Representation | peps.python.org](https://peps.python.org/pep-0393/) 92 | 93 | ## Locale & Localization 94 | * Character classification 95 | * Conversion between character Sets 96 | * Digits/Concurrency Format 97 | * Identical glyphs on different code points 98 | * NOTE: Unicode doesn't encode glyph 99 | * Security issue such as IDN Homograph Attack 100 | * Input and Output 101 | * Keyboard, IME, output locale, etc. 102 | 103 | # Formatting 104 | ## C-style format 105 | `printf`, `sprintf`, `svprintf`, `snprintf` 106 | * `printf` functions uses global locale that means mutex lock. 107 | * The float/double to string of printf is slow due to `Ryu` is used in modern C++ libraries. 108 | ## C++ standard library 109 | by `sstream` 110 | ## New style formatting 111 | * `std::format` (C++20) 112 | * `std::print` 113 | * 20% to 400% faster than `printf` (by your platform) 114 | * [fmtlib/fmt: A modern formatting library (github.com)](https://github.com/fmtlib/fmt) 115 | 116 | # Matching and Parsing 117 | ## Matching and parsing a small amount of text 118 | Regular expression 119 | ## Pre-defined structural text processsing libraries 120 | * Yaml 121 | * Json 122 | * XML 123 | * Proto 124 | * xSV (CSV, TSV, etc.) 125 | # Large scale text processing 126 | ### < 20GB 127 | * Just bigger than usual 128 | * Versioning 129 | * Validating 130 | * Allowed character set and coding may affects the storage/processing policy 131 | * English only 132 | * UCS-2 only 133 | * UTF-8 134 | * UTF-16 135 | ### 20GB to 1TB 136 | * Large but not-so-large data processing 137 | * Plain text processing 138 | * Querying by few keys 139 | * Querying by relation 140 | * Mapping to strong typed objects in software (for e.g., Serialization and ORM) 141 | * Stratigies 142 | * Process by our own tool chain 143 | * Dividing or sparsing 144 | * Using file set than one file 145 | * Indexing the file 146 | * By physical position 147 | * By (ordered or hashed) keys 148 | * Pre-computed values 149 | * Validating tools 150 | * Rely on the DB 151 | * RDBMS 152 | * Non-SQL databases 153 | * In-process or out-of-process 154 | * Concurrent processing 155 | * Multi-threading or multi-processing 156 | * Asynchronization 157 | * C++ may not be the best choice 158 | * IO might be the bottleneck 159 | * Consider Python or C# 160 | ### > 1TB 161 | * Distributed processing 162 | * e.g. Apache data processing stack 163 | * Hive/Spark 164 | -------------------------------------------------------------------------------- /brainstorms/AsyncExecutionModel.md: -------------------------------------------------------------------------------- 1 | # C++ Async Execution Models 2 | 3 | ## Synchronization Primitives 4 | 5 | * Mutex 6 | * Atomic 7 | * Conditional Variables 8 | * Semaphores 9 | 10 | ## Message Passing Style (MPI) 11 | 12 | ## Completion 13 | 14 | ## Promise/Future 15 | 16 | ## Coroutine 17 | 18 | ## `std::execution` / `libunifex` 19 | 20 | ### Concepts 21 | 22 | #### Schedulers 23 | 24 | #### Senders 25 | 26 | #### Receivers 27 | 28 | #### Operation State 29 | 30 | #### Execution Context 31 | 32 | ### Features 33 | 34 | #### Continuation-Passing Style 35 | 36 | #### Unix pipe 37 | 38 | #### Type inference and constraint 39 | 40 | #### Lifetime management and copy-free design 41 | 42 | #### Lazy evaluation 43 | * `let_*` 44 | * Lazy senders could be optimized before submitted 45 | 46 | #### Error handling and cancellation 47 | * `set_error` and `set_done` 48 | 49 | #### CPO 50 | 51 | * Order of resolution of CPO in `std::execution` 52 | * `tag_invocable` > others 53 | 54 | ### Implementation analysis 55 | 56 | 57 | #### `libunifex` 58 | 59 | #### Python version execution `pex` 60 | 61 | ### Related techniques 62 | 63 | #### Monadic 64 | 65 | ### Design Patterns 66 | 67 | #### Command 68 | 69 | #### Decorator 70 | 71 | #### Composition 72 | 73 | ### References 74 | 75 | https://github.com/NVIDIA/stdexec/blob/main/include/stdexec/execution.hpp 76 | https://github.com/facebookexperimental/libunifex -------------------------------------------------------------------------------- /brainstorms/CustomizationPoints.md: -------------------------------------------------------------------------------- 1 | # Code injection in C++ 2 | 3 | * Inheritance 4 | * IoC 5 | * CRTP / Policy 6 | * ADL 7 | * CPO 8 | * tag_invoke 9 | -------------------------------------------------------------------------------- /brainstorms/PMC++ Topics.md: -------------------------------------------------------------------------------- 1 | * `[2]` Pointers, smart pointers and ownership (C++11/14/17) 2 | * `[1]` Contiguous data structures, iterators, views(C++17/20) and concepts(C++20), adaptors 3 | * `deque` 4 | * `stack`, `queue`, `priority_queue` 5 | * `[4]` Text processing 6 | * `[2]` Associative containers 7 | * `set`, `unordered_set` 8 | * `map`, `unordered_map` 9 | * `flat_*` 10 | * interval data structure 11 | * sum types 12 | * union 13 | * `tuple`(C++11), `variant`, `any`, `optional` (C++17 or Boost with C++11) 14 | * `[1]` Algorithms 15 | * `` 16 | * `std::execution` and parallel STL (C++17) 17 | * `[1]` Memory management (C++17) 18 | * `[1]` Some daily-use utility libraries 19 | * ``, ``, `` (C++11), `filesystem` (C++17), `` (C++20) 20 | * `[1]` Engineering of C++ 21 | * Use case study: Writing unit test for standard libraries. 22 | * Project organization and dependencies maintenance with modern CMake 23 | * ABI compatibility 24 | * C++ core language features 25 | * `[1]` Out-of-the-box features 26 | * `override`, `final`, `noexcept`, `namespace A::B {}`, etc. 27 | * Literals: `100_km`, `0b0100`, `100'000ul` 28 | * Attributes (`[[*]]`): Common attributes in GCC, Clang and MSVC 29 | * `[1]` Enumerations 30 | * `[1]` Value categories (gl/pr/x/**l/r**) 31 | * Universal references 32 | * Perfect forwarding 33 | * Parameter pack(variadic arguments) 34 | * Ref-qualifier 35 | * Deducing `this` 36 | * https://www.zhihu.com/question/533946012/answer/2509921643 37 | * `[1]` Understand constancy: `const`, `constexpr` and `consteval` 38 | * `[1]` Constructors, destructors, assignments and implicit type conversion 39 | * `[1]` Initialization Hell 40 | * `[1]` Compile-time and runtime diagnostics 41 | * `assert` and `static_assert` 42 | * `source_location` and `basic_stacktrace` 43 | * `type_info` and `type_index` 44 | * `__FUNCTION__`, `__PRETTY_FUNCTION__` and `__func__` 45 | * Exceptions and system errors 46 | * `[2]` Template and automatic type deduction (`decltype`, `auto`) 47 | * `[1]` Polymorphic, CRTP, type erasure and polymorphic object 48 | * `folly.poly` 49 | * `dyno` 50 | * `Boost.TypeErasure` 51 | * `[2]` "Generalized" functions and call-back in advance 52 | * Traditional functor, lambda and `std::function` 53 | * Closure and high-order function: capture and `std::bind` 54 | * Overloading resolving and CPO 55 | * Concurrency and asynchronization 56 | * `[2]` Concurrency utilities 57 | * `thread` and `jthread` 58 | * Synchronization primitives - I 59 | * `mutex`s and `lock`s 60 | * Synchronization primitives - II 61 | * (C++11) `condition_variable` and `condition_variable_any` 62 | * (C++20) `counting_semaphore` and `binary_semaphore` 63 | * (C++20) `latch` and `barrier` 64 | * `atomic` and memory model 65 | * `[1]` Asynchronization in C++11 66 | * `promise` and `future` (`shared_future`) 67 | * `packaged_task` and `async` 68 | * `[1]` Coroutine (Language, C++20) 69 | * “dialects” in C++ 70 | * `[1]` _"SQL"_ in C++: `ranges` (Lib, C++20) 71 | * `[2]` From CPU to GPU: Executors (TS, C++26, Lib) 72 | -------------------------------------------------------------------------------- /examples/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.22) 2 | 3 | project(PMCppSamples) 4 | 5 | set(CMAKE_EXPORT_COMPILE_COMMANDS ON) 6 | 7 | find_package(Boost REQUIRED) 8 | find_package(fmt CONFIG REQUIRED) 9 | find_package(range-v3 CONFIG REQUIRED) 10 | find_package(GTest CONFIG REQUIRED) 11 | find_package(absl CONFIG REQUIRED) 12 | 13 | include(cmake/unifex/Findunifex.cmake) 14 | 15 | function( sample_link_libraries target ) 16 | target_link_libraries(${target} PRIVATE fmt::fmt-header-only) 17 | target_link_libraries(${target} PRIVATE range-v3 range-v3-meta range-v3::meta range-v3-concepts) 18 | target_link_libraries(${target} PRIVATE Boost::boost) 19 | target_link_libraries(${target} PRIVATE GTest::gtest GTest::gtest_main) 20 | target_link_libraries(${target} PRIVATE absl::strings) 21 | endfunction() 22 | 23 | file(GLOB sources11 CONFIGURE_DEPENDS samples/*_Cpp11.cpp) 24 | add_library(samples11 SHARED ${sources11}) 25 | # target_compile_features(samples11 PUBLIC cxx_std_11) doesn't work 26 | target_compile_options(samples11 PRIVATE "-std=c++11") 27 | sample_link_libraries(samples11) 28 | 29 | file(GLOB sources14 CONFIGURE_DEPENDS *_Cpp14.cpp) 30 | add_library(samples14 SHARED ${sources14}) 31 | target_compile_features(samples14 PUBLIC cxx_std_14) 32 | sample_link_libraries(samples14) 33 | 34 | # file(GLOB sources17 CONFIGURE_DEPENDS samples/*_Cpp17.cpp) 35 | # add_library(samples17 SHARED ${sources17}) 36 | # target_compile_features(samples17 PUBLIC cxx_std_17) 37 | # target_link_libraries(samples17 PRIVATE unifex::unifex) 38 | # sample_link_libraries(samples17) 39 | 40 | file(GLOB sources20 CONFIGURE_DEPENDS *_Cpp20.cpp) 41 | add_library(samples20 SHARED ${sources20}) 42 | target_compile_features(samples20 PUBLIC cxx_std_20) 43 | target_link_libraries(samples20 PRIVATE unifex::unifex) 44 | sample_link_libraries(samples20) 45 | 46 | file(GLOB sources23 CONFIGURE_DEPENDS *_Cpp23.cpp) 47 | add_library(samples23 SHARED ${sources23}) 48 | target_compile_features(samples23 PUBLIC cxx_std_23) 49 | sample_link_libraries(samples23) 50 | 51 | file(GLOB sources CONFIGURE_DEPENDS main.cpp) 52 | add_executable(samples ${sources}) 53 | target_compile_features(samples PUBLIC cxx_std_20) 54 | target_link_libraries(samples PRIVATE samples11 samples14 samples20 samples23) 55 | target_link_libraries(samples PRIVATE GTest::gtest GTest::gtest_main) -------------------------------------------------------------------------------- /examples/CMakePresets.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": 5, 3 | "cmakeMinimumRequired": { 4 | "major": 3, 5 | "minor": 22 6 | }, 7 | "configurePresets": [ 8 | { 9 | "name": "clang_rel_linux_x64", 10 | "displayName": "Clang Release Linux/WSL x64", 11 | "description": "Clang Release Linux/WSL x64", 12 | "toolchainFile": "${sourceDir}/../../Code/vcpkg/scripts/buildsystems/vcpkg.cmake", 13 | "binaryDir": "${sourceDir}/out/build/${presetName}", 14 | "cacheVariables": { 15 | "CMAKE_INSTALL_PREFIX": "${sourceDir}/out/install/${presetName}", 16 | "CMAKE_C_COMPILER": "clang", 17 | "CMAKE_CXX_COMPILER": "clang++", 18 | "CMAKE_BUILD_TYPE": "RelWithDebInfo", 19 | "CMAKE_MODULE_PATH": "${sourceDir}/cmake/unifex" 20 | } 21 | }, 22 | { 23 | "name": "clang_dbg_linux_x64", 24 | "displayName": "Clang Debug Linux/WSL x64", 25 | "description": "Clang Debug Linux/WSL x64", 26 | "toolchainFile": "${sourceDir}/../../Code/vcpkg/scripts/buildsystems/vcpkg.cmake", 27 | "binaryDir": "${sourceDir}/out/build/${presetName}", 28 | "cacheVariables": { 29 | "CMAKE_INSTALL_PREFIX": "${sourceDir}/out/install/${presetName}", 30 | "CMAKE_C_COMPILER": "clang", 31 | "CMAKE_CXX_COMPILER": "clang++", 32 | "CMAKE_BUILD_TYPE": "Debug", 33 | "CMAKE_MODULE_PATH": "${sourceDir}/cmake/unifex" 34 | } 35 | } 36 | ], 37 | "testPresets": [ 38 | { 39 | "name": "Debug Poly", 40 | "description": "", 41 | "displayName": "", 42 | "configurePreset": "clang_dbg_linux_x64" 43 | } 44 | ] 45 | } -------------------------------------------------------------------------------- /examples/CppVersion.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #if __cplusplus == 201103L 4 | #define CPP_VER 2011 5 | #endif 6 | 7 | #if __cplusplus == 201402L 8 | #define CPP_VER 2014 9 | #endif 10 | 11 | #if __cplusplus == 201703L 12 | #define CPP_VER 2017 13 | #endif 14 | 15 | #if __cplusplus == 202002L 16 | #define CPP_VER 2020 17 | #endif 18 | 19 | #if __cplusplus > 202002L 20 | #define CPP_VER 2023 21 | #endif -------------------------------------------------------------------------------- /examples/Ex_3_TextProcessing_String_Manip_Cpp11.cpp: -------------------------------------------------------------------------------- 1 | 2 | // !!!!!!!!!!!!!!!!!!!!!!!!!!! 3 | // Check version number of C++ standard. 4 | #include 5 | static_assert(__cplusplus == 201103L, "Not C++11"); 6 | // !!!!!!!!!!!!!!!!!!!!!!!!!!! 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | 15 | TEST(PMCPP_TextProcessing_Cpp11, String_Manip_Concatenate_Strings) 16 | { 17 | // Homo-type instances concatenate 18 | std::string a1{"Alpha"}, b1{"Beta"}, c1{"Gammar"}; 19 | 20 | // ====== BEST PRACTICE ============== 21 | auto o1 = a1 + b1 + c1; 22 | 23 | // ====== OTHER OPTS ============== 24 | { 25 | // ...... 1 ...... via library {fmt} 26 | auto o2 = fmt::to_string(fmt::join({a1, b1, c1}, "")); 27 | EXPECT_EQ(o1, o2); 28 | } 29 | } 30 | 31 | TEST(PMCPP_TextProcessing_Cpp11, String_Manip_Concatenate_StringViews) 32 | { 33 | // Homo-type instances concatenate 34 | boost::string_view a{"Alpha"}, b{"Beta"}, c{"Gammar"}; 35 | 36 | // ====== BEST PRACTICE ============== 37 | std::string o1 = (std::stringstream{} << a << b << c).str(); 38 | } 39 | 40 | TEST(PMCPP_TextProcessing_Cpp11, String_Manip_Concatenate_Hetero){ 41 | 42 | // Hetero-type instances concatenate 43 | const char* a{"Alpha"}; 44 | boost::string_view b1{"Beta"}; 45 | fmt::string_view b2{"Beta"}; 46 | std::string c{"Gammar"}; 47 | 48 | // ====== BEST PRACTICE ============== 49 | std::string o1 = (std::stringstream{} << a << b1 << c).str(); 50 | // Illegal: std::string o1 = (std::stringstream{} << a << b2 << c).str(); 51 | // Because fmt::string_view doesn't support stream. 52 | 53 | // ====== OTHER OPTS ============== 54 | { 55 | // ...... 1 ...... Native C++11 string code 56 | // No "string + string_view" because it may be reserved for "cheap" concat 57 | // "+=", string::append(string_view) are available since C++20. 58 | std::string o2{a}; 59 | o2.append(b1.cbegin(), b1.cend()); 60 | o2 += c; 61 | 62 | // ...... 2 ...... {fmt} 63 | fmt::string_view sv[] = {a, b2, c}; 64 | auto o3 = fmt::to_string(fmt::join(sv, "")); 65 | 66 | EXPECT_EQ(o1, o2); 67 | EXPECT_EQ(o1, o3); 68 | } 69 | } 70 | -------------------------------------------------------------------------------- /examples/Ex_3_TextProcessing_String_Manip_Cpp14.cpp: -------------------------------------------------------------------------------- 1 | #include "CppVersion.h" 2 | #include 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | TEST(PMCPP_TextProcessing_Cpp11, String_Manip_Concatenate_Strings) 10 | { 11 | // Homo-type instances concatenate 12 | std::string a1{"Alpha"}, b1{"Beta"}, c1{"Gammar"}; 13 | 14 | // ====== BEST PRACTICE ============== 15 | auto o1 = a1 + b1 + c1; 16 | } 17 | 18 | TEST(PMCPP_TextProcessing_Cpp11, String_Manip_Concatenate_StringViews) 19 | { 20 | // Homo-type instances concatenate 21 | boost::string_view a{"Alpha"}, b{"Beta"}, c{"Gammar"}; 22 | 23 | // ====== BEST PRACTICE ============== 24 | std::string o1 = (std::stringstream{} << a << b << c).str(); 25 | } 26 | 27 | TEST(PMCPP_TextProcessing_Cpp11, String_Manip_Concatenate_Hetero){ 28 | 29 | // Hetero-type instances concatenate 30 | const char* a{"Alpha"}; 31 | boost::string_view b1{"Beta"}; 32 | fmt::string_view b2{"Beta"}; 33 | std::string c{"Gammar"}; 34 | 35 | // ====== BEST PRACTICE ============== 36 | // boost::string_view 37 | std::string o1 = (std::stringstream{} << a << b1 << c).str(); 38 | // fmt::string_view 39 | fmt::string_view sv[] = {a, b2, c}; 40 | auto o2 = fmt::to_string(fmt::join(sv, "")); 41 | 42 | 43 | EXPECT_EQ(o1, o2); 44 | } -------------------------------------------------------------------------------- /examples/Ex_3_TextProcessing_String_Manip_Cpp23.cpp: -------------------------------------------------------------------------------- 1 | #include "CppVersion.h" 2 | 3 | #include 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #include 11 | #include 12 | 13 | #include 14 | #include 15 | 16 | #include 17 | #include 18 | #include 19 | 20 | #define OUT_VAR(v) std::cout << #v << " = " << v << std::endl; 21 | #define ILLEGAL(v) std::cout << #v << " is ILLEGAL" << std::endl; 22 | 23 | TEST(PMCPP_TextProcessing_Cpp23, String_Manip_Concatenate_Strings) 24 | { 25 | // Homo-type instances concatenate 26 | std::string a1{"Alpha"}, b1{"Beta"}, c1{"Gammar"}; 27 | 28 | // ====== BEST PRACTICE ============== 29 | auto o1 = a1 + b1 + c1; 30 | 31 | // ====== OTHER OPTS ============== 32 | { 33 | // ...... 1 ...... via library {fmt} 34 | auto o2 = fmt::to_string(fmt::join({a1, b1, c1}, "")); 35 | EXPECT_EQ(o1, o2); 36 | 37 | // ...... 2 ...... via library {fmt} 38 | } 39 | } 40 | 41 | TEST(PMCPP_TextProcessing_Cpp23, String_Manip_Concatenate_StringViews) 42 | { 43 | using ranges::views::join; 44 | using ranges::to; 45 | 46 | // Homo-type instances concatenate 47 | std::string_view a1{"Alpha"}, b1{"Beta"}, c1{"Gammar"}; 48 | 49 | // ====== BEST PRACTICE ============== 50 | // The range-v3 lib is better than std::ranges on clang 14.0.x with C++23. 51 | std::string_view sv[] = {a1, b1, c1}; 52 | auto o1 = sv | join | to(); 53 | } 54 | 55 | TEST(PMCPP_TextProcessing_Cpp23, String_Manip_Concatenate_Hetero){ 56 | // Hetero-type instances concatenate 57 | const char* a{"Alpha"}; 58 | std::string_view b{"Beta"}; 59 | std::string c{"Gammar"}; 60 | 61 | // ====== BEST PRACTICE ============== 62 | std::string o1 = (std::stringstream{} << a << b << c).str(); 63 | 64 | // ====== OTHER OPTS ============== 65 | { 66 | // ...... 1 ...... Native C++11 string code 67 | // No "string + string_view", they may be reserved for "cheap" concat 68 | std::string o2{a}; 69 | o2 += b; 70 | o2 += c; 71 | } 72 | } 73 | 74 | TEST(PMCPP_TextProcessing_Cpp23, String_Manip_AbslStrCatJoin){ 75 | using std::vector; 76 | using std::string; 77 | // #if __cplusplus < 201703L 78 | using absl::string_view; // std::string_view in C++17 79 | // #else 80 | // using std::string_view; 81 | // #endif 82 | 83 | vector svec{"tx", "12", "bl ah"}; 84 | string a{"tx"}, b{"12"}, c{"bl ah"}; 85 | const char* a1{"tx"}; string_view b1{"12"}; string c1{"bl ah"}; 86 | 87 | // ... 1 ... Use absl::StrCat 88 | ILLEGAL( absl::StrCat(svec) ); 89 | OUT_VAR( (absl::StrCat(a, b, c)) ); 90 | OUT_VAR( (absl::StrCat(a1, b1, c1)) ); 91 | 92 | // ... 2 ... Use absl::Join 93 | OUT_VAR( (absl::StrJoin(svec, "")) ); 94 | OUT_VAR( (absl::StrJoin({a, b, c}, "")) ); 95 | ILLEGAL( (absl::StrJoin({a1, b1, c1}, "")) ); 96 | } 97 | -------------------------------------------------------------------------------- /examples/Ex_X_Executor_Cpp20.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include 4 | #include 5 | 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | #include 15 | #include 16 | #include 17 | #include 18 | 19 | #include 20 | 21 | using namespace unifex; 22 | 23 | TEST(Executor, ExecutorTest) 24 | { 25 | unifex::trampoline_scheduler sched(2); 26 | // single_thread_context ctx; 27 | // auto sched = ctx.get_scheduler(); 28 | 29 | sender auto s = range_stream{0, 10} 30 | | via_stream(sched) 31 | | transform_stream([](auto v) { fmt::print("v1:{:d}\n", v); return v * 2; }) 32 | | transform_stream([](auto v) { fmt::print("v2:{:d}\n", v); return v + 1; }) 33 | | reduce_stream(0, [](int st, int v) { fmt::print("st:{:d}\n", st); return st + v;}) 34 | | then([](int v){return v;}) 35 | ; 36 | int v = sync_wait(s).value(); 37 | 38 | GTEST_ASSERT_EQ(v, 100); 39 | 40 | /* 41 | cache vertex_cache; 42 | triangle_list 43 | | tri_fifo | monitor 44 | | dispatch(round_robin, [](triangle const& tri) {return tri.index();}, 45 | tri_fifo | monitor 46 | | mcg(vertex_cache) 47 | | converge_then( 48 | transform([](){ get.partA(); } ) | processA, 49 | transform([](){ get.partA(); } ) | processB, 50 | [](auto resultA, auto resultB) { 51 | return resultA.id == resultB.id; 52 | }) 53 | | rasterizer 54 | | warp_generator 55 | | warp_executor 56 | | backend_executor 57 | | result_writer 58 | ) 59 | */ 60 | // s. 61 | } -------------------------------------------------------------------------------- /examples/Pex.py: -------------------------------------------------------------------------------- 1 | import threading 2 | import typing 3 | import logging 4 | 5 | class va_pack: 6 | def __init__(self, *args, **kwargs) -> None: 7 | self._args = args 8 | self._kwargs = kwargs 9 | 10 | def invoke(self, fn, *prior_args): 11 | return fn(*prior_args, *self._args, **self._kwargs) 12 | 13 | class task_base: 14 | def __init__(self, executeFn): 15 | self._next = None 16 | self._execute = executeFn 17 | 18 | def execute(self): 19 | self._execute(self) 20 | 21 | class loop_sched_operation(task_base): 22 | def __init__(self, receiver, loop): 23 | self._receiver = receiver 24 | self._loop = loop 25 | super().__init__(loop_sched_operation._execute_impl) 26 | 27 | def start(self): 28 | self._loop.enqueue(self) 29 | 30 | @staticmethod 31 | def _execute_impl(t): 32 | assert isinstance(t, loop_sched_operation) 33 | _self = t 34 | _self._receiver.set_value() 35 | 36 | class loop_scheduler_task: 37 | def __init__(self, loop): 38 | self._loop = loop 39 | 40 | def connect(self, receiver): 41 | return loop_sched_operation(receiver, self._loop) 42 | 43 | class loop_scheduler: 44 | def __init__(self, loop) -> None: 45 | self._loop = loop 46 | 47 | def schedule(self): 48 | return loop_scheduler_task(self._loop) 49 | 50 | class manual_event_loop: # manual_event_loop in libunifex 51 | def __init__(self) -> None: 52 | self._head: typing.Optional[task_base] = None 53 | self._tail: typing.Optional[task_base] = None 54 | self._stop = False 55 | # Mutex and condition variable are ignored in this demo. 56 | 57 | def get_scheduler(self): 58 | return loop_scheduler(self) 59 | 60 | def run(self): 61 | while True: 62 | while self._head is None: 63 | if self._stop: 64 | return 65 | task = self._head 66 | self._head = task._next 67 | if self._head is None: 68 | self._tail = None 69 | task.execute() 70 | 71 | def stop(self): 72 | self._stop = True 73 | 74 | def enqueue(self, t): 75 | if self._head is None: 76 | self._head = t 77 | else: 78 | self._tail._next = t 79 | 80 | self._tail = t 81 | self._tail._next = None 82 | 83 | class single_thread_context: 84 | def __init__(self) -> None: 85 | self._loop = manual_event_loop() 86 | self._thread = threading.Thread(target=lambda: self._loop.run()) 87 | self._thread.start() 88 | print("ST context launched.") 89 | 90 | def get_scheduler(self): 91 | return self._loop.get_scheduler() 92 | 93 | def __enter__(self): 94 | return self 95 | 96 | def __exit__(self, *args): 97 | self._loop.stop() 98 | self._thread.join() 99 | 100 | @staticmethod 101 | def create(): 102 | return single_thread_context() 103 | 104 | class then_sender: 105 | def __init__(self, predecessor, fn): 106 | self._predcessor = predecessor 107 | self._fn = fn 108 | 109 | def connect(self, receiver): 110 | return self._predcessor.connect( 111 | then_receiver(self._fn, receiver) 112 | ) 113 | 114 | class then_receiver: 115 | def __init__(self, fn, receiver): 116 | self._fn = fn 117 | self._receiver = receiver 118 | 119 | def set_value(self, *args, **kwargs): 120 | result = self._fn(*args, **kwargs) 121 | assert isinstance(result, va_pack) 122 | result.invoke(self._receiver.set_value) 123 | 124 | def then(sender, fn): 125 | return then_sender(sender, fn) 126 | 127 | class let_value_op: 128 | def __init__(self, pred, succFact, receiver) -> None: 129 | self._succFact = succFact 130 | self._receiver = receiver 131 | self._predOp = pred.connect(let_value_pred_receiver(self)) 132 | self._values = va_pack 133 | 134 | def start(self): 135 | self._predOp.start() 136 | 137 | class let_value_pred_sender: 138 | def __init__(self, pred, succFact): 139 | self._pred = pred 140 | self._succFact = succFact 141 | 142 | def connect(self, receiver): 143 | return let_value_op(self._pred, self._succFact, receiver) 144 | 145 | class let_value_succ_receiver: 146 | def __init__(self, op): 147 | self._op = op 148 | 149 | def set_value(self, *args, **kwargs): 150 | self._op._receiver.set_value(*args, **kwargs) 151 | 152 | class let_value_pred_receiver: 153 | def __init__(self, op): 154 | self._op = op 155 | 156 | def set_value(self, *args, **kw_args): 157 | succOp = self.va_pack.invoke(self._op._succFact).connect(let_value_succ_receiver(self._op)) 158 | succOp.start() 159 | 160 | def let_value(pred, succFact): 161 | return let_value_pred_sender(pred, succFact) 162 | 163 | 164 | class sync_wait_receiver: 165 | def __init__(self, ctx): 166 | self._ctx = ctx 167 | 168 | def set_value(self, *args, **kwargs): 169 | print(*args, **kwargs) 170 | self.signal_complete() 171 | 172 | def signal_complete(self): 173 | self._ctx.stop() 174 | 175 | def sync_wait(sender): 176 | ctx = manual_event_loop() 177 | op = sender.connect(sync_wait_receiver(ctx)) 178 | print(f"Invoke op.start() where op is <{op.__class__.__name__}>") 179 | op.start() 180 | print("Executing sync_wait.ctx.run()") 181 | ctx.run() 182 | print("sync_wait completed") 183 | 184 | 185 | # ... TO REMAKE ... 186 | # Algorithms: 187 | # let_value 188 | # just 189 | # transfer 190 | # bulk 191 | # repeat_effect_until 192 | # Contexts: 193 | # timed_single_thread_context 194 | # ... ... ... ... ... 195 | 196 | def testThen(): 197 | with single_thread_context.create() as context: 198 | scheduler = context.get_scheduler() 199 | 200 | count = [0] 201 | 202 | def inc_count(): 203 | count[0] += 1 204 | return va_pack() 205 | 206 | sync_wait( 207 | then( 208 | then( 209 | scheduler.schedule(), inc_count 210 | ), 211 | inc_count 212 | ) 213 | ) 214 | 215 | assert count[0] == 2 216 | 217 | def testLetOp(): 218 | pass 219 | 220 | def _main(): 221 | testThen() 222 | 223 | if __name__ == "__main__": 224 | _main() -------------------------------------------------------------------------------- /examples/main.cpp: -------------------------------------------------------------------------------- 1 | #include "gtest/gtest.h" 2 | 3 | int main(int argc, char* argv[]) 4 | { 5 | ::testing::InitGoogleTest(&argc, argv); 6 | return RUN_ALL_TESTS(); 7 | } -------------------------------------------------------------------------------- /examples/unifex/Findunifex.cmake: -------------------------------------------------------------------------------- 1 | find_package(unifex NO_MODULE) 2 | 3 | if(unifex_FOUND) 4 | get_target_property(link_libs unifex::unifex INTERFACE_LINK_LIBRARIES) 5 | list(REMOVE_ITEM link_libs std::coroutines) 6 | set_property(TARGET unifex::unifex PROPERTY INTERFACE_LINK_LIBRARIES "${link_libs}") 7 | endif() 8 | 9 | include(FindPackageHandleStandardArgs) 10 | find_package_handle_standard_args(unifex CONFIG_MODE) -------------------------------------------------------------------------------- /examples/vcpkg.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "quickbench", 3 | "version-string": "1.0.0", 4 | "dependencies": [ 5 | "fmt", 6 | "range-v3", 7 | "abseil", 8 | "boost-utility", 9 | "gtest", 10 | "libunifex" 11 | ] 12 | } -------------------------------------------------------------------------------- /slides/PMC++.0_Intro.md: -------------------------------------------------------------------------------- 1 | --- 2 | marp: true 3 | paginate: true 4 | style: | 5 | section { 6 | background-color: #fffff2; 7 | } 8 | --- 9 | 10 | 11 | 18 | 19 | # Practical Modern C++ 20 | 21 | --- 22 | 23 | # Introduction 1/2 24 | 25 | * It is a series; Prefer talking about features that 26 | * _The most frequently used_ and 27 | * _Easiest to use_ OR _powerful_ 28 | * “性价比” 29 | * Our departure and goal 30 | * For ~~(精通C++)~~ intermediate experienced C++ users 31 | * Not to be a flawless “language lawyer” 32 | * For more proficient reading, writing and debugging modern C++ 33 | 34 | --- 35 | 36 | # Introduction 2/2 37 | 38 | * Refers features from C++11 to C++2x (TS & proposals) 39 | * C++ 11/14/17 are our major narrative threads 40 | * C++ 17/20/2x are references to help us 41 | * Understand the motivation & limitation 42 | * Know the future 43 | * Hope it could be the pleasant journey 44 | 45 | --- 46 | # Agenda 1/5 47 | 48 | 53 | 54 | * Pointers, smart pointers and ownership (C++11/14/17) 55 | * Data structures 56 | * Contiguous data structures, iterators, views(C++17/20) and concepts(C++20) 57 | * `tuple`(C++11), `variant`, `any`, `optional` (C++17 or Boost with C++11) 58 | * Algorithms 59 | * `` 60 | * `std::execution` and parallel STL (C++17) 61 | * Memory management (C++17) 62 | 63 | --- 64 | 65 | # Agenda 2/5 66 | 67 | 72 | 73 | * Most commonly used utility libraries 74 | * ``, ``, `` (C++11) and `` (C++20) 75 | * C++ core language features 76 | * Out-of-the-box features 77 | * `override`, `final`, `noexcept`, `namespace A::B {}`, etc. 78 | * Literals: `100_km`, `0b0100`, `100'000ul` 79 | * Attributes (`[[*]]`): Common attributes in GCC, Clang and MSVC 80 | * Enumerations 81 | 82 | --- 83 | 84 | # Agenda 3/5 85 | 86 | 91 | 92 | * C++ core language features 93 | * Value categories (gl/pr/x/**l/r**), universal references, perfect forwarding and parameter pack(variadic arguments) 94 | * Understand constancy: `const`, `constexpr` and `consteval` 95 | * Constructors, destructors, assignments and implicit type conversion 96 | * Initialization 97 | * Compile-time and runtime diagnostics 98 | * Template and automatic type deduction (`decltype`, `auto`) 99 | * Functor, lambda, `std::function` and `std::bind` 100 | 101 | --- 102 | 103 | # Agenda 4/5 104 | 105 | 113 | * Concurrency, asynchronization 114 | * `thread`, `mutex/lock` and `condition_variable` 115 | * `atomic` and memory model 116 | * Asynchronization in C++11: `future`, `promise` and `packaged_task` 117 | * Coroutine (Language, C++20) 118 | ``` C++ 119 | auto const values = {0,1,2,3,4,5}; 120 | auto even = [](int i) { return 0 == i % 2; }; 121 | auto square = [](int i) { return i * i; }; 122 | 123 | for (int v : values | std::views::filter(even) | std::views::transform(square)) { 124 | std::cout << v << ' '; 125 | } // Output: 0 4 16 126 | ``` 127 | 128 | --- 129 | 130 | # Agenda 5/5 131 | 132 | 140 | 141 | * “dialects” in C++ 142 | * _"SQL"_ in C++: `ranges` (Lib, C++20) 143 | * From CPU to GPU: Executors (TS, C++26, Lib) 144 | * Engineering 145 | * Project organization and dependencies maintenance with modern CMake 146 | * ABI compatibility 147 | * **Won't talk** 148 | * Advanced template meta-programming 149 | * Design a "dialect/DSL" in C++ (for e.g. write something like `boost.proto`) 150 | 151 | --- 152 | 153 | # Resources 154 | 155 | 160 | 161 | * 语言特性 162 | * https://en.cppreference.com/w/cpp (不推荐 https://www.cplusplus.com) 163 | * 标准草案:**N3337(C++11)**/N4140(C++14)/N4659(C++17)/**N4868(C++20)** 164 | * 其它提案:https://github.com/cplusplus/draft 165 | * 标准库 166 | * https://en.cppreference.com/w/cpp (不推荐 https://www.cplusplus.com) 167 | * https://github.com/llvm/llvm-project/tree/main/libcxx 168 | * Cpp教程与行业用例 169 | * https://www.youtube.com/user/CppCon / https://github.com/CppCon 170 | * 如果404,B站上有搬运工 171 | * 最佳实践 172 | * C++ Core Guidelines 173 | * Effective Modern C++ 174 | * 其它 175 | * https://en.cppreference.com/w/cpp/links 176 | 177 | --- 178 | 187 | 188 | # Enjoy! -------------------------------------------------------------------------------- /slides/PMC++.1_SmartPointers.1.md: -------------------------------------------------------------------------------- 1 | --- 2 | marp: true 3 | paginate: true 4 | style: | 5 | section { 6 | background-color: #fffff2; 7 | font-family: 'Palatino', 'Charter', 'STHeiti', 'Segoe UI Emoji'; 8 | } 9 | section pre { 10 | font-size: 0.9em; 11 | } 12 | --- 13 | 14 | 15 | # Pointers, smart pointers and ownership, I 16 | 17 | 25 | 26 | --- 27 | 28 | * 基础用例 29 | * 坑和解决方案 30 | * 范例和辨析 31 | * 总结 32 | 33 | --- 34 | 35 | Smart pointers in C++ 36 | ``` C++ 37 | // C++ 98 - C++ 11 38 | std::auto_ptr 39 | 40 | // C++11 to now 41 | std::unique_ptr // Ownership is unique and movable (move-only) 42 | std::shared_ptr // Ownership is shared and managed by reference counting 43 | 44 | std::weak_ptr // Collaborate with shared_ptr to resolve circular reference issue 45 | ``` 46 | 47 | --- 48 | 49 | 为什么要有智能指针? 50 | 51 | * 为了解决指针 _静态可访问性_ 和 _指涉对象动态生存期_ 不一致的问题 52 | * “Whereof one cannot speak, thereof one must be silent.” 53 | * 能看到的指针,就确保它所指向的对象是存活的; 54 | * 如果对象已经没有能看到它的指针,则可以被自动释放; 55 | * 否则会存在以下常见问题 56 | * 访问已经被释放的对象 57 | * 访问未初始化的对象 58 | * 多次释放 59 | * 忘记释放导致资源泄漏 60 | 61 | --- 62 | 63 | The following code will be used in our examples 64 | ``` C++ 65 | class Dog 66 | { 67 | public: 68 | Dog(std::string species, std::string name, int age) { 69 | // ... 70 | } 71 | std::string Species() const { 72 | return m_species; 73 | } 74 | void Bark() { /* ... */ } 75 | private: 76 | // ... 77 | }; 78 | ``` 79 | --- 80 | 81 | Use cases of smart pointers 82 | 83 | ``` C++ 84 | std::unique_ptr upDog1 85 | = std::make_unique("Boxer", "Meow", 2); // make_unique 86 | auto upDog2 = std::move(upDog1); // operator = 87 | spDog1 = nullptr; // operator = (nullptr_t) 88 | if (upDog1) { // operator bool() 89 | std::cout << upDog1->Species() << std::endl; // operator -> 90 | } 91 | ``` 92 | 93 | ``` C++ 94 | std::shared_ptr spDog1 95 | = std::make_shared("Boxer", "Meow", 2); // make_shared 96 | auto spDog2 = spDog1; // operator = 97 | spDog1 = nullptr; // operator = (nullptr_t) 98 | if (spDog2) { // operator bool() 99 | std::cout << spDog2->Species() << std::endl; // operator -> 100 | } 101 | ``` 102 | --- 103 | 104 | Use cases of smart pointers: `release`, `get`, `reset` 105 | ``` C++ 106 | Dog* pDog = upDog1.release(); // Returns a pointer to the managed object 107 | // and releases the ownership 108 | Dog* pDog = upDog1.get(); // Returns a pointer to the managed object 109 | upDog1.reset(newDog); // Replaces the managed object 110 | upDog1.get_deleter(); // Get the deleter 111 | ``` 112 | 113 | ``` C++ 114 | Dog* pDog = spDog1.get(); // Returns a pointer to the managed object 115 | spDog1.reset(newDog); // Replaces the managed object 116 | spDog1.get_deleter(); // Deleter of object 117 | spDog1.use_count(); // How many shared_ptr own this the object 118 | spDog1.unique(); // Removed from C++20; equiv. use_count() == 1 119 | ``` 120 | 121 | --- 122 | 123 | `unique_ptr` 124 | 125 | ``` C++ 126 | // Copy will be prevented 127 | std::unique_ptr spDog1(new Dog("Boxer", "Meow", 2)); 128 | std::unique_ptr spDog2; 129 | ``` 130 | ``` C++ 131 | // Compilation error. Copy between unique_ptrs is not allowed. 132 | spDog2 = spDog1; 133 | ``` 134 | ``` C++ 135 | // Move (transferring the ownership) is allowed 136 | spDog2 = std::move(spDog1); 137 | // Then: pDog is null (moved), and pDog2 holds the pointer that was in pDog. 138 | ``` 139 | 140 | * **Exclusive**-ownership 141 | * Movable and move-only 142 | 143 | --- 144 | 145 | `unique_ptr` 146 | 147 | ``` C++ 148 | { 149 | // ... 150 | std::unique_ptr spDog(new Dog("Boxer", "Meow", 2)); 151 | // ... 152 | } // spDog->~spDog() will be called when exiting scope. 153 | ``` 154 | 155 | * RAII (**R**esource **A**cquisition **I**s **I**nitialization) 156 | * "A scoped life" 157 | 158 | --- 159 | 160 | Rationale of `unique_ptr` (Pseudo code) 161 | ``` C++ 162 | template class unique_ptr { 163 | T* m_ptr; 164 | public: 165 | explicit unique_ptr(T* p); 166 | explicit unique_ptr(unique_ptr&& rhs); 167 | this_type& operator = (unique_ptr&& rhs); 168 | unique_ptr(this_type const& rhs) = delete; 169 | this_type& operator = (this_type const& rhs) = delete; 170 | ~unique_ptr() {delete m_ptr;} 171 | }; 172 | ``` 173 | * Has move constructor and move assignment 174 | * Prevent copy assignment and copy constructor 175 | * Delete object when life of ptr will be ended 176 | --- 177 | 178 | Quiz 1: When will the `dog` be deleted? 179 | 180 | ``` C++ 181 | { 182 | // ... 183 | std::unique_ptr spDog; 184 | { 185 | // ... 186 | std::unique_ptr spDog2(new Dog("Boxer", "Meow", 2)); 187 | // ... 188 | spDog = std::move(spDog2); 189 | // ... 190 | } 191 | // ... 192 | } 193 | ``` 194 | * At last `}` 195 | --- 196 | 197 | Quiz 2: Is following code correct? 198 | 199 | ``` C++ 200 | { 201 | // ... 202 | Dog* pDog = new Dog("Boxer", "Meow", 2); 203 | // ... 204 | std::unique_ptr spDog1(pDog); 205 | // ... 206 | std::unique_ptr spDog2(pDog); 207 | // ... 208 | } 209 | ``` 210 | * Incorrect. Attempting to delete "Meow" twice. 211 | --- 212 | 213 | Quiz 3: Is following code correct? 214 | 215 | ``` C++ 216 | { 217 | // ... 218 | Dog* pDog = new Dog("Boxer", "Meow", 2); 219 | // ... 220 | std::unique_ptr spDog(pDog); 221 | // ... 222 | spDog.reset(new Dog("Boxer", "Wow", 2)); 223 | // ... 224 | } 225 | ``` 226 | * Correct. At 2nd reset, spDog will delete "Meow" then hold "Wow" 227 | * At last `}`, "Wow" will be deleted 228 | 229 | --- 230 | 231 | 245 | ||| 246 | |--|--| 247 | | `shared_ptr`:
The ownership will be "shared" by reference counting
> **Top-down arrow** means lifetime of objects
> **Horizontal arrow** means _copy_ action | ![](media/SharedPtrLifetime.png) | 248 | 249 | --- 250 | 251 | Quiz 4: When will "Meow" be deleted? 252 | 253 | ``` C++ 254 | { 255 | // ... 256 | std::shared_ptr spDog1; 257 | { 258 | // ... 259 | std::shared_ptr spDog2(new Dog("Boxer", "Meow", 2)); 260 | // ... 261 | spDog1 = spDog2; 262 | // ... 263 | } 264 | // ... 265 | } 266 | ``` 267 | 268 | * 2nd `}` 269 | 270 | --- 271 | Quiz 5: Is following code correct? 272 | 273 | ``` C++ 274 | { 275 | // ... 276 | Dog* pDog = new Dog("Boxer", "Meow", 2); 277 | // ... 278 | std::shared_ptr spDog1(pDog); 279 | // ... 280 | std::shared_ptr spDog2(pDog); 281 | // ... 282 | } 283 | ``` 284 | * Incorrect. Attempting to delete `pDog` twice. 285 | 286 | --- 287 | 288 | Quiz 6: Is following code correct? 289 | 290 | ``` C++ 291 | { 292 | // ... 293 | Dog* pDog = new Dog("Boxer", "Meow", 2); 294 | // ... 295 | std::shared_ptr spDog(pDog); 296 | // ... 297 | spDog.reset(new Dog("Boxer", "Wow", 2)); 298 | // ... 299 | } 300 | ``` 301 | * Correct 302 | -------------------------------------------------------------------------------- /slides/PMC++.1_SmartPointers.2.md: -------------------------------------------------------------------------------- 1 | --- 2 | marp: true 3 | paginate: true 4 | style: | 5 | section { 6 | background-color: #fffff2; 7 | font-family: 'Palatino', 'Charter', 'STHeiti', 'Segoe UI Emoji'; 8 | } 9 | section pre { 10 | font-size: 0.9em; 11 | } 12 | --- 13 | 14 | 15 | # Pointers, smart pointers and ownership, II 16 | 17 | 25 | 26 | --- 27 | 42 | 43 | # Take a breath, and let's dive into deeper ... 44 | 45 | * ~~回囘囬徊𢌞廻廽迴逥佪字有几种写法~~ 46 | * 智能指针有几种创建方法 47 | * 智能指针和一般意义上的垃圾回收机制有什么不同 48 | * `shared_ptr` 是线程安全的吗? 49 | * 和 _裸指针_ 比,智能指针的性能如何? 50 | * 如何 _恰当地_ 使用智能指针 51 | * ... 52 | 53 | --- 54 | 55 | 3 ways to create a `unique_ptr` 56 | 57 | ``` C++ 58 | // 1. Create by constructor 59 | std::unique_ptr upDog1{ new Dog("Boxer", "Meow", 3) }; 60 | // OR ... ( ... ); 61 | 62 | // 2. Create by 'reset' 63 | std::unique_ptr upDog3; 64 | upDog3.reset(new Dog("Boxer", "Meow", 3)); 65 | // Illegal: upDog3 = new Dog("Boxer", "Meow", 3); 66 | 67 | // 3. Create by make_unique 68 | std::unique_ptr upDog4 = std::make_unique("Boxer", "Meow", 3); 69 | // OR shorter 70 | auto upDog5 = std::make_unique("Boxer", "Meow", 3); 71 | ``` 72 | 73 | --- 74 | 75 | 3 ways to create a `shared_ptr` 76 | ``` C++ 77 | // 1. Create by constructor 78 | std::shared_ptr spDog1{ new Dog("Boxer", "Meow", 3) }; 79 | // OR ... ( ... ); 80 | 81 | // 2. Create by `reset` 82 | std::shared_ptr spDog2; 83 | spDog3.reset(new Dog("Boxer", "Meow", 3)); 84 | // Illegal: spDog3 = new Dog("Boxer", "Meow", 3); 85 | 86 | // 3. Create by make_shared 87 | std::shared_ptr spDog3 = std::make_shared("Boxer", "Meow", 3); 88 | // OR shorter 89 | auto spDog4 = std::make_shared("Boxer", "Meow", 3); 90 | ``` 91 | 92 | --- 93 | 94 | Any difference between `make_shared` and `shared_ptr(new T())`? 95 | 96 | * Don't hurt OCD! 97 | * No `delete`, no `new` 98 | * `make_shared` is more efficient 99 | * The memory layout generated by `make_shared` is more compact 100 | * Cache friendly, and one less allocation than the other 101 | 102 | --- 103 | 111 | 112 | Any difference between `make_shared` and `shared_ptr(new T())`? 113 | 114 | * `make_shared` is more efficient 115 | * CB: Control block, contains shared/weak counters and `deleter` 116 | 117 | * ![](media/SPLayout.svg) 118 | 119 | --- 120 | 121 | Quiz 7: Which line(s) is(are) correct? 122 | 123 | ``` C++ 124 | auto u{std::make_unique(42)}; 125 | auto s{std::make_shared(42)}; 126 | 127 | u = s; // 1 128 | s = u; // 2 129 | u = std::move(s); // 3 130 | s = std::move(u); // 4 131 | ``` 132 | 133 | * 4 134 | 135 | --- 136 | 137 | Quiz 8: Which line(s) is(are) correct? 138 | 139 | ``` C++ 140 | std::shared_ptr p = new int[42]; 141 | std::shared_ptr p = new int[42]; 142 | 143 | std::unique_ptr p = new int[42]; 144 | std::unique_ptr p = new int[42]; 145 | ``` 146 | 147 | --- 148 | 149 | Quiz 8: Which line(s) is(are) correct? 150 | 151 | ``` C++ 152 | std::shared_ptr p = new int[42]; // Incorrect 153 | std::shared_ptr p = new int[42]; // Correct from C++17 154 | 155 | std::unique_ptr p = new int[42]; // Incorrect 156 | std::unique_ptr p = new int[42]; // Correct from C++11 157 | ``` 158 | 159 | ![](media/小问号.jpeg) 160 | 161 | ``` C++ 162 | // Workaround in C++11 by customized deleter. 163 | std::shared_ptr sp(new int[42], [](int* p) {delete[] p;}); 164 | ``` 165 | 166 | --- 167 | 168 | Now you need an array. Pick one of them. 169 | 170 | * `T[N]` 171 | * or `array`, `vector` 172 | * or `shared_ptr`, `unique_ptr` 173 | * or `shared_ptr>`, `unique_ptr>`, `shared_ptr>`, `unique_ptr>` 174 | * or `array, N>`, `array, N>`, `vector>`, `vector>` 175 |
176 | * ![](media/fck.gif) 177 | 178 | 192 | 193 | * Will be discussed in next session 194 | 195 | --- 196 | 197 | Type casting between `shared_ptr` and `shared_ptr` 198 | 199 | ``` C++ 200 | class Base { /* ... */ }; 201 | class Derived: public Base { /* ... */ }; 202 | 203 | // Correct 204 | std::shared_ptr pBase = make_shared(); 205 | 206 | // Compilation Failed 207 | std::shared_ptr pDerived = pBase; 208 | 209 | // Runtime error 210 | std::shared_ptr pDerived(static_cast(pBase.get())); 211 | 212 | // Correct 213 | std::shared_ptr pDerived = std::static_pointer_cast(pBase); 214 | std::shared_ptr pDerived = std::shared_ptr( 215 | pBase, dynamic_cast(pBase.get())); // Aliasing constructor, C++11 216 | ``` 217 | 218 | --- 219 | 220 | Convert smart pointers by following functions. 221 | 222 | ``` C++ 223 | // from C++11 224 | template 225 | std::shared_ptr static_pointer_cast( const std::shared_ptr& r ) noexcept; 226 | 227 | template 228 | std::shared_ptr dynamic_pointer_cast( const std::shared_ptr& r ) noexcept; 229 | 230 | template 231 | std::shared_ptr const_pointer_cast( const std::shared_ptr& r ) noexcept; 232 | 233 | // from C++17 234 | template 235 | std::shared_ptr reinterpret_pointer_cast( const std::shared_ptr& r ) noexcept; 236 | ``` 237 | 238 | They may be implemented by _aliasing constructor_ 239 | 240 | --- 241 | 242 | How about cast between `unique_ptr`? 243 | 244 | ``` C++ 245 | class Base { /* ... */ }; 246 | class Derived: public Base { /* ... */ }; 247 | 248 | // For convertible types 249 | std::unique_ptr pBase = make_unique(); 250 | 251 | // For non-convertible types 252 | // Forget it. 253 | ``` 254 | 255 | --- 256 | 257 | 264 | Question: Is `shared_ptr` equivalent to GC object? 265 | 266 | --- 267 | 268 | We are designing the class for tree nodes by Python 269 | 270 | ``` Python 271 | class Node: 272 | def __init__(self, parent): 273 | self._parent = parent 274 | if self._parent: 275 | self._parent.addChild(self) 276 | self._children = set() 277 | 278 | def addChild(self, n): 279 | self._children.add(n) 280 | 281 | def removeChild(self, n): 282 | self._children.remove(n) 283 | ``` 284 | 285 | * Everything looks good 286 | --- 287 | 288 | Port it to C++ 289 | 290 | ``` C++ 291 | class Node { 292 | private: 293 | std::shared_ptr m_parent; 294 | std::vector> m_children; 295 | // ... 296 | public: 297 | void addChild(std::shared_ptr const& n) { /* ... */ } 298 | void removeChild(std::shared_ptr const& n) { /* ... */ } 299 | 300 | static std::shared_ptr CreateNode(std::shared_ptr const& parent) { /* ... */ } 301 | // ... 302 | }; 303 | ``` 304 | 305 | * Why do we need `CreateNode`? 306 | * Getting `shared_ptr` of itself is not a trivial problem 307 | 308 | --- 309 | 310 | Then do some boring thing 311 | 312 | ``` Python 313 | def doBoringStuff(): 314 | for i in range(100): 315 | root = Node(None) 316 | c = Node(root) 317 | gc.collect() # We assume that gc.collect do its duty. 318 | ``` 319 | 320 | ``` C++ 321 | void doBoringStuff() { 322 | for (int i = 0; i < 100; ++i) { 323 | auto root = CreateNode( std::shared_ptr() ); 324 | auto c = CreateNode( root ); 325 | } 326 | } 327 | ``` 328 | 329 | * How many `Node` objects are living while `doBoringStuff` returned? 330 | 331 | --- 332 | 333 | * Python Version: `0` :smiley: 334 | 335 |
336 |
337 |
338 | 339 | * C++ Version: `200` :hankey: 340 | 341 | --- 342 | * Why? 343 | * Circular referencing 344 | 345 | ![](media/python-cyclic-gc-5-new-page.png) 346 | 347 | * Solution - change the declaration of `m_parent` to: 348 | * `Node* m_parent;` OR 349 | * `std::weak_ptr m_parent;` 350 | 351 | --- 352 | 353 | `weak_ptr`: Construction, `expired`, `use_count` and `lock` 354 | 355 | ``` C++ 356 | { 357 | weak_ptr wpDog; 358 | { 359 | shared_ptr spDog = make_shared("Boxer", "Meow", 3); 360 | wpDog = spDog; 361 | std::cout << wpDog.expired(); 362 | std::cout << wpDog.use_count(); 363 | shared_ptr spDog2 = wpDog.lock(); 364 | std::cout << spDog2.get(); 365 | std::cout << wpDog.use_count(); 366 | } 367 | std::cout << wpDog.use_count(); 368 | shared_ptr spDog3 = wpDog.lock(); 369 | std::cout << wpDog.expired(); 370 | std::cout << spDog3.get(); 371 | } 372 | ``` 373 | 374 | --- 375 | 376 | `weak_ptr` 377 | 378 | ``` C++ 379 | { 380 | weak_ptr wpDog; 381 | { 382 | shared_ptr spDog = make_shared("Boxer", "Meow", 3); 383 | wpDog = spDog; 384 | std::cout << wpDog.expired(); // False 385 | std::cout << wpDog.use_count(); // 1; use_count() returns shared count 386 | auto spDog2 = wpDog.lock(); 387 | std::cout << spDog2.get(); // spDog2.get() == spDog.get(); 388 | std::cout << wpDog.use_count(); // 2; 389 | } 390 | std::cout << wpDog.use_count(); // 0; spDog and spDog2 were released. 391 | auto spDog3 = wpDog.lock(); // spDog3 is null. 392 | std::cout << wpDog.expired(); // True 393 | std::cout << spDog3.get(); // null. 394 | } 395 | ``` 396 | 397 | --- 398 | 399 | Quiz: Is following code correct? 400 | 401 | ``` C++ 402 | class Plugin { 403 | std::weak_ptr m_doc; 404 | void ProcessDoc() { 405 | auto pDoc = m_doc.lock().get(); 406 | CreateDocHotWordView(pDoc); 407 | } 408 | }; 409 | ``` 410 | 411 | * No. The locked shared pointer is released after statement `auto pDoc = ...`. If the owner releases the object after `pDoc` was assigned, then `pDoc` becomes a dangling pointer. 412 | * It is fault of C++ standard. `get()` should be `ref`-qualified. 413 | 414 | --- 415 | 416 | The correct one: 417 | ``` C++ 418 | class Plugin { 419 | std::weak_ptr m_doc; 420 | void ProcessDoc() { 421 | auto spDoc = m_doc.lock(); 422 | CreateDocHotWordView(spDoc.get()); 423 | } 424 | }; 425 | ``` 426 | Or (but not suggested): 427 | ``` C++ 428 | class Plugin { 429 | std::weak_ptr m_doc; 430 | void ProcessDoc() { CreateDocHotWordView(m_doc.lock().get()); } 431 | }; 432 | ``` 433 | 434 | --- 435 | 440 | Quiz: Is following code thread safety? 441 | 442 | ``` C++ 443 | class Page { 444 | // ... 445 | void beginEdit(); // lock and unlock for exclusive operations such as editing 446 | void endEdit(); 447 | }; 448 | class Notebook{ 449 | std::vector> m_pages; 450 | public: 451 | shared_ptr getPage(int i) { return m_pages[i]; } 452 | void beautify() { 453 | for (auto& page: m_pages) { 454 | if (page.use_count() > 1) { page->beginEdit(); } 455 | // ... do beautify ... 456 | if (page.use_count() > 1) { page->endEdit(); } 457 | } 458 | } 459 | }; 460 | ``` 461 | --- 462 | 463 | Quiz: Is following code thread safety? 464 | 465 | ``` C++ 466 | class Page { /* ... */ }; 467 | class Notebook{ 468 | // ... 469 | void beautify() { 470 | for (auto& page: m_pages) { 471 | if (page.use_count() > 1) { page->beginEdit(); } 472 | // ... do beautify ... 473 | if (page.use_count() > 1) { page->endEdit(); } 474 | } 475 | } 476 | }; 477 | ``` 478 | * No. Consider: Another thread holds a `weak_ptr` 479 | --- 480 | 481 | It is also an use case of `unique_ptr` 482 | 483 | ``` C++ 484 | class Page { /* ... */ }; 485 | class Notebook{ 486 | // ... 487 | static void endEdit(Page* page) { page->endEdit(); } 488 | void beautify() { 489 | for (auto& page: m_pages) { 490 | std::unique_ptr pageEditLock; 491 | if (page.use_count() > 1) { 492 | pageEditLock = std::unique_ptr( 493 | &page, Notebook::endEdit 494 | ); 495 | page->beginEdit(); 496 | } 497 | // ... do beautify ... 498 | } 499 | } 500 | }; 501 | ``` 502 | -------------------------------------------------------------------------------- /slides/PMC++.1_SmartPointers.3.md: -------------------------------------------------------------------------------- 1 | --- 2 | marp: true 3 | paginate: true 4 | style: | 5 | section { 6 | background-color: #fffff2; 7 | font-family: 'Palatino', 'Charter', 'STHeiti', 'Segoe UI Emoji'; 8 | } 9 | section pre { 10 | font-size: 0.9em; 11 | } 12 | --- 13 | 14 | 15 | # Pointers, smart pointers and ownership, III 16 | 17 | 25 | 26 | --- 27 | * 第一期 28 | * 为什么要有智能指针,为什么要用智能指针 29 | * 智能指针的基本使用 30 | * 一些帮助大家理解智能指针的题目 31 | * 第二期 32 | * 指向数组的智能指针在不同C++标准中的可用性 33 | * 智能指针的三种创建方式及他们的不同 34 | * `shared_ptr`与垃圾回收机制 35 | * 理解基于引用计数的 `shared_ptr` 的特点与陷阱 36 | * `weak_ptr` 的作用、使用与陷阱 37 | 38 | --- 39 | * 第三期 40 | * 指针的选择与系统设计 41 | * 性能问题 42 | * `shared_ptr` 的线程安全性 43 | 44 | --- 45 | Make a choice between `unique_ptr`, `shared_ptr`, `weak_ptr` and _raw pointers_ 46 | 47 | * It is not a simple question. 48 | 49 | * Following rules may help you. 50 | 51 | --- 52 | 53 | Q1: How many reference points for the object?
54 | 55 | If the answer is "1", `unique_ptr`.

56 | 57 | ``` C++ 58 | class UniqueHwModule 59 | { 60 | public: 61 | std::vector Read(address_t start, size_t length) { 62 | // ... 63 | } 64 | private: 65 | std::unique_ptr m_privateMem; 66 | }; 67 | ``` 68 | 69 | --- 70 | 71 | Q2: How many pointers **indeed own** the object at the same time? 72 | 73 | If the answer is "more than 1", 74 | AND 75 | equals to the number of ref points, `shared_ptr` 76 | 77 | ``` C++ 78 | class Processor { 79 | std::shared_ptr m_sharedCache; 80 | }; 81 | 82 | class Chip { 83 | std::vector m_processors; 84 | }; 85 | ``` 86 | --- 87 | 88 | Q2: How many pointers **indeed own** the object at the same time? 89 | 90 | If the answer is "1" please consider `unique_ptr` OR _value_ + _raw pointers_ 91 | 92 | ``` C++ 93 | class Processor { 94 | Cache* m_sharedCache; 95 | }; 96 | 97 | class Chip { 98 | std::unique_ptr m_sharedCache; 99 | // OR 100 | // Cache m_sharedCache; 101 | // if polymorphism or replaceablity is not required. 102 | std::vector m_processors; 103 | }; 104 | ``` 105 | 106 | --- 107 | Q2: How many pointers **indeed own** the object at the same time? 108 | 109 | If the answer is "more than 1", 110 | AND 111 | Not all reference points are owners. 112 | 113 | For e.g., 114 | 115 | * Messages are passing in a complex, asynchronized system 116 | * The object will be shared out of our system 117 | * **CAUTION**: Circular references may be overlooked in complicated scenarios. 118 | 119 | --- 120 | 121 | Q3: Could the lifetime of (not owned) ref points be longer than the lifetime of the object? If **NO**, 122 | `shared_ptr` for owners, and _raw pointers_ for the others 123 | 124 | ``` C++ 125 | class InstInterpreter{ 126 | void ProcessInstPacket(InstPacket const* inst); 127 | }; 128 | 129 | class ShaderExecutor{ 130 | void RunOneInstruction() { 131 | // ... 132 | auto instPacket = make_shared(/* args ...*/); 133 | m_ifcToOtherBlocks->send(instPacket); 134 | m_interpreters[instPacket->Category()]->ProcessInstPacket(instPacket.get()); 135 | } 136 | std::unordered_map> m_interpreters; 137 | }; 138 | ``` 139 | 140 | --- 141 | 142 | Could the lifetime of (not owned) ref points be longer than the lifetime of the object? If **YES** 143 | Q4: Do you **try** to use it even the object has been deleted? 144 | * NO, indicates the BAD design. Potential pointer dangling issue here. 145 | * YES, Q5: Are you **eager** to resurrect it? 146 | * YES That means the lifetime is too short. Please consider: 147 | * Own it by `shared_ptr` OR 148 | * Own a copy. 149 | * NO, how to get its living/available state? 150 | 151 | --- 152 | ``` C++ 153 | // (Continued) NO, how to get its living/available state? Solution 1 154 | // Main App 155 | class DomNode { 156 | Node* m_parent; 157 | std::unordered_map> m_children; 158 | void remove(NodeId nodeId) { m_children.erase(nodeId); } 159 | }; 160 | // Plug-in 161 | class Plugin { 162 | std::weak_ptr m_node; 163 | void Process() { 164 | if (!m_node.expired()) { ProcessLivingNode(m_node.lock().get()); } 165 | } 166 | }; 167 | ``` 168 | * Any issue in `Process()`? 169 | 170 | --- 171 | 172 | ``` C++ 173 | // (Continued) NO, how to get its living/available state? Solution 1 174 | 175 | // Main App 176 | class DomNode { 177 | Node* m_parent; 178 | std::unordered_map> m_children; 179 | void remove(NodeId nodeId) { m_children.erase(nodeId); } 180 | }; 181 | 182 | // Plug-in 183 | class Plugin { 184 | std::weak_ptr m_node; 185 | void Process() { 186 | if (auto node = m_node.lock()) { ProcessLivingNode(node.get()); } 187 | } 188 | }; 189 | ``` 190 | 191 | --- 192 | 193 | ``` C++ 194 | // (Continued) NO, how to check whether it is living/available? Solution 2 195 | 196 | // Main App 197 | class DomNode { 198 | Node* m_parent; 199 | std::unordered_map> m_children; 200 | void invalidate() { /* ... */ } 201 | bool isValid() const noexcept { /* ... */ } 202 | void remove(NodeId nodeId) { 203 | m_children[nodeId]->invalidate(); 204 | m_children.erase(nodeId); 205 | } 206 | }; 207 | 208 | // Plug-in 209 | class Plugin { 210 | std::shared_ptr m_node; 211 | void Process() { 212 | if (m_node && m_node->isValid()) { ProcessNode(m_node.get()); } 213 | else { m_node.reset(); } 214 | } 215 | }; 216 | ``` 217 | 218 | --- 219 | 220 | Example: A design of the message passing system 221 | 222 | ``` C++ 223 | struct TypeDefinition { 224 | using MessageType = /* ??? */; 225 | template using QueueType = /* ??? */; 226 | }; 227 | 228 | class ModuleA { 229 | QueueType m_messageQueue; 230 | }; 231 | 232 | class ModuleB { 233 | QueueType m_messageQueue; 234 | }; 235 | ``` 236 | 237 | --- 238 | 1 Source, 1 Sink *OR* N Sinks but 1 Consumer 239 | 240 | ``` C++ 241 | // OPT 1 242 | using MessageType = std::unique_ptr; // OR 243 | using MessageType = Message; 244 | 245 | template using QueueType = std::shared_ptr>; 246 | 247 | // OPT 2 248 | class Subsystem { 249 | std::unique_ptr m_A; 250 | std::unique_ptr m_B; 251 | std::unique_ptr> m_MsgQueAB; 252 | // Optional 253 | ObjectPool m_messagePool; 254 | }; 255 | 256 | template using QueueType = std::shared_ptr>; 257 | ``` 258 | 259 | --- 260 | 261 | 1 Source, N Sinks, K Consumers (1 < K <= N) 262 | ``` C++ 263 | // OPT 1 264 | using MessageType = std::shared_ptr; // shared object 265 | // OR 266 | using MessageType = Message; // value semantic 267 | 268 | // OPT 2 269 | // Allocate multiple times, free once 270 | class Subsystem { 271 | // ... 272 | ObjectManager m_messageManager; 273 | }; 274 | 275 | using MessageType = Message*; 276 | ``` 277 | 278 | --- 279 | 280 | 295 | # Anything else? 296 | 297 | * YES 298 | 299 | --- 300 | 301 | Performance 302 | 303 | * The ops with same cost as _raw pointers_, a.k.a, zero-cost abstraction 304 | * All ops in `unique_ptr`. 305 | * `make_unique` 306 | * `unique_ptr(new T())` 307 | * Movement of `unique_ptr` 308 | * `unique_ptr::get()` / `unique_ptr::operator ->()` 309 | * Deleting `unique_ptr` by static typed `deleter` 310 | * `shared_ptr::get()` / `shared_ptr::operator ->()` 311 | 312 | --- 313 | 314 | 322 | Performance 323 | 324 | * A little reasonable cost 325 | * `make_shared` 326 | * Time cost: 1.2x of `new T()` 327 | * Space cost: 1 CB per shared object and 2x size per pointer 328 | * `weak_ptr::lock()` 329 | * Creating a copy of `shared_ptr` 330 | 331 | --- 332 | 333 | 341 | Performance 342 | 343 | * More expensive than you think 344 | * `shared_ptr(new T);` 2x time cost compare to `new T()` 345 | * Copy operation of `shared_ptr` 346 | * The time cost of `copy` would be up to ~30% in sub-system 347 | * Shared counters are `std::atomic<>` for thread safety copy 348 | 349 | --- 350 | 351 | Performance 352 | 353 | Optimize the `shared_ptr` performance issue 354 | * If not a bottleneck, just accept it 355 | * Review your design and reconsider the lifetime management of objects 356 | * If all pointers of the object are used in the same thread, try `boost::intrusive_ptr` or `boost::local_shared_ptr` 357 | 358 | --- 359 | 360 | Thread safety of `shared_ptr` 361 | Quiz: In `threaded_func`, which statement(s) is(are) thread-safety? 362 | 363 | ``` C++ 364 | void threaded_func(shared_ptr& spDog) { 365 | shared_ptr spDog2 = spDog; 366 | spDog2.use_count(); 367 | spDog.use_count(); 368 | spDog2.reset(); 369 | spDog->Bark(); 370 | spDog.reset(); 371 | } 372 | 373 | void main(){ 374 | auto spDog = std::make_shared("Boxer", "Meow", 3); 375 | std::jthread t1([&spDog](){ threaded_func(spDog); }); 376 | std::jthread t2([&spDog](){ threaded_func(spDog); }); 377 | } 378 | ``` 379 | 380 | --- 381 | 382 | 390 | 391 | > All member functions (including copy constructor and copy assignment) can be called by multiple threads on different instances of `shared_ptr` without additional synchronization even if these instances are copies and share ownership of the same object. 392 | 393 | > If multiple threads of execution access the **same** `shared_ptr` without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur; ... 394 | 395 | --- 396 | 397 | Thread safety of `shared_ptr` 398 | Quiz: In `threaded_func`, which statement(s) is(are) thread-safety? 399 | 400 | ``` C++ 401 | void threaded_func(shared_ptr& spDog) { 402 | shared_ptr spDog2 = spDog; // Yes 403 | spDog2.use_count(); // Yes 404 | spDog.use_count(); // Yes 405 | spDog2.reset(); // Yes 406 | spDog->Bark(); // Depends on Dog's thread safety 407 | spDog.reset(); // No 408 | } 409 | 410 | void main(){ 411 | auto spDog = std::make_shared("Boxer", "Meow", 3); 412 | std::jthread t1([&spDog](){ threaded_func(spDog); }); 413 | std::jthread t2([&spDog](){ threaded_func(spDog); }); 414 | } 415 | // For thread safety read/write shared_ptr, use `atomic_*(shared_ptr)` 416 | // Or std::atomic> from C++20 417 | ``` 418 | -------------------------------------------------------------------------------- /slides/PMC++.1_SmartPointers.4.md: -------------------------------------------------------------------------------- 1 | --- 2 | marp: true 3 | paginate: true 4 | style: | 5 | section { 6 | background-color: #fffff2; 7 | font-family: 'Palatino', 'Charter', 'STHeiti', 'Segoe UI Emoji'; 8 | } 9 | section pre { 10 | font-size: 0.9em; 11 | } 12 | --- 13 | 14 | 15 | # Pointers, smart pointers and ownership, IV 16 | 17 | 25 | 26 | --- 27 | * 第一期 28 | * 为什么要有智能指针,为什么要用智能指针 29 | * 智能指针的基本使用 30 | * 一些帮助大家理解智能指针的题目 31 | --- 32 | 33 | * 第二期 34 | * 指向数组的智能指针在不同C++标准中的可用性 35 | * 智能指针的三种创建方式及他们的不同 36 | * `shared_ptr`与垃圾回收机制 37 | * 理解基于引用计数的 `shared_ptr` 的特点与陷阱 38 | * `weak_ptr` 的作用、使用与陷阱 39 | 40 | --- 41 | * 第三期 42 | * 指针的选择与系统设计 43 | * 性能问题 44 | * `shared_ptr` 的线程安全性 45 | 46 | --- 47 | 48 | * 第四期 49 | * 如何从对象的 *raw pointer* 获取它的 `shared_ptr` 50 | * 理解自定义指针销毁器(`deleter`) 51 | * 其它非标准的智能指针 52 | 53 | --- 54 | 55 | How to get `shared_ptr` by _raw pointer_ or _reference_? 56 | Is following code correct? 57 | 58 | ``` C++ 59 | class Node { 60 | public: 61 | std::shared_ptr getShared() { return std::shared_ptr(this); } 62 | }; 63 | 64 | void foo() { 65 | // Consider following case 66 | Node n; 67 | auto sp1 = n.getShared(); // What happened? 68 | 69 | Node* pn = new Node(); 70 | auto sp2 = pn->getShared(); // What happened? 71 | } 72 | ``` 73 | 74 | --- 75 | 76 | How to get `shared_ptr` by _raw pointer_ or _reference_? 77 | Is following code correct? 78 | 79 | ``` C++ 80 | class Node { 81 | public: 82 | std::shared_ptr getShared() { return std::shared_ptr(this); } 83 | }; 84 | 85 | void foo() { 86 | // Consider following case 87 | Node n; 88 | auto sp1 = n.getShared(); // !!! WRONG!!! Attempting to delete object twice. 89 | 90 | Node* pn = new Node(); 91 | auto sp2 = pn->getShared(); // !!!DANGER!!! Invisible errors. 92 | } 93 | ``` 94 | 95 | --- 96 | How to get `shared_ptr` by _raw pointer_ or _reference_? 97 | Is following code correct? 98 | 99 | ``` C++ 100 | class Node { 101 | private: 102 | std::shared_ptr self; 103 | public: 104 | void setMyself(std::shared_ptr const& selfPtr) { 105 | if (myself.get() != this) { 106 | throw std::invalid_argument(" ... "); 107 | } 108 | self = selfPtr; 109 | } 110 | std::shared_ptr getShared() { 111 | return self; 112 | } 113 | }; 114 | ``` 115 | 116 | --- 117 | 118 | How to get `shared_ptr` by _raw pointer_ or _reference_? 119 | Use `weak_ptr` intrusively. 120 | ``` C++ 121 | class Node { 122 | private: 123 | std::weak_ptr self; 124 | public: 125 | void setMyself(std::shared_ptr const& selfPtr) { 126 | if (myself.get() != this) {throw std::invalid_argument(" ... ");} 127 | self = selfPtr; 128 | } 129 | std::shared_ptr getShared() { 130 | return self.lock(); 131 | } 132 | }; 133 | ``` 134 | 135 | --- 136 | 137 | How to get `shared_ptr` by _raw pointer_ or _reference_? 138 | Use `std::enable_shared_from_this`. 139 | 140 | ``` C++ 141 | class Node: public std::enable_shared_from_this { 142 | // ... 143 | std::shared_ptr getShared() { 144 | return shared_from_this(); 145 | } 146 | // ... 147 | }; 148 | 149 | void foo() { 150 | Node n; 151 | n.getShared(); 152 | // Throws the exception "bad_weak_ptr" 153 | std::shared_ptr pNode{ new Node() }; 154 | pNode->getShared(); 155 | // Yes, the shared_ptr's constructor 156 | // will initialize the __weak_this embedded in Node 157 | } 158 | ``` 159 | 160 | --- 161 | 162 | How to get `shared_ptr` by _raw pointer_ or _reference_? 163 | Or prevent users creating objects by `new` or _from the stack_ 164 | 165 | ```C++ 166 | class Node { 167 | private: 168 | Node( /* ... */ ) { /* ... */ } 169 | 170 | public: 171 | // The only way to create an object is by specified factory method. 172 | template shared_ptr Create(Args&&... args); 173 | }; 174 | ``` 175 | 176 | --- 177 | 178 | User-defined `deleter`. 179 | 180 | ```C++ 181 | struct FnCloseFile { void operator() (FILE* pf) const { fclose(pf); } }; 182 | 183 | // type of unique_ptr: std::unique_ptr, 184 | // Deleter should be instantiated and callable. 185 | // type of shared_ptr: std::shared_ptr 186 | 187 | // deleter is a functor 188 | std::unique_ptr uf1(fopen("file.txt", "r")); 189 | std::shared_ptr sf1(fopen("file.txt", "r"), FnCloseFile{}); 190 | ``` 191 | 192 | --- 193 | 194 | ```C++ 195 | // deleter is a C function 196 | std::unique_ptr uf2a(fopen("file.txt", "r"), fclose); 197 | // COMPILATION FAILED: 198 | // std::unique_ptr uf2b(fopen("file.txt", "r"), fclose); 199 | std::unique_ptr uf2b(fopen("file.txt", "r"), fclose); 200 | std::shared_ptr sf2(fopen("file.txt", "r"), fclose); 201 | 202 | // deleter is a lambda function 203 | std::unique_ptr uf3_cpp20( 204 | fopen("file.txt", "r")); 205 | std::unique_ptr> uf3_cpp11( 206 | fopen("file.txt", "r"), [](FILE* p){fclose(p);}); 207 | std::shared_ptr sf3(fopen("file.txt", "r"), [](FILE* p){fclose(p);}); 208 | ``` 209 | 210 | * Why? 211 | 212 | --- 213 | 214 | For `unique_ptr`, we hope that the performance of following snippets are *exactly* same: 215 | 216 | ``` C++ 217 | { // 1a: 218 | std::unique_ptr uf1(fopen("file.txt", "r")); 219 | writeSomething(uf1); 220 | } 221 | 222 | { // 1b: 223 | FILE* f1 = fopen("file.txt", "r"); 224 | writeSomething(f1); 225 | fclose(f1); 226 | } 227 | ``` 228 | 229 | Any type erasing will hurt performance. So the `Deleter` should be instantiated by default constructor. 230 | 231 | --- 232 | 233 | * For `shared_ptr`, the `deleter` is stored in a function object which is similar to `std::function`. 234 | * To cost of calling a generic function object is close to a virtual function call plus 1+ indirect jumps. 235 | * It is also called 236 | * 虱多不痒,债多不愁 237 | 238 | --- 239 | 240 | Stateful `deleter` of `unique_ptr` 241 | 242 | ``` C++ 243 | template class Pool 244 | { 245 | private: 246 | ObjT* alloc() { return nullptr; } 247 | void free(ObjT*) { /* ... */ } 248 | public: 249 | struct PoolDeleter { Pool* pool; void operator()(ObjT* obj) {pool->free(obj);} }; 250 | std::unique_ptr AllocateObject() { 251 | return std::unique_ptr{alloc(), PoolDeleter{.pool = this}}; 252 | } 253 | }; 254 | 255 | ``` 256 | 257 | * `Deleter` of `unique_ptr` should be move-assignable. 258 | 259 | --- 260 | 261 | Quiz: Which line(s) is(are) correct in `foo`? 262 | 263 | ```C++ 264 | class B1 { ~B() { /* ... */ } }; 265 | class D1: public B1 { // ... 266 | ~D1() { /* ... */} 267 | }; 268 | 269 | class B2 { virtual ~B2() { /* ... */ } }; 270 | class D2 final: public B2 { // ... 271 | virtual ~D2() override { /* ... */} 272 | }; 273 | 274 | void foo() { 275 | std::unique_ptr b1 = std::make_unique(); 276 | std::shared_ptr b2 = std::make_shared(); 277 | std::unique_ptr b3 = std::make_unique(); 278 | std::shared_ptr b4 = std::make_shared(); 279 | } 280 | ``` 281 | 282 | --- 283 | 284 | > If `T` is a derived class of some base `B`, then `std::unique_ptr` is implicitly convertible to `std::unique_ptr`. The default deleter of the resulting `std::unique_ptr` will use `operator delete` for `B`, leading to undefined behavior unless the destructor of `B` is `virtual`. 285 | 286 | > Note that `std::shared_ptr` behaves differently: `std::shared_ptr` will use the `operator delete` for the type `T` and the owned object will be deleted correctly even if the destructor of `B` is not `virtual`. 287 | 288 | --- 289 | 290 | Handle with intrusive reference counting object 291 | 292 | ``` C++ 293 | template 294 | class SharedArray { 295 | public: 296 | SharedArray* Create(size_t sz) { 297 | auto pArray = new SharedArray(sz); 298 | pArray->AddRef(); 299 | return pArray; 300 | } 301 | void AddRef() { ++m_referenceCount; } 302 | void Release() { if(--m_referenceCount <= 0) { delete this; } } 303 | private: 304 | explicit SharedArray(size_t sz) { 305 | m_data = new T[sz]; m_size = sz; m_refCnt = 0; 306 | } 307 | T* m_data; 308 | size_t m_size; 309 | size_t m_refCnt; 310 | }; 311 | ``` 312 | 313 | --- 314 | 315 | ``` C++ 316 | // OPT 1: Use boost::intrusive_ptr by add 2 utility functions 317 | template 318 | void intrusive_ptr_add_ref(SharedArray* p) { p->AddRef(); } 319 | void intrusive_ptr_release(SharedArray* p) { p->Release(); } 320 | void Foo() { 321 | // ... 322 | intrusive_ptr pIntArray{new SharedArray()}; 323 | // CTAD (class template argument deduction) from C++17 used 324 | // ... 325 | } 326 | 327 | // OPT 2: Use shared_ptr instead intrusive ref counting 328 | shared_ptr make_shared_from_COM(IWhatever * p){ 329 | p->AddRef(); 330 | shared_ptr pw(p, mem_fn(&IWhatever::Release)); 331 | return pw; 332 | } 333 | 334 | // OPT 3: unique_ptr, similar as above. 335 | // ... 336 | ``` 337 | 338 | --- 339 | 340 | Misc 341 | 342 | * `allocate_shared` 343 | * Create object by user defined allocator 344 | * `shared_ptr` != `const shared_ptr` 345 | * `const T*` and `T* const` 346 | * Why is this form rarely seen? 347 | * Too long 348 | * `std::shared_ptr>>` v.s. `Queue*` 349 | 350 | --- 351 | 352 | 357 | 358 | # 总结 359 | 360 | * 使用*智能指针*管理对象的*生存期*与*所有权* 361 | * 尽管它**不是**一个“开箱即用”的特性 362 | * 需要谨慎思考并合理运用 363 | * 但是和 _裸指针_ 相比仍然好上许多 364 | * 使用`make_*`创建智能指针;并考虑使用 _工厂方法_ 限制对象的创建途径 365 | * 指向*数组*的*智能指针*是有用的,但使用时要注意细节 366 | * 多线程时,要注意 `shared_ptr` 的线程安全性 367 | * 如果要进行指针类型的转换,请使用 `std::*_pointer_cast` 368 | * **不要忘了** 在特定场合 _裸指针_ 和 `weak_ptr` 才是最佳选择 369 | 370 | --- 371 | 380 | 381 | # Q & A 382 | 383 | --- 384 | 385 | 394 | 395 | # Thank you! 396 | -------------------------------------------------------------------------------- /slides/PMC++.4_TextProcessing.2.md: -------------------------------------------------------------------------------- 1 | --- 2 | marp: true 3 | paginate: true 4 | style: | 5 | section { 6 | background-color: #fffff2; 7 | font-family: 'Palatino', 'Charter', 'STHeiti', 'Segoe UI Emoji'; 8 | } 9 | section pre { 10 | font-size: 0.9em; 11 | } 12 | section ul { 13 | font-size: 0.95em; 14 | } 15 | 16 | section iframe { 17 | margin-top: 25px; 18 | width: 100% !important; 19 | height: 90%; 20 | } 21 | 22 | section iframe.h8 { 23 | width: 100% !important; 24 | height: 80%; 25 | } 26 | 27 | section iframe.h10 { 28 | width: 100% !important; 29 | height: 100%; 30 | } 31 | --- 32 | 33 | 34 | # Text Processing II 35 | 36 | 44 | 45 | --- 46 | 55 | 56 | ## 回顾 57 | 58 | String representation 59 | 60 | Manipulation 61 | * Create 62 | * Join and Concatenate 63 | * Split 64 | * Substring 65 | * Trim 66 | * Examples based on C++11/14/17/20/23 with or without third party libraries 67 | * (C++11) *boost string_view/string algorithm/tokenizer*, *fmt* 68 | * (C++14) *abseil*, *range-v3* 69 | 70 | --- 71 | 77 | 78 | ## 回顾 79 | 80 | Don't use C string 81 | 82 | Use standard or third-party `string_view` 83 | 84 | Use the latest C++ standard if possible 85 | 86 | Use 3rd party string libs (boost/folly/abseil/range-v3) to simplify your code 87 | 88 | * C++11: stream + fmt (join, concate, string_view) + boost (split, trim, string_view) 89 | * C++14/17/20: *abseil* or *range-v3* (for join, concate, split, string_view) 90 | 91 | 92 | --- 93 | 102 | 103 | Why is the "manipulation" so complicated? 104 | 105 | “既要… 又要…” 106 | * Intuitive 直观 107 | * Concise 简洁 108 | * Performant 高性能 109 | 110 | --- 111 | 112 | 什么是:直观和简洁 113 | 114 | 1 *Unique-solution* for simple problem 115 | 2 *Flexible/extensible-form* for complex problem 116 | 3 *Completeness* for broad domains 117 | 4 *Significant differences* between usages 118 | 119 | ``` C++ 120 | // Traverse vector's elements. 121 | for(auto v: vec) { doSomething(v); } 122 | for(auto const& v: vec) { doSomething(v); } 123 | for(size_t i = 0; i < vec.size(); ++i) { doSomething(i, vec[i]); } 124 | ``` 125 | 126 | --- 127 | 137 | ``` C++ 138 | // Endswith 139 | EXPECT_EQ( 140 | /* C++11 */ 141 | s.size() >= ending.size() && std::equal(ending.rbegin(), ending.rend(), s.rbegin()), 142 | /* C++20 */ 143 | s.ends_with(ending) 144 | ); 145 | ``` 146 | 147 | All cases in previous charper formed as: 148 | 149 | > In C++11, we should blah blah blah, but in C++14, with library such_a_lib we blah blah blah. 150 | 151 | Have problems 3 and 4. 152 | * *Completeness* for broad domains 153 | * *Clear and definite border* for cases 154 | 155 | --- 156 | 157 | * Manipulation -> How to handle visibility spreading? 158 | * Lifetime management 159 | * Manual / Auto (RAII) 160 | * Visibility spreading 161 | * Duplicate (Value) 162 | * Reference semantic 163 | * Lifetime No-change / Extended / Transferred(Moved) 164 | * Visibility(maybe with Lifetime) Separation of the *Entity* and *Parts of Entity* 165 | 166 | (Unwanted) "*Domain Specific Language*" for everything 167 | 168 | --- 169 | 170 | To our own library design and coding rules 171 | * Layered libs, clear responsibility 172 | * Explicitly lifetime management 173 | * *value* / `*` / `shared_ptr` / `unique_ptr` / `weak_ptr` 174 | * Return by value or *generic* pointer if possible 175 | * Pass by value or reference/pointer 176 | * **NO** pointer calculation 177 | * Performance should be the first and last thing you consider 178 | * In system desgin stage and and pre-release/optimization stage 179 | * **DON'T** focus on perf in sub component level design and impl 180 | 181 | --- 182 | 183 | ## Examination 184 | 185 | 就没有这么事儿 186 | 187 | - get *length* 188 | - *find* (also called *index*) 189 | - *count* 190 | 191 | --- 192 | 193 | 199 | 200 | ### `string` is a *string* 201 | 202 | Operations from sequence types / (contigurous) iterators 203 | * *length* 204 | * *find* (*index*) / *rfind* 205 | * *startswith* 206 | * *endswith* 207 | 208 | Operations are meaningful with writing system 209 | * *isalpha*, *isalnum*, *isascii*, *isdigit* 210 | * *isupper*, *islower*, ... 211 | 212 | --- 213 | 222 | 223 | ### Length 224 | 225 | ``` C++ 226 | char const* cs = "..."; 227 | size_t len = strlen(cs); // cs is const char * 228 | 229 | std::string s; 230 | size_t len = s.length(); // size() is the same 231 | 232 | std::string_view sv; 233 | size_t len = s.length(); // size is the same 234 | ``` 235 | 236 | `Length` is the counted by bytes, not the number of characters. 237 | 238 | ``` C++ 239 | // UTF-8 narrow multibyte encoding 240 | const std::string_view str = "z\u00df\u6c34\U0001f34c"; // or u8"zß水🍌" 241 | std::cout << std::quoted(str) << " is " << str.size() << " bytes."; 242 | 243 | // "zß水🍌" is 10 bytes. 244 | ``` 245 | 246 | --- 247 | 248 | ### *find* 249 | 250 | ``` Python 251 | s = "The quick brown fox jumps over the lazy dog" 252 | k = "fax" 253 | 254 | assert k in s 255 | assert s.index(k) == 0 256 | assert s.find(k) != -1 257 | assert s.find(k, 5, 20) != -1 # find(substr, spos, epos) 258 | ``` 259 | 260 | For C++ 261 | ``` C++ 262 | std::string s = "The quick brown fox jumps over the lazy dog"; 263 | std::string k = "fox"; // OR C++14: auto k{"fax"s}; 264 | ``` 265 | 266 | --- 267 | 268 | ### *find* subsequence 269 | 270 | ``` C++ 271 | // C++11, naive implementation 272 | assert(s.find(k) != string::npos); 273 | assert(s.find(k, 5, 15) != string::npos); // find(substr, spos, len) 274 | 275 | // C++17 276 | assert(std::search(s.begin(), s.end(), k.begin(), k.end()) != s.end()); 277 | // boyer_moore_searcher / boyer_moore_horspool_searcher (C++17) 278 | 279 | // range style: range-v3 + C++14 or C++20 280 | assert(!ranges::search(s, k).empty()); 281 | assert(!ranges::search(string_view(s).substr(5, 15), k).empty()); 282 | 283 | // C++23 contains 284 | assert(!s.contains(k)); 285 | ``` 286 | 287 | --- 288 | 293 | ### *find* subsequence 294 | 295 | **Similar functions discrimination** 296 | 297 | C++11 `std::find` in `` 298 | 299 | * Find an element in sequence 300 | 301 | C++17 `std::includes` 302 | 303 | * Interval in interval 304 | * `vector{1, 2, 3, 4, 7, 11}.includes(vector{3, 7});` <= True 305 | * `vector{1, 2, 3, 4, 7, 11}.includes(vector{3, 6});` <= False 306 | --- 307 | 308 | ### *find* subsequence 309 | 310 | `rfind` 311 | 312 | ``` C++ 313 | // C++11 314 | assert(s.rfind(s, 'o') != string::npos); 315 | assert(s.rfind(s, "o") != string::npos); 316 | 317 | // C++17 find_end 318 | assert( find_end(s.begin(), s.end(), k.begin(), k.end()) != string::npos ); 319 | // ranges ver 320 | assert( !ranges::find_end(s, k).empty() ); 321 | ``` 322 | 323 | `std::string::find` <-> `std::string::rfind` 324 | `std::search` <-> `std::find_end` 325 | 326 | --- 327 | ### *find* element 328 | 329 | `find_if` / `find_first_of` / `find_last_of` 330 | `find_first_not_of` / `find_last_not_of` 331 | 332 | ``` C++ 333 | std::string haystack = "the beatles"; 334 | std::string needles = "abba"; 335 | 336 | find_first_of(haystack.begin(), haystack.end(), needles.begin(), needles.end()); 337 | find_first_of(haystack, needles); 338 | 339 | // Quiz: how about find_last_of? 340 | ``` 341 | 342 | More features - such as "Pred" - Will talk in the chapter "algorithm" 343 | 344 | --- 345 | 351 | ### *count* 352 | 353 | Counting element 354 | 355 | ``` C++ 356 | cout << count(s.begin(), s.end(), 'o') << endl; 357 | cout << ranges::count(s, 'o') << endl; 358 | ``` 359 | 360 | Counting sub-sequence 361 | ``` C++ 362 | // From stackoverflow 363 | while ((pos = s.find(k, pos )) != std::string::npos) { 364 | ++ occurrences; 365 | pos += k.length(); 366 | } 367 | ``` 368 | 369 | * Time complexity? 370 | * How to do better solution? 371 | 372 | --- 373 | 383 | ### *find_all* 384 | 385 | Implements by `find` 386 | 387 | Boost.find_all/ifind_all 388 | 389 | * C++11/14/17 390 | ``` C++ 391 | std::vector> output; 392 | boost::algorithm::find_all(output, s, k); 393 | ``` 394 | * C++20 395 | ``` C++ 396 | std::vector output; 397 | boost::algorithm::find_all(output, s, k); 398 | ``` 399 | 400 | --- 401 | 402 | `starts_with` 403 | 404 | ``` C++ 405 | // C++20 406 | s.starts_with(k); 407 | 408 | // Boost 409 | boost::algorithm::starts_with(s, k); 410 | 411 | // C++11 412 | s.find(k) == 0; 413 | // Quiz: any issue? 414 | ``` 415 | 416 | --- 417 | 418 | `ends_with` 419 | 420 | ``` C++ 421 | // C++11 422 | bool ends_with(std::string_view s, std::string_view ending) { 423 | s.size() >= ending.size() && std::equal(ending.rbegin(), ending.rend(), s.rbegin()); 424 | } 425 | 426 | // Boost 427 | boost::algorithm::end_with(s, k); 428 | 429 | // C++20 430 | s.ends_with(k); 431 | ``` 432 | 433 | --- 434 | 435 | ## Take Away 436 | 437 | * Use member functions in `std::string` / `std::string_view` 438 | * Use ``, but you need to be alert to the misuse. 439 | * Don't forget *Boost.StringAlgorithm*. Find the examples from web. 440 | * The document is not friendly. 441 | 442 | --- 443 | # つづく 444 | 445 | Character systems 446 | 447 | * Comparison of strings will be talked in this charpter -------------------------------------------------------------------------------- /slides/PMC++.4_TextProcessing.3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### 编码字符串到字节流" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 2, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "String : test 测试 テスト\n", 20 | "Encoded by utf-8 : 7465737420e6b58be8af9520e38386e382b9e38388\n", 21 | "Encoded by gbk : 7465737420b2e2cad420a5c6a5b9a5c8\n", 22 | "Encoded by cp936 : 7465737420b2e2cad420a5c6a5b9a5c8\n", 23 | "'cp932' codec can't encode character '\\u6d4b' in position 5: illegal multibyte sequence\n", 24 | "'charmap' codec can't encode characters in position 5-6: character maps to \n" 25 | ] 26 | } 27 | ], 28 | "source": [ 29 | "import typing\n", 30 | "\n", 31 | "def PrintLine(label: str, content: str, labelWidth: int = 16, alignSymbol: str = \"<\"):\n", 32 | " formatString = f\"{{label:{alignSymbol}{labelWidth}}} : {{content}}\"\n", 33 | " ret = formatString.format(label=label, content=content)\n", 34 | " print(ret)\n", 35 | "\n", 36 | "def PrintTextCode(s: str, codePages: typing.List[str]):\n", 37 | " PrintLine(\"String\", s)\n", 38 | " for cp in codePages:\n", 39 | " try:\n", 40 | " PrintLine(f\"Encoded by {cp}\", s.encode(cp).hex())\n", 41 | " except UnicodeEncodeError as e:\n", 42 | " print(str(e))\n", 43 | "\n", 44 | "TEST_CODE_PAGES = ['utf-8', 'gbk', 'cp936', 'cp932', 'cp1252']\n", 45 | "s = \"test 测试 テスト\"\n", 46 | "\n", 47 | "PrintTextCode(s, TEST_CODE_PAGES)" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": {}, 53 | "source": [ 54 | "### '\\u6d4b' 是什么字?" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 3, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "String : 测\n", 67 | "Encoded by utf-8 : e6b58b\n" 68 | ] 69 | } 70 | ], 71 | "source": [ 72 | "PrintTextCode('\\u6d4b', ['utf-8'])" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "### “测”不在日文汉字中吗?" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 4, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "name": "stdout", 89 | "output_type": "stream", 90 | "text": [ 91 | "String : 測\n", 92 | "Encoded by utf-8 : e6b8ac\n", 93 | "Encoded by gbk : 9c79\n", 94 | "Encoded by cp936 : 9c79\n", 95 | "Encoded by cp932 : 91aa\n", 96 | "'charmap' codec can't encode character '\\u6e2c' in position 0: character maps to \n" 97 | ] 98 | } 99 | ], 100 | "source": [ 101 | "PrintTextCode('測', TEST_CODE_PAGES)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "### 应用一下修改" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": 5, 114 | "metadata": {}, 115 | "outputs": [ 116 | { 117 | "name": "stdout", 118 | "output_type": "stream", 119 | "text": [ 120 | "String : test 測試 テスト\n", 121 | "Encoded by utf-8 : 7465737420e6b8ace8a9a620e38386e382b9e38388\n", 122 | "Encoded by gbk : 74657374209c79d48720a5c6a5b9a5c8\n", 123 | "Encoded by cp936 : 74657374209c79d48720a5c6a5b9a5c8\n", 124 | "Encoded by cp932 : 746573742091aa8e8e20836583588367\n", 125 | "'charmap' codec can't encode characters in position 5-6: character maps to \n" 126 | ] 127 | } 128 | ], 129 | "source": [ 130 | "s2 = \"test 測試 テスト\"\n", 131 | "PrintTextCode(s2, TEST_CODE_PAGES)" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "### 乱码是怎么来的?" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 10, 144 | "metadata": {}, 145 | "outputs": [ 146 | { 147 | "name": "stdout", 148 | "output_type": "stream", 149 | "text": [ 150 | "----------------------------------------------------------\n", 151 | "String : test 測試 テスト 시험하는것\n", 152 | "cp932 Enc2Bytes : 74:65:73:74:20:91:aa:8e:8e:20:83:65:83:58:83:67:20:3f:3f:3f:3f:3f\n", 153 | "cp936 Dec2Str : test 應帋 僥僗僩 ?????\n", 154 | "----------------------------------------------------------\n", 155 | "String : test 測試 テスト 시험하는것\n", 156 | "cp936 Enc2Bytes : 74:65:73:74:20:9c:79:d4:87:20:a5:c6:a5:b9:a5:c8:20:3f:3f:3f:3f:3f\n", 157 | "cp932 Dec2Str : test 忱ヤ� ・ニ・ケ・ネ ?????\n", 158 | "----------------------------------------------------------\n", 159 | "String : test 應帋 僥僗僩 ?????\n", 160 | "cp936 Enc2Bytes : 74:65:73:74:20:91:aa:8e:8e:20:83:65:83:58:83:67:20:3f:3f:3f:3f:3f\n", 161 | "cp932 Dec2Str : test 測試 テスト ?????\n", 162 | "----------------------------------------------------------\n", 163 | "String : test 忱ヤ� ・ニ・ケ・ネ ?????\n", 164 | "cp932 Enc2Bytes : 74:65:73:74:20:9c:79:d4:3f:20:a5:c6:a5:b9:a5:c8:20:3f:3f:3f:3f:3f\n", 165 | "cp936 Dec2Str : test 測�? テスト ?????\n", 166 | "----------------------------------------------------------\n", 167 | "String : test 測試 テスト 시험하는것\n", 168 | "utf-8 Enc2Bytes : 74:65:73:74:20:e6:b8:ac:e8:a9:a6:20:e3:83:86:e3:82:b9:e3:83:88:20:ec:8b:9c:ed:97:98:ed:95:98:eb:8a:94:ea:b2:83\n", 169 | "cp1252 Dec2Str : test 測試 テスト 시험하는것\n", 170 | "----------------------------------------------------------\n", 171 | "String : test 測試 テスト 시험하는것\n", 172 | "cp1252 Enc2Bytes : 74:65:73:74:20:e6:b8:ac:e8:a9:a6:20:e3:83:86:e3:82:b9:e3:83:88:20:ec:8b:9c:ed:97:98:ed:95:98:eb:8a:94:ea:b2:83\n", 173 | "utf-8 Dec2Str : test 測試 テスト 시험하는것\n", 174 | "----------------------------------------------------------\n", 175 | "String : 一隻憂鬱的台灣烏龜\n", 176 | "cp950 Enc2Bytes : a4:40:b0:a6:bc:7e:c6:7b:aa:ba:a5:78:c6:57:af:51:c0:74\n", 177 | "cp936 Dec2Str : �@唉紐苳�亥x芖疩纓\n" 178 | ] 179 | } 180 | ], 181 | "source": [ 182 | "s3 = \"test 測試 テスト 시험하는것\"\n", 183 | "def Mojibake(s: str, encCp: str, decCp: str):\n", 184 | " print(\"----------------------------------------------------------\")\n", 185 | " PrintLine(\"String\", s)\n", 186 | " bytes = s.encode(encCp, errors='replace')\n", 187 | " PrintLine(f\"{encCp} Enc2Bytes\", bytes.hex(\":\"))\n", 188 | " newS = bytes.decode(decCp, errors='replace')\n", 189 | " PrintLine(f\"{decCp} Dec2Str\", newS)\n", 190 | " return newS\n", 191 | " \n", 192 | "# 乱码 == Mojibake == 文字化け (Character Transformation)\n", 193 | "moji1 = Mojibake(s3, 'cp932', 'cp936')\n", 194 | "moji2 = Mojibake(s3, 'cp936', 'cp932')\n", 195 | "\n", 196 | "moji3 = Mojibake(moji1, 'cp936', 'cp932')\n", 197 | "moji4 = Mojibake(moji2, 'cp932', 'cp936')\n", 198 | "\n", 199 | "moji5 = Mojibake(s3, 'utf-8', 'cp1252')\n", 200 | "moji6 = Mojibake(moji5, 'cp1252', 'utf-8')\n", 201 | "\n", 202 | "moji7 = Mojibake(\"一隻憂鬱的台灣烏龜\", 'cp950', 'cp936')" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "### “烫烫烫”, “屯屯屯” 与 \"锟斤拷\"" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 9, 215 | "metadata": {}, 216 | "outputs": [ 217 | { 218 | "name": "stdout", 219 | "output_type": "stream", 220 | "text": [ 221 | "String : 烫烫烫烫\n", 222 | "Encoded by utf-8 : e783abe783abe783abe783ab\n", 223 | "Encoded by gbk : cccccccccccccccc\n", 224 | "Encoded by cp936 : cccccccccccccccc\n", 225 | "'cp932' codec can't encode character '\\u70eb' in position 0: illegal multibyte sequence\n", 226 | "'charmap' codec can't encode characters in position 0-3: character maps to \n", 227 | "String : 屯屯屯屯\n", 228 | "Encoded by utf-8 : e5b1afe5b1afe5b1afe5b1af\n", 229 | "Encoded by gbk : cdcdcdcdcdcdcdcd\n", 230 | "Encoded by cp936 : cdcdcdcdcdcdcdcd\n", 231 | "Encoded by cp932 : 93d493d493d493d4\n", 232 | "'charmap' codec can't encode characters in position 0-3: character maps to \n", 233 | "String : 锟斤拷\n", 234 | "Encoded by utf-8 : e9949fe696a4e68bb7\n", 235 | "Encoded by gbk : efbfbdefbfbd\n", 236 | "Encoded by cp936 : efbfbdefbfbd\n", 237 | "'cp932' codec can't encode character '\\u951f' in position 0: illegal multibyte sequence\n", 238 | "'charmap' codec can't encode characters in position 0-2: character maps to \n", 239 | "String : 锘\n", 240 | "Encoded by utf-8 : e99498\n", 241 | "Encoded by gbk : efbb\n", 242 | "Encoded by cp936 : efbb\n", 243 | "'cp932' codec can't encode character '\\u9518' in position 0: illegal multibyte sequence\n", 244 | "'charmap' codec can't encode character '\\u9518' in position 0: character maps to \n" 245 | ] 246 | } 247 | ], 248 | "source": [ 249 | "PrintTextCode(\"烫烫烫烫\", TEST_CODE_PAGES)\n", 250 | "PrintTextCode(\"屯屯屯屯\", TEST_CODE_PAGES)\n", 251 | "PrintTextCode(\"锟斤拷\", TEST_CODE_PAGES)\n", 252 | "PrintTextCode(\"锘\", TEST_CODE_PAGES)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "* `0xCC`: x86/64 asm `int 3` interruption, 未初始化的栈内存会被填充;\n", 260 | "* `0xCD`: MS CRT debug下 `delete`/`free` 之后对内存的标记;\n", 261 | "* `EF:BB` 是 UTF-8 BOM" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "### \"锟斤拷\" 是什么?" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 8, 274 | "metadata": {}, 275 | "outputs": [ 276 | { 277 | "name": "stdout", 278 | "output_type": "stream", 279 | "text": [ 280 | "----------------------------------------------------------\n", 281 | "String : 锟斤拷\n", 282 | "cp936 Enc2Bytes : ef:bf:bd:ef:bf:bd\n", 283 | "utf-8 Dec2Str : ��\n" 284 | ] 285 | } 286 | ], 287 | "source": [ 288 | "moji6 = Mojibake(\"锟斤拷\", 'cp936', 'utf-8')" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "### 其他一些用于填充内存以方便诊断的代码\n", 296 | "* `0xFD`\n", 297 | "* `0xDD`\n", 298 | "* `0xBAADF00D`\n", 299 | "* `0xDEADBEEF`" 300 | ] 301 | } 302 | ], 303 | "metadata": { 304 | "kernelspec": { 305 | "display_name": "Python 3.10.8 ('sandbox')", 306 | "language": "python", 307 | "name": "python3" 308 | }, 309 | "language_info": { 310 | "codemirror_mode": { 311 | "name": "ipython", 312 | "version": 3 313 | }, 314 | "file_extension": ".py", 315 | "mimetype": "text/x-python", 316 | "name": "python", 317 | "nbconvert_exporter": "python", 318 | "pygments_lexer": "ipython3", 319 | "version": "3.10.8 (main, Nov 4 2022, 13:48:29) [GCC 11.2.0]" 320 | }, 321 | "orig_nbformat": 4, 322 | "vscode": { 323 | "interpreter": { 324 | "hash": "fc2932ef53fae6f3da2fcc0bee53f5f2eca7274226af6a8a9e1e6099346e9fe5" 325 | } 326 | } 327 | }, 328 | "nbformat": 4, 329 | "nbformat_minor": 2 330 | } 331 | -------------------------------------------------------------------------------- /slides/PMC++.4_TextProcessing.3.md: -------------------------------------------------------------------------------- 1 | --- 2 | marp: true 3 | paginate: true 4 | style: | 5 | section { 6 | background-color: #fffff2; 7 | font-family: 'Palatino', 'Charter', 'STHeiti', 'Segoe UI Emoji'; 8 | } 9 | section pre { 10 | font-size: 0.9em; 11 | } 12 | section ul { 13 | font-size: 0.95em; 14 | } 15 | 16 | section iframe { 17 | margin-top: 25px; 18 | width: 100% !important; 19 | height: 90%; 20 | } 21 | 22 | section iframe.h8 { 23 | width: 100% !important; 24 | height: 80%; 25 | } 26 | 27 | section iframe.h10 { 28 | width: 100% !important; 29 | height: 100%; 30 | } 31 | --- 32 | 33 | 34 | 35 | 43 | 44 | # Text Processing III 45 | 46 | --- 47 | 48 | ### Looking back 49 | 50 | Part I(Lecture 8-10) 51 | * Agenda of Text Processing 52 | * String Representations in C and C++ 53 | * C styled string, `std::string`, `std::string_view` 54 | * String Manipulation 55 | * Create/Contatenate/Join/Split/... 56 | * STL: `string`, `string_view` (14), `ranges` for `string`(20) 57 | * Libraries: *Boost*, *{fmt}*, (14) *Abseil*, (14) *range-v3* 58 | 59 | --- 60 | 61 | ### Looking back 62 | 63 | Part II (Lecture 11) 64 | * String examination 65 | * Getting string length in byte 66 | * *find* 67 | * `string`: `find / rfind / find_*_of` 68 | * `string`: `(23)contains / (20) starts_with / (20) ends_with` 69 | * `/`: `find / search / includes` 70 | * *Boost.StringAlgorithm* 71 | 72 | --- 73 | 74 | 82 | ### Discuss some topics other than C++ ... 83 | 84 | --- 85 | 86 | ### 字符串(string)、语言(language) 与 文字(writing system, or scripts) 87 | 88 | 字符串是 *字符*(*Character*) 的 “线性存储”(串) 89 | 90 | 主要设计来记录文字 91 | 92 | 但是不仅仅是文字 93 | 94 | *字符* 是经过**编码**的 *信息单位* 95 | 96 | --- 97 | 98 | ### 语言(language) 和 文字(writing system) 99 | 100 | 关联但是并不完全一致的两个系统 101 | 102 | 存在没有文字的语言 103 | * 手语系统 104 | * 几乎所有的上古形制的语言都是先于文字出现的 105 | * 为某种语言,人为发明文字,比如西夏文、伯格里苗文 106 | * 大部分还在活跃中的语言都有对应的文字 107 | 108 | 但是没有语言基础的文字基本不存在 109 | 110 | --- 111 | 112 | 世界上大约有近7000种文字,包括 113 | 114 | * 由历史传承、现在正在使用的(英语、汉语); 115 | * 历史上使用、但是现在已经不再使用的(甲骨文[标准化中]、圣书体[U+13000 + U+1342F]); 116 | * 从其他系统借来的符号(日语汉字、朝鲜语汉字); 117 | * 匹配已有语言系统所设计的书写系统(越南语、汉语拼音、西夏文); 118 | * 为口头语言设计的书写系统(注音苗文); 119 | * 助记系统(纳西语东巴文[U+1AAC0 - U+1AFFF] 120 | * 或许二维码也可以归属到助剂系统中 121 | 122 | --- 123 | 124 | * 语素文字 - 汉字、圣书体、玛雅文、锲形文字 125 | * 表音文字 126 | * 音节文字 - 日文假名、彝文(老头环曾在官方泄露资料中使用彝文作为注音) 127 | * 拼音文字 128 | * 全音素文字 - 拉丁、希腊、西里尔、韩文(谚文) 129 | * 元音附标文字 - 泰文、缅甸文、梵文 130 | * 辅音音素文字 - 阿拉伯、希伯来 131 | * 半音节文字 - 注音符号 132 | 133 | --- 134 | 135 | ### 五大字母系统 136 | 137 | 拉丁系文字 - Latin - Latīnum 138 | 139 | 阿拉伯系文字 - Arabric 140 | أَبْجَدِيَّة عَرَبِيَّة 141 | 142 | 西里尔系文字 - Kirillica - Кири́ллица 143 | 144 | 梵文 - Sanskrit - संस्कृता वाक् 145 | 146 | 汉字 - Chinese Characters - 漢字 147 | 148 | --- 149 | 150 | ### 其它有重要影响力的文字系统 151 | 152 | 腓尼基语 - Phinican - 𐤃𐤁𐤓𐤉𐤌 𐤊𐤍𐤏𐤍𐤉𐤌 153 | 154 | 婆罗米系(波密系)文字 - Brahmic scripts - 𑀩𑁆𑀭𑀸𑀳𑁆𑀫𑀻 𑀮𑀺𑀧𑀺 155 | * 是现代东南亚多种语言的父系语言,比如泰文。 156 | 157 | --- 158 | 159 | ### 需要编码的码位 160 | 161 | * *字素*(*Grapheme*)及其 异写 162 | * 字素是最小可区分的书写单位 163 | * 字素的异写(upper/lower cases, variant chinese characters) 164 | * Upper/lower case 165 | * A / a 166 | * Variant Chinese Characters 167 | * 回[U+56DE],囘[U+56D8], 囬[U+56DC] 168 | 169 | --- 170 | 171 | ### 需要编码的码位 172 | 173 | * 不同语言相同(近)文字是否需要多个码位? 174 | * 不需要 175 | * 汉语拼音声调: a (英语) / ā(拉脱维亚语)/ à(拉丁语系,比如法语、意大利语、葡萄牙语) 176 | * [被统一](https://zh.m.wikipedia.org/zh-hans/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%B5%B1%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97) 177 | * 中日韩统一表意文字(CJK Unified Ideographs) 178 | * 把分别来自中文、日文、韩文、越南文、壮文、琉球文中,起源相同、本义相同、形状一样或稍异的表意文字,在ISO 10646及Unicode标准赋予相同编码 179 | * 表意文字认同原则、原字集分离原则、起源不同原则 180 | 181 | --- 182 | 183 | ### 需要编码的码位 184 | 185 | CJK的优点和缺点 186 | 187 | 不需要仔细区别同一点位文字所属语言,利于搜索 188 | 189 | 同点位文字在不同语言间存在些许字体差异,混排时如果没有特殊的记号或者排版算法,差异难以保留。 190 | 191 | --- 192 | ![](media/hanzi_standard_fonts.png) 193 | 194 | --- 195 | ![](media/hanzi_standard_fonts_marked.png) 196 | 197 | --- 198 | ![](media/zhen.png) 199 | 200 | --- 201 | ### 需要编码的码位 202 | 203 | * 隹 204 | * 鵻 == (尔雅/释鸟)䳕鳩(foujiu) == 鳺(fu)鴀(fou) == 鵓鴣 205 | * 上古读音:bu ku 206 | * 布谷 207 | * 杜鹃 208 | 209 | --- 210 | 215 | ### 需要编码的码位 216 | 217 | * *音素* (*Symbol* in *Syllabus*, *phoneme*) 218 | * *附加符号*(比如*音调*) 219 | * 热҈的҈字҈都҈出҈汗҈了҈ 220 | * U+0488,西里尔文修饰符` ҈` 221 | * *符号* (*Symbol*) 222 | * Emoji 223 | * 😂 [U+1F602] 224 | * 六十四卦 225 | * ䷰(U+4DF0,革)䷱(U+4FD1,鼎) 226 | * 制表符号 227 | * ├┴┬─┼ 228 | 229 | --- 230 | ### 需要编码的码位 231 | 232 | * *空白* (*Whitespace*) 或 *标点符号*(*Punctuation*) 233 | * *控制字符* (*Control characters*) 234 | * 例如:CR符 [U+000D] 235 | * 其他 236 | * 会在讨论Unicode的编码平面中具体介绍 237 | --- 238 | 239 | ### 不需要编码的部分 240 | 241 | * 字体/字形 242 | * 与异体字并非泾渭分明 243 | * 对于已经不再流通且难以溯源的古文字来说,字体和独立文字的区别相对难以区分,比如甲骨文 244 | * 隶体/楷体(异体字) -> 行书/草书(字体) -> 简化字(异体字) 245 | * 排版/书写顺序 246 | * LTR 247 | * 竖向排版 - 汉语、日语 248 | * RTL(Right to Left) - 阿拉伯语 249 | 250 | --- 251 | 252 | ### 不需要编码的部分 253 | 254 | * 特定的符号组合 255 | * 比如西里尔字母修饰、泰语修饰和基本字母间仅存在有限组合 256 | * Unicode并未一组合一码 257 | * 设计为正交编码系统,以节省码位 258 | * 复杂情况的字符排版由CTL(Complex Text Layout)决定 259 | * 天城文合字(Ligature): द + ् + ध + ् + र + ् + य = द्ध्र्य 260 | * 合字也可以由字体决定 - [Jetbrains Mono](https://www.jetbrains.com/lp/mono/) 261 | * 会导致超出原应用范围的文字渲染出现,比如泰语修饰符的无限叠加 262 | * 例如:ส็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็็ 263 | * 但是实际上在语言中并不存在这个文字 264 | 265 | --- 266 | 267 | ### Quiz:全世界的“盲文”是否一样? 268 | 269 | --- 270 | 271 | ### 盲文 272 | 273 | 由法国人 Braille 发明的语言。 274 | 类比于 ASCII 的编码系统 —— 存储、解码均只有有限的分辨率。 275 | 276 | 对于Latin系文字,对应26个字母。各语言在其音符、联拼的变种。 277 | 278 | 对于汉语,是表音文字(类似拼音);阿拉伯数字同英文盲文,选择了 Braille 式。 279 | 280 | 日本汉字盲文按照字根设计。 281 | 282 | Unicode: +2800 to +28FF 283 | 284 | --- 285 | 286 | ### 从文字到渲染 287 | 288 | 文字系统 289 | -> 字元 290 | -> 字符集(字元->顺序) 291 | -> 编码方案/编码(整数->字节) 292 | -> 字体查找 -> Fallback 293 | -> 字形查找 -> Fallback 294 | -> 字形渲染 (Glyph processing) 295 | 296 | 1 [Microsoft Typography documentation](https://learn.microsoft.com/en-us/typography/) 297 | 2 [IDWriteFontFallback::MapCharacters method (dwrite_2.h)](https://learn.microsoft.com/en-us/windows/win32/api/dwrite_2/nf-dwrite_2-idwritefontfallback-mapcharacters) 298 | 299 | --- 300 | 301 | 字符集 302 | 303 | * 字符的合集,为每个字符提供唯一的表示(不一定是一个整数) 304 | * JIS X 0208-1990中,“工”字位于25区09点 305 | 306 | 编码方案 307 | 308 | * EUC-JP: 双字节编码 - `{区+160, 点+160}` 309 | * Shift_JIS: 双字节编码 - `{...编码很复杂..., ...编码也很复杂...}` 310 | * 「区」が01~61の奇数の場合:「区」に257を加えて2で割った値を1バイト目とします。 「点」が01~63なら63を加えた値を、さもなくば64を加えた値を2バイト目とします。 311 | * 兼容 JIS X 0208-1990; 0208-1983; JIS C 6226-1978 字符集 312 | 313 | --- 314 | 315 | 325 | ### 前 Unicode 326 | 327 | 不同系统、不同语言、不同地区均可能有各自独立的编码标准 328 | 329 | (US-)ASCII (American Standard Code for Information Interchange) 330 | * Control code 331 | * `00` NULL / `07` BELL 332 | * `09` HT/SK (Horizontal Tab) / `0B` VTAB (Vertical Tab) 333 | * `0A` LF (Line Feed) / `0D` CR (Carriage Return) 334 | * Printable Characters 335 | * Letters, Digits, Prunctions, and other symbols 336 | 337 | --- 338 | ### 前 Unicode 339 | 340 | ISCII (Indian -) 341 | 342 | ArmSCII (Armenian -) 343 | 344 | PASCII(Perso-Arbic -) 345 | 346 | ATASCII (Atari), ... 347 | 348 | --- 349 | 350 | ### 前 Unicode 351 | GB/T 2312-1980 (GB2312) 国标/推荐(非强制) 352 | * Hanzi 353 | * Non-Hanzi 354 | * Punctuation/Symbols, List Markers, ISO 646-CN, Hiragana(平假), Katakana(片假), Greek and vertical, Cyrillic, zhuyin and non-ASCII pinyin, box drawing 355 | * 扩展及实现 356 | * Encoding: EUC-CN, ISO-2022-CN, HZ 357 | * Succeed/Extension: GB/T 12345, GBK, GB18030 358 | 359 | --- 360 | 361 | ### 前 Unicode 362 | 363 | GBK 364 | * 扩展及实现 365 | * (Windows) CP936 366 | * GBK最广泛使用的编码方法 367 | * (IBM) CP1386/IBM-1386 368 | * IBM's CP936 is also a Simplified Chinese code page but unused 369 | 370 | --- 371 | 372 | ### 前Unicode 373 | 374 | BIG5 375 | * 大五码,港澳台使用的繁体字的编码标准 376 | * 扩展及实现 377 | * (Windows) CP950 378 | 379 | --- 380 | 388 | ### 前 Unicode 389 | 390 | Shift-JIS 391 | 392 | ![](media/Euler_diag_for_jp_charsets.svg) 393 | 394 | --- 395 | 396 | ### 前 Unicode 397 | 398 | ISO/IEC-8859 西欧字符集 399 | * 1: 29 Languages, includes English, Spanish, Italian, PorNorwegian, Irish, ... 400 | * 15: Superset of 1, more languages/more rare characters supported, for e.g. Š in Finnish 401 | 402 | Most Popular Implementation/Extension: Windows-1252 (CP1252) 403 | 404 | --- 405 | 406 | ### 乱码 407 | 408 | ![](media/Lightmatter_panda.jpg) 409 | 410 | --- 411 | 416 | ### 乱码 417 | 418 | 文字系统(writing system) -> 字元/符号(grapheme/symbol) 419 | 420 | -> 编码字符集(character set)(字元->序列) 421 | 422 | -> 编码方案/编码(序列->字节) 423 | 424 | -> 字体查找 -> 字形查找 -> Fallback (e.g. IDWriteFontFallback) 425 | 426 | -> 字形渲染 (Glyph processing) 427 | 428 | --- 429 | 432 | 433 | 字形渲染是个大话题: 434 | 435 | 需要处理 位置、尺寸、基线、肩高、上下标、对齐、微调、抗锯齿、连字(Ligature)、合字(部分支持) 等诸多特性。 436 | 437 | ![](https://github.com/microsoft/cascadia-code/blob/main/images/ligatures.png?raw=true) 438 | 439 | 440 | --- 441 | 442 | ### 乱码 443 | 444 | 1 应用程序与数据的编码格式/字符集不同 445 | 446 | 2 缺少字体文件 447 | 448 | 3 应用程序自身的错误 449 | 450 | 4 错误的将一般的二进制数据理解为文字 451 | 452 |
453 | 454 | * **示例代码** 455 | 456 | --- 457 | 458 | ### 乱码的自动甄别工具如何识别乱码? 459 | 460 | * 基于统计理论 —— 看起来像 461 | 462 | * 适用于整段文字隶属同一字符集 463 | 464 | --- 465 | 469 | 470 | ### Unicode 471 | 472 | 解决混排文字无法 Trivial 表达的问题 473 | * 比如一个数组中同时存有汉语、韩语、越南文字,要如何表达? 474 | * Unicode之前并不存在这样的单一代码页。 475 | 476 | CJK文字的跨语言搜索 477 | 478 | 解决罕用字符的表达点位 479 | * 否则需要开个谁都不会去支持的代码页 480 | * 甚至连跨系统的Emoji都不会有 481 | 482 | 灵活且标准的编码形式 —— 可变单字节,可变双字节,定长四字符,应有尽有 483 | 484 | --- 485 | 486 | ### Unicode 码位概览 487 | 488 | UCS-2/4 489 | 490 | 491 | 492 | --- 493 | 494 | ### UTF-16 495 | 496 | 在BMP上,等价于UCS-2。或者说 —— UCS-2 就是不支持 Surrogate Pair 的 UTF-16 497 | 498 | --- 499 | 508 | 509 | ### 下期预告 510 | 511 | UTF: Unicode码位的编码 512 | * UTF-8/16/32 513 | 514 | C++ 中的字符表达 515 | * `char`; `wchar_t`; 516 | * `char8_t`; `char16_t`; `char32_t`; 517 | * `u8"测试"`, `u"测试"`, `U"测试"`, `L"测试"`, `_T("测试")` 518 | 519 | 编码字符集转换 520 | 521 | 多字节字符的处理 -------------------------------------------------------------------------------- /slides/PMC++.drawio: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | -------------------------------------------------------------------------------- /slides/media/Euler_diag_for_jp_charsets.svg: -------------------------------------------------------------------------------- 1 | 2 | 6 | Relationships between Japanese-Related Character Sets 7 | 8 | Euler diagram explaining the relationship between character sets mainly related to Japanese. 9 | 10 | 11 | 15 | 16 | 日本語関連の文字集合の関係 17 | Relationships between Japanese-Related Character Sets 18 | 19 | 20 | 21 | text/x-wiki 22 | 23 | 主に[[w:ja:日本語|日本語]]に関連した[[w:ja:文字集合|文字集合]]同士の関係性を 24 | [[w:ja:オイラー図|オイラー図]]を用いて説明する。 25 | 26 | 各集合の色と文字集合との対応は次の通り。 27 | * {{color|w|'''紅'''|bg=magenta}}: [[w:ja:JIS X 0208|JISX0208]]-1990 28 | * {{color|b|'''黄'''|bg=yellow }}: [[w:ja:Microsoftコードページ932|Windows-31J]] 29 | * {{color|w|'''青'''|bg=blue }}: [[w:ja:JIS X 0213|JISX0213]]-2004 30 | * {{color|b|'''藍'''|bg=aqua }}: [[w:ja:マイクロソフト標準キャラクタセット|Microsoft標準日本語文字セット]]Ver.3 31 | * {{color|w|'''緑'''|bg=green }}: [[w:ja:JIS X 0212|JISX0212]]-1990 32 | * {{color|w|'''赤'''|bg=red }}: [[w:ja:Unicode|Unicode]] 5.0 33 | 34 | 35 | 36 | text/x-wiki 37 | 38 | [[w:Euler diagram|]] explaining the relationship between 39 | [[w:character sets|]] mainly related to [[w:Japanese|]]. 40 | 41 | The correspondence between the color of each set and the character set 42 | is described below. 43 | * {{color|w|'''Magenta'''|bg=magenta}}: [[w:ja:JIS X 0208|JISX0208]]-1990 44 | * {{color|b|'''Yellow'''|bg=yellow }}: [[w:ja:Microsoftコードページ932|Windows-31J]] 45 | * {{color|w|'''Blue'''|bg=blue }}: [[w:ja:JIS X 0213|JISX0213]]-2004 46 | * {{color|b|'''Aqua'''|bg=aqua }}:<!-- 47 | -->[[w:ja:マイクロソフト標準キャラクタセット|Microsoft Standard Japanese Character Set]] Ver.3 48 | * {{color|w|'''Green'''|bg=green }}: [[w:ja:JIS X 0212|JISX0212]]-1990 49 | * {{color|w|'''Red'''|bg=red }}: [[w:ja:Unicode|Unicode]] 5.0 50 | 51 | 52 | 53 | 2019-09-11 54 | StillImage 55 | image/svg+xml 56 | 58 | 60 | mul 61 | 62 | 63 | 64 | 65 | 66 | 72 | 73 | 77 | 78 | Unicode 5.0 79 | 83 | 87 | 91 | 92 | 赤:Unicode 5.0 95 | Red: Unicode 5.0 98 | 99 | 100 | 101 | 102 | Microsoft標準日本語文字セットVer.3 103 | MS Std JPN Charset, Ver.3 104 | 108 | 112 | 116 | 117 | 藍:Microsoft標準日本語文字セットVer.3 120 | Aqua: MS Std JPN Charset, Ver.3 123 | 124 | 125 | 126 | 127 | JISX0213-2004 128 | 132 | 136 | 140 | 141 | 青:JISX0213-2004 144 | Blue: JISX0213-2004 147 | 148 | 149 | 150 | 151 | JISX0212-1990 152 | 156 | 160 | 164 | 165 | 緑:JISX0212-1990 168 | Green: JISX0212-1990 171 | 172 | 173 | 174 | 175 | Windows-31J 176 | 180 | 184 | 188 | 189 | 黄:Windows-31J 192 | Yellow: Windows-31J 195 | 196 | 197 | 198 | 199 | JISX0208-1990 200 | 204 | 208 | 212 | 213 | 紅:JISX0208-1990 216 | Magenta: JISX0208-1990 219 | 220 | 221 | 222 | 230 | 231 | 232 | -------------------------------------------------------------------------------- /slides/media/Lightmatter_panda.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/Lightmatter_panda.jpg -------------------------------------------------------------------------------- /slides/media/SPLayout.svg: -------------------------------------------------------------------------------- 1 |
Stack
Stack
Heap
Heap
CB
CB
OBJ
OBJ
CB
CB
OBJ
OBJ
pCB
pCB
pCB
pCB
pObj
pObj
pObj
pObj
spDog1/spDog2
spDog1/spDog2
spDog3
spDog3
Viewer does not support full SVG 1.1
-------------------------------------------------------------------------------- /slides/media/SharedPtrLifetime.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/SharedPtrLifetime.png -------------------------------------------------------------------------------- /slides/media/UTF-16 Sample.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/UTF-16 Sample.png -------------------------------------------------------------------------------- /slides/media/Venn_diagram_gr_la_ru.svg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/Venn_diagram_gr_la_ru.svg.png -------------------------------------------------------------------------------- /slides/media/fck.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/fck.gif -------------------------------------------------------------------------------- /slides/media/hanzi_standard_fonts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/hanzi_standard_fonts.png -------------------------------------------------------------------------------- /slides/media/hanzi_standard_fonts_marked.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/hanzi_standard_fonts_marked.png -------------------------------------------------------------------------------- /slides/media/notsimple.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/notsimple.jpg -------------------------------------------------------------------------------- /slides/media/python-cyclic-gc-5-new-page.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/python-cyclic-gc-5-new-page.png -------------------------------------------------------------------------------- /slides/media/tutorial.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/tutorial.png -------------------------------------------------------------------------------- /slides/media/zhen.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/zhen.png -------------------------------------------------------------------------------- /slides/media/小问号.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wuye9036/PracticalModernCpp/38ee4a09f34bd260f6783db03b3fdef39090396f/slides/media/小问号.jpeg --------------------------------------------------------------------------------