├── .editorconfig ├── .github └── workflows │ └── ci.yml ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE-APACHE ├── LICENSE-MIT ├── README.md ├── book.toml ├── history.txt └── src ├── SUMMARY.md ├── benchmarking.md ├── benchmarking_zh.md ├── bounds-checks.md ├── bounds-checks_zh.md ├── build-configuration.md ├── build-configuration_zh.md ├── compile-times.md ├── compile-times_zh.md ├── general-tips.md ├── general-tips_zh.md ├── hashing.md ├── hashing_zh.md ├── heap-allocations.md ├── heap-allocations_zh.md ├── inlining.md ├── inlining_zh.md ├── introduction.md ├── introduction_zh.md ├── io.md ├── io_zh.md ├── iterators.md ├── iterators_zh.md ├── linting.md ├── linting_zh.md ├── logging-and-debugging.md ├── logging-and-debugging_zh.md ├── machine-code.md ├── machine-code_zh.md ├── parallelism.md ├── parallelism_zh.md ├── profiling.md ├── profiling_zh.md ├── standard-library-types.md ├── standard-library-types_zh.md ├── title-page.md ├── type-sizes.md ├── type-sizes_zh.md ├── wrapper-types.md └── wrapper-types_zh.md /.editorconfig: -------------------------------------------------------------------------------- 1 | [*.md] 2 | max_line_length = 79 3 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | 3 | on: 4 | pull_request: 5 | push: 6 | branches: 7 | - master 8 | 9 | jobs: 10 | test_and_maybe_deploy: 11 | runs-on: ubuntu-latest 12 | steps: 13 | - name: Clone repository 14 | uses: actions/checkout@v3 15 | 16 | - name: Setup mdbook 17 | uses: peaceiris/actions-mdbook@v1 18 | with: 19 | mdbook-version: 'latest' 20 | 21 | # EPUB 22 | # Currently disabled due to 23 | # https://github.com/nnethercote/perf-book/actions/runs/6358429874/job/17270643057 24 | #- name: Setup mdbook-epub 25 | # run: cargo install mdbook-epub 26 | 27 | - name: Build 28 | run: mdbook build 29 | 30 | - name: Test 31 | run: mdbook test 32 | 33 | # EPUB 34 | #- name: Copy ePub 35 | # run: cp book/epub/The\ Rust\ Performance\ Book.epub book/html 36 | 37 | - name: Deploy 38 | uses: peaceiris/actions-gh-pages@v3 39 | with: 40 | github_token: ${{ secrets.GITHUB_TOKEN }} 41 | #publish_dir: ./book/html # use if EPUB is enabled 42 | publish_dir: ./book # use if EPUB is disabled 43 | # Only deploy on a push to master, not on a pull request. 44 | if: github.event_name == 'push' && github.ref == 'refs/heads/master' && github.repository == 'Blues-star/perf-book-zh' 45 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | book 2 | 3 | # Prevent Vim swap files from making `mdbook serve` regenerate HTML frequently. 4 | *.sw* 5 | 6 | # Also `diff` files, which I generate a lot. 7 | diff 8 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # The Rust Performance Book Code of Conduct 2 | 3 | This repository uses the [Rust Code of Conduct]. 4 | 5 | [Rust Code of Conduct]: https://www.rust-lang.org/conduct.html 6 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to The Rust Performance Book 2 | 3 | Please follow these style guidelines when contributing to the book. 4 | 5 | ## Line Lengths 6 | 7 | Lines of text are limited to 79 characters. (There is a `.editorconfig` file 8 | that specifies this.) Lines containing non-text elements, such as links, can be 9 | longer. 10 | 11 | ## Examples 12 | 13 | Links to examples that demonstrate performance techniques on real-world 14 | programs are encouraged. These examples might be pull requests, blog posts, 15 | etc. 16 | 17 | Single examples are written like this: 18 | ```markdown 19 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe). 20 | ``` 21 | 22 | Multiple examples are written like this: 23 | ```markdown 24 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323), 25 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7). 26 | ``` 27 | 28 | ## Title Style 29 | 30 | Section titles are capitalized, which means that all words within the title are 31 | capitalized, other than "small" words such as conjunctions. For example, "Using 32 | an Alternative Allocator", rather than "Using an alternative allocator". 33 | 34 | ## External Link Style 35 | 36 | For external links—those that point outside the book—reference links are 37 | preferred to inline links. For example, this: 38 | ```markdown 39 | The book's title is [The Rust Performance Book]. 40 | 41 | [The Rust Performance Book]: https://nnethercote.github.io/perf-book/ 42 | ``` 43 | is preferred to this: 44 | ```markdown 45 | The book's title is [The Rust Performance Book](https://nnethercote.github.io/perf-book/). 46 | ``` 47 | The reason for this preference is that external links are usually relatively 48 | long, and long inline links often break awkwardly across lines. 49 | 50 | One exception to this rule is that **Example** links are inline, with each one 51 | put on its own line, as seen above. 52 | -------------------------------------------------------------------------------- /LICENSE-APACHE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | -------------------------------------------------------------------------------- /LICENSE-MIT: -------------------------------------------------------------------------------- 1 | Permission is hereby granted, free of charge, to any 2 | person obtaining a copy of this software and associated 3 | documentation files (the "Software"), to deal in the 4 | Software without restriction, including without 5 | limitation the rights to use, copy, modify, merge, 6 | publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software 8 | is furnished to do so, subject to the following 9 | conditions: 10 | 11 | The above copyright notice and this permission notice 12 | shall be included in all copies or substantial portions 13 | of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF 16 | ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED 17 | TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 18 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 19 | SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 20 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR 22 | IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 23 | DEALINGS IN THE SOFTWARE. 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # perf-book—zh 2 | 3 | RUST性能手册中文版 4 | 5 | ## 查看 6 | 7 | 英文版在线查看 [here](https://nnethercote.github.io/perf-book/). 8 | 中文版在线查看 [here](https://blues-star.github.io/perf-book-zh/) 9 | 10 | ## 构建 11 | 12 | 本书使用 [`mdbook`](https://github.com/rust-lang/mdBook) 构建, mdbook可以用以下命令安装: 13 | ``` 14 | cargo install mdbook 15 | ``` 16 | 运行以下命令以编译本书: 17 | ``` 18 | mdbook build 19 | ``` 20 | 生成的文件将被保存在`\book`目录. 21 | 22 | ## 开发 23 | 24 | To view the built book, run this command: 25 | ``` 26 | mdbook serve 27 | ``` 28 | This will launch a local web server to serve the book. View the built book by 29 | navigating to `localhost:3000` in a web browser. While the web server is 30 | running, the rendered book will automatically update if the book's files 31 | change. 32 | 33 | To test the code within the book, run this command: 34 | ``` 35 | mdbook test 36 | ``` 37 | 38 | ## Improvements 39 | 40 | Suggestions for improvements are welcome, but I prefer them to be filed as 41 | issues rather than pull requests. This is because I am very particular about 42 | the wording used in the book. When pull requests are made, I typically take the 43 | underlying idea of a pull request and rewrite it into my own words anyway. 44 | 45 | ## License 46 | 47 | Licensed under either of 48 | * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or 49 | http://www.apache.org/licenses/LICENSE-2.0) 50 | * MIT license ([LICENSE-MIT](LICENSE-MIT) or 51 | http://opensource.org/licenses/MIT) 52 | 53 | at your option. 54 | 55 | ## Contribution 56 | 57 | Unless you explicitly state otherwise, any contribution intentionally submitted 58 | for inclusion in the work by you, as defined in the Apache-2.0 license, shall 59 | be dual licensed as above, without any additional terms or conditions. 60 | -------------------------------------------------------------------------------- /book.toml: -------------------------------------------------------------------------------- 1 | [book] 2 | title = "The Rust Performance Book" 3 | authors = ["Nicholas Nethercote"] 4 | src = "src" 5 | language = "zh" 6 | multilingual = false 7 | 8 | [build] 9 | create-missing = false 10 | 11 | [rust] 12 | edition = "2018" 13 | 14 | [output.html] 15 | curly-quotes = true 16 | default-theme = "rust" 17 | git-repository-url = "https://github.com/Blues-star/perf-book-zh/" 18 | site-url = "https://blues-star.github.io/perf-book-zh/" 19 | -------------------------------------------------------------------------------- /history.txt: -------------------------------------------------------------------------------- 1 | modified: .github/workflows/ci.yml 2 | modified: .gitignore 3 | modified: src/benchmarking.md 4 | new file: src/bounds-checks.md 5 | modified: src/build-configuration.md 6 | modified: src/compile-times.md 7 | modified: src/general-tips.md 8 | modified: src/hashing.md 9 | modified: src/heap-allocations.md 10 | modified: src/inlining.md 11 | modified: src/introduction.md 12 | modified: src/io.md 13 | modified: src/iterators.md 14 | modified: src/linting.md 15 | modified: src/logging-and-debugging.md 16 | modified: src/machine-code.md 17 | modified: src/profiling.md 18 | modified: src/standard-library-types.md 19 | modified: src/type-sizes.md -------------------------------------------------------------------------------- /src/SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | [Title Page](title-page.md) 4 | 5 | - [简介](introduction_zh.md) 6 | - [基准分析](benchmarking_zh.md) 7 | - [构建配置](build-configuration_zh.md) 8 | - [Linting](linting_zh.md) 9 | - [Profiling](profiling_zh.md) 10 | - [初始化](inlining_zh.md) 11 | - [哈希](hashing_zh.md) 12 | - [堆分配](heap-allocations_zh.md) 13 | - [类型大小](type-sizes_zh.md) 14 | - [标准库类型](standard-library-types_zh.md) 15 | - [迭代器](iterators_zh.md) 16 | - [边界检查](bounds-checks_zh.md) 17 | - [I/O](io_zh.md) 18 | - [日志和调试](logging-and-debugging_zh.md) 19 | - [封装类型](wrapper-types_zh.md) 20 | - [机器码](machine-code_zh.md) 21 | - [并行](parallelism_zh.md) 22 | - [一般提示](general-tips_zh.md) 23 | - [编译时间](compile-times_zh.md) 24 | 25 | -------------------------------------------------------------------------------- /src/benchmarking.md: -------------------------------------------------------------------------------- 1 | # Benchmarking 2 | 3 | Benchmarking typically involves comparing the performance of two or more 4 | programs that do the same thing. Sometimes this might involve comparing two or 5 | more different programs, e.g. Firefox vs Safari vs Chrome. Sometimes it 6 | involves comparing two different versions of the same program. This latter case 7 | lets us reliably answer the question "did this change speed things up?" 8 | 9 | Benchmarking is a complex topic and a thorough coverage is beyond the scope of 10 | this book, but here are the basics. 11 | 12 | First, you need workloads to measure. Ideally, you would have a variety of 13 | workloads that represent realistic usage of your program. Workloads using 14 | real-world inputs are best, but [microbenchmarks] and [stress tests] can be 15 | useful in moderation. 16 | 17 | [microbenchmarks]: https://stackoverflow.com/questions/2842695/what-is-microbenchmarking 18 | [stress tests]: https://en.wikipedia.org/wiki/Stress_testing_(software) 19 | 20 | Second, you need a way to run the workloads, which will also dictate the 21 | metrics used. 22 | - Rust's built-in [benchmark tests] are a simple starting point, but they use 23 | unstable features and therefore only work on nightly Rust. 24 | - [Criterion] and [Divan] are more sophisticated alternatives. 25 | - [Hyperfine] is an excellent general-purpose benchmarking tool. 26 | - Custom benchmarking harnesses are also possible. For example, [rustc-perf] is 27 | the harness used to benchmark the Rust compiler. 28 | 29 | [benchmark tests]: https://doc.rust-lang.org/nightly/unstable-book/library-features/test.html 30 | [Criterion]: https://github.com/bheisler/criterion.rs 31 | [Divan]: https://github.com/nvzqz/divan 32 | [Hyperfine]: https://github.com/sharkdp/hyperfine 33 | [rustc-perf]: https://github.com/rust-lang/rustc-perf/ 34 | 35 | When it comes to metrics, there are many choices, and the right one(s) will 36 | depend on the nature of the program being benchmarked. For example, metrics 37 | that make sense for a batch program might not make sense for an interactive 38 | program. Wall-time is an obvious choice in many cases because it corresponds to 39 | what users perceive. However, it can suffer from high variance. In particular, 40 | tiny changes in memory layout can cause significant but ephemeral performance 41 | fluctuations. Therefore, other metrics with lower variance (such as cycles or 42 | instruction counts) may be a reasonable alternative. 43 | 44 | Summarizing measurements from multiple workloads is also a challenge, and there 45 | are a variety of ways to do it, with no single method being obviously best. 46 | 47 | Good benchmarking is hard. Having said that, do not stress too much about 48 | having a perfect benchmarking setup, particularly when you start optimizing a 49 | program. Mediocre benchmarking is far better than no benchmarking. Keep an open 50 | mind about what you are measuring, and over time you can make benchmarking 51 | improvements as you learn about the performance characteristics of your 52 | program. 53 | -------------------------------------------------------------------------------- /src/benchmarking_zh.md: -------------------------------------------------------------------------------- 1 | # Benchmarking 2 | 3 | 基准测试通常涉及比较执行相同任务的两个或多个程序的性能。有时可能涉及比较两个或多个不同的程序,例如 `Firefox` vs `Safari` vs `Chrome`。有时涉及比较同一程序的两个不同版本。后一种情况让我们能够可靠地回答问题“这个变化是否加快了速度?” 4 | 5 | 基准测试是一个复杂的主题,全面覆盖超出了本书的范围,但以下是基础知识。 6 | 7 | 首先,您需要工作负载来进行测量。理想情况下,您会有各种代表程序实际使用情况的工作负载。使用真实世界输入的工作负载最好,但[microbenchmarks]和[压力测试]在适度的情况下也是有用的。 8 | 9 | [microbenchmarks]: https://stackoverflow.com/questions/2842695/what-is-microbenchmarking 10 | [压力测试]: https://en.wikipedia.org/wiki/Stress_testing_(software) 11 | 12 | 其次,您需要一种运行工作负载的方式,这也将决定所使用的度量标准。 13 | 14 | Rust 内置的[benchmark tests]是一个简单的起点,但它们使用不稳定的功能,因此仅适用于夜间版的 Rust。 15 | [Criterion] 和 [Divan] 是更复杂的替代方案。 16 | [Hyperfine] 是一个出色的通用基准测试工具。 17 | 也可以使用自定义基准测试工具。例如,[rustc-perf] 是用于对 Rust 编译器进行基准测试的工具。 18 | 19 | [benchmark tests]: https://doc.rust-lang.org/nightly/unstable-book/library-features/test.html 20 | [Criterion]: https://github.com/bheisler/criterion.rs 21 | [Divan]: https://github.com/nvzqz/divan 22 | [Hyperfine]: https://github.com/sharkdp/hyperfine 23 | [rustc-perf]: https://github.com/rust-lang/rustc-perf/ 24 | 25 | 在度量标准方面,有许多选择,选择合适的度量标准取决于正在进行基准测试的程序的性质。例如,对于`批处理程序`(batch program)有意义的度量标准可能对`交互式程序`(interactive program)没有意义。在许多情况下,`Wall-time`是一个显而易见的选择,因为它对应于用户的感知。然而,它可能受到高方差的影响。特别是,内存布局中微小的变化可能导致显著但短暂的性能波动。因此,具有较低方差的其他度量标准(如周期cycles或指令计数)可能是一个合理的替代方案。 26 | 27 | 总结来自多个工作负载的测量结果也是一个挑战,有许多方法可以做到这一点,没有一种方法显然是最好的。 28 | 29 | 良好的基准测试很困难。话虽如此,在拟进行程序优化时,不要过分强调拥有完美的基准测试设置,尤其是在开始优化程序时。一般的基准测试要比没有基准测试好得多。保持对您正在测量的内容开放的态度,随着时间的推移,您可以根据了解到的程序性能特征进行基准测试改进。 30 | 31 | -------------------------------------------------------------------------------- /src/bounds-checks.md: -------------------------------------------------------------------------------- 1 | # Bounds Checks 2 | 3 | By default, accesses to container types such as slices and vectors involve 4 | bounds checks in Rust. These can affect performance, e.g. within hot loops, 5 | though less often than you might expect. 6 | 7 | There are several safe ways to change code so that the compiler knows about 8 | container lengths and can optimize away bounds checks. 9 | 10 | - Replace direct element accesses in a loop by using iteration. 11 | - Instead of indexing into a `Vec` within a loop, make a slice of the `Vec` 12 | before the loop and then index into the slice within the loop. 13 | - Add assertions on the ranges of index variables. 14 | [**Example 1**](https://github.com/rust-random/rand/pull/960/commits/de9dfdd86851032d942eb583d8d438e06085867b), 15 | [**Example 2**](https://github.com/image-rs/jpeg-decoder/pull/167/files). 16 | 17 | Getting these to work can be tricky. The [Bounds Check Cookbook] goes into more 18 | detail on this topic. 19 | 20 | [Bounds Check Cookbook]: https://github.com/Shnatsel/bounds-check-cookbook/ 21 | 22 | As a last resort, there are the unsafe methods [`get_unchecked`] and 23 | [`get_unchecked_mut`]. 24 | 25 | [`get_unchecked`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked 26 | [`get_unchecked_mut`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked_mut 27 | 28 | -------------------------------------------------------------------------------- /src/bounds-checks_zh.md: -------------------------------------------------------------------------------- 1 | # Bounds Checks 2 | 3 | 默认情况下,在 Rust 中,对切片和向量等容器类型的访问涉及边界检查。这可能会影响性能,例如在热循环中,尽管发生的频率可能不如您所预期的那样频繁。 4 | 5 | 有几种安全的方法可以更改代码,以便编译器了解容器的长度并优化掉边界检查。 6 | 7 | - 在循环中,通过迭代替换直接元素访问。 8 | - 在循环中,不要对 Vec 进行索引,而是在循环之前创建 Vec 的切片,然后在循环中对切片进行索引。 9 | - 在索引变量的范围上添加断言。 10 | [**Example 1**](https://github.com/rust-random/rand/pull/960/commits/de9dfdd86851032d942eb583d8d438e06085867b), 11 | [**Example 2**](https://github.com/image-rs/jpeg-decoder/pull/167/files). 12 | 13 | 让这些方法起作用可能有些棘手。[Bounds Check Cookbook]对这个主题进行了更详细的介绍 14 | 15 | [Bounds Check Cookbook]: https://github.com/Shnatsel/bounds-check-cookbook/ 16 | 17 | 作为最后的手段,还有不安全的方法 [get_unchecked] 和 [get_unchecked_mut]。 18 | 19 | [`get_unchecked`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked 20 | [`get_unchecked_mut`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked_mut 21 | 22 | -------------------------------------------------------------------------------- /src/build-configuration.md: -------------------------------------------------------------------------------- 1 | # Build Configuration 2 | 3 | You can drastically change the performance of a Rust program without changing 4 | its code, just by changing its build configuration. There are many possible 5 | build configurations for each Rust program. The one chosen will affect several 6 | characteristics of the compiled code, such as compile times, runtime speed, 7 | memory use, binary size, debuggability, profilability, and which architectures 8 | your compiled program will run on. 9 | 10 | Most configuration choices will improve one or more characteristics while 11 | worsening one or more others. For example, a common trade-off is to accept 12 | worse compile times in exchange for higher runtime speeds. The right choice 13 | for your program depends on your needs and the specifics of your program, and 14 | performance-related choices (which is most of them) should be validated with 15 | benchmarking. 16 | 17 | Note that Cargo only looks at the profile settings in the `Cargo.toml` file at 18 | the root of the workspace. Profile settings defined in dependencies are 19 | ignored. Therefore, these options are mostly relevant for binary crates, not 20 | library crates. 21 | 22 | ## Release Builds 23 | 24 | The single most important build configuration choice is simple but [easy to 25 | overlook]: make sure you are using a [release build] rather than a [dev build] 26 | when you want high performance. This is usually done by specifying the 27 | `--release` flag to Cargo. 28 | 29 | [easy to overlook]: https://users.rust-lang.org/t/why-my-rust-program-is-so-slow/47764/5 30 | [release build]: https://doc.rust-lang.org/cargo/reference/profiles.html#release 31 | [dev build]: https://doc.rust-lang.org/cargo/reference/profiles.html#dev 32 | 33 | Dev builds are the default. They are good for debugging, but are not optimized. 34 | They are produced if you run `cargo build` or `cargo run`. (Alternatively, 35 | running `rustc` without additional options also produces an unoptimized build.) 36 | 37 | Consider the following final line of output from a `cargo build` run. 38 | ```text 39 | Finished dev [unoptimized + debuginfo] target(s) in 29.80s 40 | ``` 41 | This output indicates that a dev build has been produced. The compiled code 42 | will be placed in the `target/debug/` directory. `cargo run` will run the dev 43 | build. 44 | 45 | In comparison, release builds are much more optimized, omit debug assertions 46 | and integer overflow checks, and omit debug info. 10-100x speedups over dev 47 | builds are common! They are produced if you run `cargo build --release` or 48 | `cargo run --release`. (Alternatively, `rustc` has multiple options for 49 | optimized builds, such as `-O` and `-C opt-level`.) This will typically take 50 | longer than a dev build because of the additional optimizations. 51 | 52 | Consider the following final line of output from a `cargo build --release` run. 53 | ```text 54 | Finished release [optimized] target(s) in 1m 01s 55 | ``` 56 | This output indicates that a release build has been produced. The compiled code 57 | will be placed in the `target/release/` directory. `cargo run --release` will 58 | run the release build. 59 | 60 | See the [Cargo profile documentation] for more details about the differences 61 | between dev builds (which use the `dev` profile) and release builds (which use 62 | the `release` profile). 63 | 64 | [Cargo profile documentation]: https://doc.rust-lang.org/cargo/reference/profiles.html 65 | 66 | The default build configuration choices used in release builds provide a good 67 | balance between the abovementioned characteristics such as compile times, runtime 68 | speed, and binary size. But there are many possible adjustments, as the 69 | following sections explain. 70 | 71 | ## Maximizing Runtime Speed 72 | 73 | The following build configuration options are designed primarily to maximize 74 | runtime speed. Some of them may also reduce binary size. 75 | 76 | ### Codegen Units 77 | 78 | The Rust compiler splits crates into multiple [codegen units] to parallelize 79 | (and thus speed up) compilation. However, this might cause it to miss some 80 | potential optimizations. You may be able to improve runtime speed and reduce 81 | binary size, at the cost of increased compile times, by setting the number of 82 | units to one. Add these lines to the `Cargo.toml` file: 83 | ```toml 84 | [profile.release] 85 | codegen-units = 1 86 | ``` 87 | 89 | [**Example 1**](http://likebike.com/posts/How_To_Write_Fast_Rust_Code.html#emit-asm), 90 | [**Example 2**](https://github.com/rust-lang/rust/pull/115554#issuecomment-1742192440). 91 | 92 | [codegen units]: https://doc.rust-lang.org/cargo/reference/profiles.html#codegen-units 93 | 94 | ### Link-time Optimization 95 | 96 | [Link-time optimization] (LTO) is a whole-program optimization technique that 97 | can improve runtime speed by 10-20% or more, and also reduce binary size, at 98 | the cost of worse compile times. It comes in several forms. 99 | 100 | [Link-time optimization]: https://doc.rust-lang.org/cargo/reference/profiles.html#lto 101 | 102 | The first form of LTO is *thin local LTO*, a lightweight form of LTO. By 103 | default the compiler uses this for any build that involves a non-zero level of 104 | optimization. This includes release builds. To explicitly request this level of 105 | LTO, put these lines in the `Cargo.toml` file: 106 | ```toml 107 | [profile.release] 108 | lto = false 109 | ``` 110 | 111 | The second form of LTO is *thin LTO*, which is a little more aggressive, and 112 | likely to improve runtime speed and reduce binary size while also increasing 113 | compile times. Use `lto = "thin"` in `Cargo.toml` to enable it. 114 | 115 | The third form of LTO is *fat LTO*, which is even more aggressive, and may 116 | improve performance and reduce binary size further while increasing build 117 | times again. Use `lto = "fat"` in `Cargo.toml` to enable it. 118 | 119 | Finally, it is possible to fully disable LTO, which will likely worsen runtime 120 | speed and increase binary size but reduce compile times. Use `lto = "off"` in 121 | `Cargo.toml` for this. Note that this is different to the `lto = false` option, 122 | which, as mentioned above, leaves thin local LTO enabled. 123 | 124 | ### Alternative Allocators 125 | 126 | It is possible to replace the default (system) heap allocator used by a Rust 127 | program with an alternative allocator. The exact effect will depend on the 128 | individual program and the alternative allocator chosen, but large improvements 129 | in runtime speed and large reductions in memory usage have been seen in 130 | practice. The effect will also vary across platforms, because each platform's 131 | system allocator has its own strengths and weaknesses. The use of an 132 | alternative allocator is also likely to increase binary size and compile times. 133 | 134 | #### jemalloc 135 | 136 | One popular alternative allocator for Linux and Mac is [jemalloc], usable via 137 | the [`tikv-jemallocator`] crate. To use it, add a dependency to your 138 | `Cargo.toml` file: 139 | ```toml 140 | [dependencies] 141 | tikv-jemallocator = "0.5" 142 | ``` 143 | Then add the following to your Rust code, e.g. at the top of `src/main.rs`: 144 | ```rust,ignore 145 | #[global_allocator] 146 | static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc; 147 | ``` 148 | 149 | Furthermore, on Linux, jemalloc can be configured to use [transparent huge 150 | pages][THP] (THP). This can further speed up programs, possibly at the cost of 151 | higher memory usage. 152 | 153 | [THP]: https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html 154 | 155 | Do this by setting the `MALLOC_CONF` environment variable appropriately before 156 | building your program, for example: 157 | ```bash 158 | MALLOC_CONF="thp:always,metadata_thp:always" cargo build --release 159 | ``` 160 | The system running the compiled program also has to be configured to support 161 | THP. See [this blog post] for more details. 162 | 163 | [this blog post]: https://kobzol.github.io/rust/rustc/2023/10/21/make-rust-compiler-5percent-faster.html 164 | 165 | #### mimalloc 166 | 167 | Another alternative allocator that works on many platforms is [mimalloc], 168 | usable via the [`mimalloc`] crate. To use it, add a dependency to your 169 | `Cargo.toml` file: 170 | ```toml 171 | [dependencies] 172 | mimalloc = "0.1" 173 | ``` 174 | Then add the following to your Rust code, e.g. at the top of `src/main.rs`: 175 | ```rust,ignore 176 | #[global_allocator] 177 | static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; 178 | ``` 179 | 180 | [jemalloc]: https://github.com/jemalloc/jemalloc 181 | [`tikv-jemallocator`]: https://crates.io/crates/tikv-jemallocator 182 | [better performance]: https://github.com/rust-lang/rust/pull/83152 183 | [mimalloc]: https://github.com/microsoft/mimalloc 184 | [`mimalloc`]: https://crates.io/crates/mimalloc 185 | 186 | ### CPU Specific Instructions 187 | 188 | If you do not care about the compatibility of your binary on older (or other 189 | types of) processors, you can tell the compiler to generate the newest (and 190 | potentially fastest) instructions specific to a [certain CPU architecture], 191 | such as AVX SIMD instructions for x86-64 CPUs. 192 | 193 | [certain CPU architecture]: https://doc.rust-lang.org/rustc/codegen-options/index.html#target-cpu 194 | 195 | To request these instructions from the command line, use the `-C 196 | target-cpu=native` flag. For example: 197 | ```bash 198 | RUSTFLAGS="-C target-cpu=native" cargo build --release 199 | ``` 200 | 201 | Alternatively, to request these instructions from a [`config.toml`] file (for 202 | one or more projects), add these lines: 203 | ```toml 204 | [build] 205 | rustflags = ["-C", "target-cpu=native"] 206 | ``` 207 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 208 | 209 | This can improve runtime speed, especially if the compiler finds vectorization 210 | opportunities in your code. 211 | 212 | If you are unsure whether `-C target-cpu=native` is working optimally, compare 213 | the output of `rustc --print cfg` and `rustc --print cfg -C target-cpu=native` 214 | to see if the CPU features are being detected correctly in the latter case. If 215 | not, you can use `-C target-feature` to target specific features. 216 | 217 | ### Profile-guided Optimization 218 | 219 | Profile-guided optimization (PGO) is a compilation model where you compile 220 | your program, run it on sample data while collecting profiling data, and then 221 | use that profiling data to guide a second compilation of the program. This can 222 | improve runtime speed by 10% or more. 223 | [**Example 1**](https://blog.rust-lang.org/inside-rust/2020/11/11/exploring-pgo-for-the-rust-compiler.html), 224 | [**Example 2**](https://github.com/rust-lang/rust/pull/96978). 225 | 226 | It is an advanced technique that takes some effort to set up, but is worthwhile 227 | in some cases. See the [rustc PGO documentation] for details. Also, the 228 | [`cargo-pgo`] command makes it easier to use PGO (and [BOLT], which is similar) 229 | to optimize Rust binaries. 230 | 231 | Unfortunately, PGO is not supported for binaries hosted on crates.io and 232 | distributed via `cargo install`, which limits its usability. 233 | 234 | [rustc PGO documentation]: https://doc.rust-lang.org/rustc/profile-guided-optimization.html 235 | [`cargo-pgo`]: https://github.com/Kobzol/cargo-pgo 236 | [BOLT]: https://github.com/llvm/llvm-project/tree/main/bolt 237 | 238 | ## Minimizing Binary Size 239 | 240 | The following build configuration options are designed primarily to minimize 241 | binary size. Their effects on runtime speed vary. 242 | 243 | ### Optimization Level 244 | 245 | You can request an [optimization level] that aims to minimize binary size by 246 | adding these lines to the `Cargo.toml` file: 247 | ```toml 248 | [profile.release] 249 | opt-level = "z" 250 | ``` 251 | [optimization level]: https://doc.rust-lang.org/cargo/reference/profiles.html#opt-level 252 | 253 | This may also reduce runtime speed. 254 | 255 | An alternative is `opt-level = "s"`, which targets minimal binary size a little 256 | less aggressively. Compared to `opt-level = "z"`, it allows [slightly more 257 | inlining] and also the vectorization of loops. 258 | 259 | [slightly more inlining]: https://doc.rust-lang.org/rustc/codegen-options/index.html#inline-threshold 260 | 261 | ### Abort on `panic!` 262 | 263 | If you do not need to unwind on panic, e.g. because your program doesn't use 264 | [`catch_unwind`], you can tell the compiler to simply [abort on panic]. On 265 | panic, your program will still produce a backtrace. 266 | 267 | [`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html 268 | [abort on panic]: https://doc.rust-lang.org/cargo/reference/profiles.html#panic 269 | 270 | This might reduce binary size and increase runtime speed slightly, and may even 271 | reduce compile times slightly. Add these lines to the `Cargo.toml` file: 272 | ```toml 273 | [profile.release] 274 | panic = "abort" 275 | ``` 276 | 277 | 278 | ### Strip Debug Info and Symbols 279 | 280 | You can tell the compiler to [strip] debug info and symbols from the compiled 281 | binary. Add these lines to `Cargo.toml` to strip just debug info: 282 | ```toml 283 | [profile.release] 284 | strip = "debuginfo" 285 | ``` 286 | Alternatively, use `strip = "symbols"` to strip both debug info and symbols. 287 | 288 | [strip]: https://doc.rust-lang.org/cargo/reference/profiles.html#strip 289 | 290 | Stripping debug info can greatly reduce binary size. On Linux, the binary size 291 | of a small Rust programs might shrink by 4x when debug info is stripped. 292 | Stripping symbols can also reduce binary size, though generally not by as much. 293 | [**Example**](https://github.com/nnethercote/counts/commit/53cab44cd09ff1aa80de70a6dbe1893ff8a41142). 294 | The exact effects are platform-dependent. 295 | 296 | However, stripping makes your compiled program more difficult to debug and 297 | profile. For example, if a stripped program panics, the backtrace produced may 298 | contain less useful information than normal. The exact effects for the two 299 | levels of stripping depend on the platform. 300 | 301 | ### Other Ideas 302 | 303 | For more advanced binary size minimization techniques, consult the 304 | comprehensive documentation in the excellent [`min-sized-rust`] repository. 305 | 306 | [`min-sized-rust`]: https://github.com/johnthagen/min-sized-rust 307 | 308 | ## Minimizing Compile Times 309 | 310 | The following build configuration options are designed primarily to minimize 311 | compile times. 312 | 313 | ### Linking 314 | 315 | A big part of compile time is actually linking time, particularly when 316 | rebuilding a program after a small change. It is possible to select a faster 317 | linker than the default one. 318 | 319 | One option is [lld], which is available on Linux and Windows. To specify lld 320 | from the command line, use the `-C link-arg=-fuse-ld=lld` flag. For example: 321 | ```bash 322 | RUSTFLAGS="-C link-arg=-fuse-ld=lld" cargo build --release 323 | ``` 324 | 325 | [lld]: https://lld.llvm.org/ 326 | 327 | Alternatively, to specify lld from a [`config.toml`] file (for one or more 328 | projects), add these lines: 329 | ```toml 330 | [build] 331 | rustflags = ["-C", "link-arg=-fuse-ld=lld"] 332 | ``` 333 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 334 | 335 | lld is not fully supported for use with Rust, but it should work for most use 336 | cases on Linux and Windows. There is a [GitHub Issue] tracking full support for 337 | lld. 338 | 339 | Another option is [mold], which is currently available on Linux and macOS. 340 | Simply substitute `mold` for `lld` in the instructions above. mold is often 341 | faster than lld. It is also much newer and may not work in all cases. 342 | 343 | [mold]: https://github.com/rui314/mold 344 | 345 | Unlike the other options in this chapter, there are no trade-offs here! 346 | Alternative linkers can be dramatically faster, without any downsides. 347 | 348 | [GitHub Issue]: https://github.com/rust-lang/rust/issues/39915#issuecomment-618726211 349 | 350 | ### Experimental Parallel Front-end 351 | 352 | If you use nightly Rust, you can enable the experimental [parallel front-end]. 353 | It may reduce compile times at the cost of higher compile-time memory usage. It 354 | won't affect the quality of the generated code. 355 | 356 | [parallel front-end]: https://blog.rust-lang.org/2023/11/09/parallel-rustc.html 357 | 358 | You can do that by adding `-Zthreads=N` to RUSTFLAGS, for example: 359 | ```bash 360 | RUSTFLAGS="-Zthreads=8" cargo build --release 361 | ``` 362 | 363 | Alternatively, to enable the parallel front-end from a [`config.toml`] file (for 364 | one or more projects), add these lines: 365 | ```toml 366 | [build] 367 | rustflags = ["-Z", "threads=8"] 368 | ``` 369 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 370 | 371 | Values other than `8` are possible, but that is the number that tends to give 372 | the best results. 373 | 374 | In the best cases, the experimental parallel front-end reduces compile times by 375 | up to 50%. But the effects vary widely and depend on the characteristics of the 376 | code and its build configuration, and for some programs there is no compile 377 | time improvement. 378 | 379 | ### Cranelift Codegen Back-end 380 | 381 | If you use nightly Rust on x86-64/Linux or ARM/Linux, you can enable the 382 | Cranelift codegen back-end. It may reduce compile times at the cost of lower 383 | quality generated code, and therefore is recommended for dev builds rather than 384 | release builds. 385 | 386 | First, install the back-end with this `rustup` command: 387 | ```bash 388 | rustup component add rustc-codegen-cranelift-preview --toolchain nightly 389 | ``` 390 | 391 | To select Cranelift from the command line, use the 392 | `-Zcodegen-backend=cranelift` flag. For example: 393 | ```bash 394 | RUSTFLAGS="-Zcodegen-backend=cranelift" cargo +nightly build 395 | ``` 396 | 397 | Alternatively, to specify Cranelift from a [`config.toml`] file (for one or 398 | more projects), add these lines: 399 | ```toml 400 | [unstable] 401 | codegen-backend = true 402 | 403 | [profile.dev] 404 | codegen-backend = "cranelift" 405 | ``` 406 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 407 | 408 | For more information, see the [Cranelift documentation]. 409 | 410 | [Cranelift documentation]: https://github.com/rust-lang/rustc_codegen_cranelift 411 | 412 | ## Custom profiles 413 | 414 | In addition to the `dev` and `release` profiles, Cargo supports [custom 415 | profiles]. It might be useful, for example, to create a custom profile halfway 416 | between `dev` and `release` if you find the runtime speed of dev builds 417 | insufficient and the compile times of release builds too slow for everyday 418 | development. 419 | 420 | [custom profiles]: https://doc.rust-lang.org/cargo/reference/profiles.html#custom-profiles 421 | 422 | ## Summary 423 | 424 | There are many choices to be made when it comes to build configurations. The 425 | following points summarize the above information into some recommendations. 426 | 427 | - If you want to maximize runtime speed, consider all of the following: 428 | `codegen-units = 1`, `lto = "fat"`, an alternative allocator, and `panic = 429 | "abort"`. 430 | - If you want to minimize binary size, consider `opt-level = "z"`, 431 | `codegen-units = 1`, `lto = "fat"`, `panic = "abort"`, and `strip = 432 | "symbols"`. 433 | - In either case, consider `-C target-cpu=native` if broad architecture support 434 | is not needed, and `cargo-pgo` if it works with your distribution mechanism. 435 | - Always use a faster linker if you are on a platform that supports it, because 436 | there are no downsides to doing so. 437 | - Benchmark all changes, one at a time, to ensure they have the expected 438 | effects. 439 | 440 | Finally, [this issue] tracks the evolution of the Rust compiler's own build 441 | configuration. The Rust compiler's build system is stranger and more complex 442 | than that of most Rust programs. Nonetheless, this issue may be instructive in 443 | showing how build configuration choices can be applied to a large program. 444 | 445 | [this issue]: https://github.com/rust-lang/rust/issues/103595 446 | -------------------------------------------------------------------------------- /src/build-configuration_zh.md: -------------------------------------------------------------------------------- 1 | # 构建配置 2 | 3 | 您可以通过更改构建配置而不更改代码,从而显著改变 Rust 程序的性能。对于每个 Rust 程序,都有许多可能的构建配置。所选择的配置将影响编译代码的几个特征,如编译时间、运行时速度、内存使用、二进制大小、调试性、性能分析性以及编译程序将在哪些架构上运行。 4 | 5 | 大多数配置选择会改善一个或多个特征,同时恶化一个或多个其他特征。例如,一个常见的权衡是为了获得更高的运行时速度而接受更差的编译时间。对于您的程序来说,正确的选择取决于您的需求和程序的具体情况,与性能相关的选择(其中大部分都是)应该通过基准测试来验证。 6 | 7 | 请注意,Cargo 只查看工作区根目录下 Cargo.toml 文件中的配置设置。在依赖项中定义的配置设置将被忽略。因此,这些选项主要与二进制 crate 相关,而不是库 crate。 8 | 9 | ## 发布构建 10 | 11 | 最重要的一个Rust性能提示很简单,但[很容易被忽视]:当你想要高性能时,确保你使用的是release构建而不是debug构建。这通常是通过在Cargo中指定`--release`标志来实现的。 12 | 13 | [很容易被忽视]: https://users.rust-lang.org/t/why-my-rust-program-is-so-slow/47764/5 14 | 15 | 开发构建是默认设置。它们适用于调试,但没有经过优化。如果运行 cargo build 或 cargo run,则会生成这些构建。(另外,运行 `rustc` 而不添加额外选项也会生成未经优化的构建。) 16 | 17 | 考虑以下来自 cargo build 运行的输出的最后一行。 18 | ```text 19 | Finished dev [unoptimized + debuginfo] target(s) in 29.80s 20 | ``` 21 | 这个输出表明已生成了一个开发构建。编译后的代码将放在 `target/debug/` 目录中。`cargo run` 将运行开发构建。 22 | 23 | 相比之下,发布构建经过了更多优化,省略了调试断言和整数溢出检查,也省略了调试信息。相对于开发构建,通常可以实现 10-100 倍的速度提升!如果运行 `cargo build --release` 或 `cargo run --release`,则会生成这些构建。(另外,`rustc` 有多个选项用于优化构建,如 `-O` 和 `-C opt-level`。)由于额外的优化,这通常会比开发构建花费更长的时间。 24 | 25 | 请看下面的`"cargo build --release"`运行的最后一行输出。 26 | ```text 27 | Finished release [optimized] target(s) in 1m 01s 28 | ``` 29 | 这个输出表明已生成了一个发布构建。编译后的代码将放在 `target/release/` 目录中。`cargo run --release` 将运行发布构建。 30 | 31 | 查看 [Cargo 配置文件文档] 以获取有关开发构建(使用 `dev` 配置文件)和发布构建(使用 `release` 配置文件)之间差异的更多详细信息。 32 | 33 | [Cargo 配置文件文档]: https://doc.rust-lang.org/cargo/reference/profiles.html 34 | 35 | 发布构建中使用的默认构建配置选择在编译时间、运行时速度和二进制文件大小等方面提供了良好的平衡。但正如下文所述,还有许多可能的调整。 36 | 37 | ## 最大化运行时速度 38 | 39 | 以下构建配置选项主要旨在最大化运行时速度。其中一些选项也可能会减小二进制文件大小。 40 | 41 | ### codegen units 42 | 43 | Rust编译器将crate分割为多个[codegen units]以并行化编译(从而加快速度)。然而,这可能导致它错过一些潜在的优化。您可以通过将单元数设置为1来提高运行时速度并减小二进制文件大小,但这会增加编译时间。请将以下行添加到`Cargo.toml`文件中: 44 | ```toml 45 | [profile.release] 46 | codegen-units = 1 47 | ``` 48 | 50 | [**Example 1**](http://likebike.com/posts/How_To_Write_Fast_Rust_Code.html#emit-asm), 51 | [**Example 2**](https://github.com/rust-lang/rust/pull/115554#issuecomment-1742192440). 52 | 53 | [codegen units]: https://doc.rust-lang.org/cargo/reference/profiles.html#codegen-units 54 | 55 | ### 链接时优化 56 | 57 | [链接时优化](LTO)是一种整体程序优化技术,可以提高运行时速度10-20%或更多,并减小二进制文件大小,但会导致较差的编译时间。它有几种形式。 58 | 59 | [链接时优化]: https://doc.rust-lang.org/cargo/reference/profiles.html#lto 60 | 61 | LTO的第一种形式是thin local LTO,这是一种轻量级的LTO形式。默认情况下,编译器会在涉及非零优化级别的任何构建中使用此形式。这包括发布构建。要显式请求此级别的LTO,请将以下行放入Cargo.toml文件中: 62 | ```toml 63 | [profile.release] 64 | lto = false 65 | ``` 66 | LTO的第二种形式是thin LTO,它稍微更具侵略性,可能会提高运行时速度并减小二进制文件大小,同时也会增加编译时间。在Cargo.toml中使用lto = "thin"来启用它。 67 | 68 | LTO的第三种形式是fat LTO,它更具侵略性,可能会进一步提高性能并减小二进制文件大小,同时再次增加构建时间。在Cargo.toml中使用lto = "fat"来启用它。 69 | 70 | 最后,可以完全禁用LTO,这可能会降低运行时速度并增加二进制文件大小,但会减少编译时间。在Cargo.toml中使用lto = "off"来实现此目的。请注意,这与lto = false选项不同,如上所述,后者会保留thin local LTO。 71 | 72 | ### 替代分配器 73 | 可以使用替代分配器替换Rust程序使用的默认(系统)堆分配器。具体效果取决于个别程序和所选择的替代分配器,但在实践中已经看到了运行时速度大幅提升和内存使用大幅减少。效果还会因平台而异,因为每个平台的系统分配器都有其优势和劣势。使用替代分配器还可能增加二进制文件大小和编译时间。 74 | 75 | #### jemalloc 76 | 77 | 一种流行的适用于Linux和Mac的替代分配器是[jemalloc],可通过[`tikv-jemallocator`] crate使用。要使用它,请在您的Cargo.toml文件中添加一个依赖项: 78 | ```toml 79 | [dependencies] 80 | tikv-jemallocator = "0.5" 81 | ``` 82 | 然后在您的Rust代码中添加以下内容,例如在`src/main.rs`的顶部: 83 | ```rust,ignore 84 | #[global_allocator] 85 | static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc; 86 | ``` 87 | 88 | 此外,在Linux上,jemalloc可以配置为使用[透明大页][THP]。这可以进一步加快程序的运行速度,可能会以更高的内存使用为代价。 89 | 90 | [THP]: https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html 91 | 92 | 在构建程序之前,通过适当设置 `MALLOC_CONF` 环境变量来执行此操作,例如: 93 | ```bash 94 | MALLOC_CONF="thp:always,metadata_thp:always" cargo build --release 95 | ``` 96 | 运行编译程序的系统还必须配置为支持THP。有关更多详细信息,请参阅[此博客]。 97 | 98 | [此博客]: https://kobzol.github.io/rust/rustc/2023/10/21/make-rust-compiler-5percent-faster.html 99 | 100 | #### mimalloc 101 | 102 | 另一个适用于许多平台的替代分配器是[mimalloc],可通过[mimalloc] crate使用。要使用它,请在您的`Cargo.toml`文件中添加一个依赖项: 103 | ```toml 104 | [dependencies] 105 | mimalloc = "0.1" 106 | ``` 107 | 然后在您的Rust代码中添加以下内容,例如在`src/main.rs`的顶部: 108 | ```rust,ignore 109 | #[global_allocator] 110 | static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; 111 | ``` 112 | 113 | [jemalloc]: https://github.com/jemalloc/jemalloc 114 | [`tikv-jemallocator`]: https://crates.io/crates/tikv-jemallocator 115 | [better performance]: https://github.com/rust-lang/rust/pull/83152 116 | [mimalloc]: https://github.com/microsoft/mimalloc 117 | [`mimalloc`]: https://crates.io/crates/mimalloc 118 | 119 | ## 使用CPU专用指令 120 | 121 | 如果您不关心二进制文件在旧版(或其他类型的)处理器上的兼容性,您可以告诉编译器生成针对特定CPU架构的最新(可能是最快的)指令,比如针对x86-64 CPU的AVX SIMD指令。 122 | 123 | [特定CPU架构]: https://doc.rust-lang.org/1.41.1/rustc/codegen-options/index.html#target-cpu 124 | 125 | 例如,如果你把`-C target-cpu=native`传给rustc,它将使用当前CPU的最佳指令。 126 | ```bash 127 | $ RUSTFLAGS="-C target-cpu=native" cargo build --release 128 | ``` 129 | 或者,要从一个[config.toml]文件(用于一个或多个项目)中请求这些指令,请添加以下行: 130 | ```toml 131 | [build] 132 | rustflags = ["-C", "target-cpu=native"] 133 | ``` 134 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 135 | 136 | 这可能会有助于提升运行时的性能,特别是当编译器在你的代码中发现矢量化的机会时。 137 | 138 | 如果您不确定`-C target-cpu=native`是否达到了最佳效果,请比较`rustc --print cfg`和`rustc --print cfg -C target-cpu=native`的输出,看看在后一种情况下是否正确检测到了CPU特性。如果没有,您可以使用`-C target-feature`来针对特定特性。 139 | 140 | ## Profile-guided Optimization 141 | 142 | Profile-guided optimization(PGO)是一种编译模式,即编译程序后,在收集样本数据的同时在样本数据上运行,然后用样本数据引导程序的第二次编译。这可以提升10%或更多的运行时性能 143 | [**Example 1**](https://blog.rust-lang.org/inside-rust/2020/11/11/exploring-pgo-for-the-rust-compiler.html), 144 | [**Example 2**](https://github.com/rust-lang/rust/pull/96978). 145 | 146 | 这是一种高级技术,需要一些设置工作,但在某些情况下是值得的。详细信息请参阅[rustc PGO文档]。此外,[cargo-pgo]命令使使用PGO(以及类似的[BOLT])来优化Rust二进制文件变得更加容易。 147 | 148 | 不幸的是,对于托管在crates.io上并通过cargo install分发的二进制文件,不支持PGO,这限制了其可用性。 149 | 150 | [rustc PGO文档]: https://doc.rust-lang.org/rustc/profile-guided-optimization.html 151 | [`cargo-pgo`]: https://github.com/Kobzol/cargo-pgo 152 | [BOLT]: https://github.com/llvm/llvm-project/tree/main/bolt 153 | 154 | ## 最小化二进制文件大小 155 | 156 | 以下构建配置选项主要旨在最小化二进制文件大小。它们对运行时速度的影响各不相同。 157 | 158 | ### 优化级别 159 | 160 | 您可以通过向`Cargo.toml`文件添加以下行来请求一个旨在最小化二进制文件大小的优化级别: 161 | ```toml 162 | [profile.release] 163 | opt-level = "z" 164 | ``` 165 | [optimization level]: https://doc.rust-lang.org/cargo/reference/profiles.html#opt-level 166 | 167 | 这可能会降低运行时速度。 168 | 169 | 另一种选择是`opt-level = "s"`,它针对最小化二进制文件大小的目标略微不那么激进。与`opt-level = "z"`相比,它允许[稍微更多的内联]和循环的矢量化。 170 | 171 | [稍微更多的内联]: https://doc.rust-lang.org/rustc/codegen-options/index.html#inline-threshold 172 | 173 | ### 在`panic!`时中止 174 | 175 | 如果您不需要在发生恐慌时展开,例如因为您的程序不使用[`catch_unwind`],您可以告诉编译器在恐慌时简单地[abort on panic]。在发生恐慌时,您的程序仍将生成回溯信息。 176 | 177 | [`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html 178 | [abort on panic]: https://doc.rust-lang.org/cargo/reference/profiles.html#panic 179 | 180 | 这可能会减小二进制文件大小并略微增加运行时速度,甚至可能会略微减少编译时间。将以下内容添加到`Cargo.toml`文件中: 181 | ```toml 182 | [profile.release] 183 | panic = "abort" 184 | ``` 185 | 186 | ### 剥离调试信息和符号 187 | 188 | 您可以告诉编译器从编译后的二进制文件中[剥离]调试信息和符号。将以下内容添加到`Cargo.toml`中以仅剥离调试信息: 189 | 190 | ```toml 191 | [profile.release] 192 | strip = "debuginfo" 193 | ``` 194 | 195 | 或者,使用`strip = "symbols"`来同时剥离调试信息和符号。 196 | 197 | 剥离调试信息可以极大地减小二进制文件大小。在Linux上,当剥离调试信息时,一个小型Rust程序的二进制文件大小可能会缩小4倍。剥离符号也可以减小二进制文件大小,尽管通常不会减少那么多。[**示例**](https://github.com/nnethercote/counts/commit/53cab44cd09ff1aa80de70a6dbe1893ff8a41142)。具体效果取决于平台。 198 | 199 | 然而,剥离会使您编译的程序更难以调试和分析性能。例如,如果一个被剥离的程序发生恐慌,生成的回溯信息可能会比正常情况下包含的信息更少。两种剥离级别的具体效果取决于平台。 200 | 201 | ### 其他想法 202 | 203 | 要了解更多高级的二进制文件大小最小化技术,请参考优秀的[`min-sized-rust`]存储库中的全面文档。 204 | 205 | [`min-sized-rust`]: https://github.com/johnthagen/min-sized-rust 206 | 207 | ## 最小化编译时间 208 | 209 | 以下构建配置选项主要旨在最小化编译时间。 210 | 211 | ### 链接 212 | 213 | 编译时间的一个重要部分实际上是链接时间,特别是在对程序进行小改动后重新构建时。可以选择比默认链接器更快的链接器。 214 | 215 | 一个选择是[lld],它在Linux和Windows上都可用。要从命令行指定lld,请使用`-C link-arg=-fuse-ld=lld`标志。例如: 216 | ```bash 217 | RUSTFLAGS="-C link-arg=-fuse-ld=lld" cargo build --release 218 | ``` 219 | 220 | [lld]: https://lld.llvm.org/ 221 | 222 | 另一种方法是从[`config.toml`]文件(针对一个或多个项目)中指定lld,添加以下内容: 223 | ```toml 224 | [build] 225 | rustflags = ["-C", "link-arg=-fuse-ld=lld"] 226 | ``` 227 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 228 | 229 | lld目前并不完全支持与Rust一起使用,但在Linux和Windows上的大多数用例中应该可以工作。有一个[GitHub Issue]跟踪lld的完全支持。 230 | 231 | 另一个选择是[mold],目前在Linux和macOS上可用。只需在上述说明中用`mold`替换`lld`。mold通常比lld更快。它也要新得多,可能不适用于所有情况。 232 | 233 | [mold]: https://github.com/rui314/mold 234 | 235 | 与本章中的其他选项不同,这里没有任何权衡!替代链接器可以显著提高速度,而没有任何不利影响。 236 | 237 | [GitHub Issue]: https://github.com/rust-lang/rust/issues/39915#issuecomment-618726211 238 | 239 | ### 实验性并行前端 240 | 241 | 如果您使用nightly版的Rust,可以启用实验性的[并行前端]。这可能会减少编译时间,但会增加编译时内存的使用。它不会影响生成的代码质量。 242 | 243 | [并行前端]: https://blog.rust-lang.org/2023/11/09/parallel-rustc.html 244 | 245 | 您可以通过将`-Zthreads=N`添加到RUSTFLAGS来实现,例如: 246 | ```bash 247 | RUSTFLAGS="-Zthreads=8" cargo build --release 248 | ``` 249 | 250 | 或者,要从[`config.toml`]文件(针对一个或多个项目)启用并行前端,添加以下内容: 251 | ```toml 252 | [build] 253 | rustflags = ["-Z", "threads=8"] 254 | ``` 255 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 256 | 257 | 除了`8`之外,还可以使用其他值,但这个数字通常会产生最佳结果。 258 | 259 | 在最佳情况下,实验性并行前端可以将编译时间缩短高达50%。但效果因代码特性和构建配置的不同而异,对于某些程序,编译时间可能不会有所改喀。 260 | 261 | ### Cranelift代码生成后端 262 | 263 | 如果您在x86-64/Linux或ARM/Linux上使用nightly版的Rust,可以启用Cranelift代码生成后端。它可能会减少编译时间,但会以生成的代码质量降低为代价,因此建议用于开发构建而不是发布构建。 264 | 265 | 首先,使用以下`rustup`命令安装后端: 266 | ```bash 267 | rustup component add rustc-codegen-cranelift-preview --toolchain nightly 268 | ``` 269 | 270 | 要从命令行选择Cranelift,请使用`-Zcodegen-backend=cranelift`标志。例如: 271 | ```bash 272 | RUSTFLAGS="-Zcodegen-backend=cranelift" cargo +nightly build 273 | ``` 274 | 275 | 或者,要从[`config.toml`]文件(针对一个或多个项目)指定Cranelift,添加以下内容: 276 | ```toml 277 | [unstable] 278 | codegen-backend = true 279 | 280 | [profile.dev] 281 | codegen-backend = "cranelift" 282 | ``` 283 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 284 | 285 | 有关更多信息,请参阅[Cranelift文档]。 286 | 287 | [Cranelift文档]: https://github.com/rust-lang/rustc_codegen_cranelift 288 | 289 | ## 自定义配置文件 290 | 291 | 除了`dev`和`release`配置文件外,Cargo还支持[自定义配置文件]。例如,如果您发现开发构建的运行时速度不够,发布构建的编译时间对日常开发来说太慢,那么创建一个介于`dev`和`release`之间的自定义配置文件可能会很有用。 292 | 293 | [自定义配置文件]: https://doc.rust-lang.org/cargo/reference/profiles.html#custom-profiles 294 | 295 | ## 总结 296 | 297 | 在构建配置方面有许多选择需要考虑。以下总结了上述信息并提出了一些建议。 298 | 299 | - 如果您想最大化运行时速度,请考虑以下所有内容:`codegen-units = 1`、`lto = "fat"`、替代分配器和`panic = "abort"`。 300 | - 如果您想最小化二进制文件大小,请考虑`opt-level = "z"`、`codegen-units = 1`、`lto = "fat"`、`panic = "abort"`和`strip = "symbols"`。 301 | - 在任何情况下,如果不需要广泛的架构支持,请考虑使用`-C target-cpu=native`,如果与您的分发机制兼容,请考虑使用`cargo-pgo`。 302 | - 如果您所在的平台支持更快的链接器,请始终使用它,因为这样做没有任何不利之处。 303 | - 逐个对所有更改进行基准测试,以确保它们产生预期效果。 304 | 305 | 最后,[此问题]跟踪了Rust编译器自身构建配置的演变。Rust编译器的构建系统比大多数Rust程序更奇特和复杂。尽管如此,这个问题可能有助于展示如何将构建配置选择应用于大型程序。 306 | 307 | [此问题]: https://github.com/rust-lang/rust/issues/103595 -------------------------------------------------------------------------------- /src/compile-times.md: -------------------------------------------------------------------------------- 1 | # Compile Times 2 | 3 | Although this book is primarily about improving the performance of Rust 4 | programs, this section is about reducing the compile times of Rust programs, 5 | because that is a related topic of interest to many people. 6 | 7 | The [Minimizing Compile Times] section discussed ways to reduce compile times 8 | via build configuration choices. The rest of this section discusses ways to 9 | reduce compile times that require modifying your program's code. 10 | 11 | [Minimizing Compile Times]: build-configuration.md#minimizing-compile-times 12 | 13 | ## Visualization 14 | 15 | Cargo has a feature that lets you visualize compilation of your 16 | program. Build with this command: 17 | ```text 18 | cargo build --timings 19 | ``` 20 | On completion it will print the name of an HTML file. Open that file in a web 21 | browser. It contains a [Gantt chart] that shows the dependencies between the 22 | various crates in your program. This shows how much parallelism there is in 23 | your crate graph, which can indicate if any large crates that serialize 24 | compilation should be broken up. See [the documentation][timings] for more 25 | details on how to read the graphs. 26 | 27 | [Gantt chart]: https://en.wikipedia.org/wiki/Gantt_chart 28 | [timings]: https://doc.rust-lang.org/nightly/cargo/reference/timings.html 29 | 30 | ## LLVM IR 31 | 32 | The Rust compiler uses [LLVM] for its back-end. LLVM's execution can be a large 33 | part of compile times, especially when the Rust compiler's front end generates 34 | a lot of [IR] which takes LLVM a long time to optimize. 35 | 36 | [LLVM]: https://llvm.org/ 37 | [IR]: https://en.wikipedia.org/wiki/Intermediate_representation 38 | 39 | These problems can be diagnosed with [`cargo llvm-lines`], which shows which 40 | Rust functions cause the most LLVM IR to be generated. Generic functions are 41 | often the most important ones, because they can be instantiated dozens or even 42 | hundreds of times in large programs. 43 | 44 | [`cargo llvm-lines`]: https://github.com/dtolnay/cargo-llvm-lines/ 45 | 46 | If a generic function causes IR bloat, there are several ways to fix it. The 47 | simplest is to just make the function smaller. 48 | [**Example 1**](https://github.com/rust-lang/rust/pull/72166/commits/5a0ac0552e05c079f252482cfcdaab3c4b39d614), 49 | [**Example 2**](https://github.com/rust-lang/rust/pull/91246/commits/f3bda74d363a060ade5e5caeb654ba59bfed51a4). 50 | 51 | Another way is to move the non-generic parts of the function into a separate, 52 | non-generic function, which will only be instantiated once. Whether this is 53 | possible will depend on the details of the generic function. When it is 54 | possible, the non-generic function can often be written neatly as an inner 55 | function within the generic function, as shown by the code for 56 | [`std::fs::read`]: 57 | ```rust,ignore 58 | pub fn read>(path: P) -> io::Result> { 59 | fn inner(path: &Path) -> io::Result> { 60 | let mut file = File::open(path)?; 61 | let size = file.metadata().map(|m| m.len()).unwrap_or(0); 62 | let mut bytes = Vec::with_capacity(size as usize); 63 | io::default_read_to_end(&mut file, &mut bytes)?; 64 | Ok(bytes) 65 | } 66 | inner(path.as_ref()) 67 | } 68 | ``` 69 | [`std::fs::read`]: https://doc.rust-lang.org/std/fs/fn.read.html 70 | 71 | [**Example**](https://github.com/rust-lang/rust/pull/72013/commits/68b75033ad78d88872450a81745cacfc11e58178). 72 | 73 | Sometimes common utility functions like [`Option::map`] and [`Result::map_err`] 74 | are instantiated many times. Replacing them with equivalent `match` expressions 75 | can help compile times. 76 | 77 | [`Option::map`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map 78 | [`Result::map_err`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_err 79 | 80 | The effects of these sorts of changes on compile times will usually be small, 81 | though occasionally they can be large. 82 | [**Example**](https://github.com/servo/servo/issues/26585). 83 | 84 | Such changes can also reduce binary size. 85 | -------------------------------------------------------------------------------- /src/compile-times_zh.md: -------------------------------------------------------------------------------- 1 | # 编译时间 2 | 3 | 虽然本书的主要内容是提高Rust程序的性能,但本节的内容是关于减少Rust程序的编译时间,因为这是很多人感兴趣的相关话题。 4 | 5 | [减少编译时间]部分讨论了通过构建配置选择来减少编译时间的方法。本节的其余部分将讨论需要修改程序代码来减少编译时间的方法。 6 | 7 | [减少编译时间]: build-configuration.md#minimizing-compile-times 8 | 9 | ## 可视化 10 | 11 | Rust编译器有一个功能,可以让你可视化编译你的程序。用这个命令进行编译。 12 | ```text 13 | cargo build --timings 14 | ``` 15 | 完成后,它将打印一个HTML文件的名称。在Web浏览器中打开该文件。其中包含一个[Gantt chart],显示程序中各个crate之间的依赖关系。这显示了您的crate图中有多少并行性,这可以表明是否应该拆分任何序列化编译的大型crate。有关如何阅读这些图表的更多详细信息,请参阅[timings]。 16 | 17 | [Gantt chart]: https://en.wikipedia.org/wiki/Gantt_chart 18 | [timings]: https://doc.rust-lang.org/nightly/cargo/reference/timings.html 19 | 20 | ## LLVM IR 21 | 22 | Rust编译器的后端使用[LLVM]。LLVM的执行会占到编译时间的很大一部分,尤其是当Rust编译器的前端会产生大量的[IR],这需要LLVM花很长的时间去优化。 23 | 24 | [LLVM]: https://llvm.org/ 25 | [IR]: https://en.wikipedia.org/wiki/Intermediate_representation 26 | 27 | 这些问题可以用[`cargo llvm-line`]来诊断,它显示了哪些Rust函数导致了最多的LLVM IR生成。通用函数通常是最重要的函数,因为它们在大型程序中可以被实例化几十次甚至几百次。 28 | 29 | [`cargo llvm-lines`]: https://github.com/dtolnay/cargo-llvm-lines/ 30 | 31 | 如果一个通用函数导致IR膨胀,有几种方法可以解决。最简单的方法就是把函数变小。 32 | [**Example 1**](https://github.com/rust-lang/rust/pull/72166/commits/5a0ac0552e05c079f252482cfcdaab3c4b39d614), 33 | [**Example 2**](https://github.com/rust-lang/rust/pull/91246/commits/f3bda74d363a060ade5e5caeb654ba59bfed51a4). 34 | 35 | 另一种方法是将函数的非泛型部分移动到一个单独的非泛型函数中,该函数只会被实例化一次。是否可能取决于泛型函数的细节。当可能时,非泛型函数通常可以被整洁地编写为泛型函数内部的内部函数,就像[`std::fs::read`]的代码所示: 36 | ```rust,ignore 37 | pub fn read>(path: P) -> io::Result> { 38 | fn inner(path: &Path) -> io::Result> { 39 | let mut file = File::open(path)?; 40 | let size = file.metadata().map(|m| m.len()).unwrap_or(0); 41 | let mut bytes = Vec::with_capacity(size as usize); 42 | io::default_read_to_end(&mut file, &mut bytes)?; 43 | Ok(bytes) 44 | } 45 | inner(path.as_ref()) 46 | } 47 | ``` 48 | [`std::fs::read`]: https://doc.rust-lang.org/std/fs/fn.read.html 49 | 50 | [**Example**](https://github.com/rust-lang/rust/pull/72013/commits/68b75033ad78d88872450a81745cacfc11e58178). 51 | 52 | 有时,像[`Option::map`]和[`Result::map_err`]这样的常用实用函数会被实例化多次。 用等价的`match`表达式替换它们可以帮助编译时间。 53 | 54 | [`Option::map`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map 55 | [`Result::map_err`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_err 56 | 57 | 这些变化对编译时间的影响通常是很小的,但偶尔也会很大。 58 | [**Example**](https://github.com/servo/servo/issues/26585). 59 | 60 | 这些更改还可以减少二进制文件的大小。 -------------------------------------------------------------------------------- /src/general-tips.md: -------------------------------------------------------------------------------- 1 | # General Tips 2 | 3 | The previous sections of this book have discussed Rust-specific techniques. 4 | This section gives a brief overview of some general performance principles. 5 | 6 | As long as the obvious pitfalls are avoided (e.g. [using non-release builds]), 7 | Rust code generally is fast and uses little memory. Especially if you are used 8 | to dynamically-typed languages such as Python and Ruby, or statically-types 9 | languages with a garbage collector such as Java and C#. 10 | 11 | [using non-release builds]: build-configuration.md 12 | 13 | Optimized code is often more complex and takes more effort to write than 14 | unoptimized code. For this reason, it is only worth optimizing hot code. 15 | 16 | The biggest performance improvements often come from changes to algorithms or 17 | data structures, rather than low-level optimizations. 18 | [**Example 1**](https://github.com/rust-lang/rust/pull/53383/commits/5745597e6195fe0591737f242d02350001b6c590), 19 | [**Example 2**](https://github.com/rust-lang/rust/pull/54318/commits/154be2c98cf348de080ce951df3f73649e8bb1a6). 20 | 21 | Writing code that works well with modern hardware is not always easy, but worth 22 | striving for. For example, try to minimize cache misses and branch 23 | mispredictions, where possible. 24 | 25 | Most optimizations result in small speedups. Although no single small speedup 26 | is noticeable, they really add up if you can do enough of them. 27 | 28 | Different profilers have different strengths. It is good to use more than one. 29 | 30 | When profiling indicates that a function is hot, there are two common ways to 31 | speed things up: (a) make the function faster, and/or (b) avoid calling it as 32 | much. 33 | 34 | It is often easier to eliminate silly slowdowns than it is to introduce clever 35 | speedups. 36 | 37 | Avoid computing things unless necessary. Lazy/on-demand computations are 38 | often a win. 39 | [**Example 1**](https://github.com/rust-lang/rust/pull/36592/commits/80a44779f7a211e075da9ed0ff2763afa00f43dc), 40 | [**Example 2**](https://github.com/rust-lang/rust/pull/50339/commits/989815d5670826078d9984a3515eeb68235a4687). 41 | 42 | Complex general cases can often be avoided by optimistically checking for 43 | common special cases that are simpler. 44 | [**Example 1**](https://github.com/rust-lang/rust/pull/68790/commits/d62b6f204733d255a3e943388ba99f14b053bf4a), 45 | [**Example 2**](https://github.com/rust-lang/rust/pull/53733/commits/130e55665f8c9f078dec67a3e92467853f400250), 46 | [**Example 3**](https://github.com/rust-lang/rust/pull/65260/commits/59e41edcc15ed07de604c61876ea091900f73649). 47 | In particular, specially handling collections with 0, 1, or 2 elements is often 48 | a win when small sizes dominate. 49 | [**Example 1**](https://github.com/rust-lang/rust/pull/50932/commits/2ff632484cd8c2e3b123fbf52d9dd39b54a94505), 50 | [**Example 2**](https://github.com/rust-lang/rust/pull/64627/commits/acf7d4dcdba4046917c61aab141c1dec25669ce9), 51 | [**Example 3**](https://github.com/rust-lang/rust/pull/64949/commits/14192607d38f5501c75abea7a4a0e46349df5b5f), 52 | [**Example 4**](https://github.com/rust-lang/rust/pull/64949/commits/d1a7bb36ad0a5932384eac03d3fb834efc0317e5). 53 | 54 | Similarly, when dealing with repetitive data, it is often possible to use a 55 | simple form of data compression, by using a compact representation for common 56 | values and then having a fallback to a secondary table for unusual values. 57 | [**Example 1**](https://github.com/rust-lang/rust/pull/54420/commits/b2f25e3c38ff29eebe6c8ce69b8c69243faa440d), 58 | [**Example 2**](https://github.com/rust-lang/rust/pull/59693/commits/fd7f605365b27bfdd3cd6763124e81bddd61dd28), 59 | [**Example 3**](https://github.com/rust-lang/rust/pull/65750/commits/eea6f23a0ed67fd8c6b8e1b02cda3628fee56b2f). 60 | 61 | When code deals with multiple cases, measure case frequencies and handle the 62 | most common ones first. 63 | 64 | When dealing with lookups that involve high locality, it can be a win to put a 65 | small cache in front of a data structure. 66 | 67 | Optimized code often has a non-obvious structure, which means that explanatory 68 | comments are valuable, particularly those that reference profiling 69 | measurements. A comment like "99% of the time this vector has 0 or 1 elements, 70 | so handle those cases first" can be illuminating. 71 | -------------------------------------------------------------------------------- /src/general-tips_zh.md: -------------------------------------------------------------------------------- 1 | # 一般建议 2 | 3 | 本书前几节讨论了 Rust 特定的技术。本节简要概述了一些一般性能原则。 4 | 5 | 只要避免明显的陷阱(例如[使用非发布版本构建]),Rust 代码通常运行速度快且占用内存少。特别是如果你习惯于动态类型语言如 Python 和 Ruby,或者带有垃圾回收器的静态类型语言如 Java 和 C#。 6 | 7 | [使用非发布版本构建]: build-configuration.md 8 | 9 | 优化的代码通常比未优化的代码更复杂,编写起来需要更多的工作。因此,只有值得优化热点代码时才值得进行优化。 10 | 11 | 最大的性能改进通常来自于算法或数据结构的更改,而不是低级优化。 12 | [**Example 1**](https://github.com/rust-lang/rust/pull/53383/commits/5745597e6195fe0591737f242d02350001b6c590), 13 | [**Example 2**](https://github.com/rust-lang/rust/pull/54318/commits/154be2c98cf348de080ce951df3f73649e8bb1a6). 14 | 15 | 编写能够与现代硬件良好配合的代码并不总是容易的,但值得努力。例如,尽量减少缓存未命中和分支预测错误。 16 | 17 | 大多数优化只会带来轻微的加速。虽然单个小的加速可能不明显,但如果你能做足够多的优化,它们的效果会累积起来。 18 | 19 | 不同的性能分析工具各有优势。最好使用多个工具。 20 | 21 | 当性能分析表明某个函数运行热点时,有两种常见的加速方法:(a)加快函数运行速度,和/或者(b)尽量减少调用次数。 22 | 23 | 消除愚蠢的减速往往比引入巧妙的加速更容易。 24 | 25 | 除非必要,避免计算。延迟/按需计算通常是明智的选择。 26 | [**Example 1**](https://github.com/rust-lang/rust/pull/36592/commits/80a44779f7a211e075da9ed0ff2763afa00f43dc), 27 | [**Example 2**](https://github.com/rust-lang/rust/pull/50339/commits/989815d5670826078d9984a3515eeb68235a4687). 28 | 29 | 一般复杂的情况往往可以通过乐观地检查比较简单的常见特殊情况来避免。 30 | [**Example 1**](https://github.com/rust-lang/rust/pull/68790/commits/d62b6f204733d255a3e943388ba99f14b053bf4a), 31 | [**Example 2**](https://github.com/rust-lang/rust/pull/53733/commits/130e55665f8c9f078dec67a3e92467853f400250), 32 | [**Example 3**](https://github.com/rust-lang/rust/pull/65260/commits/59e41edcc15ed07de604c61876ea091900f73649). 33 | 尤其是在小尺寸占主导地位的情况下,特别处理0、1或2个元素的集合往往是一种好办法。 34 | [**Example 1**](https://github.com/rust-lang/rust/pull/50932/commits/2ff632484cd8c2e3b123fbf52d9dd39b54a94505), 35 | [**Example 2**](https://github.com/rust-lang/rust/pull/64627/commits/acf7d4dcdba4046917c61aab141c1dec25669ce9), 36 | [**Example 3**](https://github.com/rust-lang/rust/pull/64949/commits/14192607d38f5501c75abea7a4a0e46349df5b5f), 37 | [**Example 4**](https://github.com/rust-lang/rust/pull/64949/commits/d1a7bb36ad0a5932384eac03d3fb834efc0317e5). 38 | 39 | 同样,在处理重复性数据时,通常可以使用一种简单的数据压缩形式,对常见的值使用紧凑的表示方式,然后对不常见的值进行回退到二级表。 40 | [**Example 1**](https://github.com/rust-lang/rust/pull/54420/commits/b2f25e3c38ff29eebe6c8ce69b8c69243faa440d), 41 | [**Example 2**](https://github.com/rust-lang/rust/pull/59693/commits/fd7f605365b27bfdd3cd6763124e81bddd61dd28), 42 | [**Example 3**](https://github.com/rust-lang/rust/pull/65750/commits/eea6f23a0ed67fd8c6b8e1b02cda3628fee56b2f). 43 | 44 | 当代码涉及多种情况时,测量各种情况的频率,并首先处理最常见的情况。 45 | 46 | 在涉及高局部性的查找时,将一个小缓存放在数据结构前面可能会带来好处。 47 | 48 | 优化的代码通常具有非显而易见的结构,这意味着解释性注释非常有价值,特别是那些参考了性能分析数据的注释。例如,“99% 的情况下,这个向量有 0 或 1 个元素,因此首先处理这些情况”这样的注释可以很有启发性。 49 | -------------------------------------------------------------------------------- /src/hashing.md: -------------------------------------------------------------------------------- 1 | # Hashing 2 | 3 | `HashSet` and `HashMap` are two widely-used types. The default hashing 4 | algorithm is not specified, but at the time of writing the default is an 5 | algorithm called [SipHash 1-3]. This algorithm is high quality—it provides high 6 | protection against collisions—but is relatively slow, particularly for short keys 7 | such as integers. 8 | 9 | [SipHash 1-3]: https://en.wikipedia.org/wiki/SipHash 10 | 11 | If profiling shows that hashing is hot, and [HashDoS attacks] are not a concern 12 | for your application, the use of hash tables with faster hash algorithms can 13 | provide large speed wins. 14 | - [`rustc-hash`] provides `FxHashSet` and `FxHashMap` types that are drop-in 15 | replacements for `HashSet` and `HashMap`. Its hashing algorithm is 16 | low-quality but very fast, especially for integer keys, and has been found to 17 | out-perform all other hash algorithms within rustc. ([`fxhash`] is an older, 18 | less well maintained implementation of the same algorithm and types.) 19 | - [`fnv`] provides `FnvHashSet` and `FnvHashMap` types. Its hashing algorithm 20 | is higher quality than `rustc-hash`'s but a little slower. 21 | - [`ahash`] provides `AHashSet` and `AHashMap`. Its hashing algorithm can take 22 | advantage of AES instruction support that is available on some processors. 23 | 24 | [HashDoS attacks]: https://en.wikipedia.org/wiki/Collision_attack 25 | [`rustc-hash`]: https://crates.io/crates/rustc-hash 26 | [`fxhash`]: https://crates.io/crates/fxhash 27 | [`fnv`]: https://crates.io/crates/fnv 28 | [`ahash`]: https://crates.io/crates/ahash 29 | 30 | If hashing performance is important in your program, it is worth trying more 31 | than one of these alternatives. For example, the following results were seen in 32 | rustc. 33 | - The switch from `fnv` to `fxhash` gave [speedups of up to 6%][fnv2fx]. 34 | - An attempt to switch from `fxhash` to `ahash` resulted in [slowdowns of 35 | 1-4%][fx2a]. 36 | - An attempt to switch from `fxhash` back to the default hasher resulted in 37 | [slowdowns ranging from 4-84%][fx2default]! 38 | 39 | [fnv2fx]: https://github.com/rust-lang/rust/pull/37229/commits/00e48affde2d349e3b3bfbd3d0f6afb5d76282a7 40 | [fx2a]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589504301 41 | [fx2default]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589338446 42 | 43 | If you decide to universally use one of the alternatives, such as 44 | `FxHashSet`/`FxHashMap`, it is easy to accidentally use `HashSet`/`HashMap` in 45 | some places. You can [use Clippy] to avoid this problem. 46 | 47 | [use Clippy]: linting.md#disallowing-types 48 | 49 | Some types don't need hashing. For example, you might have a newtype that wraps 50 | an integer and the integer values are random, or close to random. For such a 51 | type, the distribution of the hashed values won't be that different to the 52 | distribution of the values themselves. In this case the [`nohash_hasher`] crate 53 | can be useful. 54 | 55 | [`nohash_hasher`]: https://crates.io/crates/nohash-hasher 56 | 57 | Hash function design is a complex topic and is beyond the scope of this book. 58 | The [`ahash` documentation] has a good discussion. 59 | 60 | [`ahash` documentation]: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md 61 | -------------------------------------------------------------------------------- /src/hashing_zh.md: -------------------------------------------------------------------------------- 1 | # 哈希 2 | 3 | `HashSet` 和 `HashMap` 是两种广泛使用的类型。默认的哈希算法没有指定,但在撰写本文时,默认算法是一种称为 [SipHash 1-3] 的算法。这个算法质量很高——它提供了很高的碰撞保护,但对于短键(如整数)来说相对较慢。 4 | 5 | [SipHash 1-3]: https://en.wikipedia.org/wiki/SipHash 6 | 7 | 如果测试显示hash是关键部分,而[HashDoS attacks]并不是你的应用所关心的问题,那么使用具有更快的散列算法的散列表可以提供很大的速度优势。 8 | - [`rustc-hash`]提供了 "FxHashSet "和 "FxHashMap "类型,它们是 "HashSet "和 "HashMap "的替代物。它的散列算法质量不高,但速度非常快,特别是对整数键而言,并且发现它的性能优于rustc内的所有其他散列算法。 9 | - [`fnv`]提供了`FnvHashSet`和`FnvHashMap`类型。其散列算法比`fxhash`的质量高,但速度稍慢。 10 | - [`ahash`]提供`AHashSet`和`AHashMap`。它的哈希算法可以采取一些处理器上的AES指令支持的优势。 11 | 12 | [HashDoS attacks]: https://en.wikipedia.org/wiki/Collision_attack 13 | [`rustc-hash`]: https://crates.io/crates/rustc-hash 14 | [`fnv`]: https://crates.io/crates/fnv 15 | [`ahash`]: https://crates.io/crates/ahash 16 | 17 | 如果散列性能在你的程序中很重要,那么值得尝试以上几种选择。例如,在rustc中看到以下结果。 18 | - 从 `fnv`切换到 `rustc-hash` 的结果是[速度提高了6%][fnv2fx]。 19 | - 试图从`rustc-hash`切换到`ahash`的结果是[减速1-4%][fx2a]。 20 | - 试图从`rustc-hash`切换回默认的哈希,结果是[速度减慢了4-84%][fx2default]! 21 | 22 | [fnv2fx]: https://github.com/rust-lang/rust/pull/37229/commits/00e48affde2d349e3b3bfbd3d0f6afb5d76282a7 23 | [fx2a]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589504301 24 | [fx2default]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589338446 25 | 26 | 如果您决定普遍使用替代方案之一,比如 `FxHashSet`/`FxHashMap`,很容易在某些地方意外地使用 `HashSet`/`HashMap`。您可以使用 [Clippy] 来避免这个问题。 27 | 28 | 有些类型不需要哈希。例如,您可能有一个包装整数的新类型,而整数值是随机的,或者接近随机的。对于这种类型,哈希值的分布与值本身的分布并没有太大不同。在这种情况下,[`nohash_hasher`] crate 可能会有用。 29 | 30 | 哈希函数设计是一个复杂的主题,超出了本书的范围。[`ahash` 文档] 中有很好的讨论。 31 | 32 | [Clippy]: linting.md#disallowing-types 33 | [`nohash_hasher`]: https://crates.io/crates/nohash-hasher 34 | [`ahash` 文档]: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md 35 | -------------------------------------------------------------------------------- /src/heap-allocations.md: -------------------------------------------------------------------------------- 1 | # Heap Allocations 2 | 3 | Heap allocations are moderately expensive. The exact details depend on which 4 | allocator is in use, but each allocation (and deallocation) typically involves 5 | acquiring a global lock, doing some non-trivial data structure manipulation, 6 | and possibly executing a system call. Small allocations are not necessarily 7 | cheaper than large allocations. It is worth understanding which Rust data 8 | structures and operations cause allocations, because avoiding them can greatly 9 | improve performance. 10 | 11 | The [Rust Container Cheat Sheet] has visualizations of common Rust types, and 12 | is an excellent companion to the following sections. 13 | 14 | [Rust Container Cheat Sheet]: https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/ 15 | 16 | ## Profiling 17 | 18 | If a general-purpose profiler shows `malloc`, `free`, and related functions as 19 | hot, then it is likely worth trying to reduce the allocation rate and/or using 20 | an alternative allocator. 21 | 22 | [DHAT] is an excellent profiler to use when reducing allocation rates. It works 23 | on Linux and some other Unixes. It precisely identifies hot allocation 24 | sites and their allocation rates. Exact results will vary, but experience with 25 | rustc has shown that reducing allocation rates by 10 allocations per million 26 | instructions executed can have measurable performance improvements (e.g. ~1%). 27 | 28 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html 29 | 30 | Here is some example output from DHAT. 31 | ```text 32 | AP 1.1/25 (2 children) { 33 | Total: 54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration) 34 | At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes 35 | At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes 36 | Reads: 15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte 37 | Writes: 20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte 38 | Allocated at { 39 | #1: 0x95CACC9: alloc (alloc.rs:72) 40 | #2: 0x95CACC9: alloc (alloc.rs:148) 41 | #3: 0x95CACC9: reserve_internal (raw_vec.rs:669) 42 | #4: 0x95CACC9: reserve (raw_vec.rs:492) 43 | #5: 0x95CACC9: reserve (vec.rs:460) 44 | #6: 0x95CACC9: push (vec.rs:989) 45 | #7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27) 46 | #8: 0x95CACC9: syntax::parse::lexer::tokentrees::>::parse_token_tree (tokentrees.rs:81) 47 | } 48 | } 49 | ``` 50 | It is beyond the scope of this book to describe everything in this example, but 51 | it should be clear that DHAT gives a wealth of information about allocations, 52 | such as where and how often they happen, how big they are, how long they live 53 | for, and how often they are accessed. 54 | 55 | ## `Box` 56 | 57 | [`Box`] is the simplest heap-allocated type. A `Box` value is a `T` value 58 | that is allocated on the heap. 59 | 60 | [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html 61 | 62 | It is sometimes worth boxing one or more fields in a struct or enum fields to 63 | make a type smaller. (See the [Type Sizes](type-sizes.md) chapter for more 64 | about this.) 65 | 66 | Other than that, `Box` is straightforward and does not offer much scope for 67 | optimizations. 68 | 69 | ## `Rc`/`Arc` 70 | 71 | [`Rc`]/[`Arc`] are similar to `Box`, but the value on the heap is accompanied by 72 | two reference counts. They allow value sharing, which can be an effective way 73 | to reduce memory usage. 74 | 75 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html 76 | [`Arc`]: https://doc.rust-lang.org/std/sync/struct.Arc.html 77 | 78 | However, if used for values that are rarely shared, they can increase allocation 79 | rates by heap allocating values that might otherwise not be heap-allocated. 80 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe). 81 | 82 | Unlike `Box`, calling `clone` on an `Rc`/`Arc` value does not involve an 83 | allocation. Instead, it merely increments a reference count. 84 | 85 | ## `Vec` 86 | 87 | [`Vec`] is a heap-allocated type with a great deal of scope for optimizing the 88 | number of allocations, and/or minimizing the amount of wasted space. To do this 89 | requires understanding how its elements are stored. 90 | 91 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html 92 | 93 | A `Vec` contains three words: a length, a capacity, and a pointer. The pointer 94 | will point to heap-allocated memory if the capacity is nonzero and the element 95 | size is nonzero; otherwise, it will not point to allocated memory. 96 | 97 | Even if the `Vec` itself is not heap-allocated, the elements (if present and 98 | nonzero-sized) always will be. If nonzero-sized elements are present, the 99 | memory holding those elements may be larger than necessary, providing space for 100 | additional future elements. The number of elements present is the length, and 101 | the number of elements that could be held without reallocating is the capacity. 102 | 103 | When the vector needs to grow beyond its current capacity, the elements will be 104 | copied into a larger heap allocation, and the old heap allocation will be 105 | freed. 106 | 107 | ### `Vec` Growth 108 | 109 | A new, empty `Vec` created by the common means 110 | ([`vec![]`](https://doc.rust-lang.org/std/macro.vec.html) 111 | or [`Vec::new`] or [`Vec::default`]) has a length and capacity of zero, and no 112 | heap allocation is required. If you repeatedly push individual elements onto 113 | the end of the `Vec`, it will periodically reallocate. The growth strategy is 114 | not specified, but at the time of writing it uses a quasi-doubling strategy 115 | resulting in the following capacities: 0, 4, 8, 16, 32, 64, and so on. (It 116 | skips directly from 0 to 4, instead of going via 1 and 2, because this [avoids 117 | many allocations] in practice.) As a vector grows, the frequency of 118 | reallocations will decrease exponentially, but the amount of possibly-wasted 119 | excess capacity will increase exponentially. 120 | 121 | [`Vec::new`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.new 122 | [`Vec::default`]: https://doc.rust-lang.org/std/default/trait.Default.html#tymethod.default 123 | [avoids many allocations]: https://github.com/rust-lang/rust/pull/72227 124 | 125 | This growth strategy is typical for growable data structures and reasonable in 126 | the general case, but if you know in advance the likely length of a vector you 127 | can often do better. If you have a hot vector allocation site (e.g. a hot 128 | [`Vec::push`] call), it is worth using [`eprintln!`] to print the vector length 129 | at that site and then doing some post-processing (e.g. with [`counts`]) to 130 | determine the length distribution. For example, you might have many short 131 | vectors, or you might have a smaller number of very long vectors, and the best 132 | way to optimize the allocation site will vary accordingly. 133 | 134 | [`Vec::push`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.push 135 | [`eprintln!`]: https://doc.rust-lang.org/std/macro.eprintln.html 136 | [`counts`]: https://github.com/nnethercote/counts/ 137 | 138 | ### Short `Vec`s 139 | 140 | If you have many short vectors, you can use the `SmallVec` type from the 141 | [`smallvec`] crate. `SmallVec<[T; N]>` is a drop-in replacement for `Vec` that 142 | can store `N` elements within the `SmallVec` itself, and then switches to a 143 | heap allocation if the number of elements exceeds that. (Note also that 144 | `vec![]` literals must be replaced with `smallvec![]` literals.) 145 | [**Example 1**](https://github.com/rust-lang/rust/pull/50565/commits/78262e700dc6a7b57e376742f344e80115d2d3f2), 146 | [**Example 2**](https://github.com/rust-lang/rust/pull/55383/commits/526dc1421b48e3ee8357d58d997e7a0f4bb26915). 147 | 148 | [`smallvec`]: https://crates.io/crates/smallvec 149 | 150 | `SmallVec` reliably reduces the allocation rate when used appropriately, but 151 | its use does not guarantee improved performance. It is slightly slower than 152 | `Vec` for normal operations because it must always check if the elements are 153 | heap-allocated or not. Also, If `N` is high or `T` is large, then the 154 | `SmallVec<[T; N]>` itself can be larger than `Vec`, and copying of 155 | `SmallVec` values will be slower. As always, benchmarking is required to 156 | confirm that an optimization is effective. 157 | 158 | If you have many short vectors *and* you precisely know their maximum length, 159 | `ArrayVec` from the [`arrayvec`] crate is a better choice than `SmallVec`. It 160 | does not require the fallback to heap allocation, which makes it a little 161 | faster. 162 | [**Example**](https://github.com/rust-lang/rust/pull/74310/commits/c492ca40a288d8a85353ba112c4d38fe87ef453e). 163 | 164 | [`arrayvec`]: https://crates.io/crates/arrayvec 165 | 166 | ### Longer `Vec`s 167 | 168 | If you know the minimum or exact size of a vector, you can reserve a specific 169 | capacity with [`Vec::with_capacity`], [`Vec::reserve`], or 170 | [`Vec::reserve_exact`]. For example, if you know a vector will grow to have at 171 | least 20 elements, these functions can immediately provide a vector with a 172 | capacity of at least 20 using a single allocation, whereas pushing the items 173 | one at a time would result in four allocations (for capacities of 4, 8, 16, and 174 | 32). 175 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/a7f2bb634308a5f05f2af716482b67ba43701681). 176 | 177 | [`Vec::with_capacity`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity 178 | [`Vec::reserve`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve 179 | [`Vec::reserve_exact`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve_exact 180 | 181 | If you know the maximum length of a vector, the above functions also let you 182 | not allocate excess space unnecessarily. Similarly, [`Vec::shrink_to_fit`] can be 183 | used to minimize wasted space, but note that it may cause a reallocation. 184 | 185 | [`Vec::shrink_to_fit`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit 186 | 187 | ## `String` 188 | 189 | A [`String`] contains heap-allocated bytes. The representation and operation of 190 | `String` are very similar to that of `Vec`. Many `Vec` methods relating to 191 | growth and capacity have equivalents for `String`, such as 192 | [`String::with_capacity`]. 193 | 194 | [`String`]: https://doc.rust-lang.org/std/string/struct.String.html 195 | [`String::with_capacity`]: https://doc.rust-lang.org/std/string/struct.String.html#method.with_capacity 196 | 197 | The `SmallString` type from the [`smallstr`] crate is similar to the `SmallVec` 198 | type. 199 | 200 | [`smallstr`]: https://crates.io/crates/smallstr 201 | 202 | The `String` type from the [`smartstring`] crate is a drop-in replacement for 203 | `String` that avoids heap allocations for strings with less than three words' 204 | worth of characters. On 64-bit platforms, this is any string that is less than 205 | 24 bytes, which includes all strings containing 23 or fewer ASCII characters. 206 | [**Example**](https://github.com/djc/topfew-rs/commit/803fd566e9b889b7ba452a2a294a3e4df76e6c4c). 207 | 208 | [`smartstring`]: https://crates.io/crates/smartstring 209 | 210 | Note that the `format!` macro produces a `String`, which means it performs an 211 | allocation. If you can avoid a `format!` call by using a string literal, that 212 | will avoid this allocation. 213 | [**Example**](https://github.com/rust-lang/rust/pull/55905/commits/c6862992d947331cd6556f765f6efbde0a709cf9). 214 | [`std::format_args`] and/or the [`lazy_format`] crate may help with this. 215 | 216 | [`std::format_args`]: https://doc.rust-lang.org/std/macro.format_args.html 217 | [`lazy_format`]: https://crates.io/crates/lazy_format 218 | 219 | ## Hash Tables 220 | 221 | [`HashSet`] and [`HashMap`] are hash tables. Their representation and 222 | operations are similar to those of `Vec`, in terms of allocations: they have 223 | a single contiguous heap allocation, holding keys and values, which is 224 | reallocated as necessary as the table grows. Many `Vec` methods relating to 225 | growth and capacity have equivalents for `HashSet`/`HashMap`, such as 226 | [`HashSet::with_capacity`]. 227 | 228 | [`HashSet`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html 229 | [`HashMap`]: https://doc.rust-lang.org/std/collections/struct.HashMap.html 230 | [`HashSet::with_capacity`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.with_capacity 231 | 232 | ## `clone` 233 | 234 | Calling [`clone`] on a value that contains heap-allocated memory typically 235 | involves additional allocations. For example, calling `clone` on a non-empty 236 | `Vec` requires a new allocation for the elements (but note that the capacity of 237 | the new `Vec` might not be the same as the capacity of the original `Vec`). The 238 | exception is `Rc`/`Arc`, where a `clone` call just increments the reference 239 | count. 240 | 241 | [`clone`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone 242 | 243 | [`clone_from`] is an alternative to `clone`. `a.clone_from(&b)` is equivalent 244 | to `a = b.clone()` but may avoid unnecessary allocations. For example, if you 245 | want to clone one `Vec` over the top of an existing `Vec`, the existing `Vec`'s 246 | heap allocation will be reused if possible, as the following example shows. 247 | ```rust 248 | let mut v1: Vec = Vec::with_capacity(99); 249 | let v2: Vec = vec![1, 2, 3]; 250 | v1.clone_from(&v2); // v1's allocation is reused 251 | assert_eq!(v1.capacity(), 99); 252 | ``` 253 | Although `clone` usually causes allocations, it is a reasonable thing to use in 254 | many circumstances and can often make code simpler. Use profiling data to see 255 | which `clone` calls are hot and worth taking the effort to avoid. 256 | 257 | [`clone_from`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#method.clone_from 258 | 259 | Sometimes Rust code ends up containing unnecessary `clone` calls, due to (a) 260 | programmer error, or (b) changes in the code that render previously-necessary 261 | `clone` calls unnecessary. If you see a hot `clone` call that does not seem 262 | necessary, sometimes it can simply be removed. 263 | [**Example 1**](https://github.com/rust-lang/rust/pull/37318/commits/e382267cfb9133ef12d59b66a2935ee45b546a61), 264 | [**Example 2**](https://github.com/rust-lang/rust/pull/37705/commits/11c1126688bab32f76dbe1a973906c7586da143f), 265 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/36b37e22de92b584b9cf4464ed1d4ad317b798be). 266 | 267 | ## `to_owned` 268 | 269 | [`ToOwned::to_owned`] is implemented for many common types. It creates owned 270 | data from borrowed data, usually by cloning, and therefore often causes heap 271 | allocations. For example, it can be used to create a `String` from a `&str`. 272 | 273 | [`ToOwned::to_owned`]: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html#tymethod.to_owned 274 | 275 | Sometimes `to_owned` calls (and related calls such as `clone` and `to_string`) 276 | can be avoided by storing a reference to borrowed data in a struct rather than 277 | an owned copy. This requires lifetime annotations on the struct, complicating 278 | the code, and should only be done when profiling and benchmarking shows that it 279 | is worthwhile. 280 | [**Example**](https://github.com/rust-lang/rust/pull/50855/commits/6872377357dbbf373cfd2aae352cb74cfcc66f34). 281 | 282 | ## `Cow` 283 | 284 | Sometimes code deals with a mixture of borrowed and owned data. Imagine a 285 | vector of error messages, some of which are static string literals and some of 286 | which are constructed with `format!`. The obvious representation is 287 | `Vec`, as the following example shows. 288 | ```rust 289 | let mut errors: Vec = vec![]; 290 | errors.push("something went wrong".to_string()); 291 | errors.push(format!("something went wrong on line {}", 100)); 292 | ``` 293 | That requires a `to_string` call to promote the static string literal to a 294 | `String`, which incurs an allocation. 295 | 296 | Instead you can use the [`Cow`] type, which can hold either borrowed or owned 297 | data. A borrowed value `x` is wrapped with `Cow::Borrowed(x)`, and an owned 298 | value `y` is wrapped with `Cow::Owned(y)`. `Cow` also implements the `From` 299 | trait for various string, slice, and path types, so you can usually use `into` 300 | as well. (Or `Cow::from`, which is longer but results in more readable code, 301 | because it makes the type clearer.) The following example puts all this together. 302 | 303 | [`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html 304 | 305 | ```rust 306 | use std::borrow::Cow; 307 | let mut errors: Vec> = vec![]; 308 | errors.push(Cow::Borrowed("something went wrong")); 309 | errors.push(Cow::Owned(format!("something went wrong on line {}", 100))); 310 | errors.push(Cow::from("something else went wrong")); 311 | errors.push(format!("something else went wrong on line {}", 101).into()); 312 | ``` 313 | `errors` now holds a mixture of borrowed and owned data without requiring any 314 | extra allocations. This example involves `&str`/`String`, but other pairings 315 | such as `&[T]`/`Vec` and `&Path`/`PathBuf` are also possible. 316 | 317 | [**Example 1**](https://github.com/rust-lang/rust/pull/37064/commits/b043e11de2eb2c60f7bfec5e15960f537b229e20), 318 | [**Example 2**](https://github.com/rust-lang/rust/pull/56336/commits/787959c20d062d396b97a5566e0a766d963af022). 319 | 320 | All of the above applies if the data is immutable. But `Cow` also allows 321 | borrowed data to be promoted to owned data if it needs to be mutated. 322 | [`Cow::to_mut`] will obtain a mutable reference to an owned value, cloning if 323 | necessary. This is called "clone-on-write", which is where the name `Cow` comes 324 | from. 325 | 326 | [`Deref`]: https://doc.rust-lang.org/std/ops/trait.Deref.html 327 | [`Cow::to_mut`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html#method.to_mut 328 | 329 | This clone-on-write behaviour is useful when you have some borrowed data, such 330 | as a `&str`, that is mostly read-only but occasionally needs to be modified. 331 | 332 | [**Example 1**](https://github.com/rust-lang/rust/pull/50855/commits/ad471452ba6fbbf91ad566dc4bdf1033a7281811), 333 | [**Example 2**](https://github.com/rust-lang/rust/pull/68848/commits/67da45f5084f98eeb20cc6022d68788510dc832a). 334 | 335 | Finally, because `Cow` implements [`Deref`], you can call methods directly on 336 | the data it encloses. 337 | 338 | `Cow` can be fiddly to get working, but it is often worth the effort. 339 | 340 | ## Reusing Collections 341 | 342 | Sometimes you need to build up a collection such as a `Vec` in stages. It is 343 | usually better to do this by modifying a single `Vec` than by building multiple 344 | `Vec`s and then combining them. 345 | 346 | For example, if you have a function `do_stuff` that produces a `Vec` that might 347 | be called multiple times: 348 | ```rust 349 | fn do_stuff(x: u32, y: u32) -> Vec { 350 | vec![x, y] 351 | } 352 | ``` 353 | It might be better to instead modify a passed-in `Vec`: 354 | ```rust 355 | fn do_stuff(x: u32, y: u32, vec: &mut Vec) { 356 | vec.push(x); 357 | vec.push(y); 358 | } 359 | ``` 360 | Sometimes it is worth keeping around a "workhorse" collection that can be 361 | reused. For example, if a `Vec` is needed for each iteration of a loop, you 362 | could declare the `Vec` outside the loop, use it within the loop body, and then 363 | call [`clear`] at the end of the loop body (to empty the `Vec` without affecting 364 | its capacity). This avoids allocations at the cost of obscuring the fact that 365 | each iteration's usage of the `Vec` is unrelated to the others. 366 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323), 367 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7). 368 | 369 | [`clear`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.clear 370 | 371 | Similarly, it is sometimes worth keeping a workhorse collection within a 372 | struct, to be reused in one or more methods that are called repeatedly. 373 | 374 | ## Reading Lines from a File 375 | 376 | [`BufRead::lines`] makes it easy to read a file one line at a time: 377 | ```rust 378 | # fn blah() -> Result<(), std::io::Error> { 379 | # fn process(_: &str) {} 380 | use std::io::{self, BufRead}; 381 | let mut lock = io::stdin().lock(); 382 | for line in lock.lines() { 383 | process(&line?); 384 | } 385 | # Ok(()) 386 | # } 387 | ``` 388 | But the iterator it produces returns `io::Result`, which means it 389 | allocates for every line in the file. 390 | 391 | [`BufRead::lines`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.lines 392 | 393 | An alternative is to use a workhorse `String` in a loop over 394 | [`BufRead::read_line`]: 395 | ```rust 396 | # fn blah() -> Result<(), std::io::Error> { 397 | # fn process(_: &str) {} 398 | use std::io::{self, BufRead}; 399 | let mut lock = io::stdin().lock(); 400 | let mut line = String::new(); 401 | while lock.read_line(&mut line)? != 0 { 402 | process(&line); 403 | line.clear(); 404 | } 405 | # Ok(()) 406 | # } 407 | ``` 408 | This reduces the number of allocations to at most a handful, and possibly just 409 | one. (The exact number depends on how many times `line` needs to be 410 | reallocated, which depends on the distribution of line lengths in the file.) 411 | 412 | This will only work if the loop body can operate on a `&str`, rather than a 413 | `String`. 414 | 415 | [`BufRead::read_line`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.read_line 416 | 417 | [**Example**](https://github.com/nnethercote/counts/commit/7d39bbb1867720ef3b9799fee739cd717ad1539a). 418 | 419 | ## Using an Alternative Allocator 420 | 421 | It is also possible to improve heap allocation performance without changing 422 | your code, simply by using a different allocator. See the [Alternative 423 | Allocators] section for details. 424 | 425 | [Alternative Allocators]: build-configuration.md#alternative-allocators 426 | 427 | ## Avoiding Regressions 428 | 429 | To ensure the number and/or size of allocations done by your code doesn't 430 | increase unintentionally, you can use the *heap usage testing* feature of 431 | [dhat-rs] to write tests that check particular code snippets allocate the 432 | expected amount of heap memory. 433 | 434 | [dhat-rs]: https://crates.io/crates/dhat 435 | -------------------------------------------------------------------------------- /src/heap-allocations_zh.md: -------------------------------------------------------------------------------- 1 | # 堆分配 2 | 3 | 堆分配的代价不高。具体细节取决于使用的分配器,但每次分配和deallocation通常都需要获取一个全局锁,做一些非平凡的数据结构操作。并可能执行一个系统调用。小分配不一定比大分配便宜。值得了解哪些Rust数据结构和操作会导致分配,因为避免它们可以大大提高性能。 4 | 5 | [Rust Container Cheat Sheet]有常见的Rust类型的可视化,是下面章节的绝佳配套。 6 | 7 | [Rust Container Cheat Sheet]: https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/ 8 | 9 | ## Profiling 10 | 11 | 如果通用profile解析器显示 "malloc"、"free"和相关函数为热函数,那么很可能值得尝试降低分配率或使用其他分配器。 12 | 13 | [DHAT]是降低分配率时可以使用的一个优秀的剖析器。它能精确地识别热分配点及其分配率。确切的结果会有所不同,但使用rustc的经验表明,每执行一百万条指令减少10条分配率可以有可衡量的性能改进(例如~1%)。 14 | 15 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html 16 | 17 | 下面是DHAT的一些输出示例。 18 | ```text 19 | AP 1.1/25 (2 children) { 20 | Total: 54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration) 21 | At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes 22 | At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes 23 | Reads: 15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte 24 | Writes: 20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte 25 | Allocated at { 26 | #1: 0x95CACC9: alloc (alloc.rs:72) 27 | #2: 0x95CACC9: alloc (alloc.rs:148) 28 | #3: 0x95CACC9: reserve_internal (raw_vec.rs:669) 29 | #4: 0x95CACC9: reserve (raw_vec.rs:492) 30 | #5: 0x95CACC9: reserve (vec.rs:460) 31 | #6: 0x95CACC9: push (vec.rs:989) 32 | #7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27) 33 | #8: 0x95CACC9: syntax::parse::lexer::tokentrees::>::parse_token_tree (tokentrees.rs:81) 34 | } 35 | } 36 | ``` 37 | 在这个例子中,描述所有的内容已经超出了本书的范围,但应该清楚的是,DHAT给出了大量关于分配的信息,比如它们发生的地点和频率,它们的规模有多大,它们的寿命有多长,以及它们被访问的频率。 38 | 39 | ## `Box` 40 | 41 | [`Box`]是最简单的堆分配类型。`Box`值是一个在堆上分配的`T`值。 42 | 43 | [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html 44 | 45 | 有时值得将结构体或枚举字段中的一个或多个字段装箱,以使类型更小。(更多信息请参见 [Type Sizes](type-sizes.md) 一章)。 46 | 47 | 除此之外,`Box`是直接的,并没有提供太多优化的空间。 48 | 49 | ## `Rc`/`Arc` 50 | 51 | [`Rc`]/[`Arc`]类似于`Box`,但堆上的值有两个引用计数。它们允许值共享,这可以是减少内存使用的有效方法。 52 | 53 | [`Rc`]:https://doc.rust-lang.org/std/rc/struct.Rc.html 54 | [Arc]:https://doc.rust-lang.org/std/sync/struct.Arc.html 55 | 56 | 但是,如果用于很少共享的值,它们可以通过堆分配本来可能不会被堆分配的值来提高分配率。 57 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe). 58 | 59 | 与`Box`不同的是,在`Rc`/`Arc`值上调用`clone`并不涉及分配。相反,它只是增加一个引用计数。 60 | 61 | ## `Vec` 62 | 63 | [`Vec`]是一种堆分配类型,在优化分配数量和/或尽量减少浪费的空间方面有很大的空间。要做到这一点,需要了解其元素的存储方式。 64 | 65 | [`Vec`]:https://doc.rust-lang.org/std/vec/struct.Vec.html 66 | 67 | 一个 "Vec "包含三个词:一个长度、一个容量和一个指针。如果容量是非零,元素大小是非零,指针将指向堆分配的内存;否则,它将不指向分配的内存。 68 | 69 | 即使 "Vec "本身不是堆分配的,元素(如果存在且大小非零)也会是堆分配的。如果存在非零大小的元素,那么存放这些元素的内存可能会比必要的大,为未来的元素提供空间。存在的元素数就是长度,不需要重新分配就可以容纳的元素数就是容量。 70 | 71 | 当向量需要增长到超过其当前容量时,元素将被复制到一个更大的堆分配中,旧的堆分配将被释放。 72 | 73 | ### `Vec` growth 74 | 75 | 用普通方法创建一个新的、空的`Vec`。 76 | ([`vec![]`](https://doc.rust-lang.org/std/macro.vec.html) 或 [`Vec::new`] 或 [`Vec::default`])的长度和容量为零,不需要进行堆分配。如果你反复将单个元素推到`Vec`的末端,它将周期性地重新分配。增长策略没有被指定,但在写这篇文章的时候,它使用了一个准双倍策略,结果是以下容量。0, 4, 8, 16, 32, 64, 等等. (它直接从0跳到4,而不是通过1和2,因为这在实践中[避免了许多分配]。) 随着向量的增长,重新分配的频率将以指数形式减少,但可能浪费的多余容量将以指数形式增加。 77 | 78 | [`Vec::new`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.new 79 | [`Vec::default`]: https://doc.rust-lang.org/std/default/trait.Default.html#tymethod.default 80 | [避免了许多分配]: https://github.com/rust-lang/rust/pull/72227 81 | 82 | 这种增长策略对于可增长的数据结构来说是典型的,在一般情况下是合理的,但如果你事先知道一个向量的可能长度,你可以做的往往更好。如果你有一个热向量分配站点(例如一个热的 [`Vec::push`]调用),值得使用[`eprintln!`]来打印该站点的向量长度,然后做一些后处理(例如使用[`counts`])来确定长度分布。例如,你可能有很多短向量,也可能有较少的超长向量,优化分配站点的最佳方式也会相应变化。 83 | 84 | [`Vec::push`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.push 85 | [`eprintln!`]: https://doc.rust-lang.org/std/macro.eprintln.html 86 | [`counts`]: https://github.com/nnethercote/counts/ 87 | 88 | ### Short `Vec`s 89 | 90 | 如果你有很多短向量,你可以使用[`smallvec`]crate中的`SmallVec`类型。`SmallVec<[T;N]>`是`Vec`的替代物,它可以在`SmallVec`本身中存储`N`个元素,如果元素数量超过这个数量,就会切换到堆分配。(还需要注意的是,`vec![]`字元必须用`smallvec![]`字元代替。) 91 | [**Example 1**](https://github.com/rust-lang/rust/pull/50565/commits/78262e700dc6a7b57e376742f344e80115d2d3f2), 92 | [**Example 2**](https://github.com/rust-lang/rust/pull/55383/commits/526dc1421b48e3ee8357d58d997e7a0f4bb26915). 93 | 94 | [`smallvec`]: https://crates.io/crates/smallvec 95 | 96 | `SmallVec`如果使用得当,可以可靠地降低分配率,但使用它并不能保证提高性能。对于正常的操作,它比`Vec`稍慢,因为它必须总是检查元素是否被堆分配。另外,如果`N`很高或者`T`很大,那么`SmallVec<[T; N]>`本身就会比`Vec`大,复制`SmallVec`值的速度会比较慢。和以往一样,需要通过基准测试来确认优化是否有效。 97 | 98 | 如果你有很多短向量,并且你精确地知道它们的最大长度,[arrayvec]箱子中的ArrayVec比SmallVec更好。它不需要回落到堆分配,这使得它更快一些。 99 | [**Example**](https://github.com/rust-lang/rust/pull/74310/commits/c492ca40a288d8a85353ba112c4d38fe87ef453e). 100 | 101 | [`arrayvec`]: https://crates.io/crates/arrayvec 102 | 103 | ### Longer `Vec`s 104 | 105 | 如果你知道一个向量的最小或精确大小,你可以用[`Vec::with_capacity`]、[`Vec::reserve`]或[`Vec::reserve_exact`]来保留一个特定的容量。例如,如果你知道一个向量将成长为至少有20个元素,这些函数可以使用一次分配立即提供一个至少有20个容量的向量,而一次推送一个项目将导致四次分配(对于4、8、16和32的容量)。 106 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/a7f2bb634308a5f05f2af716482b67ba43701681). 107 | 108 | [`Vec::with_capacity`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity 109 | [`Vec::reserve`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve 110 | [`Vec::reserve_exact`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve_exact 111 | 112 | 如果你知道一个向量的最大长度,上述函数也可以让你不分配多余的不必要的空间。同样,[`Vec::shrink_to_fit`]也可以用来尽量减少浪费的空间,但要注意它可能会引起重新分配。 113 | 114 | [`Vec::shrink_to_fit`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit 115 | 116 | ## `String` 117 | 118 | 一个[`String`]包含堆分配的字节。`String`的表示和操作与`Vec`非常相似。许多与增长和容量有关的`Vec`方法与`String`有对应关系,如[`String::with_capacity`]。 119 | 120 | [`String`]: https://doc.rust-lang.org/std/string/struct.String.html 121 | [`String::with_capacity`]: https://doc.rust-lang.org/std/string/struct.String.html#method.with_capacity 122 | 123 | 来自[`smallstr`]箱子的`SmallString`类型与`SmallVec`类型类似。 124 | 125 | [`smallstr`]: https://crates.io/crates/smallstr 126 | 127 | 请注意,`format!`宏产生一个`String`,这意味着它进行了分配。如果你能通过使用字符串文字来避免`format!`的调用,就能避免这种分配。 128 | [**Example**](https://github.com/rust-lang/rust/pull/55905/commits/c6862992d947331cd6556f765f6efbde0a709cf9). 129 | 130 | ## Hash tables 131 | 132 | [`HashSet`]和[`HashMap`]是哈希表。在分配方面,它们的表示和操作与`Vec`的表示和操作相似:它们有一个单一的连续的堆分配,存放键和值,随着表的增长,必要时重新分配。许多与增长和容量有关的`Vec`方法都有与`HashSet`/`HashMap`对应的方法,如[`HashSet::with_capacity`]。 133 | 134 | [`HashSet`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html 135 | [`HashMap`]: https://doc.rust-lang.org/std/collections/struct.HashMap.html 136 | [`HashSet::with_capacity`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.with_capacity 137 | 138 | ## `Cow` 139 | 140 | 有时候你有一些借来的数据,比如`&str`,大部分是只读的,但偶尔需要修改。每次都克隆数据会很浪费。相反,你可以通过[`Cow`]类型使用 "write-on-clone "语义,它既可以表示借来的数据,也可以表示拥有的数据。 141 | 142 | [`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html 143 | 144 | 通常情况下,当从一个借来的值`x`开始时,你用`Cow::Borrowed(x)`把它包在一个`Cow`中。因为`Cow`实现了[`Deref`],所以你可以直接在它所包含的数据上调用非修改的方法。如果需要修改,[`Cow::to_mut`]将获得一个对所拥有的值的可修改引用,必要时进行克隆。 145 | 146 | [`Deref`]: https://doc.rust-lang.org/std/ops/trait.Deref.html 147 | [`Cow::to_mut`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html#method.to_mut 148 | 149 | `Cow`的工作可能会很麻烦,但它往往是值得的。 150 | [**Example 1**](https://github.com/rust-lang/rust/pull/37064/commits/b043e11de2eb2c60f7bfec5e15960f537b229e20), 151 | [**Example 2**](https://github.com/rust-lang/rust/pull/50855/commits/ad471452ba6fbbf91ad566dc4bdf1033a7281811), 152 | [**Example 3**](https://github.com/rust-lang/rust/pull/56336/commits/787959c20d062d396b97a5566e0a766d963af022), 153 | [**Example 4**](https://github.com/rust-lang/rust/pull/68848/commits/67da45f5084f98eeb20cc6022d68788510dc832a). 154 | 155 | ## `clone` 156 | 157 | 在一个包含堆分配内存的值上调用[clone]通常会涉及额外的分配。例如,在一个非空的`Vec`上调用`clone`,需要对元素进行新的分配(但请注意,新`Vec`的容量可能与原`Vec`的容量不同)。例外的情况是`Rc`/`Arc`,`clone`的调用只是增加引用数。 158 | 159 | [`clone`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone 160 | 161 | [`clone_from`]是`clone`的替代方法。`a.clone_from(&b)`相当于`a = b.clone()`,但可以避免不必要的分配。例如,如果你想在一个现有的`Vec`之上克隆一个`Vec`,现有`Vec`的堆分配将尽可能地被重复使用,如下例所示。 162 | ```rust 163 | let mut v1: Vec = Vec::with_capacity(99); 164 | let v2: Vec = vec![1, 2, 3]; 165 | v1.clone_from(&v2); // v1's allocation is reused 166 | assert_eq!(v1.capacity(), 99); 167 | ``` 168 | 虽然`clone`通常会造成分配,但在很多情况下使用它是一件很合理的事情,往往可以使代码更简单。使用剖析数据来查看哪些`clone`调用是热门的,值得花力气去避免。 169 | 170 | [`clone_from`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#method.clone_from 171 | 172 | 有时,由于(a)程序员的错误,或(b)代码中的变化,使以前必要的`clone`调用变得不必要,Rust代码最终会包含不必要的`clone`调用。如果你看到一个似乎没有必要的热`clone`调用,有时可以简单地删除它。 173 | [**Example 1**](https://github.com/rust-lang/rust/pull/37318/commits/e382267cfb9133ef12d59b66a2935ee45b546a61), 174 | [**Example 2**](https://github.com/rust-lang/rust/pull/37705/commits/11c1126688bab32f76dbe1a973906c7586da143f), 175 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/36b37e22de92b584b9cf4464ed1d4ad317b798be). 176 | 177 | ## `to_owned` 178 | 179 | `ToOwned::to_owned`]是为许多常见类型实现的。它从借来的数据中创建拥有的数据,通常是通过克隆的方式,因此经常会引起堆分配。例如,它可以用来从一个`&str`创建一个`String`。 180 | 181 | [`ToOwned::to_owned`]: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html#tymethod.to_owned 182 | 183 | 有时,可以通过在结构中存储对借入数据的引用而不是自有副本来避免`to_owned`调用。这需要在结构体上做终身注解,使代码复杂化,只有在分析和基准测试表明值得时才可以这样做。 184 | [**Example**](https://github.com/rust-lang/rust/pull/50855/commits/6872377357dbbf373cfd2aae352cb74cfcc66f34). 185 | 186 | ## `Cow` 187 | 188 | 有时代码处理的是借用和拥有数据的混合。想象一下一个包含错误消息的向量,其中一些是静态字符串字面量,另一些是用 `format!` 构造的。显而易见的表示是 `Vec`,如下例所示。 189 | ```rust 190 | let mut errors: Vec = vec![]; 191 | errors.push("something went wrong".to_string()); 192 | errors.push(format!("something went wrong on line {}", 100)); 193 | ``` 194 | 这需要一个 `to_string` 调用来将静态字符串字面量提升为 `String`,这会产生一次分配。 195 | 196 | 相反,您可以使用 [`Cow`] 类型,它可以保存借用或拥有的数据。借用值 `x` 被包装在 `Cow::Borrowed(x)` 中,拥有值 `y` 被包装在 `Cow::Owned(y)` 中。`Cow` 还为各种字符串、切片和路径类型实现了 `From` trait,因此通常也可以使用 `into`。 (或者 `Cow::from`,这更长一些,但会产生更易读的代码,因为它使类型更清晰。)以下示例将所有这些内容整合在一起。 197 | 198 | [`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html 199 | 200 | ```rust 201 | use std::borrow::Cow; 202 | let mut errors: Vec> = vec![]; 203 | errors.push(Cow::Borrowed("something went wrong")); 204 | errors.push(Cow::Owned(format!("something went wrong on line {}", 100))); 205 | errors.push(Cow::from("something else went wrong")); 206 | errors.push(format!("something else went wrong on line {}", 101).into()); 207 | ``` 208 | 209 | 现在,`errors` 包含了借用和拥有数据的混合,而无需进行任何额外的分配。这个示例涉及 `&str`/`String`,但其他配对,如 `&[T]`/`Vec` 和 `&Path`/`PathBuf` 也是可能的。 210 | 211 | 212 | [**Example 1**](https://github.com/rust-lang/rust/pull/37064/commits/b043e11de2eb2c60f7bfec5e15960f537b229e20), 213 | [**Example 2**](https://github.com/rust-lang/rust/pull/56336/commits/787959c20d062d396b97a5566e0a766d963af022). 214 | 215 | 以上所有内容适用于数据是不可变的情况。但是,`Cow` 也允许将借用数据提升为拥有数据,如果需要对其进行修改。[`Cow::to_mut`] 将获得一个可变引用到一个拥有值,必要时进行克隆。这就是所谓的“写时复制”,也是 `Cow` 名称的由来。 216 | 217 | ## Reusing Collections 218 | 219 | 有时你需要分阶段建立一个集合,如`Vec`。通常情况下,通过修改一个`Vec`比建立多个`Vec`然后将它们组合起来更好。 220 | 221 | 例如,如果你有一个函数 "do_stuff",产生一个 "Vec",可能会被多次调用。 222 | ```rust 223 | fn do_stuff(x: u32, y: u32) -> Vec { 224 | vec![x, y] 225 | } 226 | ``` 227 | It might be better to instead modify a passed-in `Vec`: 228 | ```rust 229 | fn do_stuff(x: u32, y: u32, vec: &mut Vec) { 230 | vec.push(x); 231 | vec.push(y); 232 | } 233 | ``` 234 | 有时,值得保留一个可以重复使用的 "主力"集合。例如,如果一个循环的每次迭代都需要一个`Vec`,你可以在循环外声明`Vec`,在循环体中使用它,然后在循环体结束时调用[`clear`](清空`Vec`而不影响它的容量)。这避免了分配,但代价是掩盖了一个事实,即每次迭代对`Vec`的使用与其他迭代无关。 235 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323), 236 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7). 237 | 238 | [`clear`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.clear 239 | 240 | 同样,有时值得在一个结构中保留一个 "主力 "集合,以便在一个或多个被重复调用的方法中重用。 241 | 242 | ## 从文件中逐行读取 243 | 244 | [`BufRead::lines`] 使得逐行读取文件变得很容易: 245 | ```rust 246 | # fn blah() -> Result<(), std::io::Error> { 247 | # fn process(_: &str) {} 248 | use std::io::{self, BufRead}; 249 | let mut lock = io::stdin().lock(); 250 | for line in lock.lines() { 251 | process(&line?); 252 | } 253 | # Ok(()) 254 | # } 255 | ``` 256 | 但它生成的迭代器返回的是 `io::Result`,这意味着它为文件中的每一行分配内存。 257 | 258 | [`BufRead::lines`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.lines 259 | 260 | 另一种方法是在循环中使用一个工作用的 `String`,通过 [`BufRead::read_line`]: 261 | ```rust 262 | # fn blah() -> Result<(), std::io::Error> { 263 | # fn process(_: &str) {} 264 | use std::io::{self, BufRead}; 265 | let mut lock = io::stdin().lock(); 266 | let mut line = String::new(); 267 | while lock.read_line(&mut line)? != 0 { 268 | process(&line); 269 | line.clear(); 270 | } 271 | # Ok(()) 272 | # } 273 | ``` 274 | 这样可以将分配的次数降至最多几次,甚至可能只有一次。(确切的次数取决于需要多少次重新分配 `line`,这取决于文件中行长度的分布情况。) 275 | 276 | 这种方法只适用于循环体能够操作 `&str` 而不是 `String` 的情况。 277 | 278 | [`BufRead::read_line`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.read_line 279 | 280 | [**Example**](https://github.com/nnethercote/counts/commit/7d39bbb1867720ef3b9799fee739cd717ad1539a). 281 | 282 | ## 使用不同的分配器 283 | 284 | 除了更改代码外,还可以通过使用不同的分配器来改善堆分配性能。请参阅 [Alternative Allocators] 部分获取详细信息。 285 | 286 | [Alternative Allocators]: build-configuration.md#alternative-allocators_zh 287 | 288 | ## 避免回归 289 | 290 | 为了确保代码执行时的分配次数和/或大小不会意外增加,您可以使用 [dhat-rs] 的 *堆使用测试* 功能编写测试,检查特定代码段分配了预期量的堆内存。 291 | 292 | [dhat-rs]: https://crates.io/crates/dhat -------------------------------------------------------------------------------- /src/inlining.md: -------------------------------------------------------------------------------- 1 | # Inlining 2 | 3 | Entry to and exit from hot, uninlined functions often accounts for a 4 | non-trivial fraction of execution time. Inlining these functions can provide 5 | small but easy speed wins. 6 | 7 | There are four inline attributes that can be used on Rust functions. 8 | - **None**. The compiler will decide itself if the function should be inlined. 9 | This will depend on factors such as the optimization level and the size of 10 | the function. Non-generic functions will never be inlined across crate 11 | boundaries unless link-time optimization is used; generic functions might be. 12 | - **`#[inline]`**. This suggests that the function should be inlined, including 13 | across crate boundaries. 14 | - **`#[inline(always)]`**. This strongly suggests that the function should be 15 | inlined, including across crate boundaries. 16 | - **`#[inline(never)]`**. This strongly suggests that the function should not 17 | be inlined. 18 | 19 | Inline attributes do not guarantee that a function is inlined or not inlined, 20 | but in practice `#[inline(always)]` will cause inlining in all but the most 21 | exceptional cases. 22 | 23 | Inlining is non-transitive. If a function `f` calls a function `g` and you want 24 | both functions to be inlined together at a callsite to `f`, both functions 25 | should be marked with an inline attribute. 26 | 27 | ## Simple Cases 28 | 29 | The best candidates for inlining are (a) functions that are very small, or (b) 30 | functions that have a single call site. The compiler will often inline these 31 | functions itself even without an inline attribute. But the compiler cannot 32 | always make the best choices, so attributes are sometimes needed. 33 | [**Example 1**](https://github.com/rust-lang/rust/pull/37083/commits/6a4bb35b70862f33ac2491ffe6c55fb210c8490d), 34 | [**Example 2**](https://github.com/rust-lang/rust/pull/50407/commits/e740b97be699c9445b8a1a7af6348ca2d4c460ce), 35 | [**Example 3**](https://github.com/rust-lang/rust/pull/50564/commits/77c40f8c6f8cc472f6438f7724d60bf3b7718a0c), 36 | [**Example 4**](https://github.com/rust-lang/rust/pull/57719/commits/92fd6f9d30d0b6b4ecbcf01534809fb66393f139), 37 | [**Example 5**](https://github.com/rust-lang/rust/pull/69256/commits/e761f3af904b3c275bdebc73bb29ffc45384945d). 38 | 39 | Cachegrind is a good profiler for determining if a function is inlined. When 40 | looking at Cachegrind's output, you can tell that a function has been inlined 41 | if (and only if) its first and last lines are *not* marked with event counts. 42 | For example: 43 | ```text 44 | . #[inline(always)] 45 | . fn inlined(x: u32, y: u32) -> u32 { 46 | 700,000 eprintln!("inlined: {} + {}", x, y); 47 | 200,000 x + y 48 | . } 49 | . 50 | . #[inline(never)] 51 | 400,000 fn not_inlined(x: u32, y: u32) -> u32 { 52 | 700,000 eprintln!("not_inlined: {} + {}", x, y); 53 | 200,000 x + y 54 | 200,000 } 55 | ``` 56 | You should measure again after adding inline attributes, because the effects 57 | can be unpredictable. Sometimes it has no effect because a nearby function that 58 | was previously inlined no longer is. Sometimes it slows the code down. Inlining 59 | can also affect compile times, especially cross-crate inlining which involves 60 | duplicating internal representations of the functions. 61 | 62 | ## Harder Cases 63 | 64 | Sometimes you have a function that is large and has multiple call sites, but 65 | only one call site is hot. You would like to inline the hot call site for 66 | speed, but not inline the cold call sites to avoid unnecessary code bloat. The 67 | way to handle this is to split the function always-inlined and never-inlined 68 | variants, with the latter calling the former. 69 | 70 | For example, this function: 71 | ```rust 72 | # fn one() {}; 73 | # fn two() {}; 74 | # fn three() {}; 75 | fn my_function() { 76 | one(); 77 | two(); 78 | three(); 79 | } 80 | ``` 81 | Would become these two functions: 82 | ```rust 83 | # fn one() {}; 84 | # fn two() {}; 85 | # fn three() {}; 86 | // Use this at the hot call site. 87 | #[inline(always)] 88 | fn inlined_my_function() { 89 | one(); 90 | two(); 91 | three(); 92 | } 93 | 94 | // Use this at the cold call sites. 95 | #[inline(never)] 96 | fn uninlined_my_function() { 97 | inlined_my_function(); 98 | } 99 | ``` 100 | [**Example 1**](https://github.com/rust-lang/rust/pull/53513/commits/b73843f9422fb487b2d26ac2d65f79f73a4c9ae3), 101 | [**Example 2**](https://github.com/rust-lang/rust/pull/64420/commits/a2261ad66400c3145f96ebff0d9b75e910fa89dd). 102 | 103 | -------------------------------------------------------------------------------- /src/inlining_zh.md: -------------------------------------------------------------------------------- 1 | # Inlining 2 | 3 | 调用和退出热函数、未内联的函数往往占执行时间的一小部分。内联这些函数可以提供小而简单的速度优势。 4 | 5 | 有四个内联属性可以用于Rust函数。 6 | - **None**. 编译器会自己决定是否应该内联函数。这将取决于优化级别、函数的大小等。如果你没有使用链接时间优化,函数永远不会跨箱子内联。 7 | - **`#[inline]`**. 这表明该函数应该内嵌,包括跨越crate边界。 8 | - **`#[inline(always)]`**. 这强烈建议该函数应该内嵌,包括跨越crate边界。 9 | - **`#[inline(never)]`**. 这强烈表示该函数不应被内联。 10 | 11 | 内联属性并不保证函数是否会被内联,但实际上 `#[inline(always)]` 会导致内联,除非在极端情况下。 12 | 13 | 内联不具有传递性。如果函数 `f` 调用函数 `g`,并且您希望这两个函数在调用 `f` 的地方一起内联,那么这两个函数都应该标记为内联属性。 14 | 15 | ## Simple Cases 16 | 17 | 最适合内联的是(a)非常小的函数,或者(b)只有一个调用点的函数。编译器通常会自己内联这些函数,即使没有内联属性。但是编译器不可能总是做出最好的选择,所以有时需要属性。 18 | [**Example 1**](https://github.com/rust-lang/rust/pull/37083/commits/6a4bb35b70862f33ac2491ffe6c55fb210c8490d), 19 | [**Example 2**](https://github.com/rust-lang/rust/pull/50407/commits/e740b97be699c9445b8a1a7af6348ca2d4c460ce), 20 | [**Example 3**](https://github.com/rust-lang/rust/pull/50564/commits/77c40f8c6f8cc472f6438f7724d60bf3b7718a0c), 21 | [**Example 4**](https://github.com/rust-lang/rust/pull/57719/commits/92fd6f9d30d0b6b4ecbcf01534809fb66393f139), 22 | [**Example 5**](https://github.com/rust-lang/rust/pull/69256/commits/e761f3af904b3c275bdebc73bb29ffc45384945d). 23 | 24 | Cachegrind是一个很好的判断函数是否被内联的剖析器。当查看Cachegrind的输出时,如果(也只有当)函数的第一行和最后一行没有*标记事件数,你就可以判断该函数被内联了。 25 | 例如 26 | ```text 27 | . #[inline(always)] 28 | . fn inlined(x: u32, y: u32) -> u32 { 29 | 700,000 eprintln!("inlined: {} + {}", x, y); 30 | 200,000 x + y 31 | . } 32 | . 33 | . #[inline(never)] 34 | 400,000 fn not_inlined(x: u32, y: u32) -> u32 { 35 | 700,000 eprintln!("not_inlined: {} + {}", x, y); 36 | 200,000 x + y 37 | 200,000 } 38 | ``` 39 | 添加内联属性后你应该再测一次,因为效果可能是不可预知的。有时它没有效果,因为附近一个之前内联的函数不再内联了。有时会拖慢代码的速度。内联也会影响编译时间,特别是交叉速率内联,它涉及到重复函数的内部表示。 40 | 41 | ## Harder Cases 42 | 43 | 有时候,你有一个函数很大,有多个调用站点,但只有一个调用点是热调用点。你希望内联热调用点以提高速度,但不内联冷调用点以避免不必要的代码膨胀。处理的方法是将函数分成总是内联和从不内联的部分,后者调用前者。 44 | 45 | 例如,这个函数。 46 | ```rust 47 | # fn one() {}; 48 | # fn two() {}; 49 | # fn three() {}; 50 | fn my_function() { 51 | one(); 52 | two(); 53 | three(); 54 | } 55 | ``` 56 | 应该修改为如下函数 57 | ```rust 58 | # fn one() {}; 59 | # fn two() {}; 60 | # fn three() {}; 61 | // Use this at the hot call site. 62 | #[inline(always)] 63 | fn inlined_my_function() { 64 | one(); 65 | two(); 66 | three(); 67 | } 68 | 69 | // Use this at the cold call sites. 70 | #[inline(never)] 71 | fn uninlined_my_function() { 72 | inlined_my_function(); 73 | } 74 | ``` 75 | [**Example 1**](https://github.com/rust-lang/rust/pull/53513/commits/b73843f9422fb487b2d26ac2d65f79f73a4c9ae3), 76 | [**Example 2**](https://github.com/rust-lang/rust/pull/64420/commits/a2261ad66400c3145f96ebff0d9b75e910fa89dd). 77 | 78 | -------------------------------------------------------------------------------- /src/introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Performance is important for many Rust programs. 4 | 5 | This book contains techniques that can improve the performance-related 6 | characteristics of Rust programs, such as runtime speed, memory usage, and 7 | binary size. The [Compile Times] section also contains techniques that will 8 | improve the compile times of Rust programs. Some techniques only require 9 | changing build configurations, but many require changing code. 10 | 11 | [Compile Times]: compile-times.md 12 | 13 | Some techniques are entirely Rust-specific, and some involve ideas that can be 14 | applied (often with modifications) to programs written in other languages. The 15 | [General Tips] section also includes some general principles that apply to any 16 | programming language. Nonetheless, this book is mostly about the performance of 17 | Rust programs and is no substitute for a general purpose guide to profiling and 18 | optimization. 19 | 20 | [General Tips]: general-tips.md 21 | 22 | This book also focuses on techniques that are practical and proven: many are 23 | accompanied by links to pull requests or other resources that show how the 24 | technique was used on a real-world Rust program. It reflects the primary 25 | author's background, being somewhat biased towards compiler development and 26 | away from other areas such as scientific computing. 27 | 28 | This book is deliberately terse, favouring breadth over depth, so that it is 29 | quick to read. It links to external sources that provide more depth when 30 | appropriate. 31 | 32 | This book is aimed at intermediate and advanced Rust users. Beginner Rust users 33 | have more than enough to learn and these techniques are likely to be an 34 | unhelpful distraction to them. 35 | -------------------------------------------------------------------------------- /src/introduction_zh.md: -------------------------------------------------------------------------------- 1 | # 简介 2 | 3 | 性能对许多Rust程序来说都很重要。 4 | 5 | 本书包含了许多可以提高Rust程序的性能-速度和内存使用率的技术,其中[编译时间]部分也包含了一些可以提高Rust程序编译时间的技术。编译时间]部分也包含了一些可以改善Rust程序编译时间的技术。本书的一些技术只需要改变构建配置,但许多技术需要改变代码。 6 | 7 | [编译时间]: compile-times_zh.md 8 | 9 | 一些技术完全是 Rust 特有的,而一些涉及的思想可以应用于其他编程语言编写的程序(通常需要进行修改)。[General Tips] 部分还包括适用于任何编程语言的一些一般原则。尽管如此,这本书主要关注 Rust 程序的性能,不能替代一本关于分析和优化的通用指南。 10 | 11 | [General Tips]: general-tips_zh.md 12 | 13 | 本书侧重于实用且经过验证的技术:许多技术都附有指向拉取请求或其他资源的链接,展示了这些技术在真实的 Rust 程序中的应用。这反映了主要作者的背景,有点偏向编译器开发,而不太涉及其他领域,如科学计算。 14 | 15 | 本书有意简洁,更注重广度而非深度,使其阅读起来迅速。在适当的情况下,它会链接到提供更深入信息的外部资源。 16 | 17 | 本书面向中级和高级 Rust 用户。初学者 Rust 用户有很多东西要学习,这些技术可能会对他们造成不必要的干扰。 -------------------------------------------------------------------------------- /src/io.md: -------------------------------------------------------------------------------- 1 | # I/O 2 | 3 | ## Locking 4 | 5 | Rust's [`print!`] and [`println!`] macros lock stdout on every call. If you 6 | have repeated calls to these macros it may be better to lock stdout manually. 7 | 8 | [`print!`]: https://doc.rust-lang.org/std/macro.print.html 9 | [`println!`]: https://doc.rust-lang.org/std/macro.println.html 10 | 11 | For example, change this code: 12 | ```rust 13 | # let lines = vec!["one", "two", "three"]; 14 | for line in lines { 15 | println!("{}", line); 16 | } 17 | ``` 18 | to this: 19 | ```rust 20 | # fn blah() -> Result<(), std::io::Error> { 21 | # let lines = vec!["one", "two", "three"]; 22 | use std::io::Write; 23 | let mut stdout = std::io::stdout(); 24 | let mut lock = stdout.lock(); 25 | for line in lines { 26 | writeln!(lock, "{}", line)?; 27 | } 28 | // stdout is unlocked when `lock` is dropped 29 | # Ok(()) 30 | # } 31 | ``` 32 | stdin and stderr can likewise be locked when doing repeated operations on them. 33 | 34 | ## Buffering 35 | 36 | Rust file I/O is unbuffered by default. If you have many small and repeated 37 | read or write calls to a file or network socket, use [`BufReader`] or 38 | [`BufWriter`]. They maintain an in-memory buffer for input and output, 39 | minimizing the number of system calls required. 40 | 41 | [`BufReader`]: https://doc.rust-lang.org/std/io/struct.BufReader.html 42 | [`BufWriter`]: https://doc.rust-lang.org/std/io/struct.BufWriter.html 43 | 44 | For example, change this unbuffered writer code: 45 | ```rust 46 | # fn blah() -> Result<(), std::io::Error> { 47 | # let lines = vec!["one", "two", "three"]; 48 | use std::io::Write; 49 | let mut out = std::fs::File::create("test.txt")?; 50 | for line in lines { 51 | writeln!(out, "{}", line)?; 52 | } 53 | # Ok(()) 54 | # } 55 | ``` 56 | to this: 57 | ```rust 58 | # fn blah() -> Result<(), std::io::Error> { 59 | # let lines = vec!["one", "two", "three"]; 60 | use std::io::{BufWriter, Write}; 61 | let mut out = BufWriter::new(std::fs::File::create("test.txt")?); 62 | for line in lines { 63 | writeln!(out, "{}", line)?; 64 | } 65 | out.flush()?; 66 | # Ok(()) 67 | # } 68 | ``` 69 | [**Example 1**](https://github.com/rust-lang/rust/pull/93954), 70 | [**Example 2**](https://github.com/nnethercote/dhat-rs/pull/22/commits/8c3ae26f1219474ee55c30bc9981e6af2e869be2). 71 | 72 | The explicit call to [`flush`] is not strictly necessary, as flushing will 73 | happen automatically when `out` is dropped. However, in that case any error 74 | that occurs on flushing will be ignored, whereas an explicit flush will make 75 | that error explicit. 76 | 77 | [`flush`]: https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.flush 78 | 79 | Forgetting to buffer is more common when writing. Both unbuffered and buffered 80 | writers implement the [`Write`] trait, which means the code for writing 81 | to an unbuffered writer and a buffered writer is much the same. In contrast, 82 | unbuffered readers implement the [`Read`] trait but buffered readers implement 83 | the [`BufRead`] trait, which means the code for reading from an unbuffered reader 84 | and a buffered reader is different. For example, it is difficult to read a file 85 | line by line with an unbuffered reader, but it is trivial with a buffered 86 | reader by using [`BufRead::read_line`] or [`BufRead::lines`]. For this reason, 87 | it is hard to write an example for readers like the one above for writers, 88 | where the before and after versions are so similar. 89 | 90 | [`Write`]: https://doc.rust-lang.org/std/io/trait.Write.html 91 | [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html 92 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html 93 | [`BufRead::read_line`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_line 94 | [`BufRead::lines`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines 95 | 96 | Finally, note that buffering also works with stdout, so you might want to 97 | combine manual locking *and* buffering when making many writes to stdout. 98 | 99 | ## Reading Lines from a File 100 | 101 | [This section] explains how to avoid excessive allocations when using 102 | [`BufRead`] to read a file one line at a time. 103 | 104 | [This section]: heap-allocations.md#reading-lines-from-a-file 105 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html 106 | 107 | ## Reading Input as Raw Bytes 108 | 109 | The built-in [String] type uses UTF-8 internally, which adds a small, but 110 | nonzero overhead caused by UTF-8 validation when you read input into it. If you 111 | just want to process input bytes without worrying about UTF-8 (for example if 112 | you handle ASCII text), you can use [`BufRead::read_until`]. 113 | 114 | [String]: https://doc.rust-lang.org/std/string/struct.String.html 115 | [`BufRead::read_until`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until 116 | 117 | There are also dedicated crates for reading [byte-oriented lines of data] 118 | and working with [byte strings]. 119 | 120 | [byte-oriented lines of data]: https://github.com/Freaky/rust-linereader 121 | [byte strings]: https://github.com/BurntSushi/bstr 122 | -------------------------------------------------------------------------------- /src/io_zh.md: -------------------------------------------------------------------------------- 1 | # I/O 2 | 3 | ## Locking 4 | 5 | Rust的[`print!`]和[`println!`]宏在每次调用时锁定stdout。如果你需要重复调用这些宏,最好手动锁定stdout。 6 | 7 | [`print!`]: https://doc.rust-lang.org/std/macro.print.html 8 | [`println!`]: https://doc.rust-lang.org/std/macro.println.html 9 | 10 | 例如,修改这段代码。 11 | ```rust 12 | # let lines = vec!["one", "two", "three"]; 13 | for line in lines { 14 | println!("{}", line); 15 | } 16 | ``` 17 | to this: 18 | ```rust 19 | # fn blah() -> Result<(), std::io::Error> { 20 | # let lines = vec!["one", "two", "three"]; 21 | use std::io::Write; 22 | let mut stdout = std::io::stdout(); 23 | let mut lock = stdout.lock(); 24 | for line in lines { 25 | writeln!(lock, "{}", line)?; 26 | } 27 | // stdout is unlocked when `lock` is dropped 28 | # Ok(()) 29 | # } 30 | ``` 31 | 当对stdin和stderr进行重复操作时,同样可以锁定它们。 32 | 33 | ## 缓冲 34 | 35 | Rust文件I/O默认是无缓冲的。如果你对文件或网络套接字有许多小的和重复的读写调用,使用[`BufReader`]或[`BufWriter`]。它们为输入和输出维护了一个内存缓冲区,最大限度地减少了系统调用的次数。 36 | 37 | [`BufReader`]: https://doc.rust-lang.org/std/io/struct.BufReader.html 38 | [`BufWriter`]: https://doc.rust-lang.org/std/io/struct.BufWriter.html 39 | 40 | 例如,修改这个无缓冲的输出代码。 41 | ```rust 42 | # fn blah() -> Result<(), std::io::Error> { 43 | # let lines = vec!["one", "two", "three"]; 44 | use std::io::Write; 45 | let mut out = std::fs::File::create("test.txt").unwrap(); 46 | for line in lines { 47 | writeln!(out, "{}", line)?; 48 | } 49 | # Ok(()) 50 | # } 51 | ``` 52 | 修改为: 53 | ```rust 54 | # fn blah() -> Result<(), std::io::Error> { 55 | # let lines = vec!["one", "two", "three"]; 56 | use std::io::{BufWriter, Write}; 57 | let mut out = std::fs::File::create("test.txt")?; 58 | let mut buf = BufWriter::new(out); 59 | for line in lines { 60 | writeln!(buf, "{}", line)?; 61 | } 62 | buf.flush()?; 63 | # Ok(()) 64 | # } 65 | ``` 66 | 对[`flush`]的显式调用并不是绝对必要的,因为当`buf`被丢弃时,刷新将自动发生。然而,在这种情况下,刷新时发生的任何错误都将被忽略,而显式刷新将使该错误显式化。 67 | 68 | [`flush`]: https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.flush 69 | 70 | 在编写时忘记使用缓冲区是比较常见的。无缓冲和有缓冲的写入器都实现了 [Write] trait,这意味着向无缓冲写入器和有缓冲写入器写入的代码基本相同。相比之下,无缓冲读取器实现了 [Read] trait,但有缓冲读取器实现了 [BufRead] trait,这意味着从无缓冲读取器和有缓冲读取器读取的代码是不同的。例如,使用无缓冲读取器逐行读取文件是困难的,但使用有缓冲读取器通过 [BufRead::read_line] 或 [BufRead::lines] 则很简单。因此,对于读取器来说,很难像对写入器那样编写一个示例,其中之前和之后的版本是如此相似。 71 | 72 | [`Write`]: https://doc.rust-lang.org/std/io/trait.Write.html 73 | [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html 74 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html 75 | [`BufRead::read_line`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_line 76 | [`BufRead::lines`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines 77 | 78 | 最后,请注意,缓冲也适用于标准输出(stdout),因此当向标准输出频繁写入时,您可能希望结合手动锁定和缓冲。 79 | 80 | ## Reading Lines from a File 81 | 82 | [这一部分]解释了如何在使用 [BufRead] 逐行读取文件时避免过多的内存分配。 83 | 84 | [这一部分]: heap-allocations_zh.md#reading-lines-from-a-file 85 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html 86 | 87 | ## Reading Input as Raw Bytes 88 | 89 | 内置的[String]类型在内部使用UTF-8,当你读取输入到string类型时,会增加一个由UTF-8验证引起的小但非零的开销。如果你只想处理输入字节而不担心UTF-8(例如,如果你处理ASCII文本),你可以使用[`BufRead::read_until`]。 90 | 91 | [String]: https://doc.rust-lang.org/std/string/struct.String.html 92 | [`BufRead::read_until`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until 93 | 94 | 还有专门的箱子用于读取[面向字节的数据行]和处理[byte strings]。 95 | 96 | [面向字节的数据行]: https://github.com/Freaky/rust-linereader 97 | [byte strings]: https://github.com/BurntSushi/bstr 98 | -------------------------------------------------------------------------------- /src/iterators.md: -------------------------------------------------------------------------------- 1 | # Iterators 2 | 3 | ## `collect` and `extend` 4 | 5 | [`Iterator::collect`] converts an iterator into a collection such as `Vec`, 6 | which typically requires an allocation. You should avoid calling `collect` if 7 | the collection is then only iterated over again. 8 | 9 | [`Iterator::collect`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect 10 | 11 | For this reason, it is often better to return an iterator type like `impl 12 | Iterator` from a function than a `Vec`. Note that sometimes 13 | additional lifetimes are required on these return types, as [this blog post] 14 | explains. 15 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/660d8a6550a126797aa66a417137e39a5639451b). 16 | 17 | [this blog post]: https://blog.katona.me/2019/12/29/Rust-Lifetimes-and-Iterators/ 18 | 19 | Similarly, you can use [`extend`] to extend an existing collection (such as a 20 | `Vec`) with an iterator, rather than collecting the iterator into a `Vec` and 21 | then using [`append`]. 22 | 23 | [`extend`]: https://doc.rust-lang.org/std/iter/trait.Extend.html#tymethod.extend 24 | [`append`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.append 25 | 26 | Finally, when you write an iterator it is often worth implementing the 27 | [`Iterator::size_hint`] or [`ExactSizeIterator::len`] method, if possible. 28 | `collect` and `extend` calls that use the iterator may then do fewer 29 | allocations, because they have advance information about the number of elements 30 | yielded by the iterator. 31 | 32 | [`Iterator::size_hint`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint 33 | [`ExactSizeIterator::len`]: https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#method.len 34 | 35 | ## Chaining 36 | 37 | [`chain`] can be very convenient, but it can also be slower than a single 38 | iterator. It may be worth avoiding for hot iterators, if possible. 39 | [**Example**](https://github.com/rust-lang/rust/pull/64801/commits/5ca99b750e455e9b5e13e83d0d7886486231e48a). 40 | 41 | Similarly, [`filter_map`] may be faster than using [`filter`] followed by 42 | [`map`]. 43 | 44 | [`chain`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.chain 45 | [`filter_map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter_map 46 | [`filter`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter 47 | [`map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map 48 | 49 | ## Chunks 50 | 51 | When a chunking iterator is required and the chunk size is known to exactly 52 | divide the slice length, use the faster [`slice::chunks_exact`] instead of [`slice::chunks`]. 53 | 54 | When the chunk size is not known to exactly divide the slice length, it can 55 | still be faster to use `slice::chunks_exact` in combination with either 56 | [`ChunksExact::remainder`] or manual handling of excess elements. 57 | [**Example 1**](https://github.com/johannesvollmer/exrs/pull/173/files), 58 | [**Example 2**](https://github.com/johannesvollmer/exrs/pull/175/files). 59 | 60 | The same is true for related iterators: 61 | - [`slice::rchunks`], [`slice::rchunks_exact`], and [`RChunksExact::remainder`]; 62 | - [`slice::chunks_mut`], [`slice::chunks_exact_mut`], and [`ChunksExactMut::into_remainder`]; 63 | - [`slice::rchunks_mut`], [`slice::rchunks_exact_mut`], and [`RChunksExactMut::into_remainder`]. 64 | 65 | [`slice::chunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks 66 | [`slice::chunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact 67 | [`ChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExact.html#method.remainder 68 | 69 | [`slice::rchunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks 70 | [`slice::rchunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact 71 | [`RChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExact.html#method.remainder 72 | 73 | [`slice::chunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_mut 74 | [`slice::chunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact_mut 75 | [`ChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExactMut.html#method.into_remainder 76 | 77 | [`slice::rchunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_mut 78 | [`slice::rchunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact_mut 79 | [`RChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExactMut.html#method.into_remainder 80 | 81 | -------------------------------------------------------------------------------- /src/iterators_zh.md: -------------------------------------------------------------------------------- 1 | # Iterators 2 | 3 | ## `collect` 4 | 5 | [`Iterator::collect`]将一个迭代器转换为一个集合,如`Vec`,它通常需要一个分配。如果该集合只是再次迭代,你应该避免调用`collect`。 6 | 7 | [`Iterator::collect`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect 8 | 9 | 出于这个原因,从函数中返回一个迭代器类型,比如`impl Iterator`,往往比`Vec`更好。请注意,有时这些返回类型需要额外的生存期,正如[this post]所解释的那样。 10 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/660d8a6550a126797aa66a417137e39a5639451b). 11 | 12 | [this post]: https://blog.katona.me/2019/12/29/Rust-Lifetimes-and-Iterators/ 13 | 14 | 同样,你可以使用[`extend`]用迭代器扩展一个现有的集合(如`Vec`),而不是将迭代器收集到`Vec`中,然后使用[`append`]。 15 | 16 | [`extend`]: https://doc.rust-lang.org/std/iter/trait.Extend.html#tymethod.extend 17 | [`append`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.append 18 | 19 | 最后,当编写迭代器时,如果可能的话,实现 [`Iterator::size_hint`] 或 [`ExactSizeIterator::len`] 方法通常是值得的。使用该迭代器的 `collect` 和 `extend` 调用可能会减少分配,因为它们提前了解迭代器产生的元素数量信息。 20 | 21 | [`Iterator::size_hint`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint 22 | [`ExactSizeIterator::len`]: https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#method.len 23 | 24 | ## Chaining 25 | 26 | [`chain`]可以非常方便,但也可能比单个迭代器慢。如果可能的话,热迭代器可能值得避免。 27 | [**Example**](https://github.com/rust-lang/rust/pull/64801/commits/5ca99b750e455e9b5e13e83d0d7886486231e48a). 28 | 29 | 类似地,[`filter_map`]可能比使用[`filter`]和[`map`]更快。 30 | 31 | [`chain`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.chain 32 | [`filter_map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter_map 33 | [`filter`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter 34 | [`map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map 35 | 36 | ## Chunks 37 | 38 | 当需要一个分块迭代器,并且已知分块大小恰好能整除切片长度时,应该使用更快的 [`slice::chunks_exact`] 而不是 [`slice::chunks`]。 39 | 40 | 当分块大小不确定能否恰好整除切片长度时,仍然可以更快地使用 `slice::chunks_exact`,结合 [`ChunksExact::remainder`] 或手动处理多余元素。 41 | [**Example 1**](https://github.com/johannesvollmer/exrs/pull/173/files), 42 | [**Example 2**](https://github.com/johannesvollmer/exrs/pull/175/files). 43 | 44 | 同样适用于相关的迭代器: 45 | - [`slice::rchunks`], [`slice::rchunks_exact`], and [`RChunksExact::remainder`]; 46 | - [`slice::chunks_mut`], [`slice::chunks_exact_mut`], and [`ChunksExactMut::into_remainder`]; 47 | - [`slice::rchunks_mut`], [`slice::rchunks_exact_mut`], and [`RChunksExactMut::into_remainder`]. 48 | 49 | [`slice::chunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks 50 | [`slice::chunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact 51 | [`ChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExact.html#method.remainder 52 | 53 | [`slice::rchunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks 54 | [`slice::rchunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact 55 | [`RChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExact.html#method.remainder 56 | 57 | [`slice::chunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_mut 58 | [`slice::chunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact_mut 59 | [`ChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExactMut.html#method.into_remainder 60 | 61 | [`slice::rchunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_mut 62 | [`slice::rchunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact_mut 63 | [`RChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExactMut.html#method.into_remainder 64 | 65 | -------------------------------------------------------------------------------- /src/linting.md: -------------------------------------------------------------------------------- 1 | # Linting 2 | 3 | [Clippy] is a collection of lints to catch common mistakes in Rust code. It is 4 | an excellent tool to run on Rust code in general. It can also help with 5 | performance, because a number of the lints relate to code patterns that can 6 | cause sub-optimal performance. 7 | 8 | Given that automated detection of problems is preferable to manual detection, 9 | the rest of this book will not mention performance problems that Clippy detects 10 | by default. 11 | 12 | ## Basics 13 | 14 | [Clippy]: https://github.com/rust-lang/rust-clippy 15 | 16 | Once installed, it is easy to run: 17 | ```text 18 | cargo clippy 19 | ``` 20 | The full list of performance lints can be seen by visiting the [lint list] and 21 | deselecting all the lint groups except for "Perf". 22 | 23 | [lint list]: https://rust-lang.github.io/rust-clippy/master/ 24 | 25 | As well as making the code faster, the performance lint suggestions usually 26 | result in code that is simpler and more idiomatic, so they are worth following 27 | even for code that is not executed frequently. 28 | 29 | Conversely, some non-performance lint suggestions can improve performance. For 30 | example, the [`ptr_arg`] style lint suggests changing various container 31 | arguments to slices, such as changing `&mut Vec` arguments to `&mut [T]`. 32 | The primary motivation here is that a slice gives a more flexible API, but it 33 | may also result in faster code due to less indirection and better optimization 34 | opportunities for the compiler. 35 | [**Example**](https://github.com/fschutt/fastblur/pull/3/files). 36 | 37 | [`ptr_arg`]: https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg 38 | 39 | ## Disallowing Types 40 | 41 | In the following chapters we will see that it is sometimes worth avoiding 42 | certain standard library types in favour of alternatives that are faster. If 43 | you decide to use these alternatives, it is easy to accidentally use the 44 | standard library types in some places by mistake. 45 | 46 | You can use Clippy's [`disallowed_types`] lint to avoid this problem. For 47 | example, to disallow the use of the standard hash tables (for reasons explained 48 | in the [Hashing] section) add a `clippy.toml` file to your code with the 49 | following line. 50 | ```toml 51 | disallowed-types = ["std::collections::HashMap", "std::collections::HashSet"] 52 | ``` 53 | 54 | [Hashing]: hashing.md 55 | [`disallowed_types`]: https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_types 56 | -------------------------------------------------------------------------------- /src/linting_zh.md: -------------------------------------------------------------------------------- 1 | # Linting 2 | 3 | [Clippy]是一个用来捕捉Rust代码中常见错误的lints集合。它是运行在Rust代码上的一个优秀工具。它还可以帮助提高性能,因为许多lints与可能导致次优性能的代码模式有关。 4 | 5 | [Clippy]: https://github.com/rust-lang/rust-clippy 6 | 7 | 鉴于自动检测问题优于手动检测问题,本书的其余部分将不会提及 Clippy 默认检测到的性能问题。 8 | 9 | ## 基础 10 | 11 | Clippy 是一个 Rust 的 lint 工具,用于静态代码分析。安装后,可以通过以下命令轻松运行: 12 | 13 | ```text 14 | cargo clippy 15 | ``` 16 | 17 | 可以通过访问 [lint list] 并取消选择除了 "Perf" 之外的所有 lint 组,查看完整的性能 lint 列表。 18 | 19 | [lint list]: https://rust-lang.github.io/rust-clippy/master/ 20 | 21 | 除了使代码更快之外,性能 lint 建议通常会导致更简单、更符合惯例的代码,因此即使对于不经常执行的代码,也值得遵循这些建议。 22 | 23 | 相反,一些非性能 lint 建议可能会提高性能。例如,[`ptr_arg`] 风格 lint 建议将各种容器参数更改为切片,例如将 `&mut Vec` 参数更改为 `&mut [T]`。这里的主要动机是切片提供了更灵活的 API,但也可能由于减少间接性和为编译器提供更好的优化机会而导致更快的代码。 24 | [**Example**](https://github.com/fschutt/fastblur/pull/3/files). 25 | 26 | [`ptr_arg`]: https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg 27 | 28 | ## Disallowing Types 29 | 30 | 在接下来的章节中,我们将看到有时候值得避免使用某些标准库类型,而选择更快的替代方案。如果你决定使用这些替代方案,很容易在某些地方意外地错误使用标准库类型。 31 | 32 | 你可以使用 Clippy 的 [`disallowed_types`] lint 来避免这个问题。例如,为了禁止使用标准哈希表(原因在 [Hashing] 部分有解释),可以在你的代码中添加一个 `clippy.toml` 文件,并包含以下行。 33 | 34 | ```toml 35 | disallowed-types = ["std::collections::HashMap", "std::collections::HashSet"] 36 | ``` 37 | 38 | [Hashing]: hashing.md 39 | [`disallowed_types`]: https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_types -------------------------------------------------------------------------------- /src/logging-and-debugging.md: -------------------------------------------------------------------------------- 1 | # Logging and Debugging 2 | 3 | Sometimes logging code or debugging code can slow down a program significantly. 4 | Either the logging/debugging code itself is slow, or data collection code that 5 | feeds into logging/debugging code is slow. Make sure that no unnecessary work 6 | is done for logging/debugging purposes when logging/debugging is not enabled. 7 | [**Example 1**](https://github.com/rust-lang/rust/pull/50246/commits/2e4f66a86f7baa5644d18bb2adc07a8cd1c7409d), 8 | [**Example 2**](https://github.com/rust-lang/rust/pull/75133/commits/eeb4b83289e09956e0dda174047729ca87c709fe). 9 | 10 | Note that [`assert!`] calls always run, but [`debug_assert!`] calls only run in 11 | dev builds. If you have an assertion that is hot but is not necessary for 12 | safety, consider making it a `debug_assert!`. 13 | [**Example 1**](https://github.com/rust-lang/rust/pull/58210/commits/f7ed6e18160bc8fccf27a73c05f3935c9e8f672e), 14 | [**Example 2**](https://github.com/rust-lang/rust/pull/90746/commits/580d357b5adef605fc731d295ca53ab8532e26fb). 15 | 16 | [`assert!`]: https://doc.rust-lang.org/std/macro.assert.html 17 | [`debug_assert!`]: https://doc.rust-lang.org/std/macro.debug_assert.html 18 | -------------------------------------------------------------------------------- /src/logging-and-debugging_zh.md: -------------------------------------------------------------------------------- 1 | # Logging and Debugging 2 | 3 | 有时,日志代码或调试代码会大大降低程序的速度。要么是日志记录/调试代码本身很慢,要么是反馈到日志记录/调试代码的数据收集代码很慢。确保在不启用日志记录/调试时,不为日志记录/调试目的做不必要的工作。 4 | [**Example 1**](https://github.com/rust-lang/rust/pull/50246/commits/2e4f66a86f7baa5644d18bb2adc07a8cd1c7409d), 5 | [**Example 2**](https://github.com/rust-lang/rust/pull/75133/commits/eeb4b83289e09956e0dda174047729ca87c709fe). 6 | 7 | 请注意,[`assert!`]调用总是运行,但[`debug_assert!`]调用只在调试构建中运行。如果你有一个热的断言,但对安全来说不是必需的,可以考虑把它变成一个`debug_assert!`。 8 | [**Example**](https://github.com/rust-lang/rust/pull/58210/commits/f7ed6e18160bc8fccf27a73c05f3935c9e8f672e). 9 | 10 | [`assert!`]: https://doc.rust-lang.org/std/macro.assert.html 11 | [`debug_assert!`]: https://doc.rust-lang.org/std/macro.debug_assert.html 12 | -------------------------------------------------------------------------------- /src/machine-code.md: -------------------------------------------------------------------------------- 1 | # Machine Code 2 | 3 | When you have a small piece of very hot code it may be worth inspecting the 4 | generated machine code to see if it has any inefficiencies, such as removable 5 | [bounds checks]. The [Compiler Explorer] website is an excellent resource when 6 | doing this on small snippets. [`cargo-show-asm`] is an alternative tool that 7 | can be used on full Rust projects. 8 | 9 | [bounds checks]: bounds-checks.md 10 | [Compiler Explorer]: https://godbolt.org/ 11 | [`cargo-show-asm`]: https://github.com/pacak/cargo-show-asm 12 | 13 | Relatedly, the [`core::arch`] module provides access to architecture-specific 14 | intrinsics, many of which relate to SIMD instructions. 15 | 16 | [`core::arch`]: https://doc.rust-lang.org/core/arch/index.html 17 | -------------------------------------------------------------------------------- /src/machine-code_zh.md: -------------------------------------------------------------------------------- 1 | # Machine Code 2 | 3 | 当你有一小段非常频繁执行的热点代码时,值得检查生成的机器代码,看看是否存在一些低效之处,比如可移除的[边界检查]。在处理小片段时,[Compiler Explorer] 网站是一个很好的资源。而 [`cargo-show-asm`] 则是另一个工具,可以用于完整的 Rust 项目。 4 | 5 | [边界检查]: https://en.wikipedia.org/wiki/Bounds_checking 6 | [Compiler Explorer]: https://godbolt.org/ 7 | [`cargo-show-asm`]: https://github.com/thephoeron/cargo-show-asm 8 | 9 | 与此相关的是,[`core::arch`]模块提供了对特定架构的固有知识的访问,其中许多与SIMD指令有关。 10 | 11 | [`core::arch`]: https://doc.rust-lang.org/core/arch/index.html 12 | 13 | 14 | -------------------------------------------------------------------------------- /src/parallelism.md: -------------------------------------------------------------------------------- 1 | # Parallelism 2 | 3 | Rust provides excellent support for safe parallel programming, which can lead 4 | to large performance improvements. There are a variety of ways to introduce 5 | parallelism into a program and the best way for any program will depend greatly 6 | on its design. 7 | 8 | An in-depth treatment of parallelism is beyond the scope of this book. If you 9 | are interested in this topic, the documentation for the [`rayon`] and 10 | [`crossbeam`] crates is a good place to start. 11 | 12 | [`rayon`]: https://crates.io/crates/rayon 13 | [`crossbeam`]: https://crates.io/crates/crossbeam 14 | 15 | -------------------------------------------------------------------------------- /src/parallelism_zh.md: -------------------------------------------------------------------------------- 1 | # Parallelism 2 | 3 | Rust为安全的并行编程提供了很好的支持,这可以带来很大的性能提升。有多种方法可以将并行性引入到程序中,对于任何程序来说,最好的方式在很大程度上取决于其设计。 4 | 5 | 对并行性的深入研究超出了本书的范围。如果你对这一主题感兴趣,[rayon]和[crossbeam]板条箱的文档是一个很好的开始。 6 | 7 | [`rayon`]: https://crates.io/crates/rayon 8 | [`crossbeam`]: https://crates.io/crates/crossbeam 9 | 10 | -------------------------------------------------------------------------------- /src/profiling.md: -------------------------------------------------------------------------------- 1 | # Profiling 2 | 3 | When optimizing a program, you also need a way to determine which parts of the 4 | program are "hot" (executed frequently enough to affect runtime) and worth 5 | modifying. This is best done via profiling. 6 | 7 | ## Profilers 8 | 9 | There are many different profilers available, each with their strengths and 10 | weaknesses. The following is an incomplete list of profilers that have been 11 | used successfully on Rust programs. 12 | - [perf] is a general-purpose profiler that uses hardware performance counters. 13 | [Hotspot] and [Firefox Profiler] are good for viewing data recorded by perf. 14 | It works on Linux. 15 | - [Instruments] is a general-purpose profiler that comes with Xcode on macOS. 16 | - [Intel VTune Profiler] is a general-purpose profiler. It works on Windows, 17 | Linux, and macOS. 18 | - [AMD μProf] is a general-purpose profiler. It works on Windows and Linux. 19 | - [samply] is a sampling profiler that produces profiles that can be viewed 20 | in the Firefox Profiler. It works on Mac and Linux. 21 | - [flamegraph] is a Cargo command that uses perf/DTrace to profile your 22 | code and then displays the results in a flame graph. It works on Linux and 23 | all platforms that support DTrace (macOS, FreeBSD, NetBSD, and possibly 24 | Windows). 25 | - [Cachegrind] & [Callgrind] give global, per-function, and per-source-line 26 | instruction counts and simulated cache and branch prediction data. They work 27 | on Linux and some other Unixes. 28 | - [DHAT] is good for finding which parts of the code are causing a lot of 29 | allocations, and for giving insight into peak memory usage. It can also be 30 | used to identify hot calls to `memcpy`. It works on Linux and some other 31 | Unixes. [dhat-rs] is an experimental alternative that is a little less 32 | powerful and requires minor changes to your Rust program, but works on all 33 | platforms. 34 | - [heaptrack] and [bytehound] are heap profiling tools. They work on Linux. 35 | - [`counts`] supports ad hoc profiling, which combines the use of `eprintln!` 36 | statement with frequency-based post-processing, which is good for getting 37 | domain-specific insights into parts of your code. It works on all platforms. 38 | - [Coz] performs *causal profiling* to measure optimization potential, and has 39 | Rust support via [coz-rs]. It works on Linux. 40 | 41 | [perf]: https://perf.wiki.kernel.org/index.php/Main_Page 42 | [Hotspot]: https://github.com/KDAB/hotspot 43 | [Firefox Profiler]: https://profiler.firefox.com/ 44 | [Instruments]: https://developer.apple.com/forums/tags/instruments 45 | [Intel VTune Profiler]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html 46 | [AMD μProf]: https://developer.amd.com/amd-uprof/ 47 | [samply]: https://github.com/mstange/samply/ 48 | [flamegraph]: https://github.com/flamegraph-rs/flamegraph 49 | [Cachegrind]: https://www.valgrind.org/docs/manual/cg-manual.html 50 | [Callgrind]: https://www.valgrind.org/docs/manual/cl-manual.html 51 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html 52 | [dhat-rs]: https://github.com/nnethercote/dhat-rs/ 53 | [heaptrack]: https://github.com/KDE/heaptrack 54 | [bytehound]: https://github.com/koute/bytehound 55 | [`counts`]: https://github.com/nnethercote/counts/ 56 | [Coz]: https://github.com/plasma-umass/coz 57 | [coz-rs]: https://github.com/plasma-umass/coz/tree/master/rust 58 | 59 | ## Debug Info 60 | 61 | To profile a release build effectively you might need to enable source line 62 | debug info. To do this, add the following lines to your `Cargo.toml` file: 63 | ```toml 64 | [profile.release] 65 | debug = 1 66 | ``` 67 | See the [Cargo documentation] for more details about the `debug` setting. 68 | 69 | [Cargo documentation]: https://doc.rust-lang.org/cargo/reference/profiles.html#debug 70 | 71 | Unfortunately, even after doing the above step you won't get detailed profiling 72 | information for standard library code. This is because shipped versions of the 73 | Rust standard library are not built with debug info. 74 | 75 | The most reliable way around this is to build your own version of the compiler 76 | and standard library, following [these instructions], and adding the following 77 | lines to the `config.toml` file: 78 | ```toml 79 | [rust] 80 | debuginfo-level = 1 81 | ``` 82 | This is a hassle, but may be worth the effort in some cases. 83 | 84 | [these instructions]: https://github.com/rust-lang/rust 85 | 86 | Alternatively, the unstable [build-std] feature lets you compile the standard 87 | library as part of your program's normal compilation, with the same build 88 | configuration. However, filenames present in the debug info for the standard 89 | library will not point to source code files, because this feature does not also 90 | download standard library source code. So this approach will not help with 91 | profilers such as Cachegrind and Samply that require source code to work fully. 92 | 93 | [build-std]: https://doc.rust-lang.org/cargo/reference/unstable.html#build-std 94 | 95 | ## Symbol Demangling 96 | 97 | Rust uses a form of name mangling to encode function names in compiled code. If 98 | a profiler is unaware of this, its output may contain symbol names beginning 99 | with `_ZN` or `_R`, such as `_ZN3foo3barE` or 100 | `_ZN28_$u7b$$u7b$closure$u7d$$u7d$E` or 101 | `_RMCsno73SFvQKx_1cINtB0_3StrKRe616263_E` 102 | 103 | Names like these can be manually demangled using [`rustfilt`]. 104 | 105 | [`rustfilt`]: https://crates.io/crates/rustfilt 106 | 107 | If you are having trouble with symbol demangling while profiling, it may be 108 | worth changing the [mangling format] from the default legacy format to the newer 109 | v0 format. 110 | 111 | [mangling format]: https://doc.rust-lang.org/rustc/codegen-options/index.html#symbol-mangling-version 112 | 113 | To use the v0 format from the command line, use the `-C 114 | symbol-mangling-version=v0` flag. For example: 115 | ```bash 116 | RUSTFLAGS="-C symbol-mangling-version=v0" cargo build --release 117 | ``` 118 | 119 | Alternatively, to request these instructions from a [`config.toml`] file (for 120 | one or more projects), add these lines: 121 | ```toml 122 | [build] 123 | rustflags = ["-C", "symbol-mangling-version=v0"] 124 | ``` 125 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 126 | 127 | -------------------------------------------------------------------------------- /src/profiling_zh.md: -------------------------------------------------------------------------------- 1 | # Profiling 2 | 3 | 在优化程序时,你还需要一种方法来确定程序的哪些部分是 "热 "的(执行频率足以影响运行时间),值得修改。这一点最好通过Profiling来完成。 4 | 5 | ## Profilers 6 | 7 | 有许多不同的性能分析工具可供选择,每种工具都有其优势和劣势。以下是一份未完整的性能分析工具列表,这些工具已成功用于 Rust 程序。 8 | - [perf] 是一个使用硬件性能计数器的通用性能分析工具。[Hotspot] 和 [Firefox Profiler] 适用于查看 perf 记录的数据。它适用于 Linux。 9 | - [Instruments] 是一个随 macOS 上的 Xcode 提供的通用性能分析工具。 10 | - [Intel VTune Profiler] 是一个通用性能分析工具。它适用于 Windows、Linux 和 macOS。 11 | - [AMD μProf] 是一个通用性能分析工具。它适用于 Windows 和 Linux。 12 | - [samply] 是一个采样性能分析工具,生成的分析结果可以在 Firefox Profiler 中查看。它适用于 Mac 和 Linux。 13 | - [flamegraph] 是一个 Cargo 命令,使用 perf/DTrace 对代码进行性能分析,然后在火焰图中显示结果。它适用于 Linux 和支持 DTrace 的所有平台(macOS、FreeBSD、NetBSD,可能还包括 Windows)。 14 | - [Cachegrind] 和 [Callgrind] 提供全局、每个函数和每个源代码行的指令计数以及模拟缓存和分支预测数据。它们适用于 Linux 和其他一些 Unix 系统。 15 | - [DHAT] 适用于找出代码中导致大量分配的部分,并提供有关峰值内存使用情况的见解。它还可以用于识别对 `memcpy` 的热调用。它适用于 Linux 和其他一些 Unix 系统。[dhat-rs] 是一个实验性的替代方案,功能稍弱,需要对 Rust 程序进行轻微更改,但适用于所有平台。 16 | - [heaptrack] 和 [bytehound] 是堆分析工具。它们适用于 Linux。 17 | - [`counts`] 支持临时性能分析,结合 `eprintln!` 语句和基于频率的后处理,适用于获取代码部分的领域特定见解。它适用于所有平台。 18 | - [Coz] 执行*因果分析*以测量优化潜力,并通过 [coz-rs] 支持 Rust。它适用于 Linux。 19 | - 20 | [perf]: https://perf.wiki.kernel.org/index.php/Main_Page 21 | [Hotspot]: https://github.com/KDAB/hotspot 22 | [Firefox Profiler]: https://profiler.firefox.com/ 23 | [Cachegrind]: https://www.valgrind.org/docs/manual/cg-manual.html 24 | [Callgrind]: https://www.valgrind.org/docs/manual/cl-manual.html 25 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html 26 | [heaptrack]: https://github.com/KDE/heaptrack 27 | [`counts`]: https://github.com/nnethercote/counts/ 28 | [Coz]: https://github.com/plasma-umass/coz 29 | [coz-rs]: https://github.com/plasma-umass/coz/tree/master/rust 30 | [flamegraph]: https://github.com/flamegraph-rs/flamegraph 31 | 32 | 为了有效地对发布版本进行性能分析,你可能需要启用源代码行调试信息。要做到这一点,在你的 `Cargo.toml` 文件中添加以下行: 33 | ```toml 34 | [profile.release] 35 | debug = 1 36 | ``` 37 | 查看 [Cargo 文档] 以获取有关 `debug` 设置的更多详细信息。 38 | 39 | [Cargo 文档]: https://doc.rust-lang.org/cargo/reference/profiles.html#debug 40 | 41 | 不幸的是,即使执行了上述步骤,你也无法获得标准库代码的详细性能分析信息。这是因为 Rust 标准库的发布版本未使用调试信息构建。 42 | 43 | 最可靠的解决方法是构建自己的编译器和标准库版本,遵循 [这些说明],并在 `config.toml` 文件中添加以下行: 44 | ```toml 45 | [rust] 46 | debuginfo-level = 1 47 | ``` 48 | 这可能有些麻烦,但在某些情况下值得努力。 49 | 50 | [这些说明]: https://github.com/rust-lang/rust 51 | 52 | 另外,不稳定的 [build-std] 功能允许你将标准库作为程序正常编译的一部分进行编译,使用相同的构建配置。然而,标准库调试信息中存在的文件名将不指向源代码文件,因为此功能不会下载标准库源代码。因此,这种方法对于像 Cachegrind 和 Samply 这样需要源代码才能完全工作的性能分析工具并不适用。 53 | 54 | [build-std]: https://doc.rust-lang.org/cargo/reference/unstable.html#build-std 55 | 56 | ## Symbol Demangling 57 | 58 | Rust在编译代码中使用了一个杂乱的方案来编码函数名。如果一个剖析器不知道这个方案,它的输出可能会包含像这样的符号名 59 | `_ZN3foo3barE`或`_ZN28_$u7b$$u7b$closure$u7d$$u7d$E`或`_ZN28_$u7b$$$u7d$E`或 60 | `_ZN88_$LT$core.result.Result$LT$$u21$$C$$u20$E$GT$u20$as$u20$std.process.Termination$GT$6report17hfc41d0da4a40b3e8E`。 61 | 像这样的名字,可以用[`rustfilt`]手动拆分。 62 | 63 | [`rustfilt`]: https://crates.io/crates/rustfilt 64 | 65 | 如果在进行性能分析时遇到符号解缠混淆的问题,可能值得将 [编码格式] 从默认的传统格式更改为更新的 v0 格式。 66 | 67 | [编码格式]: https://doc.rust-lang.org/rustc/codegen-options/index.html#symbol-mangling-version 68 | 69 | 要从命令行中使用 v0 格式,可以使用 `-C symbol-mangling-version=v0` 标志。例如: 70 | ```bash 71 | RUSTFLAGS="-C symbol-mangling-version=v0" cargo build --release 72 | ``` 73 | 74 | 另外,要从 [`config.toml`] 文件(针对一个或多个项目)请求这些说明,添加以下行: 75 | ```toml 76 | [build] 77 | rustflags = ["-C", "symbol-mangling-version=v0"] 78 | ``` 79 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html -------------------------------------------------------------------------------- /src/standard-library-types.md: -------------------------------------------------------------------------------- 1 | # Standard Library Types 2 | 3 | It is worth reading through the documentation for common standard library 4 | types—such as [`Box`], [`Vec`], [`Option`], [`Result`], and [`Rc`]/[`Arc`]—to find interesting 5 | functions that can sometimes be used to improve performance. 6 | 7 | [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html 8 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html 9 | [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html 10 | [`Result`]: https://doc.rust-lang.org/std/result/enum.Result.html 11 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html 12 | [`Arc`]: https://doc.rust-lang.org/std/sync/struct.Arc.html 13 | 14 | It is also worth knowing about high-performance alternatives to standard 15 | library types, such as [`Mutex`], [`RwLock`], [`Condvar`], and 16 | [`Once`]. 17 | 18 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html 19 | [`RwLock`]: https://doc.rust-lang.org/std/sync/struct.RwLock.html 20 | [`Condvar`]: https://doc.rust-lang.org/std/sync/struct.Condvar.html 21 | [`Once`]: https://doc.rust-lang.org/std/sync/struct.Once.html 22 | 23 | ## `Box` 24 | 25 | The expression [`Box::default()`] has the same effect as 26 | `Box::new(T::default())` but can be faster because the compiler can create the 27 | value directly on the heap, rather than constructing it on the stack and then 28 | copying it over. 29 | [**Example**](https://github.com/komora-io/art/commit/d5dc58338f475709c375e15976d0d77eb5d7f7ef). 30 | 31 | [`Box::default()`]: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.default 32 | 33 | ## `Vec` 34 | 35 | The best way to create a zero-filled `Vec` of length `n` is with `vec![0; n]`. 36 | This is simple and probably [as fast or faster] than alternatives, such as 37 | using `resize`, `extend`, or anything involving `unsafe`, because it can use OS 38 | assistance. 39 | 40 | [as fast or faster]: https://github.com/rust-lang/rust/issues/54628 41 | 42 | [`Vec::remove`] removes an element at a particular index and shifts all 43 | subsequent elements one to the left, which makes it O(n). [`Vec::swap_remove`] 44 | replaces an element at a particular index with the final element, which does 45 | not preserve ordering, but is O(1). 46 | 47 | [`Vec::retain`] efficiently removes multiple items from a `Vec`. There is an 48 | equivalent method for other collection types such as `String`, `HashSet`, and 49 | `HashMap`. 50 | 51 | [`Vec::remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.remove 52 | [`Vec::swap_remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.swap_remove 53 | [`Vec::retain`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain 54 | 55 | ## `Option` and `Result` 56 | 57 | [`Option::ok_or`] converts an `Option` into a `Result`, and is passed an `err` 58 | parameter that is used if the `Option` value is `None`. `err` is computed 59 | eagerly. If its computation is expensive, you should instead use 60 | [`Option::ok_or_else`], which computes the error value lazily via a closure. 61 | For example, this: 62 | ```rust 63 | # fn expensive() {} 64 | # let o: Option = None; 65 | let r = o.ok_or(expensive()); // always evaluates `expensive()` 66 | ``` 67 | should be changed to this: 68 | ```rust 69 | # fn expensive() {} 70 | # let o: Option = None; 71 | let r = o.ok_or_else(|| expensive()); // evaluates `expensive()` only when needed 72 | ``` 73 | [**Example**](https://github.com/rust-lang/rust/pull/50051/commits/5070dea2366104fb0b5c344ce7f2a5cf8af176b0). 74 | 75 | [`Option::ok_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or 76 | [`Option::ok_or_else`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or_else 77 | 78 | There are similar alternatives for [`Option::map_or`], [`Option::unwrap_or`], 79 | [`Result::or`], [`Result::map_or`], and [`Result::unwrap_or`]. 80 | 81 | [`Option::map_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or 82 | [`Option::unwrap_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.unwrap_or 83 | [`Result::or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.or 84 | [`Result::map_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_or 85 | [`Result::unwrap_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or 86 | 87 | ## `Rc`/`Arc` 88 | 89 | [`Rc::make_mut`]/[`Arc::make_mut`] provide clone-on-write semantics. They make 90 | a mutable reference to an `Rc`/`Arc`. If the refcount is greater than one, they 91 | will `clone` the inner value to ensure unique ownership; otherwise, they will 92 | modify the original value. They are not needed often, but they can be extremely 93 | useful on occasion. 94 | [**Example 1**](https://github.com/rust-lang/rust/pull/65198/commits/3832a634d3aa6a7c60448906e6656a22f7e35628), 95 | [**Example 2**](https://github.com/rust-lang/rust/pull/65198/commits/75e0078a1703448a19e25eac85daaa5a4e6e68ac). 96 | 97 | [`Rc::make_mut`]: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.make_mut 98 | [`Arc::make_mut`]: https://doc.rust-lang.org/std/sync/struct.Arc.html#method.make_mut 99 | 100 | ## `Mutex`, `RwLock`, `Condvar`, and `Once` 101 | 102 | The [`parking_lot`] crate provides alternative implementations of these 103 | synchronization types. The APIs and semantics of the `parking_lot` types are 104 | similar but not identical to those of the equivalent types in the standard 105 | library. 106 | 107 | The `parking_lot` versions used to be reliably smaller, faster, and more 108 | flexible than those in the standard library, but the standard library versions 109 | have greatly improved on some platforms. So you should measure before switching 110 | to `parking_lot`. 111 | 112 | [`parking_lot`]: https://crates.io/crates/parking_lot 113 | 114 | If you decide to universally use the `parking_lot` types it is easy to 115 | accidentally use the standard library equivalents in some places. You can [use 116 | Clippy] to avoid this problem. 117 | 118 | [use Clippy]: linting.md#disallowing-types 119 | -------------------------------------------------------------------------------- /src/standard-library-types_zh.md: -------------------------------------------------------------------------------- 1 | # Standard Library Types 2 | 3 | 值得阅读常见标准库类型的文档--如[`Vec`]、[`Option`]、[`Result`]和[`Rc`]--以找到有趣的函数,有时可以用来提高性能。 4 | 5 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html 6 | [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html 7 | [`Result`]: https://doc.rust-lang.org/std/result/enum.Result.html 8 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html 9 | 10 | 还值得了解标准库类型的高性能替代品,如[`Mutex`]、[`RwLock`]、[`Condvar`]和[`Once`]。 11 | 12 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html 13 | [`RwLock`]: https://doc.rust-lang.org/std/sync/struct.RwLock.html 14 | [`Condvar`]: https://doc.rust-lang.org/std/sync/struct.Condvar.html 15 | [`Once`]: https://doc.rust-lang.org/std/sync/struct.Once.html 16 | 17 | ## `Box` 18 | 表达式 [`Box::default()`] 的效果与 `Box::new(T::default())` 相同,但可能更快,因为编译器可以直接在堆上创建值,而不是在堆栈上构造值然后复制它。 19 | [**示例**](https://github.com/komora-io/art/commit/d5dc58338f475709c375e15976d0d77eb5d7f7ef)。 20 | 21 | [`Box::default()`]: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.default 22 | 23 | ## `Vec` 24 | 25 | 创建长度为 `n` 的零填充 `Vec` 的最佳方法是使用 `vec![0; n]`。这种方法简单且可能比其他方法更快,比如使用 `resize`、`extend` 或涉及 `unsafe` 的任何操作,因为它可以利用操作系统的帮助。 26 | 27 | [`Vec::remove`] 会移除特定索引处的元素,并将所有后续元素向左移动一个位置,这使得它的时间复杂度为 O(n)。[`Vec::swap_remove`] 会用最后一个元素替换特定索引处的元素,这不会保留顺序,但时间复杂度为 O(1)。 28 | 29 | [`Vec::retain`] 可以高效地从 `Vec` 中移除多个项。其他集合类型如 `String`、`HashSet` 和 `HashMap` 也有类似的方法。 30 | 31 | [`Vec::remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.remove 32 | [`Vec::swap_remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.swap_remove 33 | [`Vec::retain`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain 34 | 35 | ## `Option` and `Result` 36 | 37 | [`Option::ok_or']将`Option'转换为`Result',并传递一个`err'参数,如果`Option'值为`None',则使用该参数。`err`是急于计算的。如果它的计算很昂贵,你应该使用[`Option::ok_or_else`],它通过一个闭包缓慢地计算错误值。 38 | 例如,这个。 39 | ```rust 40 | # fn expensive() {} 41 | # let o: Option = None; 42 | let r = o.ok_or(expensive()); // always evaluates `expensive()` 43 | ``` 44 | should be changed to this: 45 | ```rust 46 | # fn expensive() {} 47 | # let o: Option = None; 48 | let r = o.ok_or_else(|| expensive()); // evaluates `expensive()` only when needed 49 | ``` 50 | [**Example**](https://github.com/rust-lang/rust/pull/50051/commits/5070dea2366104fb0b5c344ce7f2a5cf8af176b0). 51 | 52 | [`Option::ok_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or 53 | [`Option::ok_or_else`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or_else 54 | 55 | [`Option::map_or`]、[`Option::unwrap_or`]、[`Result::or`]、[`Result::map_or`]和[`Result::unwrap_or`]有类似的替代方案。 56 | 57 | [`Option::map_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or 58 | [`Option::unwrap_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.unwrap_or 59 | [`Result::or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.or 60 | [`Result::map_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_or 61 | [`Result::unwrap_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or 62 | 63 | ## `Rc`/`Arc` 64 | 65 | [`Rc::make_mut`]/[`Arc::make_mut`]提供了clone-on-write语义。它对`Rc`做了一个可改变的引用。如果refcount大于1,它将`clone`内部值以确保唯一的所有权;否则,它将修改原始值。它不经常需要,但偶尔会非常有用。 66 | [**Example 1**](https://github.com/rust-lang/rust/pull/65198/commits/3832a634d3aa6a7c60448906e6656a22f7e35628), 67 | [**Example 2**](https://github.com/rust-lang/rust/pull/65198/commits/75e0078a1703448a19e25eac85daaa5a4e6e68ac). 68 | 69 | [`Rc::make_mut`]: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.make_mut 70 | [`Arc::make_mut`]: https://doc.rust-lang.org/std/sync/struct.Arc.html#method.make_mut 71 | 72 | ## `Mutex`, `RwLock`, `Condvar`, and `Once` 73 | 74 | [`parking_lot`] crate提供了这些同步类型的替代实现。`parking_lot` 类型的API和语义与标准库中等效类型的类似但并非完全相同。 75 | 76 | 过去,`parking_lot` 版本通常比标准库中的版本更小、更快、更灵活,但在某些平台上,标准库版本已经有了很大改进。因此,在切换到 `parking_lot` 之前,您应该进行测量。 77 | 78 | [`parking_lot`]: https://crates.io/crates/parking_lot 79 | 80 | 如果决定普遍使用 `parking_lot` 类型,很容易在某些地方意外地使用标准库的等效类型。您可以使用 [Clippy] 来避免这个问题。 81 | 82 | [Clippy]: linting.md#disallowing-types 83 | -------------------------------------------------------------------------------- /src/title-page.md: -------------------------------------------------------------------------------- 1 | # RUST性能手册 2 | 3 | **First published in November 2020** 4 | 5 | **Written by Nicholas Nethercote and others** 6 | 7 | **Chinese translated by Blues-star** 8 | -------------------------------------------------------------------------------- /src/type-sizes.md: -------------------------------------------------------------------------------- 1 | # Type Sizes 2 | 3 | Shrinking oft-instantiated types can help performance. 4 | 5 | For example, if memory usage is high, a heap profiler like [DHAT] can identify 6 | the hot allocation points and the types involved. Shrinking these types can 7 | reduce peak memory usage, and possibly improve performance by reducing memory 8 | traffic and cache pressure. 9 | 10 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html 11 | 12 | Furthermore, Rust types that are larger than 128 bytes are copied with `memcpy` 13 | rather than inline code. If `memcpy` shows up in non-trivial amounts in 14 | profiles, DHAT's "copy profiling" mode will tell you exactly where the hot 15 | `memcpy` calls are and the types involved. Shrinking these types to 128 bytes 16 | or less can make the code faster by avoiding `memcpy` calls and reducing memory 17 | traffic. 18 | 19 | ## Measuring Type Sizes 20 | 21 | [`std::mem::size_of`] gives the size of a type, in bytes, but often you want to 22 | know the exact layout as well. For example, an enum might be surprisingly large 23 | due to a single outsized variant. 24 | 25 | [`std::mem::size_of`]: https://doc.rust-lang.org/std/mem/fn.size_of.html 26 | 27 | The `-Zprint-type-sizes` option does exactly this. It isn’t enabled on release 28 | versions of rustc, so you’ll need to use a nightly version of rustc. Here is 29 | one possible invocation via Cargo: 30 | ```text 31 | RUSTFLAGS=-Zprint-type-sizes cargo +nightly build --release 32 | ``` 33 | And here is a possible invocation of rustc: 34 | ```text 35 | rustc +nightly -Zprint-type-sizes input.rs 36 | ``` 37 | It will print out details of the size, layout, and alignment of all types in 38 | use. For example, for this type: 39 | ```rust 40 | enum E { 41 | A, 42 | B(i32), 43 | C(u64, u8, u64, u8), 44 | D(Vec), 45 | } 46 | ``` 47 | it prints the following, plus information about a few built-in types. 48 | ```text 49 | print-type-size type: `E`: 32 bytes, alignment: 8 bytes 50 | print-type-size discriminant: 1 bytes 51 | print-type-size variant `D`: 31 bytes 52 | print-type-size padding: 7 bytes 53 | print-type-size field `.0`: 24 bytes, alignment: 8 bytes 54 | print-type-size variant `C`: 23 bytes 55 | print-type-size field `.1`: 1 bytes 56 | print-type-size field `.3`: 1 bytes 57 | print-type-size padding: 5 bytes 58 | print-type-size field `.0`: 8 bytes, alignment: 8 bytes 59 | print-type-size field `.2`: 8 bytes 60 | print-type-size variant `B`: 7 bytes 61 | print-type-size padding: 3 bytes 62 | print-type-size field `.0`: 4 bytes, alignment: 4 bytes 63 | print-type-size variant `A`: 0 bytes 64 | ``` 65 | The output shows the following. 66 | - The size and alignment of the type. 67 | - For enums, the size of the discriminant. 68 | - For enums, the size of each variant (sorted from largest to smallest). 69 | - The size, alignment, and ordering of all fields. (Note that the compiler has 70 | reordered variant `C`'s fields to minimize the size of `E`.) 71 | - The size and location of all padding. 72 | 73 | Alternatively, the [top-type-sizes] crate can be used to display the output in 74 | a more compact form. 75 | 76 | [top-type-sizes]: https://crates.io/crates/top-type-sizes 77 | 78 | Once you know the layout of a hot type, there are multiple ways to shrink it. 79 | 80 | ## Field Ordering 81 | 82 | The Rust compiler automatically sorts the fields in struct and enums to 83 | minimize their sizes (unless the `#[repr(C)]` attribute is specified), so you 84 | do not have to worry about field ordering. But there are other ways to minimize 85 | the size of hot types. 86 | 87 | ## Smaller Enums 88 | 89 | If an enum has an outsized variant, consider boxing one or more fields. For 90 | example, you could change this type: 91 | ```rust 92 | type LargeType = [u8; 100]; 93 | enum A { 94 | X, 95 | Y(i32), 96 | Z(i32, LargeType), 97 | } 98 | ``` 99 | to this: 100 | ```rust 101 | # type LargeType = [u8; 100]; 102 | enum A { 103 | X, 104 | Y(i32), 105 | Z(Box<(i32, LargeType)>), 106 | } 107 | ``` 108 | This reduces the type size at the cost of requiring an extra heap allocation 109 | for the `A::Z` variant. This is more likely to be a net performance win if the 110 | `A::Z` variant is relatively rare. The `Box` will also make `A::Z` slightly 111 | less ergonomic to use, especially in `match` patterns. 112 | [**Example 1**](https://github.com/rust-lang/rust/pull/37445/commits/a920e355ea837a950b484b5791051337cd371f5d), 113 | [**Example 2**](https://github.com/rust-lang/rust/pull/55346/commits/38d9277a77e982e49df07725b62b21c423b6428e), 114 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/b972ac818c98373b6d045956b049dc34932c41be), 115 | [**Example 4**](https://github.com/rust-lang/rust/pull/64374/commits/2fcd870711ce267c79408ec631f7eba8e0afcdf6), 116 | [**Example 5**](https://github.com/rust-lang/rust/pull/64394/commits/7f0637da5144c7435e88ea3805021882f077d50c), 117 | [**Example 6**](https://github.com/rust-lang/rust/pull/71942/commits/27ae2f0d60d9201133e1f9ec7a04c05c8e55e665). 118 | 119 | ## Smaller Integers 120 | 121 | It is often possible to shrink types by using smaller integer types. For 122 | example, while it is most natural to use `usize` for indices, it is often 123 | reasonable to stores indices as `u32`, `u16`, or even `u8`, and then coerce to 124 | `usize` at use points. 125 | [**Example 1**](https://github.com/rust-lang/rust/pull/49993/commits/4d34bfd00a57f8a8bdb60ec3f908c5d4256f8a9a), 126 | [**Example 2**](https://github.com/rust-lang/rust/pull/50981/commits/8d0fad5d3832c6c1f14542ea0be038274e454524). 127 | 128 | ## Boxed Slices 129 | 130 | Rust vectors contain three words: a length, a capacity, and a pointer. If you 131 | have a vector that is unlikely to be changed in the future, you can convert it 132 | to a *boxed slice* with [`Vec::into_boxed_slice`]. A boxed slice contains only 133 | two words, a length and a pointer. Any excess element capacity is dropped, 134 | which may cause a reallocation. 135 | ```rust 136 | # use std::mem::{size_of, size_of_val}; 137 | let v: Vec = vec![1, 2, 3]; 138 | assert_eq!(size_of_val(&v), 3 * size_of::()); 139 | 140 | let bs: Box<[u32]> = v.into_boxed_slice(); 141 | assert_eq!(size_of_val(&bs), 2 * size_of::()); 142 | ``` 143 | The boxed slice can be converted back to a vector with [`slice::into_vec`] 144 | without any cloning or a reallocation. 145 | 146 | [`Vec::into_boxed_slice`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.into_boxed_slice 147 | [`slice::into_vec`]: https://doc.rust-lang.org/std/primitive.slice.html#method.into_vec 148 | 149 | ## `ThinVec` 150 | 151 | An alternative to boxed slices is `ThinVec`, from the [`thin_vec`] crate. It is 152 | functionally equivalent to `Vec`, but stores the length and capacity in the 153 | same allocation as the elements (if there are any). This means that 154 | `size_of::>` is only one word. 155 | 156 | `ThinVec` is a good choice within oft-instantiated types for vectors that are 157 | often empty. It can also be used to shrink the largest variant of an enum, if 158 | that variant contains a `Vec`. 159 | 160 | [`thin_vec`]: https://crates.io/crates/thin-vec 161 | 162 | ## Avoiding Regressions 163 | 164 | If a type is hot enough that its size can affect performance, it is a good idea 165 | to use a static assertion to ensure that it does not accidentally regress. The 166 | following example uses a macro from the [`static_assertions`] crate. 167 | ```rust,ignore 168 | // This type is used a lot. Make sure it doesn't unintentionally get bigger. 169 | #[cfg(target_arch = "x86_64")] 170 | static_assertions::assert_eq_size!(HotType, [u8; 64]); 171 | ``` 172 | The `cfg` attribute is important, because type sizes can vary on different 173 | platforms. Restricting the assertion to `x86_64` (which is typically the most 174 | widely-used platform) is likely to be good enough to prevent regressions in 175 | practice. 176 | 177 | [`static_assertions`]: https://crates.io/crates/static_assertions 178 | -------------------------------------------------------------------------------- /src/type-sizes_zh.md: -------------------------------------------------------------------------------- 1 | # 类型大小 2 | 3 | 缩小经常实例化的类型可以提高性能。 4 | 5 | 例如,如果内存使用量很高,像 [DHAT] 这样的堆分析器可以识别热点分配点和涉及的类型。缩小这些类型可以减少峰值内存使用量,并通过减少内存流量和缓存压力可能改善性能。 6 | 7 | 此外,Rust 中大于 128 字节的类型会使用 `memcpy` 进行复制,而不是内联代码。如果在性能分析中出现大量 `memcpy`,DHAT 的 "copy profiling" 模式将告诉您热点 `memcpy` 调用的确切位置和涉及的类型。将这些类型缩小到 128 字节或更小可以通过避免 `memcpy` 调用和减少内存流量使代码更快。 8 | 9 | ## 测量类型大小 10 | 11 | [`std::mem::size_of`]给出了一个类型的大小,以字节为单位,但通常你也想知道确切的布局。例如,一个枚举可能会出乎意料的大,这可能是由一个超大的变体造成的。 12 | 13 | [`std::mem::size_of`]: https://doc.rust-lang.org/std/mem/fn.size_of.html 14 | 15 | `-Zprint-type-sizes`选项正是这样做的,它在rustc的发行版上没有被启用,所以你需要使用`rustc`的夜间版本。 下面是一个通过`Cargo`的可能调用 16 | ```text 17 | RUSTFLAGS=-Zprint-type-sizes cargo +nightly build --release 18 | ``` 19 | 而这里是一个`rustc`的可能调用 20 | ```text 21 | rustc +nightly -Zprint-type-sizes input.rs 22 | ``` 23 | 它将打印出所有使用中的类型的尺寸、布局和对齐方式的详细信息。例如,对于这种类型。 24 | ```rust 25 | enum E { 26 | A, 27 | B(i32), 28 | C(u64, u8, u64, u8), 29 | D(Vec), 30 | } 31 | ``` 32 | 它打印以下信息,以及一些内置类型的信息。 33 | ```text 34 | print-type-size type: `E`: 32 bytes, alignment: 8 bytes 35 | print-type-size discriminant: 1 bytes 36 | print-type-size variant `D`: 31 bytes 37 | print-type-size padding: 7 bytes 38 | print-type-size field `.0`: 24 bytes, alignment: 8 bytes 39 | print-type-size variant `C`: 23 bytes 40 | print-type-size field `.1`: 1 bytes 41 | print-type-size field `.3`: 1 bytes 42 | print-type-size padding: 5 bytes 43 | print-type-size field `.0`: 8 bytes, alignment: 8 bytes 44 | print-type-size field `.2`: 8 bytes 45 | print-type-size variant `B`: 7 bytes 46 | print-type-size padding: 3 bytes 47 | print-type-size field `.0`: 4 bytes, alignment: 4 bytes 48 | print-type-size variant `A`: 0 bytes 49 | ``` 50 | 输出显示以下内容。 51 | - 类型的大小和排列。 52 | - 对于enums,判别子的大小。 53 | - 对于enums,每个变量的大小(从最大到最小排序)。 54 | - 所有字段的大小、对齐和排序。(请注意,编译器对变体`C`的字段进行了重新排序,以最小化`E`的大小。) 55 | - 所有padding的大小和位置。 56 | 57 | 另外,可以使用 [top-type-sizes] crate 来以更紧凑的形式显示输出。 58 | 59 | [top-type-sizes]: https://crates.io/crates/top-type-sizes 60 | 61 | 一旦你知道了热型的布局,就有多种方法来收缩它。 62 | 63 | ## 字段顺序 64 | 65 | Rust编译器会自动对结构体和枚举的字段进行排序,以最小化它们的大小(除非指定了 `#[repr(C)]` 属性),因此您无需担心字段顺序的问题。但是,还有其他方法可以最小化热门类型的大小。 66 | 67 | ## 更小的枚举 68 | 69 | 如果一个枚举有一个超大的变体,可以考虑将一个或多个字段装箱。例如,你可以改变这个类型。 70 | ```rust 71 | type LargeType = [u8; 100]; 72 | enum A { 73 | X, 74 | Y(i32), 75 | Z(i32, LargeType), 76 | } 77 | ``` 78 | 修改为: 79 | ```rust 80 | # type LargeType = [u8; 100]; 81 | enum A { 82 | X, 83 | Y(i32), 84 | Z(Box<(i32, LargeType)>), 85 | } 86 | ``` 87 | 这减少了类型大小,但代价是需要为`A::Z`变体分配一个额外的堆。如果`A::Z`变体比较少见,这更有可能成为提高性能的好方法。`Box`也会使`A::Z`的使用略微不符合人的直觉,特别是在`匹配`模式中。 88 | [**Example 1**](https://github.com/rust-lang/rust/pull/37445/commits/a920e355ea837a950b484b5791051337cd371f5d), 89 | [**Example 2**](https://github.com/rust-lang/rust/pull/55346/commits/38d9277a77e982e49df07725b62b21c423b6428e), 90 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/b972ac818c98373b6d045956b049dc34932c41be), 91 | [**Example 4**](https://github.com/rust-lang/rust/pull/64374/commits/2fcd870711ce267c79408ec631f7eba8e0afcdf6), 92 | [**Example 5**](https://github.com/rust-lang/rust/pull/64394/commits/7f0637da5144c7435e88ea3805021882f077d50c), 93 | [**Example 6**](https://github.com/rust-lang/rust/pull/71942/commits/27ae2f0d60d9201133e1f9ec7a04c05c8e55e665). 94 | 95 | ## 更小的intergers 96 | 97 | 通常可以通过使用较小的整数类型来缩小类型。例如,虽然对索引使用 "usize "是最自然的,但将索引存储为 "u32"、"u16"、甚至 "u8",然后在使用点强制使用 "usize",往往是合理的。 98 | [**Example 1**](https://github.com/rust-lang/rust/pull/49993/commits/4d34bfd00a57f8a8bdb60ec3f908c5d4256f8a9a), 99 | [**Example 2**](https://github.com/rust-lang/rust/pull/50981/commits/8d0fad5d3832c6c1f14542ea0be038274e454524). 100 | 101 | ## Boxed Slices 102 | 103 | Rust向量包含三个词:一个长度、一个容量和一个指针。如果你有一个将来不太可能被改变的向量,你可以用[`Vec::into_boxed_slice`]把它转换为一个*boxed slice*。一个boxed slice只包含两个词,一个长度和一个指针。任何多余的元素容量都会被丢弃,这可能会导致重新分配。 104 | ```rust 105 | # use std::mem::{size_of, size_of_val}; 106 | let v: Vec = vec![1, 2, 3]; 107 | assert_eq!(size_of_val(&v), 3 * size_of::()); 108 | 109 | let bs: Box<[u32]> = v.into_boxed_slice(); 110 | assert_eq!(size_of_val(&bs), 2 * size_of::()); 111 | ``` 112 | 盒状切片可以用[`slice::into_vec`]转换回一个矢量,而无需任何克隆或重新分配。 113 | 114 | [`Vec::into_boxed_slice`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.into_boxed_slice 115 | [`slice::into_vec`]: https://doc.rust-lang.org/std/primitive.slice.html#method.into_vec 116 | 117 | ## `ThinVec` 118 | 119 | `ThinVec`是一个替代Boxed slices的选择,来自于`thin_vec` crate。它在功能上等同于`Vec`,但是在与元素相同的分配中存储长度和容量。这意味着`size_of::>`只占用一个字。 120 | 121 | 在经常实例化的类型中,`ThinVec`是一个不错的选择,适用于经常为空的向量。它还可以用于缩小枚举的最大变体,如果该变体包含一个`Vec`。 122 | 123 | [`thin_vec`]: https://crates.io/crates/thin-vec 124 | 125 | ## Avoiding Regressions 126 | 127 | 如果一个类型足够热,它的大小会影响性能,那么最好使用静态断言来确保它不会意外地回归。下面的例子使用了[`static_assertions`]中的一个宏。 128 | ```rust,ignore 129 | // This type is used a lot. Make sure it doesn't unintentionally get bigger. 130 | #[cfg(target_arch = "x86_64")] 131 | static_assertions::assert_eq_size!(HotType, [u8; 64]); 132 | ``` 133 | `cfg`属性很重要,因为类型大小在不同的平台上会有所不同。将断言限制在 "`x86_64`"(通常是最广泛使用的平台)可能足以防止实际中的回落。 134 | 135 | [`static_assertions`]: https://crates.io/crates/static_assertions 136 | 137 | -------------------------------------------------------------------------------- /src/wrapper-types.md: -------------------------------------------------------------------------------- 1 | # Wrapper Types 2 | 3 | Rust has a variety of "wrapper" types, such as [`RefCell`] and [`Mutex`], that 4 | provide special behavior for values. Accessing these values can take a 5 | non-trivial amount of time. If multiple such values are typically accessed 6 | together, it may be better to put them within a single wrapper. 7 | 8 | [`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html 9 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html 10 | 11 | For example, a struct like this: 12 | ```rust 13 | # use std::sync::{Arc, Mutex}; 14 | struct S { 15 | x: Arc>, 16 | y: Arc>, 17 | } 18 | ``` 19 | may be better represented like this: 20 | ```rust 21 | # use std::sync::{Arc, Mutex}; 22 | struct S { 23 | xy: Arc>, 24 | } 25 | ``` 26 | Whether or not this helps performance will depend on the exact access patterns 27 | of the values. 28 | [**Example**](https://github.com/rust-lang/rust/pull/68694/commits/7426853ba255940b880f2e7f8026d60b94b42404). 29 | -------------------------------------------------------------------------------- /src/wrapper-types_zh.md: -------------------------------------------------------------------------------- 1 | # Wrapper Types 2 | 3 | Rust有多种 "封装 "类型,如[`RefCell`]和[`Mutex`],它们为值提供了特殊行为。访问这些值可能会耗费大量的时间。如果多个这样的值通常是一起访问的,那么最好将它们放在一个包装器中。 4 | 5 | [`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html 6 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html 7 | 8 | 例如,这样的结构。 9 | ```rust 10 | # use std::sync::{Arc, Mutex}; 11 | struct S { 12 | x: Arc>, 13 | y: Arc>, 14 | } 15 | ``` 16 | 也许这样更典型。 17 | ```rust 18 | # use std::sync::{Arc, Mutex}; 19 | struct S { 20 | xy: Arc>, 21 | } 22 | ``` 23 | 这是否有助于性能,将取决于值的具体访问模式。 24 | [**Example**](https://github.com/rust-lang/rust/pull/68694/commits/7426853ba255940b880f2e7f8026d60b94b42404). 25 | --------------------------------------------------------------------------------