├── .editorconfig
├── .github
    └── workflows
    │   └── ci.yml
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE-APACHE
├── LICENSE-MIT
├── README.md
├── book.toml
├── history.txt
└── src
    ├── SUMMARY.md
    ├── benchmarking.md
    ├── benchmarking_zh.md
    ├── bounds-checks.md
    ├── bounds-checks_zh.md
    ├── build-configuration.md
    ├── build-configuration_zh.md
    ├── compile-times.md
    ├── compile-times_zh.md
    ├── general-tips.md
    ├── general-tips_zh.md
    ├── hashing.md
    ├── hashing_zh.md
    ├── heap-allocations.md
    ├── heap-allocations_zh.md
    ├── inlining.md
    ├── inlining_zh.md
    ├── introduction.md
    ├── introduction_zh.md
    ├── io.md
    ├── io_zh.md
    ├── iterators.md
    ├── iterators_zh.md
    ├── linting.md
    ├── linting_zh.md
    ├── logging-and-debugging.md
    ├── logging-and-debugging_zh.md
    ├── machine-code.md
    ├── machine-code_zh.md
    ├── parallelism.md
    ├── parallelism_zh.md
    ├── profiling.md
    ├── profiling_zh.md
    ├── standard-library-types.md
    ├── standard-library-types_zh.md
    ├── title-page.md
    ├── type-sizes.md
    ├── type-sizes_zh.md
    ├── wrapper-types.md
    └── wrapper-types_zh.md


/.editorconfig:
--------------------------------------------------------------------------------
1 | [*.md]
2 | max_line_length = 79
3 | 


--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------
 1 | name: CI
 2 | 
 3 | on:
 4 |   pull_request:
 5 |   push:
 6 |     branches:
 7 |       - master
 8 | 
 9 | jobs:
10 |   test_and_maybe_deploy:
11 |     runs-on: ubuntu-latest
12 |     steps:
13 |       - name: Clone repository
14 |         uses: actions/checkout@v3
15 | 
16 |       - name: Setup mdbook
17 |         uses: peaceiris/actions-mdbook@v1
18 |         with:
19 |           mdbook-version: 'latest'
20 | 
21 |       # EPUB
22 |       # Currently disabled due to
23 |       # https://github.com/nnethercote/perf-book/actions/runs/6358429874/job/17270643057
24 |       #- name: Setup mdbook-epub
25 |       #  run: cargo install mdbook-epub
26 | 
27 |       - name: Build
28 |         run: mdbook build
29 | 
30 |       - name: Test
31 |         run: mdbook test
32 | 
33 |       # EPUB
34 |       #- name: Copy ePub
35 |       #  run: cp book/epub/The\ Rust\ Performance\ Book.epub book/html
36 | 
37 |       - name: Deploy
38 |         uses: peaceiris/actions-gh-pages@v3
39 |         with:
40 |           github_token: ${{ secrets.GITHUB_TOKEN }}
41 |           #publish_dir: ./book/html  # use if EPUB is enabled
42 |           publish_dir: ./book      # use if EPUB is disabled
43 |         # Only deploy on a push to master, not on a pull request.
44 |         if: github.event_name == 'push' && github.ref == 'refs/heads/master' && github.repository == 'Blues-star/perf-book-zh'
45 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | book
2 | 
3 | # Prevent Vim swap files from making `mdbook serve` regenerate HTML frequently.
4 | *.sw*
5 | 
6 | # Also `diff` files, which I generate a lot.
7 | diff
8 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # The Rust Performance Book Code of Conduct
2 | 
3 | This repository uses the [Rust Code of Conduct].
4 | 
5 | [Rust Code of Conduct]: https://www.rust-lang.org/conduct.html
6 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to The Rust Performance Book
 2 | 
 3 | Please follow these style guidelines when contributing to the book.
 4 | 
 5 | ## Line Lengths
 6 | 
 7 | Lines of text are limited to 79 characters. (There is a `.editorconfig` file
 8 | that specifies this.) Lines containing non-text elements, such as links, can be
 9 | longer.
10 | 
11 | ## Examples
12 | 
13 | Links to examples that demonstrate performance techniques on real-world
14 | programs are encouraged. These examples might be pull requests, blog posts,
15 | etc.
16 | 
17 | Single examples are written like this:
18 | ```markdown
19 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe).
20 | ```
21 | 
22 | Multiple examples are written like this:
23 | ```markdown
24 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323),
25 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7).
26 | ```
27 | 
28 | ## Title Style
29 | 
30 | Section titles are capitalized, which means that all words within the title are
31 | capitalized, other than "small" words such as conjunctions. For example, "Using
32 | an Alternative Allocator", rather than "Using an alternative allocator".
33 | 
34 | ## External Link Style
35 | 
36 | For external links—those that point outside the book—reference links are
37 | preferred to inline links. For example, this:
38 | ```markdown
39 | The book's title is [The Rust Performance Book].
40 | 
41 | [The Rust Performance Book]: https://nnethercote.github.io/perf-book/
42 | ```
43 | is preferred to this:
44 | ```markdown
45 | The book's title is [The Rust Performance Book](https://nnethercote.github.io/perf-book/).
46 | ```
47 | The reason for this preference is that external links are usually relatively
48 | long, and long inline links often break awkwardly across lines.
49 | 
50 | One exception to this rule is that **Example** links are inline, with each one
51 | put on its own line, as seen above.
52 | 


--------------------------------------------------------------------------------
/LICENSE-APACHE:
--------------------------------------------------------------------------------
  1 |                               Apache License
  2 |                         Version 2.0, January 2004
  3 |                      http://www.apache.org/licenses/
  4 | 
  5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 | 1. Definitions.
  8 | 
  9 |    "License" shall mean the terms and conditions for use, reproduction,
 10 |    and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |    "Licensor" shall mean the copyright owner or entity authorized by
 13 |    the copyright owner that is granting the License.
 14 | 
 15 |    "Legal Entity" shall mean the union of the acting entity and all
 16 |    other entities that control, are controlled by, or are under common
 17 |    control with that entity. For the purposes of this definition,
 18 |    "control" means (i) the power, direct or indirect, to cause the
 19 |    direction or management of such entity, whether by contract or
 20 |    otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |    outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |    "You" (or "Your") shall mean an individual or Legal Entity
 24 |    exercising permissions granted by this License.
 25 | 
 26 |    "Source" form shall mean the preferred form for making modifications,
 27 |    including but not limited to software source code, documentation
 28 |    source, and configuration files.
 29 | 
 30 |    "Object" form shall mean any form resulting from mechanical
 31 |    transformation or translation of a Source form, including but
 32 |    not limited to compiled object code, generated documentation,
 33 |    and conversions to other media types.
 34 | 
 35 |    "Work" shall mean the work of authorship, whether in Source or
 36 |    Object form, made available under the License, as indicated by a
 37 |    copyright notice that is included in or attached to the work
 38 |    (an example is provided in the Appendix below).
 39 | 
 40 |    "Derivative Works" shall mean any work, whether in Source or Object
 41 |    form, that is based on (or derived from) the Work and for which the
 42 |    editorial revisions, annotations, elaborations, or other modifications
 43 |    represent, as a whole, an original work of authorship. For the purposes
 44 |    of this License, Derivative Works shall not include works that remain
 45 |    separable from, or merely link (or bind by name) to the interfaces of,
 46 |    the Work and Derivative Works thereof.
 47 | 
 48 |    "Contribution" shall mean any work of authorship, including
 49 |    the original version of the Work and any modifications or additions
 50 |    to that Work or Derivative Works thereof, that is intentionally
 51 |    submitted to Licensor for inclusion in the Work by the copyright owner
 52 |    or by an individual or Legal Entity authorized to submit on behalf of
 53 |    the copyright owner. For the purposes of this definition, "submitted"
 54 |    means any form of electronic, verbal, or written communication sent
 55 |    to the Licensor or its representatives, including but not limited to
 56 |    communication on electronic mailing lists, source code control systems,
 57 |    and issue tracking systems that are managed by, or on behalf of, the
 58 |    Licensor for the purpose of discussing and improving the Work, but
 59 |    excluding communication that is conspicuously marked or otherwise
 60 |    designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |    "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |    on behalf of whom a Contribution has been received by Licensor and
 64 |    subsequently incorporated within the Work.
 65 | 
 66 | 2. Grant of Copyright License. Subject to the terms and conditions of
 67 |    this License, each Contributor hereby grants to You a perpetual,
 68 |    worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |    copyright license to reproduce, prepare Derivative Works of,
 70 |    publicly display, publicly perform, sublicense, and distribute the
 71 |    Work and such Derivative Works in Source or Object form.
 72 | 
 73 | 3. Grant of Patent License. Subject to the terms and conditions of
 74 |    this License, each Contributor hereby grants to You a perpetual,
 75 |    worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |    (except as stated in this section) patent license to make, have made,
 77 |    use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |    where such license applies only to those patent claims licensable
 79 |    by such Contributor that are necessarily infringed by their
 80 |    Contribution(s) alone or by combination of their Contribution(s)
 81 |    with the Work to which such Contribution(s) was submitted. If You
 82 |    institute patent litigation against any entity (including a
 83 |    cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |    or a Contribution incorporated within the Work constitutes direct
 85 |    or contributory patent infringement, then any patent licenses
 86 |    granted to You under this License for that Work shall terminate
 87 |    as of the date such litigation is filed.
 88 | 
 89 | 4. Redistribution. You may reproduce and distribute copies of the
 90 |    Work or Derivative Works thereof in any medium, with or without
 91 |    modifications, and in Source or Object form, provided that You
 92 |    meet the following conditions:
 93 | 
 94 |    (a) You must give any other recipients of the Work or
 95 |        Derivative Works a copy of this License; and
 96 | 
 97 |    (b) You must cause any modified files to carry prominent notices
 98 |        stating that You changed the files; and
 99 | 
100 |    (c) You must retain, in the Source form of any Derivative Works
101 |        that You distribute, all copyright, patent, trademark, and
102 |        attribution notices from the Source form of the Work,
103 |        excluding those notices that do not pertain to any part of
104 |        the Derivative Works; and
105 | 
106 |    (d) If the Work includes a "NOTICE" text file as part of its
107 |        distribution, then any Derivative Works that You distribute must
108 |        include a readable copy of the attribution notices contained
109 |        within such NOTICE file, excluding those notices that do not
110 |        pertain to any part of the Derivative Works, in at least one
111 |        of the following places: within a NOTICE text file distributed
112 |        as part of the Derivative Works; within the Source form or
113 |        documentation, if provided along with the Derivative Works; or,
114 |        within a display generated by the Derivative Works, if and
115 |        wherever such third-party notices normally appear. The contents
116 |        of the NOTICE file are for informational purposes only and
117 |        do not modify the License. You may add Your own attribution
118 |        notices within Derivative Works that You distribute, alongside
119 |        or as an addendum to the NOTICE text from the Work, provided
120 |        that such additional attribution notices cannot be construed
121 |        as modifying the License.
122 | 
123 |    You may add Your own copyright statement to Your modifications and
124 |    may provide additional or different license terms and conditions
125 |    for use, reproduction, or distribution of Your modifications, or
126 |    for any such Derivative Works as a whole, provided Your use,
127 |    reproduction, and distribution of the Work otherwise complies with
128 |    the conditions stated in this License.
129 | 
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 |    any Contribution intentionally submitted for inclusion in the Work
132 |    by You to the Licensor shall be under the terms and conditions of
133 |    this License, without any additional terms or conditions.
134 |    Notwithstanding the above, nothing herein shall supersede or modify
135 |    the terms of any separate license agreement you may have executed
136 |    with Licensor regarding such Contributions.
137 | 
138 | 6. Trademarks. This License does not grant permission to use the trade
139 |    names, trademarks, service marks, or product names of the Licensor,
140 |    except as required for reasonable and customary use in describing the
141 |    origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 |    agreed to in writing, Licensor provides the Work (and each
145 |    Contributor provides its Contributions) on an "AS IS" BASIS,
146 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |    implied, including, without limitation, any warranties or conditions
148 |    of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |    PARTICULAR PURPOSE. You are solely responsible for determining the
150 |    appropriateness of using or redistributing the Work and assume any
151 |    risks associated with Your exercise of permissions under this License.
152 | 
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 |    whether in tort (including negligence), contract, or otherwise,
155 |    unless required by applicable law (such as deliberate and grossly
156 |    negligent acts) or agreed to in writing, shall any Contributor be
157 |    liable to You for damages, including any direct, indirect, special,
158 |    incidental, or consequential damages of any character arising as a
159 |    result of this License or out of the use or inability to use the
160 |    Work (including but not limited to damages for loss of goodwill,
161 |    work stoppage, computer failure or malfunction, or any and all
162 |    other commercial damages or losses), even if such Contributor
163 |    has been advised of the possibility of such damages.
164 | 
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 |    the Work or Derivative Works thereof, You may choose to offer,
167 |    and charge a fee for, acceptance of support, warranty, indemnity,
168 |    or other liability obligations and/or rights consistent with this
169 |    License. However, in accepting such obligations, You may act only
170 |    on Your own behalf and on Your sole responsibility, not on behalf
171 |    of any other Contributor, and only if You agree to indemnify,
172 |    defend, and hold each Contributor harmless for any liability
173 |    incurred by, or claims asserted against, such Contributor by reason
174 |    of your accepting any such warranty or additional liability.
175 | 
176 | END OF TERMS AND CONDITIONS
177 | 


--------------------------------------------------------------------------------
/LICENSE-MIT:
--------------------------------------------------------------------------------
 1 | Permission is hereby granted, free of charge, to any
 2 | person obtaining a copy of this software and associated
 3 | documentation files (the "Software"), to deal in the
 4 | Software without restriction, including without
 5 | limitation the rights to use, copy, modify, merge,
 6 | publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software
 8 | is furnished to do so, subject to the following
 9 | conditions:
10 | 
11 | The above copyright notice and this permission notice
12 | shall be included in all copies or substantial portions
13 | of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
16 | ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
17 | TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
18 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
19 | SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
20 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
22 | IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
23 | DEALINGS IN THE SOFTWARE.
24 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # perf-book—zh
 2 | 
 3 | RUST性能手册中文版
 4 | 
 5 | ## 查看
 6 | 
 7 | 英文版在线查看 [here](https://nnethercote.github.io/perf-book/).
 8 | 中文版在线查看 [here](https://blues-star.github.io/perf-book-zh/)
 9 | 
10 | ## 构建
11 | 
12 | 本书使用 [`mdbook`](https://github.com/rust-lang/mdBook) 构建, mdbook可以用以下命令安装:
13 | ```
14 | cargo install mdbook
15 | ```
16 | 运行以下命令以编译本书:
17 | ```
18 | mdbook build
19 | ```
20 | 生成的文件将被保存在`\book`目录.
21 | 
22 | ## 开发
23 | 
24 | To view the built book, run this command:
25 | ```
26 | mdbook serve
27 | ```
28 | This will launch a local web server to serve the book. View the built book by
29 | navigating to `localhost:3000` in a web browser. While the web server is
30 | running, the rendered book will automatically update if the book's files
31 | change.
32 | 
33 | To test the code within the book, run this command:
34 | ```
35 | mdbook test
36 | ```
37 | 
38 | ## Improvements
39 | 
40 | Suggestions for improvements are welcome, but I prefer them to be filed as
41 | issues rather than pull requests. This is because I am very particular about
42 | the wording used in the book. When pull requests are made, I typically take the
43 | underlying idea of a pull request and rewrite it into my own words anyway.
44 | 
45 | ## License
46 | 
47 | Licensed under either of
48 | * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or
49 |   http://www.apache.org/licenses/LICENSE-2.0)
50 | * MIT license ([LICENSE-MIT](LICENSE-MIT) or
51 |   http://opensource.org/licenses/MIT)
52 | 
53 | at your option.
54 | 
55 | ## Contribution
56 | 
57 | Unless you explicitly state otherwise, any contribution intentionally submitted
58 | for inclusion in the work by you, as defined in the Apache-2.0 license, shall
59 | be dual licensed as above, without any additional terms or conditions.
60 | 


--------------------------------------------------------------------------------
/book.toml:
--------------------------------------------------------------------------------
 1 | [book]
 2 | title = "The Rust Performance Book"
 3 | authors = ["Nicholas Nethercote"]
 4 | src = "src"
 5 | language = "zh"
 6 | multilingual = false
 7 | 
 8 | [build]
 9 | create-missing = false
10 | 
11 | [rust]
12 | edition = "2018"
13 | 
14 | [output.html]
15 | curly-quotes = true
16 | default-theme = "rust"
17 | git-repository-url = "https://github.com/Blues-star/perf-book-zh/"
18 | site-url = "https://blues-star.github.io/perf-book-zh/"
19 | 


--------------------------------------------------------------------------------
/history.txt:
--------------------------------------------------------------------------------
 1 |         modified:   .github/workflows/ci.yml
 2 |         modified:   .gitignore
 3 |         modified:   src/benchmarking.md
 4 |         new file:   src/bounds-checks.md
 5 |         modified:   src/build-configuration.md
 6 |         modified:   src/compile-times.md
 7 |         modified:   src/general-tips.md
 8 |         modified:   src/hashing.md
 9 |         modified:   src/heap-allocations.md
10 |         modified:   src/inlining.md
11 |         modified:   src/introduction.md
12 |         modified:   src/io.md
13 |         modified:   src/iterators.md
14 |         modified:   src/linting.md
15 |         modified:   src/logging-and-debugging.md
16 |         modified:   src/machine-code.md
17 |         modified:   src/profiling.md
18 |         modified:   src/standard-library-types.md
19 |         modified:   src/type-sizes.md


--------------------------------------------------------------------------------
/src/SUMMARY.md:
--------------------------------------------------------------------------------
 1 | # Summary
 2 | 
 3 | [Title Page](title-page.md)
 4 | 
 5 | - [简介](introduction_zh.md)
 6 | - [基准分析](benchmarking_zh.md)
 7 | - [构建配置](build-configuration_zh.md)
 8 | - [Linting](linting_zh.md)
 9 | - [Profiling](profiling_zh.md)
10 | - [初始化](inlining_zh.md)
11 | - [哈希](hashing_zh.md)
12 | - [堆分配](heap-allocations_zh.md)
13 | - [类型大小](type-sizes_zh.md)
14 | - [标准库类型](standard-library-types_zh.md)
15 | - [迭代器](iterators_zh.md)
16 | - [边界检查](bounds-checks_zh.md)
17 | - [I/O](io_zh.md)
18 | - [日志和调试](logging-and-debugging_zh.md)
19 | - [封装类型](wrapper-types_zh.md)
20 | - [机器码](machine-code_zh.md)
21 | - [并行](parallelism_zh.md)
22 | - [一般提示](general-tips_zh.md)
23 | - [编译时间](compile-times_zh.md)
24 | 
25 | 


--------------------------------------------------------------------------------
/src/benchmarking.md:
--------------------------------------------------------------------------------
 1 | # Benchmarking
 2 | 
 3 | Benchmarking typically involves comparing the performance of two or more
 4 | programs that do the same thing. Sometimes this might involve comparing two or
 5 | more different programs, e.g. Firefox vs Safari vs Chrome. Sometimes it
 6 | involves comparing two different versions of the same program. This latter case
 7 | lets us reliably answer the question "did this change speed things up?"
 8 | 
 9 | Benchmarking is a complex topic and a thorough coverage is beyond the scope of
10 | this book, but here are the basics.
11 | 
12 | First, you need workloads to measure. Ideally, you would have a variety of
13 | workloads that represent realistic usage of your program. Workloads using
14 | real-world inputs are best, but [microbenchmarks] and [stress tests] can be
15 | useful in moderation.
16 | 
17 | [microbenchmarks]: https://stackoverflow.com/questions/2842695/what-is-microbenchmarking
18 | [stress tests]: https://en.wikipedia.org/wiki/Stress_testing_(software)
19 | 
20 | Second, you need a way to run the workloads, which will also dictate the
21 | metrics used.
22 | - Rust's built-in [benchmark tests] are a simple starting point, but they use
23 |   unstable features and therefore only work on nightly Rust.
24 | - [Criterion] and [Divan] are more sophisticated alternatives.
25 | - [Hyperfine] is an excellent general-purpose benchmarking tool.
26 | - Custom benchmarking harnesses are also possible. For example, [rustc-perf] is
27 |   the harness used to benchmark the Rust compiler.
28 | 
29 | [benchmark tests]: https://doc.rust-lang.org/nightly/unstable-book/library-features/test.html
30 | [Criterion]: https://github.com/bheisler/criterion.rs
31 | [Divan]: https://github.com/nvzqz/divan
32 | [Hyperfine]: https://github.com/sharkdp/hyperfine
33 | [rustc-perf]: https://github.com/rust-lang/rustc-perf/
34 | 
35 | When it comes to metrics, there are many choices, and the right one(s) will
36 | depend on the nature of the program being benchmarked. For example, metrics
37 | that make sense for a batch program might not make sense for an interactive
38 | program. Wall-time is an obvious choice in many cases because it corresponds to
39 | what users perceive. However, it can suffer from high variance. In particular,
40 | tiny changes in memory layout can cause significant but ephemeral performance
41 | fluctuations. Therefore, other metrics with lower variance (such as cycles or
42 | instruction counts) may be a reasonable alternative.
43 | 
44 | Summarizing measurements from multiple workloads is also a challenge, and there
45 | are a variety of ways to do it, with no single method being obviously best.
46 | 
47 | Good benchmarking is hard. Having said that, do not stress too much about
48 | having a perfect benchmarking setup, particularly when you start optimizing a
49 | program. Mediocre benchmarking is far better than no benchmarking. Keep an open
50 | mind about what you are measuring, and over time you can make benchmarking
51 | improvements as you learn about the performance characteristics of your
52 | program.
53 | 


--------------------------------------------------------------------------------
/src/benchmarking_zh.md:
--------------------------------------------------------------------------------
 1 | # Benchmarking
 2 | 
 3 | 基准测试通常涉及比较执行相同任务的两个或多个程序的性能。有时可能涉及比较两个或多个不同的程序，例如 `Firefox` vs `Safari` vs `Chrome`。有时涉及比较同一程序的两个不同版本。后一种情况让我们能够可靠地回答问题“这个变化是否加快了速度？”
 4 | 
 5 | 基准测试是一个复杂的主题，全面覆盖超出了本书的范围，但以下是基础知识。
 6 | 
 7 | 首先，您需要工作负载来进行测量。理想情况下，您会有各种代表程序实际使用情况的工作负载。使用真实世界输入的工作负载最好，但[microbenchmarks]和[压力测试]在适度的情况下也是有用的。
 8 | 
 9 | [microbenchmarks]: https://stackoverflow.com/questions/2842695/what-is-microbenchmarking
10 | [压力测试]: https://en.wikipedia.org/wiki/Stress_testing_(software)
11 | 
12 | 其次，您需要一种运行工作负载的方式，这也将决定所使用的度量标准。
13 | 
14 | Rust 内置的[benchmark tests]是一个简单的起点，但它们使用不稳定的功能，因此仅适用于夜间版的 Rust。
15 | [Criterion] 和 [Divan] 是更复杂的替代方案。
16 | [Hyperfine] 是一个出色的通用基准测试工具。
17 | 也可以使用自定义基准测试工具。例如，[rustc-perf] 是用于对 Rust 编译器进行基准测试的工具。
18 | 
19 | [benchmark tests]: https://doc.rust-lang.org/nightly/unstable-book/library-features/test.html
20 | [Criterion]: https://github.com/bheisler/criterion.rs
21 | [Divan]: https://github.com/nvzqz/divan
22 | [Hyperfine]: https://github.com/sharkdp/hyperfine
23 | [rustc-perf]: https://github.com/rust-lang/rustc-perf/
24 | 
25 | 在度量标准方面，有许多选择，选择合适的度量标准取决于正在进行基准测试的程序的性质。例如，对于`批处理程序`(batch program)有意义的度量标准可能对`交互式程序`(interactive program)没有意义。在许多情况下，`Wall-time`是一个显而易见的选择，因为它对应于用户的感知。然而，它可能受到高方差的影响。特别是，内存布局中微小的变化可能导致显著但短暂的性能波动。因此，具有较低方差的其他度量标准（如周期cycles或指令计数）可能是一个合理的替代方案。
26 | 
27 | 总结来自多个工作负载的测量结果也是一个挑战，有许多方法可以做到这一点，没有一种方法显然是最好的。
28 | 
29 | 良好的基准测试很困难。话虽如此，在拟进行程序优化时，不要过分强调拥有完美的基准测试设置，尤其是在开始优化程序时。一般的基准测试要比没有基准测试好得多。保持对您正在测量的内容开放的态度，随着时间的推移，您可以根据了解到的程序性能特征进行基准测试改进。
30 | 
31 | 


--------------------------------------------------------------------------------
/src/bounds-checks.md:
--------------------------------------------------------------------------------
 1 | # Bounds Checks
 2 | 
 3 | By default, accesses to container types such as slices and vectors involve
 4 | bounds checks in Rust. These can affect performance, e.g. within hot loops,
 5 | though less often than you might expect.
 6 | 
 7 | There are several safe ways to change code so that the compiler knows about
 8 | container lengths and can optimize away bounds checks.
 9 | 
10 | - Replace direct element accesses in a loop by using iteration.
11 | - Instead of indexing into a `Vec` within a loop, make a slice of the `Vec`
12 |   before the loop and then index into the slice within the loop.
13 | - Add assertions on the ranges of index variables.
14 | [**Example 1**](https://github.com/rust-random/rand/pull/960/commits/de9dfdd86851032d942eb583d8d438e06085867b),
15 | [**Example 2**](https://github.com/image-rs/jpeg-decoder/pull/167/files).
16 | 
17 | Getting these to work can be tricky. The [Bounds Check Cookbook] goes into more
18 | detail on this topic.
19 | 
20 | [Bounds Check Cookbook]: https://github.com/Shnatsel/bounds-check-cookbook/
21 | 
22 | As a last resort, there are the unsafe methods [`get_unchecked`] and
23 | [`get_unchecked_mut`].
24 | 
25 | [`get_unchecked`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked
26 | [`get_unchecked_mut`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked_mut
27 | 
28 | 


--------------------------------------------------------------------------------
/src/bounds-checks_zh.md:
--------------------------------------------------------------------------------
 1 | # Bounds Checks
 2 | 
 3 | 默认情况下，在 Rust 中，对切片和向量等容器类型的访问涉及边界检查。这可能会影响性能，例如在热循环中，尽管发生的频率可能不如您所预期的那样频繁。
 4 | 
 5 | 有几种安全的方法可以更改代码，以便编译器了解容器的长度并优化掉边界检查。
 6 | 
 7 | - 在循环中，通过迭代替换直接元素访问。
 8 | - 在循环中，不要对 Vec 进行索引，而是在循环之前创建 Vec 的切片，然后在循环中对切片进行索引。
 9 | - 在索引变量的范围上添加断言。
10 | [**Example 1**](https://github.com/rust-random/rand/pull/960/commits/de9dfdd86851032d942eb583d8d438e06085867b),
11 | [**Example 2**](https://github.com/image-rs/jpeg-decoder/pull/167/files).
12 | 
13 | 让这些方法起作用可能有些棘手。[Bounds Check Cookbook]对这个主题进行了更详细的介绍
14 | 
15 | [Bounds Check Cookbook]: https://github.com/Shnatsel/bounds-check-cookbook/
16 | 
17 | 作为最后的手段，还有不安全的方法 [get_unchecked] 和 [get_unchecked_mut]。
18 | 
19 | [`get_unchecked`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked
20 | [`get_unchecked_mut`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked_mut
21 | 
22 | 


--------------------------------------------------------------------------------
/src/build-configuration.md:
--------------------------------------------------------------------------------
  1 | # Build Configuration
  2 | 
  3 | You can drastically change the performance of a Rust program without changing
  4 | its code, just by changing its build configuration. There are many possible
  5 | build configurations for each Rust program. The one chosen will affect several
  6 | characteristics of the compiled code, such as compile times, runtime speed,
  7 | memory use, binary size, debuggability, profilability, and which architectures
  8 | your compiled program will run on.
  9 | 
 10 | Most configuration choices will improve one or more characteristics while
 11 | worsening one or more others. For example, a common trade-off is to accept
 12 | worse compile times in exchange for higher runtime speeds. The right choice
 13 | for your program depends on your needs and the specifics of your program, and
 14 | performance-related choices (which is most of them) should be validated with
 15 | benchmarking.
 16 | 
 17 | Note that Cargo only looks at the profile settings in the `Cargo.toml` file at
 18 | the root of the workspace. Profile settings defined in dependencies are
 19 | ignored. Therefore, these options are mostly relevant for binary crates, not
 20 | library crates.
 21 | 
 22 | ## Release Builds
 23 | 
 24 | The single most important build configuration choice is simple but [easy to
 25 | overlook]: make sure you are using a [release build] rather than a [dev build]
 26 | when you want high performance. This is usually done by specifying the
 27 | `--release` flag to Cargo.
 28 | 
 29 | [easy to overlook]: https://users.rust-lang.org/t/why-my-rust-program-is-so-slow/47764/5
 30 | [release build]: https://doc.rust-lang.org/cargo/reference/profiles.html#release
 31 | [dev build]: https://doc.rust-lang.org/cargo/reference/profiles.html#dev
 32 | 
 33 | Dev builds are the default. They are good for debugging, but are not optimized.
 34 | They are produced if you run `cargo build` or `cargo run`. (Alternatively,
 35 | running `rustc` without additional options also produces an unoptimized build.)
 36 | 
 37 | Consider the following final line of output from a `cargo build` run.
 38 | ```text
 39 | Finished dev [unoptimized + debuginfo] target(s) in 29.80s
 40 | ```
 41 | This output indicates that a dev build has been produced. The compiled code
 42 | will be placed in the `target/debug/` directory. `cargo run` will run the dev
 43 | build.
 44 | 
 45 | In comparison, release builds are much more optimized, omit debug assertions
 46 | and integer overflow checks, and omit debug info. 10-100x speedups over dev
 47 | builds are common! They are produced if you run `cargo build --release` or
 48 | `cargo run --release`. (Alternatively, `rustc` has multiple options for
 49 | optimized builds, such as `-O` and `-C opt-level`.) This will typically take
 50 | longer than a dev build because of the additional optimizations.
 51 | 
 52 | Consider the following final line of output from a `cargo build --release` run.
 53 | ```text
 54 | Finished release [optimized] target(s) in 1m 01s
 55 | ```
 56 | This output indicates that a release build has been produced. The compiled code
 57 | will be placed in the `target/release/` directory. `cargo run --release` will
 58 | run the release build.
 59 | 
 60 | See the [Cargo profile documentation] for more details about the differences
 61 | between dev builds (which use the `dev` profile) and release builds (which use
 62 | the `release` profile).
 63 | 
 64 | [Cargo profile documentation]: https://doc.rust-lang.org/cargo/reference/profiles.html
 65 | 
 66 | The default build configuration choices used in release builds provide a good
 67 | balance between the abovementioned characteristics such as compile times, runtime
 68 | speed, and binary size. But there are many possible adjustments, as the
 69 | following sections explain.
 70 | 
 71 | ## Maximizing Runtime Speed
 72 | 
 73 | The following build configuration options are designed primarily to maximize
 74 | runtime speed. Some of them may also reduce binary size.
 75 | 
 76 | ### Codegen Units
 77 | 
 78 | The Rust compiler splits crates into multiple [codegen units] to parallelize
 79 | (and thus speed up) compilation. However, this might cause it to miss some
 80 | potential optimizations. You may be able to improve runtime speed and reduce
 81 | binary size, at the cost of increased compile times, by setting the number of
 82 | units to one. Add these lines to the `Cargo.toml` file:
 83 | ```toml
 84 | [profile.release]
 85 | codegen-units = 1
 86 | ```
 87 | <!-- Using `https` for this link triggers "potential security risk" warnings due
 88 | to a certificate problem. -->
 89 | [**Example 1**](http://likebike.com/posts/How_To_Write_Fast_Rust_Code.html#emit-asm),
 90 | [**Example 2**](https://github.com/rust-lang/rust/pull/115554#issuecomment-1742192440).
 91 | 
 92 | [codegen units]: https://doc.rust-lang.org/cargo/reference/profiles.html#codegen-units
 93 | 
 94 | ### Link-time Optimization
 95 | 
 96 | [Link-time optimization] (LTO) is a whole-program optimization technique that
 97 | can improve runtime speed by 10-20% or more, and also reduce binary size, at
 98 | the cost of worse compile times. It comes in several forms.
 99 | 
100 | [Link-time optimization]: https://doc.rust-lang.org/cargo/reference/profiles.html#lto
101 | 
102 | The first form of LTO is *thin local LTO*, a lightweight form of LTO. By
103 | default the compiler uses this for any build that involves a non-zero level of
104 | optimization. This includes release builds. To explicitly request this level of
105 | LTO, put these lines in the `Cargo.toml` file:
106 | ```toml
107 | [profile.release]
108 | lto = false
109 | ```
110 | 
111 | The second form of LTO is *thin LTO*, which is a little more aggressive, and
112 | likely to improve runtime speed and reduce binary size while also increasing
113 | compile times. Use `lto = "thin"` in `Cargo.toml` to enable it.
114 | 
115 | The third form of LTO is *fat LTO*, which is even more aggressive, and may
116 | improve performance and reduce binary size further while increasing build
117 | times again. Use `lto = "fat"` in `Cargo.toml` to enable it.
118 | 
119 | Finally, it is possible to fully disable LTO, which will likely worsen runtime
120 | speed and increase binary size but reduce compile times. Use `lto = "off"` in
121 | `Cargo.toml` for this. Note that this is different to the `lto = false` option,
122 | which, as mentioned above, leaves thin local LTO enabled.
123 | 
124 | ### Alternative Allocators
125 | 
126 | It is possible to replace the default (system) heap allocator used by a Rust
127 | program with an alternative allocator. The exact effect will depend on the
128 | individual program and the alternative allocator chosen, but large improvements
129 | in runtime speed and large reductions in memory usage have been seen in
130 | practice. The effect will also vary across platforms, because each platform's
131 | system allocator has its own strengths and weaknesses. The use of an
132 | alternative allocator is also likely to increase binary size and compile times.
133 | 
134 | #### jemalloc
135 | 
136 | One popular alternative allocator for Linux and Mac is [jemalloc], usable via
137 | the [`tikv-jemallocator`] crate. To use it, add a dependency to your
138 | `Cargo.toml` file:
139 | ```toml
140 | [dependencies]
141 | tikv-jemallocator = "0.5"
142 | ```
143 | Then add the following to your Rust code, e.g. at the top of `src/main.rs`:
144 | ```rust,ignore
145 | #[global_allocator]
146 | static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
147 | ```
148 | 
149 | Furthermore, on Linux, jemalloc can be configured to use [transparent huge
150 | pages][THP] (THP). This can further speed up programs, possibly at the cost of
151 | higher memory usage.
152 | 
153 | [THP]: https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html
154 | 
155 | Do this by setting the `MALLOC_CONF` environment variable appropriately before
156 | building your program, for example:
157 | ```bash
158 | MALLOC_CONF="thp:always,metadata_thp:always" cargo build --release
159 | ```
160 | The system running the compiled program also has to be configured to support
161 | THP. See [this blog post] for more details.
162 | 
163 | [this blog post]: https://kobzol.github.io/rust/rustc/2023/10/21/make-rust-compiler-5percent-faster.html
164 | 
165 | #### mimalloc
166 | 
167 | Another alternative allocator that works on many platforms is [mimalloc],
168 | usable via the [`mimalloc`] crate. To use it, add a dependency to your
169 | `Cargo.toml` file:
170 | ```toml
171 | [dependencies]
172 | mimalloc = "0.1"
173 | ```
174 | Then add the following to your Rust code, e.g. at the top of `src/main.rs`:
175 | ```rust,ignore
176 | #[global_allocator]
177 | static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
178 | ```
179 | 
180 | [jemalloc]: https://github.com/jemalloc/jemalloc
181 | [`tikv-jemallocator`]: https://crates.io/crates/tikv-jemallocator
182 | [better performance]: https://github.com/rust-lang/rust/pull/83152
183 | [mimalloc]: https://github.com/microsoft/mimalloc
184 | [`mimalloc`]: https://crates.io/crates/mimalloc
185 | 
186 | ### CPU Specific Instructions
187 | 
188 | If you do not care about the compatibility of your binary on older (or other
189 | types of) processors, you can tell the compiler to generate the newest (and
190 | potentially fastest) instructions specific to a [certain CPU architecture],
191 | such as AVX SIMD instructions for x86-64 CPUs.
192 | 
193 | [certain CPU architecture]: https://doc.rust-lang.org/rustc/codegen-options/index.html#target-cpu
194 | 
195 | To request these instructions from the command line, use the `-C
196 | target-cpu=native` flag. For example:
197 | ```bash
198 | RUSTFLAGS="-C target-cpu=native" cargo build --release
199 | ```
200 | 
201 | Alternatively, to request these instructions from a [`config.toml`] file (for
202 | one or more projects), add these lines:
203 | ```toml
204 | [build]
205 | rustflags = ["-C", "target-cpu=native"]
206 | ```
207 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
208 | 
209 | This can improve runtime speed, especially if the compiler finds vectorization
210 | opportunities in your code.
211 | 
212 | If you are unsure whether `-C target-cpu=native` is working optimally, compare
213 | the output of `rustc --print cfg` and `rustc --print cfg -C target-cpu=native`
214 | to see if the CPU features are being detected correctly in the latter case. If
215 | not, you can use `-C target-feature` to target specific features.
216 | 
217 | ### Profile-guided Optimization
218 | 
219 | Profile-guided optimization (PGO) is a compilation model where you compile
220 | your program, run it on sample data while collecting profiling data, and then
221 | use that profiling data to guide a second compilation of the program. This can
222 | improve runtime speed by 10% or more.
223 | [**Example 1**](https://blog.rust-lang.org/inside-rust/2020/11/11/exploring-pgo-for-the-rust-compiler.html),
224 | [**Example 2**](https://github.com/rust-lang/rust/pull/96978).
225 | 
226 | It is an advanced technique that takes some effort to set up, but is worthwhile
227 | in some cases. See the [rustc PGO documentation] for details. Also, the
228 | [`cargo-pgo`] command makes it easier to use PGO (and [BOLT], which is similar)
229 | to optimize Rust binaries.
230 | 
231 | Unfortunately, PGO is not supported for binaries hosted on crates.io and
232 | distributed via `cargo install`, which limits its usability.
233 | 
234 | [rustc PGO documentation]: https://doc.rust-lang.org/rustc/profile-guided-optimization.html
235 | [`cargo-pgo`]: https://github.com/Kobzol/cargo-pgo
236 | [BOLT]: https://github.com/llvm/llvm-project/tree/main/bolt
237 | 
238 | ## Minimizing Binary Size
239 | 
240 | The following build configuration options are designed primarily to minimize
241 | binary size. Their effects on runtime speed vary.
242 | 
243 | ### Optimization Level
244 | 
245 | You can request an [optimization level] that aims to minimize binary size by
246 | adding these lines to the `Cargo.toml` file:
247 | ```toml
248 | [profile.release]
249 | opt-level = "z"
250 | ```
251 | [optimization level]: https://doc.rust-lang.org/cargo/reference/profiles.html#opt-level
252 | 
253 | This may also reduce runtime speed.
254 | 
255 | An alternative is `opt-level = "s"`, which targets minimal binary size a little
256 | less aggressively. Compared to `opt-level = "z"`, it allows [slightly more
257 | inlining] and also the vectorization of loops.
258 | 
259 | [slightly more inlining]: https://doc.rust-lang.org/rustc/codegen-options/index.html#inline-threshold
260 | 
261 | ### Abort on `panic!`
262 | 
263 | If you do not need to unwind on panic, e.g. because your program doesn't use
264 | [`catch_unwind`], you can tell the compiler to simply [abort on panic]. On
265 | panic, your program will still produce a backtrace.
266 | 
267 | [`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
268 | [abort on panic]: https://doc.rust-lang.org/cargo/reference/profiles.html#panic
269 | 
270 | This might reduce binary size and increase runtime speed slightly, and may even
271 | reduce compile times slightly. Add these lines to the `Cargo.toml` file:
272 | ```toml
273 | [profile.release]
274 | panic = "abort"
275 | ```
276 | 
277 | 
278 | ### Strip Debug Info and Symbols
279 | 
280 | You can tell the compiler to [strip] debug info and symbols from the compiled
281 | binary. Add these lines to `Cargo.toml` to strip just debug info:
282 | ```toml
283 | [profile.release]
284 | strip = "debuginfo"
285 | ```
286 | Alternatively, use `strip = "symbols"` to strip both debug info and symbols.
287 | 
288 | [strip]: https://doc.rust-lang.org/cargo/reference/profiles.html#strip
289 | 
290 | Stripping debug info can greatly reduce binary size. On Linux, the binary size
291 | of a small Rust programs might shrink by 4x when debug info is stripped.
292 | Stripping symbols can also reduce binary size, though generally not by as much.
293 | [**Example**](https://github.com/nnethercote/counts/commit/53cab44cd09ff1aa80de70a6dbe1893ff8a41142).
294 | The exact effects are platform-dependent.
295 | 
296 | However, stripping makes your compiled program more difficult to debug and
297 | profile. For example, if a stripped program panics, the backtrace produced may
298 | contain less useful information than normal. The exact effects for the two
299 | levels of stripping depend on the platform.
300 | 
301 | ### Other Ideas
302 | 
303 | For more advanced binary size minimization techniques, consult the
304 | comprehensive documentation in the excellent [`min-sized-rust`] repository.
305 | 
306 | [`min-sized-rust`]: https://github.com/johnthagen/min-sized-rust
307 | 
308 | ## Minimizing Compile Times
309 | 
310 | The following build configuration options are designed primarily to minimize
311 | compile times.
312 | 
313 | ### Linking
314 | 
315 | A big part of compile time is actually linking time, particularly when
316 | rebuilding a program after a small change. It is possible to select a faster
317 | linker than the default one.
318 | 
319 | One option is [lld], which is available on Linux and Windows. To specify lld
320 | from the command line, use the `-C link-arg=-fuse-ld=lld` flag. For example:
321 | ```bash
322 | RUSTFLAGS="-C link-arg=-fuse-ld=lld" cargo build --release
323 | ```
324 | 
325 | [lld]: https://lld.llvm.org/
326 | 
327 | Alternatively, to specify lld from a [`config.toml`] file (for one or more
328 | projects), add these lines:
329 | ```toml
330 | [build]
331 | rustflags = ["-C", "link-arg=-fuse-ld=lld"]
332 | ```
333 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
334 | 
335 | lld is not fully supported for use with Rust, but it should work for most use
336 | cases on Linux and Windows. There is a [GitHub Issue] tracking full support for
337 | lld.
338 | 
339 | Another option is [mold], which is currently available on Linux and macOS.
340 | Simply substitute `mold` for `lld` in the instructions above. mold is often
341 | faster than lld. It is also much newer and may not work in all cases.
342 | 
343 | [mold]: https://github.com/rui314/mold
344 | 
345 | Unlike the other options in this chapter, there are no trade-offs here!
346 | Alternative linkers can be dramatically faster, without any downsides.
347 | 
348 | [GitHub Issue]: https://github.com/rust-lang/rust/issues/39915#issuecomment-618726211
349 | 
350 | ### Experimental Parallel Front-end
351 | 
352 | If you use nightly Rust, you can enable the experimental [parallel front-end].
353 | It may reduce compile times at the cost of higher compile-time memory usage. It
354 | won't affect the quality of the generated code.
355 | 
356 | [parallel front-end]: https://blog.rust-lang.org/2023/11/09/parallel-rustc.html
357 | 
358 | You can do that by adding `-Zthreads=N` to RUSTFLAGS, for example:
359 | ```bash
360 | RUSTFLAGS="-Zthreads=8" cargo build --release
361 | ```
362 | 
363 | Alternatively, to enable the parallel front-end from a [`config.toml`] file (for
364 | one or more projects), add these lines:
365 | ```toml
366 | [build]
367 | rustflags = ["-Z", "threads=8"]
368 | ```
369 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
370 | 
371 | Values other than `8` are possible, but that is the number that tends to give
372 | the best results.
373 | 
374 | In the best cases, the experimental parallel front-end reduces compile times by
375 | up to 50%. But the effects vary widely and depend on the characteristics of the
376 | code and its build configuration, and for some programs there is no compile
377 | time improvement.
378 | 
379 | ### Cranelift Codegen Back-end
380 | 
381 | If you use nightly Rust on x86-64/Linux or ARM/Linux, you can enable the
382 | Cranelift codegen back-end. It may reduce compile times at the cost of lower
383 | quality generated code, and therefore is recommended for dev builds rather than
384 | release builds.
385 | 
386 | First, install the back-end with this `rustup` command:
387 | ```bash
388 | rustup component add rustc-codegen-cranelift-preview --toolchain nightly
389 | ```
390 | 
391 | To select Cranelift from the command line, use the
392 | `-Zcodegen-backend=cranelift` flag. For example:
393 | ```bash
394 | RUSTFLAGS="-Zcodegen-backend=cranelift" cargo +nightly build
395 | ```
396 | 
397 | Alternatively, to specify Cranelift from a [`config.toml`] file (for one or
398 | more projects), add these lines:
399 | ```toml
400 | [unstable]
401 | codegen-backend = true
402 | 
403 | [profile.dev]
404 | codegen-backend = "cranelift"
405 | ```
406 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
407 | 
408 | For more information, see the [Cranelift documentation].
409 | 
410 | [Cranelift documentation]: https://github.com/rust-lang/rustc_codegen_cranelift
411 | 
412 | ## Custom profiles
413 | 
414 | In addition to the `dev` and `release` profiles, Cargo supports [custom
415 | profiles]. It might be useful, for example, to create a custom profile halfway
416 | between `dev` and `release` if you find the runtime speed of dev builds
417 | insufficient and the compile times of release builds too slow for everyday
418 | development.
419 | 
420 | [custom profiles]: https://doc.rust-lang.org/cargo/reference/profiles.html#custom-profiles
421 | 
422 | ## Summary
423 | 
424 | There are many choices to be made when it comes to build configurations. The
425 | following points summarize the above information into some recommendations.
426 | 
427 | - If you want to maximize runtime speed, consider all of the following:
428 |   `codegen-units = 1`, `lto = "fat"`, an alternative allocator, and `panic =
429 |   "abort"`.
430 | - If you want to minimize binary size, consider `opt-level = "z"`,
431 |   `codegen-units = 1`, `lto = "fat"`, `panic = "abort"`, and `strip =
432 |   "symbols"`.
433 | - In either case, consider `-C target-cpu=native` if broad architecture support
434 |   is not needed, and `cargo-pgo` if it works with your distribution mechanism.
435 | - Always use a faster linker if you are on a platform that supports it, because
436 |   there are no downsides to doing so.
437 | - Benchmark all changes, one at a time, to ensure they have the expected
438 |   effects.
439 | 
440 | Finally, [this issue] tracks the evolution of the Rust compiler's own build
441 | configuration. The Rust compiler's build system is stranger and more complex
442 | than that of most Rust programs. Nonetheless, this issue may be instructive in
443 | showing how build configuration choices can be applied to a large program.
444 | 
445 | [this issue]: https://github.com/rust-lang/rust/issues/103595
446 | 


--------------------------------------------------------------------------------
/src/build-configuration_zh.md:
--------------------------------------------------------------------------------
  1 | # 构建配置
  2 | 
  3 | 您可以通过更改构建配置而不更改代码，从而显著改变 Rust 程序的性能。对于每个 Rust 程序，都有许多可能的构建配置。所选择的配置将影响编译代码的几个特征，如编译时间、运行时速度、内存使用、二进制大小、调试性、性能分析性以及编译程序将在哪些架构上运行。
  4 | 
  5 | 大多数配置选择会改善一个或多个特征，同时恶化一个或多个其他特征。例如，一个常见的权衡是为了获得更高的运行时速度而接受更差的编译时间。对于您的程序来说，正确的选择取决于您的需求和程序的具体情况，与性能相关的选择（其中大部分都是）应该通过基准测试来验证。
  6 | 
  7 | 请注意，Cargo 只查看工作区根目录下 Cargo.toml 文件中的配置设置。在依赖项中定义的配置设置将被忽略。因此，这些选项主要与二进制 crate 相关，而不是库 crate。
  8 | 
  9 | ## 发布构建
 10 | 
 11 | 最重要的一个Rust性能提示很简单，但[很容易被忽视]：当你想要高性能时，确保你使用的是release构建而不是debug构建。这通常是通过在Cargo中指定`--release`标志来实现的。
 12 | 
 13 | [很容易被忽视]: https://users.rust-lang.org/t/why-my-rust-program-is-so-slow/47764/5
 14 | 
 15 | 开发构建是默认设置。它们适用于调试，但没有经过优化。如果运行 cargo build 或 cargo run，则会生成这些构建。（另外，运行 `rustc` 而不添加额外选项也会生成未经优化的构建。）
 16 | 
 17 | 考虑以下来自 cargo build 运行的输出的最后一行。
 18 | ```text
 19 | Finished dev [unoptimized + debuginfo] target(s) in 29.80s
 20 | ```
 21 | 这个输出表明已生成了一个开发构建。编译后的代码将放在 `target/debug/` 目录中。`cargo run` 将运行开发构建。
 22 | 
 23 | 相比之下，发布构建经过了更多优化，省略了调试断言和整数溢出检查，也省略了调试信息。相对于开发构建，通常可以实现 10-100 倍的速度提升！如果运行 `cargo build --release` 或 `cargo run --release`，则会生成这些构建。（另外，`rustc` 有多个选项用于优化构建，如 `-O` 和 `-C opt-level`。）由于额外的优化，这通常会比开发构建花费更长的时间。
 24 | 
 25 | 请看下面的`"cargo build --release"`运行的最后一行输出。
 26 | ```text
 27 | Finished release [optimized] target(s) in 1m 01s
 28 | ```
 29 | 这个输出表明已生成了一个发布构建。编译后的代码将放在 `target/release/` 目录中。`cargo run --release` 将运行发布构建。
 30 | 
 31 | 查看 [Cargo 配置文件文档] 以获取有关开发构建（使用 `dev` 配置文件）和发布构建（使用 `release` 配置文件）之间差异的更多详细信息。
 32 | 
 33 | [Cargo 配置文件文档]: https://doc.rust-lang.org/cargo/reference/profiles.html
 34 | 
 35 | 发布构建中使用的默认构建配置选择在编译时间、运行时速度和二进制文件大小等方面提供了良好的平衡。但正如下文所述，还有许多可能的调整。
 36 | 
 37 | ## 最大化运行时速度
 38 | 
 39 | 以下构建配置选项主要旨在最大化运行时速度。其中一些选项也可能会减小二进制文件大小。
 40 | 
 41 | ### codegen units
 42 | 
 43 | Rust编译器将crate分割为多个[codegen units]以并行化编译（从而加快速度）。然而，这可能导致它错过一些潜在的优化。您可以通过将单元数设置为1来提高运行时速度并减小二进制文件大小，但这会增加编译时间。请将以下行添加到`Cargo.toml`文件中：
 44 | ```toml
 45 | [profile.release]
 46 | codegen-units = 1
 47 | ```
 48 | <!-- Using `https` for this link triggers "potential security risk" warnings due
 49 | to a certificate problem. -->
 50 | [**Example 1**](http://likebike.com/posts/How_To_Write_Fast_Rust_Code.html#emit-asm),
 51 | [**Example 2**](https://github.com/rust-lang/rust/pull/115554#issuecomment-1742192440).
 52 | 
 53 | [codegen units]: https://doc.rust-lang.org/cargo/reference/profiles.html#codegen-units
 54 | 
 55 | ### 链接时优化
 56 | 
 57 | [链接时优化]（LTO）是一种整体程序优化技术，可以提高运行时速度10-20%或更多，并减小二进制文件大小，但会导致较差的编译时间。它有几种形式。
 58 | 
 59 | [链接时优化]: https://doc.rust-lang.org/cargo/reference/profiles.html#lto
 60 | 
 61 | LTO的第一种形式是thin local LTO，这是一种轻量级的LTO形式。默认情况下，编译器会在涉及非零优化级别的任何构建中使用此形式。这包括发布构建。要显式请求此级别的LTO，请将以下行放入Cargo.toml文件中：
 62 | ```toml
 63 | [profile.release]
 64 | lto = false
 65 | ```
 66 | LTO的第二种形式是thin LTO，它稍微更具侵略性，可能会提高运行时速度并减小二进制文件大小，同时也会增加编译时间。在Cargo.toml中使用lto = "thin"来启用它。
 67 | 
 68 | LTO的第三种形式是fat LTO，它更具侵略性，可能会进一步提高性能并减小二进制文件大小，同时再次增加构建时间。在Cargo.toml中使用lto = "fat"来启用它。
 69 | 
 70 | 最后，可以完全禁用LTO，这可能会降低运行时速度并增加二进制文件大小，但会减少编译时间。在Cargo.toml中使用lto = "off"来实现此目的。请注意，这与lto = false选项不同，如上所述，后者会保留thin local LTO。
 71 | 
 72 | ### 替代分配器
 73 | 可以使用替代分配器替换Rust程序使用的默认（系统）堆分配器。具体效果取决于个别程序和所选择的替代分配器，但在实践中已经看到了运行时速度大幅提升和内存使用大幅减少。效果还会因平台而异，因为每个平台的系统分配器都有其优势和劣势。使用替代分配器还可能增加二进制文件大小和编译时间。
 74 | 
 75 | #### jemalloc
 76 | 
 77 | 一种流行的适用于Linux和Mac的替代分配器是[jemalloc]，可通过[`tikv-jemallocator`] crate使用。要使用它，请在您的Cargo.toml文件中添加一个依赖项：
 78 | ```toml
 79 | [dependencies]
 80 | tikv-jemallocator = "0.5"
 81 | ```
 82 | 然后在您的Rust代码中添加以下内容，例如在`src/main.rs`的顶部：
 83 | ```rust,ignore
 84 | #[global_allocator]
 85 | static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
 86 | ```
 87 | 
 88 | 此外，在Linux上，jemalloc可以配置为使用[透明大页][THP]。这可以进一步加快程序的运行速度，可能会以更高的内存使用为代价。
 89 | 
 90 | [THP]: https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html
 91 | 
 92 | 在构建程序之前，通过适当设置 `MALLOC_CONF` 环境变量来执行此操作，例如：
 93 | ```bash
 94 | MALLOC_CONF="thp:always,metadata_thp:always" cargo build --release
 95 | ```
 96 | 运行编译程序的系统还必须配置为支持THP。有关更多详细信息，请参阅[此博客]。
 97 | 
 98 | [此博客]: https://kobzol.github.io/rust/rustc/2023/10/21/make-rust-compiler-5percent-faster.html
 99 | 
100 | #### mimalloc
101 | 
102 | 另一个适用于许多平台的替代分配器是[mimalloc]，可通过[mimalloc] crate使用。要使用它，请在您的`Cargo.toml`文件中添加一个依赖项：
103 | ```toml
104 | [dependencies]
105 | mimalloc = "0.1"
106 | ```
107 | 然后在您的Rust代码中添加以下内容，例如在`src/main.rs`的顶部：
108 | ```rust,ignore
109 | #[global_allocator]
110 | static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
111 | ```
112 | 
113 | [jemalloc]: https://github.com/jemalloc/jemalloc
114 | [`tikv-jemallocator`]: https://crates.io/crates/tikv-jemallocator
115 | [better performance]: https://github.com/rust-lang/rust/pull/83152
116 | [mimalloc]: https://github.com/microsoft/mimalloc
117 | [`mimalloc`]: https://crates.io/crates/mimalloc
118 | 
119 | ## 使用CPU专用指令
120 | 
121 | 如果您不关心二进制文件在旧版（或其他类型的）处理器上的兼容性，您可以告诉编译器生成针对特定CPU架构的最新（可能是最快的）指令，比如针对x86-64 CPU的AVX SIMD指令。
122 | 
123 | [特定CPU架构]: https://doc.rust-lang.org/1.41.1/rustc/codegen-options/index.html#target-cpu
124 | 
125 | 例如，如果你把`-C target-cpu=native`传给rustc，它将使用当前CPU的最佳指令。
126 | ```bash
127 | $ RUSTFLAGS="-C target-cpu=native" cargo build --release
128 | ```
129 | 或者，要从一个[config.toml]文件（用于一个或多个项目）中请求这些指令，请添加以下行：
130 | ```toml
131 | [build]
132 | rustflags = ["-C", "target-cpu=native"]
133 | ```
134 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
135 | 
136 | 这可能会有助于提升运行时的性能，特别是当编译器在你的代码中发现矢量化的机会时。
137 | 
138 | 如果您不确定`-C target-cpu=native`是否达到了最佳效果，请比较`rustc --print cfg`和`rustc --print cfg -C target-cpu=native`的输出，看看在后一种情况下是否正确检测到了CPU特性。如果没有，您可以使用`-C target-feature`来针对特定特性。
139 | 
140 | ## Profile-guided Optimization
141 | 
142 | Profile-guided optimization(PGO)是一种编译模式，即编译程序后，在收集样本数据的同时在样本数据上运行，然后用样本数据引导程序的第二次编译。这可以提升10%或更多的运行时性能
143 | [**Example 1**](https://blog.rust-lang.org/inside-rust/2020/11/11/exploring-pgo-for-the-rust-compiler.html),
144 | [**Example 2**](https://github.com/rust-lang/rust/pull/96978).
145 | 
146 | 这是一种高级技术，需要一些设置工作，但在某些情况下是值得的。详细信息请参阅[rustc PGO文档]。此外，[cargo-pgo]命令使使用PGO（以及类似的[BOLT]）来优化Rust二进制文件变得更加容易。
147 | 
148 | 不幸的是，对于托管在crates.io上并通过cargo install分发的二进制文件，不支持PGO，这限制了其可用性。
149 | 
150 | [rustc PGO文档]: https://doc.rust-lang.org/rustc/profile-guided-optimization.html
151 | [`cargo-pgo`]: https://github.com/Kobzol/cargo-pgo
152 | [BOLT]: https://github.com/llvm/llvm-project/tree/main/bolt
153 | 
154 | ## 最小化二进制文件大小
155 | 
156 | 以下构建配置选项主要旨在最小化二进制文件大小。它们对运行时速度的影响各不相同。
157 | 
158 | ### 优化级别
159 | 
160 | 您可以通过向`Cargo.toml`文件添加以下行来请求一个旨在最小化二进制文件大小的优化级别：
161 | ```toml
162 | [profile.release]
163 | opt-level = "z"
164 | ```
165 | [optimization level]: https://doc.rust-lang.org/cargo/reference/profiles.html#opt-level
166 | 
167 | 这可能会降低运行时速度。
168 | 
169 | 另一种选择是`opt-level = "s"`，它针对最小化二进制文件大小的目标略微不那么激进。与`opt-level = "z"`相比，它允许[稍微更多的内联]和循环的矢量化。
170 | 
171 | [稍微更多的内联]: https://doc.rust-lang.org/rustc/codegen-options/index.html#inline-threshold
172 | 
173 | ### 在`panic!`时中止
174 | 
175 | 如果您不需要在发生恐慌时展开，例如因为您的程序不使用[`catch_unwind`]，您可以告诉编译器在恐慌时简单地[abort on panic]。在发生恐慌时，您的程序仍将生成回溯信息。
176 | 
177 | [`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
178 | [abort on panic]: https://doc.rust-lang.org/cargo/reference/profiles.html#panic
179 | 
180 | 这可能会减小二进制文件大小并略微增加运行时速度，甚至可能会略微减少编译时间。将以下内容添加到`Cargo.toml`文件中：
181 | ```toml
182 | [profile.release]
183 | panic = "abort"
184 | ```
185 | 
186 | ### 剥离调试信息和符号
187 | 
188 | 您可以告诉编译器从编译后的二进制文件中[剥离]调试信息和符号。将以下内容添加到`Cargo.toml`中以仅剥离调试信息：
189 | 
190 | ```toml
191 | [profile.release]
192 | strip = "debuginfo"
193 | ```
194 | 
195 | 或者，使用`strip = "symbols"`来同时剥离调试信息和符号。
196 | 
197 | 剥离调试信息可以极大地减小二进制文件大小。在Linux上，当剥离调试信息时，一个小型Rust程序的二进制文件大小可能会缩小4倍。剥离符号也可以减小二进制文件大小，尽管通常不会减少那么多。[**示例**](https://github.com/nnethercote/counts/commit/53cab44cd09ff1aa80de70a6dbe1893ff8a41142)。具体效果取决于平台。
198 | 
199 | 然而，剥离会使您编译的程序更难以调试和分析性能。例如，如果一个被剥离的程序发生恐慌，生成的回溯信息可能会比正常情况下包含的信息更少。两种剥离级别的具体效果取决于平台。
200 | 
201 | ### 其他想法
202 | 
203 | 要了解更多高级的二进制文件大小最小化技术，请参考优秀的[`min-sized-rust`]存储库中的全面文档。
204 | 
205 | [`min-sized-rust`]: https://github.com/johnthagen/min-sized-rust
206 | 
207 | ## 最小化编译时间
208 | 
209 | 以下构建配置选项主要旨在最小化编译时间。
210 | 
211 | ### 链接
212 | 
213 | 编译时间的一个重要部分实际上是链接时间，特别是在对程序进行小改动后重新构建时。可以选择比默认链接器更快的链接器。
214 | 
215 | 一个选择是[lld]，它在Linux和Windows上都可用。要从命令行指定lld，请使用`-C link-arg=-fuse-ld=lld`标志。例如：
216 | ```bash
217 | RUSTFLAGS="-C link-arg=-fuse-ld=lld" cargo build --release
218 | ```
219 | 
220 | [lld]: https://lld.llvm.org/
221 | 
222 | 另一种方法是从[`config.toml`]文件（针对一个或多个项目）中指定lld，添加以下内容：
223 | ```toml
224 | [build]
225 | rustflags = ["-C", "link-arg=-fuse-ld=lld"]
226 | ```
227 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
228 | 
229 | lld目前并不完全支持与Rust一起使用，但在Linux和Windows上的大多数用例中应该可以工作。有一个[GitHub Issue]跟踪lld的完全支持。
230 | 
231 | 另一个选择是[mold]，目前在Linux和macOS上可用。只需在上述说明中用`mold`替换`lld`。mold通常比lld更快。它也要新得多，可能不适用于所有情况。
232 | 
233 | [mold]: https://github.com/rui314/mold
234 | 
235 | 与本章中的其他选项不同，这里没有任何权衡！替代链接器可以显著提高速度，而没有任何不利影响。
236 | 
237 | [GitHub Issue]: https://github.com/rust-lang/rust/issues/39915#issuecomment-618726211
238 | 
239 | ### 实验性并行前端
240 | 
241 | 如果您使用nightly版的Rust，可以启用实验性的[并行前端]。这可能会减少编译时间，但会增加编译时内存的使用。它不会影响生成的代码质量。
242 | 
243 | [并行前端]: https://blog.rust-lang.org/2023/11/09/parallel-rustc.html
244 | 
245 | 您可以通过将`-Zthreads=N`添加到RUSTFLAGS来实现，例如：
246 | ```bash
247 | RUSTFLAGS="-Zthreads=8" cargo build --release
248 | ```
249 | 
250 | 或者，要从[`config.toml`]文件（针对一个或多个项目）启用并行前端，添加以下内容：
251 | ```toml
252 | [build]
253 | rustflags = ["-Z", "threads=8"]
254 | ```
255 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
256 | 
257 | 除了`8`之外，还可以使用其他值，但这个数字通常会产生最佳结果。
258 | 
259 | 在最佳情况下，实验性并行前端可以将编译时间缩短高达50%。但效果因代码特性和构建配置的不同而异，对于某些程序，编译时间可能不会有所改喀。
260 | 
261 | ### Cranelift代码生成后端
262 | 
263 | 如果您在x86-64/Linux或ARM/Linux上使用nightly版的Rust，可以启用Cranelift代码生成后端。它可能会减少编译时间，但会以生成的代码质量降低为代价，因此建议用于开发构建而不是发布构建。
264 | 
265 | 首先，使用以下`rustup`命令安装后端：
266 | ```bash
267 | rustup component add rustc-codegen-cranelift-preview --toolchain nightly
268 | ```
269 | 
270 | 要从命令行选择Cranelift，请使用`-Zcodegen-backend=cranelift`标志。例如：
271 | ```bash
272 | RUSTFLAGS="-Zcodegen-backend=cranelift" cargo +nightly build
273 | ```
274 | 
275 | 或者，要从[`config.toml`]文件（针对一个或多个项目）指定Cranelift，添加以下内容：
276 | ```toml
277 | [unstable]
278 | codegen-backend = true
279 | 
280 | [profile.dev]
281 | codegen-backend = "cranelift"
282 | ```
283 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
284 | 
285 | 有关更多信息，请参阅[Cranelift文档]。
286 | 
287 | [Cranelift文档]: https://github.com/rust-lang/rustc_codegen_cranelift
288 | 
289 | ## 自定义配置文件
290 | 
291 | 除了`dev`和`release`配置文件外，Cargo还支持[自定义配置文件]。例如，如果您发现开发构建的运行时速度不够，发布构建的编译时间对日常开发来说太慢，那么创建一个介于`dev`和`release`之间的自定义配置文件可能会很有用。
292 | 
293 | [自定义配置文件]: https://doc.rust-lang.org/cargo/reference/profiles.html#custom-profiles
294 | 
295 | ## 总结
296 | 
297 | 在构建配置方面有许多选择需要考虑。以下总结了上述信息并提出了一些建议。
298 | 
299 | - 如果您想最大化运行时速度，请考虑以下所有内容：`codegen-units = 1`、`lto = "fat"`、替代分配器和`panic = "abort"`。
300 | - 如果您想最小化二进制文件大小，请考虑`opt-level = "z"`、`codegen-units = 1`、`lto = "fat"`、`panic = "abort"`和`strip = "symbols"`。
301 | - 在任何情况下，如果不需要广泛的架构支持，请考虑使用`-C target-cpu=native`，如果与您的分发机制兼容，请考虑使用`cargo-pgo`。
302 | - 如果您所在的平台支持更快的链接器，请始终使用它，因为这样做没有任何不利之处。
303 | - 逐个对所有更改进行基准测试，以确保它们产生预期效果。
304 | 
305 | 最后，[此问题]跟踪了Rust编译器自身构建配置的演变。Rust编译器的构建系统比大多数Rust程序更奇特和复杂。尽管如此，这个问题可能有助于展示如何将构建配置选择应用于大型程序。
306 | 
307 | [此问题]: https://github.com/rust-lang/rust/issues/103595


--------------------------------------------------------------------------------
/src/compile-times.md:
--------------------------------------------------------------------------------
 1 | # Compile Times
 2 | 
 3 | Although this book is primarily about improving the performance of Rust
 4 | programs, this section is about reducing the compile times of Rust programs,
 5 | because that is a related topic of interest to many people.
 6 | 
 7 | The [Minimizing Compile Times] section discussed ways to reduce compile times
 8 | via build configuration choices. The rest of this section discusses ways to
 9 | reduce compile times that require modifying your program's code.
10 | 
11 | [Minimizing Compile Times]: build-configuration.md#minimizing-compile-times
12 | 
13 | ## Visualization 
14 | 
15 | Cargo has a feature that lets you visualize compilation of your
16 | program. Build with this command:
17 | ```text
18 | cargo build --timings
19 | ```
20 | On completion it will print the name of an HTML file. Open that file in a web
21 | browser. It contains a [Gantt chart] that shows the dependencies between the
22 | various crates in your program. This shows how much parallelism there is in
23 | your crate graph, which can indicate if any large crates that serialize
24 | compilation should be broken up. See [the documentation][timings] for more
25 | details on how to read the graphs.
26 | 
27 | [Gantt chart]: https://en.wikipedia.org/wiki/Gantt_chart
28 | [timings]: https://doc.rust-lang.org/nightly/cargo/reference/timings.html
29 | 
30 | ## LLVM IR
31 | 
32 | The Rust compiler uses [LLVM] for its back-end. LLVM's execution can be a large
33 | part of compile times, especially when the Rust compiler's front end generates
34 | a lot of [IR] which takes LLVM a long time to optimize.
35 | 
36 | [LLVM]: https://llvm.org/
37 | [IR]: https://en.wikipedia.org/wiki/Intermediate_representation
38 | 
39 | These problems can be diagnosed with [`cargo llvm-lines`], which shows which
40 | Rust functions cause the most LLVM IR to be generated. Generic functions are
41 | often the most important ones, because they can be instantiated dozens or even
42 | hundreds of times in large programs.
43 | 
44 | [`cargo llvm-lines`]: https://github.com/dtolnay/cargo-llvm-lines/
45 | 
46 | If a generic function causes IR bloat, there are several ways to fix it. The
47 | simplest is to just make the function smaller.
48 | [**Example 1**](https://github.com/rust-lang/rust/pull/72166/commits/5a0ac0552e05c079f252482cfcdaab3c4b39d614),
49 | [**Example 2**](https://github.com/rust-lang/rust/pull/91246/commits/f3bda74d363a060ade5e5caeb654ba59bfed51a4).
50 | 
51 | Another way is to move the non-generic parts of the function into a separate,
52 | non-generic function, which will only be instantiated once. Whether this is
53 | possible will depend on the details of the generic function. When it is
54 | possible, the non-generic function can often be written neatly as an inner
55 | function within the generic function, as shown by the code for
56 | [`std::fs::read`]:
57 | ```rust,ignore
58 | pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
59 |     fn inner(path: &Path) -> io::Result<Vec<u8>> {
60 |         let mut file = File::open(path)?;
61 |         let size = file.metadata().map(|m| m.len()).unwrap_or(0);
62 |         let mut bytes = Vec::with_capacity(size as usize);
63 |         io::default_read_to_end(&mut file, &mut bytes)?;
64 |         Ok(bytes)
65 |     }
66 |     inner(path.as_ref())
67 | }
68 | ```
69 | [`std::fs::read`]: https://doc.rust-lang.org/std/fs/fn.read.html
70 | 
71 | [**Example**](https://github.com/rust-lang/rust/pull/72013/commits/68b75033ad78d88872450a81745cacfc11e58178).
72 | 
73 | Sometimes common utility functions like [`Option::map`] and [`Result::map_err`]
74 | are instantiated many times. Replacing them with equivalent `match` expressions
75 | can help compile times.
76 | 
77 | [`Option::map`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map
78 | [`Result::map_err`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_err
79 | 
80 | The effects of these sorts of changes on compile times will usually be small,
81 | though occasionally they can be large.
82 | [**Example**](https://github.com/servo/servo/issues/26585).
83 | 
84 | Such changes can also reduce binary size.
85 | 


--------------------------------------------------------------------------------
/src/compile-times_zh.md:
--------------------------------------------------------------------------------
 1 | # 编译时间
 2 | 
 3 | 虽然本书的主要内容是提高Rust程序的性能，但本节的内容是关于减少Rust程序的编译时间，因为这是很多人感兴趣的相关话题。
 4 | 
 5 | [减少编译时间]部分讨论了通过构建配置选择来减少编译时间的方法。本节的其余部分将讨论需要修改程序代码来减少编译时间的方法。
 6 | 
 7 | [减少编译时间]: build-configuration.md#minimizing-compile-times
 8 | 
 9 | ## 可视化
10 | 
11 | Rust编译器有一个功能，可以让你可视化编译你的程序。用这个命令进行编译。
12 | ```text
13 | cargo build --timings
14 | ```
15 | 完成后，它将打印一个HTML文件的名称。在Web浏览器中打开该文件。其中包含一个[Gantt chart]，显示程序中各个crate之间的依赖关系。这显示了您的crate图中有多少并行性，这可以表明是否应该拆分任何序列化编译的大型crate。有关如何阅读这些图表的更多详细信息，请参阅[timings]。
16 | 
17 | [Gantt chart]: https://en.wikipedia.org/wiki/Gantt_chart
18 | [timings]: https://doc.rust-lang.org/nightly/cargo/reference/timings.html
19 | 
20 | ## LLVM IR
21 | 
22 | Rust编译器的后端使用[LLVM]。LLVM的执行会占到编译时间的很大一部分，尤其是当Rust编译器的前端会产生大量的[IR]，这需要LLVM花很长的时间去优化。
23 | 
24 | [LLVM]: https://llvm.org/
25 | [IR]: https://en.wikipedia.org/wiki/Intermediate_representation
26 | 
27 | 这些问题可以用[`cargo llvm-line`]来诊断，它显示了哪些Rust函数导致了最多的LLVM IR生成。通用函数通常是最重要的函数，因为它们在大型程序中可以被实例化几十次甚至几百次。
28 | 
29 | [`cargo llvm-lines`]: https://github.com/dtolnay/cargo-llvm-lines/
30 | 
31 | 如果一个通用函数导致IR膨胀，有几种方法可以解决。最简单的方法就是把函数变小。
32 | [**Example 1**](https://github.com/rust-lang/rust/pull/72166/commits/5a0ac0552e05c079f252482cfcdaab3c4b39d614),
33 | [**Example 2**](https://github.com/rust-lang/rust/pull/91246/commits/f3bda74d363a060ade5e5caeb654ba59bfed51a4).
34 | 
35 | 另一种方法是将函数的非泛型部分移动到一个单独的非泛型函数中，该函数只会被实例化一次。是否可能取决于泛型函数的细节。当可能时，非泛型函数通常可以被整洁地编写为泛型函数内部的内部函数，就像[`std::fs::read`]的代码所示：
36 | ```rust,ignore
37 | pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
38 |     fn inner(path: &Path) -> io::Result<Vec<u8>> {
39 |         let mut file = File::open(path)?;
40 |         let size = file.metadata().map(|m| m.len()).unwrap_or(0);
41 |         let mut bytes = Vec::with_capacity(size as usize);
42 |         io::default_read_to_end(&mut file, &mut bytes)?;
43 |         Ok(bytes)
44 |     }
45 |     inner(path.as_ref())
46 | }
47 | ```
48 | [`std::fs::read`]: https://doc.rust-lang.org/std/fs/fn.read.html
49 | 
50 | [**Example**](https://github.com/rust-lang/rust/pull/72013/commits/68b75033ad78d88872450a81745cacfc11e58178).
51 | 
52 | 有时，像[`Option::map`]和[`Result::map_err`]这样的常用实用函数会被实例化多次。 用等价的`match`表达式替换它们可以帮助编译时间。
53 | 
54 | [`Option::map`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map
55 | [`Result::map_err`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_err
56 | 
57 | 这些变化对编译时间的影响通常是很小的，但偶尔也会很大。
58 | [**Example**](https://github.com/servo/servo/issues/26585).
59 | 
60 | 这些更改还可以减少二进制文件的大小。


--------------------------------------------------------------------------------
/src/general-tips.md:
--------------------------------------------------------------------------------
 1 | # General Tips
 2 | 
 3 | The previous sections of this book have discussed Rust-specific techniques.
 4 | This section gives a brief overview of some general performance principles.
 5 | 
 6 | As long as the obvious pitfalls are avoided (e.g. [using non-release builds]),
 7 | Rust code generally is fast and uses little memory. Especially if you are used
 8 | to dynamically-typed languages such as Python and Ruby, or statically-types
 9 | languages with a garbage collector such as Java and C#.
10 | 
11 | [using non-release builds]: build-configuration.md
12 | 
13 | Optimized code is often more complex and takes more effort to write than
14 | unoptimized code. For this reason, it is only worth optimizing hot code.
15 | 
16 | The biggest performance improvements often come from changes to algorithms or
17 | data structures, rather than low-level optimizations.
18 | [**Example 1**](https://github.com/rust-lang/rust/pull/53383/commits/5745597e6195fe0591737f242d02350001b6c590),
19 | [**Example 2**](https://github.com/rust-lang/rust/pull/54318/commits/154be2c98cf348de080ce951df3f73649e8bb1a6).
20 | 
21 | Writing code that works well with modern hardware is not always easy, but worth
22 | striving for. For example, try to minimize cache misses and branch
23 | mispredictions, where possible.
24 | 
25 | Most optimizations result in small speedups. Although no single small speedup
26 | is noticeable, they really add up if you can do enough of them.
27 | 
28 | Different profilers have different strengths. It is good to use more than one.
29 | 
30 | When profiling indicates that a function is hot, there are two common ways to
31 | speed things up: (a) make the function faster, and/or (b) avoid calling it as
32 | much.
33 | 
34 | It is often easier to eliminate silly slowdowns than it is to introduce clever
35 | speedups.
36 | 
37 | Avoid computing things unless necessary. Lazy/on-demand computations are
38 | often a win.
39 | [**Example 1**](https://github.com/rust-lang/rust/pull/36592/commits/80a44779f7a211e075da9ed0ff2763afa00f43dc),
40 | [**Example 2**](https://github.com/rust-lang/rust/pull/50339/commits/989815d5670826078d9984a3515eeb68235a4687).
41 | 
42 | Complex general cases can often be avoided by optimistically checking for
43 | common special cases that are simpler.
44 | [**Example 1**](https://github.com/rust-lang/rust/pull/68790/commits/d62b6f204733d255a3e943388ba99f14b053bf4a),
45 | [**Example 2**](https://github.com/rust-lang/rust/pull/53733/commits/130e55665f8c9f078dec67a3e92467853f400250),
46 | [**Example 3**](https://github.com/rust-lang/rust/pull/65260/commits/59e41edcc15ed07de604c61876ea091900f73649).
47 | In particular, specially handling collections with 0, 1, or 2 elements is often
48 | a win when small sizes dominate.
49 | [**Example 1**](https://github.com/rust-lang/rust/pull/50932/commits/2ff632484cd8c2e3b123fbf52d9dd39b54a94505),
50 | [**Example 2**](https://github.com/rust-lang/rust/pull/64627/commits/acf7d4dcdba4046917c61aab141c1dec25669ce9),
51 | [**Example 3**](https://github.com/rust-lang/rust/pull/64949/commits/14192607d38f5501c75abea7a4a0e46349df5b5f),
52 | [**Example 4**](https://github.com/rust-lang/rust/pull/64949/commits/d1a7bb36ad0a5932384eac03d3fb834efc0317e5).
53 | 
54 | Similarly, when dealing with repetitive data, it is often possible to use a
55 | simple form of data compression, by using a compact representation for common
56 | values and then having a fallback to a secondary table for unusual values.
57 | [**Example 1**](https://github.com/rust-lang/rust/pull/54420/commits/b2f25e3c38ff29eebe6c8ce69b8c69243faa440d),
58 | [**Example 2**](https://github.com/rust-lang/rust/pull/59693/commits/fd7f605365b27bfdd3cd6763124e81bddd61dd28),
59 | [**Example 3**](https://github.com/rust-lang/rust/pull/65750/commits/eea6f23a0ed67fd8c6b8e1b02cda3628fee56b2f).
60 | 
61 | When code deals with multiple cases, measure case frequencies and handle the
62 | most common ones first.
63 | 
64 | When dealing with lookups that involve high locality, it can be a win to put a
65 | small cache in front of a data structure.
66 | 
67 | Optimized code often has a non-obvious structure, which means that explanatory
68 | comments are valuable, particularly those that reference profiling
69 | measurements. A comment like "99% of the time this vector has 0 or 1 elements,
70 | so handle those cases first" can be illuminating.
71 | 


--------------------------------------------------------------------------------
/src/general-tips_zh.md:
--------------------------------------------------------------------------------
 1 | # 一般建议
 2 | 
 3 | 本书前几节讨论了 Rust 特定的技术。本节简要概述了一些一般性能原则。
 4 | 
 5 | 只要避免明显的陷阱（例如[使用非发布版本构建]），Rust 代码通常运行速度快且占用内存少。特别是如果你习惯于动态类型语言如 Python 和 Ruby，或者带有垃圾回收器的静态类型语言如 Java 和 C#。
 6 | 
 7 | [使用非发布版本构建]: build-configuration.md
 8 | 
 9 | 优化的代码通常比未优化的代码更复杂，编写起来需要更多的工作。因此，只有值得优化热点代码时才值得进行优化。
10 | 
11 | 最大的性能改进通常来自于算法或数据结构的更改，而不是低级优化。
12 | [**Example 1**](https://github.com/rust-lang/rust/pull/53383/commits/5745597e6195fe0591737f242d02350001b6c590),
13 | [**Example 2**](https://github.com/rust-lang/rust/pull/54318/commits/154be2c98cf348de080ce951df3f73649e8bb1a6).
14 | 
15 | 编写能够与现代硬件良好配合的代码并不总是容易的，但值得努力。例如，尽量减少缓存未命中和分支预测错误。
16 | 
17 | 大多数优化只会带来轻微的加速。虽然单个小的加速可能不明显，但如果你能做足够多的优化，它们的效果会累积起来。
18 | 
19 | 不同的性能分析工具各有优势。最好使用多个工具。
20 | 
21 | 当性能分析表明某个函数运行热点时，有两种常见的加速方法：（a）加快函数运行速度，和/或者（b）尽量减少调用次数。
22 | 
23 | 消除愚蠢的减速往往比引入巧妙的加速更容易。
24 | 
25 | 除非必要，避免计算。延迟/按需计算通常是明智的选择。
26 | [**Example 1**](https://github.com/rust-lang/rust/pull/36592/commits/80a44779f7a211e075da9ed0ff2763afa00f43dc),
27 | [**Example 2**](https://github.com/rust-lang/rust/pull/50339/commits/989815d5670826078d9984a3515eeb68235a4687).
28 | 
29 | 一般复杂的情况往往可以通过乐观地检查比较简单的常见特殊情况来避免。
30 | [**Example 1**](https://github.com/rust-lang/rust/pull/68790/commits/d62b6f204733d255a3e943388ba99f14b053bf4a),
31 | [**Example 2**](https://github.com/rust-lang/rust/pull/53733/commits/130e55665f8c9f078dec67a3e92467853f400250),
32 | [**Example 3**](https://github.com/rust-lang/rust/pull/65260/commits/59e41edcc15ed07de604c61876ea091900f73649).
33 | 尤其是在小尺寸占主导地位的情况下，特别处理0、1或2个元素的集合往往是一种好办法。
34 | [**Example 1**](https://github.com/rust-lang/rust/pull/50932/commits/2ff632484cd8c2e3b123fbf52d9dd39b54a94505),
35 | [**Example 2**](https://github.com/rust-lang/rust/pull/64627/commits/acf7d4dcdba4046917c61aab141c1dec25669ce9),
36 | [**Example 3**](https://github.com/rust-lang/rust/pull/64949/commits/14192607d38f5501c75abea7a4a0e46349df5b5f),
37 | [**Example 4**](https://github.com/rust-lang/rust/pull/64949/commits/d1a7bb36ad0a5932384eac03d3fb834efc0317e5).
38 | 
39 | 同样，在处理重复性数据时，通常可以使用一种简单的数据压缩形式，对常见的值使用紧凑的表示方式，然后对不常见的值进行回退到二级表。
40 | [**Example 1**](https://github.com/rust-lang/rust/pull/54420/commits/b2f25e3c38ff29eebe6c8ce69b8c69243faa440d),
41 | [**Example 2**](https://github.com/rust-lang/rust/pull/59693/commits/fd7f605365b27bfdd3cd6763124e81bddd61dd28),
42 | [**Example 3**](https://github.com/rust-lang/rust/pull/65750/commits/eea6f23a0ed67fd8c6b8e1b02cda3628fee56b2f).
43 | 
44 | 当代码涉及多种情况时，测量各种情况的频率，并首先处理最常见的情况。
45 | 
46 | 在涉及高局部性的查找时，将一个小缓存放在数据结构前面可能会带来好处。
47 | 
48 | 优化的代码通常具有非显而易见的结构，这意味着解释性注释非常有价值，特别是那些参考了性能分析数据的注释。例如，“99% 的情况下，这个向量有 0 或 1 个元素，因此首先处理这些情况”这样的注释可以很有启发性。
49 | 


--------------------------------------------------------------------------------
/src/hashing.md:
--------------------------------------------------------------------------------
 1 | # Hashing
 2 | 
 3 | `HashSet` and `HashMap` are two widely-used types. The default hashing
 4 | algorithm is not specified, but at the time of writing the default is an
 5 | algorithm called [SipHash 1-3]. This algorithm is high quality—it provides high
 6 | protection against collisions—but is relatively slow, particularly for short keys
 7 | such as integers.
 8 | 
 9 | [SipHash 1-3]: https://en.wikipedia.org/wiki/SipHash
10 | 
11 | If profiling shows that hashing is hot, and [HashDoS attacks] are not a concern
12 | for your application, the use of hash tables with faster hash algorithms can
13 | provide large speed wins.
14 | - [`rustc-hash`] provides `FxHashSet` and `FxHashMap` types that are drop-in
15 |   replacements for `HashSet` and `HashMap`. Its hashing algorithm is
16 |   low-quality but very fast, especially for integer keys, and has been found to
17 |   out-perform all other hash algorithms within rustc. ([`fxhash`] is an older,
18 |   less well maintained implementation of the same algorithm and types.)
19 | - [`fnv`] provides `FnvHashSet` and `FnvHashMap` types. Its hashing algorithm
20 |   is higher quality than `rustc-hash`'s but a little slower.
21 | - [`ahash`] provides `AHashSet` and `AHashMap`. Its hashing algorithm can take
22 |   advantage of AES instruction support that is available on some processors.
23 | 
24 | [HashDoS attacks]: https://en.wikipedia.org/wiki/Collision_attack
25 | [`rustc-hash`]: https://crates.io/crates/rustc-hash
26 | [`fxhash`]: https://crates.io/crates/fxhash
27 | [`fnv`]: https://crates.io/crates/fnv
28 | [`ahash`]: https://crates.io/crates/ahash
29 | 
30 | If hashing performance is important in your program, it is worth trying more
31 | than one of these alternatives. For example, the following results were seen in
32 | rustc.
33 | - The switch from `fnv` to `fxhash` gave [speedups of up to 6%][fnv2fx].
34 | - An attempt to switch from `fxhash` to `ahash` resulted in [slowdowns of
35 |   1-4%][fx2a].
36 | - An attempt to switch from `fxhash` back to the default hasher resulted in
37 |   [slowdowns ranging from 4-84%][fx2default]!
38 | 
39 | [fnv2fx]: https://github.com/rust-lang/rust/pull/37229/commits/00e48affde2d349e3b3bfbd3d0f6afb5d76282a7
40 | [fx2a]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589504301
41 | [fx2default]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589338446
42 | 
43 | If you decide to universally use one of the alternatives, such as
44 | `FxHashSet`/`FxHashMap`, it is easy to accidentally use `HashSet`/`HashMap` in
45 | some places. You can [use Clippy] to avoid this problem.
46 | 
47 | [use Clippy]: linting.md#disallowing-types
48 | 
49 | Some types don't need hashing. For example, you might have a newtype that wraps
50 | an integer and the integer values are random, or close to random. For such a
51 | type, the distribution of the hashed values won't be that different to the
52 | distribution of the values themselves. In this case the [`nohash_hasher`] crate
53 | can be useful.
54 | 
55 | [`nohash_hasher`]: https://crates.io/crates/nohash-hasher
56 | 
57 | Hash function design is a complex topic and is beyond the scope of this book.
58 | The [`ahash` documentation] has a good discussion. 
59 | 
60 | [`ahash` documentation]: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md
61 | 


--------------------------------------------------------------------------------
/src/hashing_zh.md:
--------------------------------------------------------------------------------
 1 | # 哈希
 2 | 
 3 | `HashSet` 和 `HashMap` 是两种广泛使用的类型。默认的哈希算法没有指定，但在撰写本文时，默认算法是一种称为 [SipHash 1-3] 的算法。这个算法质量很高——它提供了很高的碰撞保护，但对于短键（如整数）来说相对较慢。
 4 | 
 5 | [SipHash 1-3]: https://en.wikipedia.org/wiki/SipHash
 6 | 
 7 | 如果测试显示hash是关键部分，而[HashDoS attacks]并不是你的应用所关心的问题，那么使用具有更快的散列算法的散列表可以提供很大的速度优势。
 8 | - [`rustc-hash`]提供了 "FxHashSet "和 "FxHashMap "类型，它们是 "HashSet "和 "HashMap "的替代物。它的散列算法质量不高，但速度非常快，特别是对整数键而言，并且发现它的性能优于rustc内的所有其他散列算法。
 9 | - [`fnv`]提供了`FnvHashSet`和`FnvHashMap`类型。其散列算法比`fxhash`的质量高，但速度稍慢。
10 | - [`ahash`]提供`AHashSet`和`AHashMap`。它的哈希算法可以采取一些处理器上的AES指令支持的优势。
11 | 
12 | [HashDoS attacks]: https://en.wikipedia.org/wiki/Collision_attack
13 | [`rustc-hash`]: https://crates.io/crates/rustc-hash
14 | [`fnv`]: https://crates.io/crates/fnv
15 | [`ahash`]: https://crates.io/crates/ahash
16 | 
17 | 如果散列性能在你的程序中很重要，那么值得尝试以上几种选择。例如，在rustc中看到以下结果。
18 | - 从 `fnv`切换到 `rustc-hash` 的结果是[速度提高了6%][fnv2fx]。
19 | - 试图从`rustc-hash`切换到`ahash`的结果是[减速1-4%][fx2a]。
20 | - 试图从`rustc-hash`切换回默认的哈希，结果是[速度减慢了4-84%][fx2default]!
21 | 
22 | [fnv2fx]: https://github.com/rust-lang/rust/pull/37229/commits/00e48affde2d349e3b3bfbd3d0f6afb5d76282a7
23 | [fx2a]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589504301
24 | [fx2default]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589338446
25 | 
26 | 如果您决定普遍使用替代方案之一，比如 `FxHashSet`/`FxHashMap`，很容易在某些地方意外地使用 `HashSet`/`HashMap`。您可以使用 [Clippy] 来避免这个问题。
27 | 
28 | 有些类型不需要哈希。例如，您可能有一个包装整数的新类型，而整数值是随机的，或者接近随机的。对于这种类型，哈希值的分布与值本身的分布并没有太大不同。在这种情况下，[`nohash_hasher`] crate 可能会有用。
29 | 
30 | 哈希函数设计是一个复杂的主题，超出了本书的范围。[`ahash` 文档] 中有很好的讨论。
31 | 
32 | [Clippy]: linting.md#disallowing-types
33 | [`nohash_hasher`]: https://crates.io/crates/nohash-hasher
34 | [`ahash` 文档]: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md
35 | 


--------------------------------------------------------------------------------
/src/heap-allocations.md:
--------------------------------------------------------------------------------
  1 | # Heap Allocations
  2 | 
  3 | Heap allocations are moderately expensive. The exact details depend on which
  4 | allocator is in use, but each allocation (and deallocation) typically involves
  5 | acquiring a global lock, doing some non-trivial data structure manipulation,
  6 | and possibly executing a system call. Small allocations are not necessarily
  7 | cheaper than large allocations. It is worth understanding which Rust data
  8 | structures and operations cause allocations, because avoiding them can greatly
  9 | improve performance.
 10 | 
 11 | The [Rust Container Cheat Sheet] has visualizations of common Rust types, and
 12 | is an excellent companion to the following sections.
 13 | 
 14 | [Rust Container Cheat Sheet]: https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/
 15 | 
 16 | ## Profiling
 17 | 
 18 | If a general-purpose profiler shows `malloc`, `free`, and related functions as
 19 | hot, then it is likely worth trying to reduce the allocation rate and/or using
 20 | an alternative allocator.
 21 | 
 22 | [DHAT] is an excellent profiler to use when reducing allocation rates. It works
 23 | on Linux and some other Unixes. It precisely identifies hot allocation
 24 | sites and their allocation rates. Exact results will vary, but experience with
 25 | rustc has shown that reducing allocation rates by 10 allocations per million
 26 | instructions executed can have measurable performance improvements (e.g. ~1%).
 27 | 
 28 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html
 29 | 
 30 | Here is some example output from DHAT.
 31 | ```text
 32 | AP 1.1/25 (2 children) {
 33 |   Total:     54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration)
 34 |   At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
 35 |   At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
 36 |   Reads:     15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte
 37 |   Writes:    20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte
 38 |   Allocated at {
 39 |     #1: 0x95CACC9: alloc (alloc.rs:72)
 40 |     #2: 0x95CACC9: alloc (alloc.rs:148)
 41 |     #3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
 42 |     #4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
 43 |     #5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
 44 |     #6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
 45 |     #7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
 46 |     #8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
 47 |   }
 48 | }
 49 | ```
 50 | It is beyond the scope of this book to describe everything in this example, but
 51 | it should be clear that DHAT gives a wealth of information about allocations,
 52 | such as where and how often they happen, how big they are, how long they live
 53 | for, and how often they are accessed.
 54 | 
 55 | ## `Box`
 56 | 
 57 | [`Box`] is the simplest heap-allocated type. A `Box<T>` value is a `T` value
 58 | that is allocated on the heap.
 59 | 
 60 | [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html
 61 | 
 62 | It is sometimes worth boxing one or more fields in a struct or enum fields to
 63 | make a type smaller. (See the [Type Sizes](type-sizes.md) chapter for more
 64 | about this.)
 65 | 
 66 | Other than that, `Box` is straightforward and does not offer much scope for
 67 | optimizations.
 68 | 
 69 | ## `Rc`/`Arc`
 70 | 
 71 | [`Rc`]/[`Arc`] are similar to `Box`, but the value on the heap is accompanied by
 72 | two reference counts. They allow value sharing, which can be an effective way
 73 | to reduce memory usage.
 74 | 
 75 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html
 76 | [`Arc`]: https://doc.rust-lang.org/std/sync/struct.Arc.html
 77 | 
 78 | However, if used for values that are rarely shared, they can increase allocation
 79 | rates by heap allocating values that might otherwise not be heap-allocated.
 80 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe).
 81 | 
 82 | Unlike `Box`, calling `clone` on an `Rc`/`Arc` value does not involve an
 83 | allocation. Instead, it merely increments a reference count.
 84 | 
 85 | ## `Vec`
 86 | 
 87 | [`Vec`] is a heap-allocated type with a great deal of scope for optimizing the
 88 | number of allocations, and/or minimizing the amount of wasted space. To do this
 89 | requires understanding how its elements are stored.
 90 | 
 91 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html
 92 | 
 93 | A `Vec` contains three words: a length, a capacity, and a pointer. The pointer
 94 | will point to heap-allocated memory if the capacity is nonzero and the element
 95 | size is nonzero; otherwise, it will not point to allocated memory.
 96 | 
 97 | Even if the `Vec` itself is not heap-allocated, the elements (if present and
 98 | nonzero-sized) always will be. If nonzero-sized elements are present, the
 99 | memory holding those elements may be larger than necessary, providing space for
100 | additional future elements. The number of elements present is the length, and
101 | the number of elements that could be held without reallocating is the capacity.
102 | 
103 | When the vector needs to grow beyond its current capacity, the elements will be
104 | copied into a larger heap allocation, and the old heap allocation will be
105 | freed.
106 | 
107 | ### `Vec` Growth
108 | 
109 | A new, empty `Vec` created by the common means
110 | ([`vec![]`](https://doc.rust-lang.org/std/macro.vec.html)
111 | or [`Vec::new`] or [`Vec::default`]) has a length and capacity of zero, and no
112 | heap allocation is required. If you repeatedly push individual elements onto
113 | the end of the `Vec`, it will periodically reallocate. The growth strategy is
114 | not specified, but at the time of writing it uses a quasi-doubling strategy
115 | resulting in the following capacities: 0, 4, 8, 16, 32, 64, and so on. (It
116 | skips directly from 0 to 4, instead of going via 1 and 2, because this [avoids
117 | many allocations] in practice.) As a vector grows, the frequency of
118 | reallocations will decrease exponentially, but the amount of possibly-wasted
119 | excess capacity will increase exponentially.
120 | 
121 | [`Vec::new`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.new
122 | [`Vec::default`]: https://doc.rust-lang.org/std/default/trait.Default.html#tymethod.default
123 | [avoids many allocations]: https://github.com/rust-lang/rust/pull/72227
124 | 
125 | This growth strategy is typical for growable data structures and reasonable in
126 | the general case, but if you know in advance the likely length of a vector you
127 | can often do better. If you have a hot vector allocation site (e.g. a hot
128 | [`Vec::push`] call), it is worth using [`eprintln!`] to print the vector length
129 | at that site and then doing some post-processing (e.g. with [`counts`]) to
130 | determine the length distribution. For example, you might have many short
131 | vectors, or you might have a smaller number of very long vectors, and the best
132 | way to optimize the allocation site will vary accordingly.
133 | 
134 | [`Vec::push`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.push
135 | [`eprintln!`]: https://doc.rust-lang.org/std/macro.eprintln.html
136 | [`counts`]: https://github.com/nnethercote/counts/
137 | 
138 | ### Short `Vec`s
139 | 
140 | If you have many short vectors, you can use the `SmallVec` type from the
141 | [`smallvec`] crate. `SmallVec<[T; N]>` is a drop-in replacement for `Vec` that
142 | can store `N` elements within the `SmallVec` itself, and then switches to a
143 | heap allocation if the number of elements exceeds that. (Note also that
144 | `vec![]` literals must be replaced with `smallvec![]` literals.)
145 | [**Example 1**](https://github.com/rust-lang/rust/pull/50565/commits/78262e700dc6a7b57e376742f344e80115d2d3f2),
146 | [**Example 2**](https://github.com/rust-lang/rust/pull/55383/commits/526dc1421b48e3ee8357d58d997e7a0f4bb26915).
147 | 
148 | [`smallvec`]: https://crates.io/crates/smallvec
149 | 
150 | `SmallVec` reliably reduces the allocation rate when used appropriately, but
151 | its use does not guarantee improved performance. It is slightly slower than
152 | `Vec` for normal operations because it must always check if the elements are
153 | heap-allocated or not. Also, If `N` is high or `T` is large, then the
154 | `SmallVec<[T; N]>` itself can be larger than `Vec<T>`, and copying of
155 | `SmallVec` values will be slower. As always, benchmarking is required to
156 | confirm that an optimization is effective.
157 | 
158 | If you have many short vectors *and* you precisely know their maximum length,
159 | `ArrayVec` from the [`arrayvec`] crate is a better choice than `SmallVec`. It
160 | does not require the fallback to heap allocation, which makes it a little
161 | faster.
162 | [**Example**](https://github.com/rust-lang/rust/pull/74310/commits/c492ca40a288d8a85353ba112c4d38fe87ef453e).
163 | 
164 | [`arrayvec`]: https://crates.io/crates/arrayvec
165 | 
166 | ### Longer `Vec`s
167 | 
168 | If you know the minimum or exact size of a vector, you can reserve a specific
169 | capacity with [`Vec::with_capacity`], [`Vec::reserve`], or
170 | [`Vec::reserve_exact`]. For example, if you know a vector will grow to have at
171 | least 20 elements, these functions can immediately provide a vector with a
172 | capacity of at least 20 using a single allocation, whereas pushing the items
173 | one at a time would result in four allocations (for capacities of 4, 8, 16, and
174 | 32).
175 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/a7f2bb634308a5f05f2af716482b67ba43701681).
176 | 
177 | [`Vec::with_capacity`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity
178 | [`Vec::reserve`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve
179 | [`Vec::reserve_exact`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve_exact
180 | 
181 | If you know the maximum length of a vector, the above functions also let you
182 | not allocate excess space unnecessarily. Similarly, [`Vec::shrink_to_fit`] can be
183 | used to minimize wasted space, but note that it may cause a reallocation.
184 | 
185 | [`Vec::shrink_to_fit`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit
186 | 
187 | ## `String`
188 | 
189 | A [`String`] contains heap-allocated bytes. The representation and operation of
190 | `String` are very similar to that of `Vec<u8>`. Many `Vec` methods relating to
191 | growth and capacity have equivalents for `String`, such as
192 | [`String::with_capacity`].
193 | 
194 | [`String`]: https://doc.rust-lang.org/std/string/struct.String.html
195 | [`String::with_capacity`]: https://doc.rust-lang.org/std/string/struct.String.html#method.with_capacity
196 | 
197 | The `SmallString` type from the [`smallstr`] crate is similar to the `SmallVec`
198 | type.
199 | 
200 | [`smallstr`]: https://crates.io/crates/smallstr
201 | 
202 | The `String` type from the [`smartstring`] crate is a drop-in replacement for
203 | `String` that avoids heap allocations for strings with less than three words'
204 | worth of characters. On 64-bit platforms, this is any string that is less than
205 | 24 bytes, which includes all strings containing 23 or fewer ASCII characters.
206 | [**Example**](https://github.com/djc/topfew-rs/commit/803fd566e9b889b7ba452a2a294a3e4df76e6c4c).
207 | 
208 | [`smartstring`]: https://crates.io/crates/smartstring
209 | 
210 | Note that the `format!` macro produces a `String`, which means it performs an
211 | allocation. If you can avoid a `format!` call by using a string literal, that
212 | will avoid this allocation.
213 | [**Example**](https://github.com/rust-lang/rust/pull/55905/commits/c6862992d947331cd6556f765f6efbde0a709cf9).
214 | [`std::format_args`] and/or the [`lazy_format`] crate may help with this.
215 | 
216 | [`std::format_args`]: https://doc.rust-lang.org/std/macro.format_args.html
217 | [`lazy_format`]: https://crates.io/crates/lazy_format
218 | 
219 | ## Hash Tables
220 | 
221 | [`HashSet`] and [`HashMap`] are hash tables. Their representation and
222 | operations are similar to those of `Vec`, in terms of allocations: they have
223 | a single contiguous heap allocation, holding keys and values, which is
224 | reallocated as necessary as the table grows. Many `Vec` methods relating to
225 | growth and capacity have equivalents for `HashSet`/`HashMap`, such as
226 | [`HashSet::with_capacity`].
227 | 
228 | [`HashSet`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html
229 | [`HashMap`]: https://doc.rust-lang.org/std/collections/struct.HashMap.html
230 | [`HashSet::with_capacity`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.with_capacity
231 | 
232 | ## `clone`
233 | 
234 | Calling [`clone`] on a value that contains heap-allocated memory typically
235 | involves additional allocations. For example, calling `clone` on a non-empty
236 | `Vec` requires a new allocation for the elements (but note that the capacity of
237 | the new `Vec` might not be the same as the capacity of the original `Vec`). The
238 | exception is `Rc`/`Arc`, where a `clone` call just increments the reference
239 | count.
240 | 
241 | [`clone`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone
242 | 
243 | [`clone_from`] is an alternative to `clone`. `a.clone_from(&b)` is equivalent
244 | to `a = b.clone()` but may avoid unnecessary allocations. For example, if you
245 | want to clone one `Vec` over the top of an existing `Vec`, the existing `Vec`'s
246 | heap allocation will be reused if possible, as the following example shows.
247 | ```rust
248 | let mut v1: Vec<u32> = Vec::with_capacity(99);
249 | let v2: Vec<u32> = vec![1, 2, 3];
250 | v1.clone_from(&v2); // v1's allocation is reused
251 | assert_eq!(v1.capacity(), 99);
252 | ```
253 | Although `clone` usually causes allocations, it is a reasonable thing to use in
254 | many circumstances and can often make code simpler. Use profiling data to see
255 | which `clone` calls are hot and worth taking the effort to avoid.
256 | 
257 | [`clone_from`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#method.clone_from
258 | 
259 | Sometimes Rust code ends up containing unnecessary `clone` calls, due to (a)
260 | programmer error, or (b) changes in the code that render previously-necessary
261 | `clone` calls unnecessary. If you see a hot `clone` call that does not seem
262 | necessary, sometimes it can simply be removed.
263 | [**Example 1**](https://github.com/rust-lang/rust/pull/37318/commits/e382267cfb9133ef12d59b66a2935ee45b546a61),
264 | [**Example 2**](https://github.com/rust-lang/rust/pull/37705/commits/11c1126688bab32f76dbe1a973906c7586da143f),
265 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/36b37e22de92b584b9cf4464ed1d4ad317b798be).
266 | 
267 | ## `to_owned`
268 | 
269 | [`ToOwned::to_owned`] is implemented for many common types. It creates owned
270 | data from borrowed data, usually by cloning, and therefore often causes heap
271 | allocations. For example, it can be used to create a `String` from a `&str`.
272 | 
273 | [`ToOwned::to_owned`]: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html#tymethod.to_owned
274 | 
275 | Sometimes `to_owned` calls (and related calls such as `clone` and `to_string`)
276 | can be avoided by storing a reference to borrowed data in a struct rather than
277 | an owned copy. This requires lifetime annotations on the struct, complicating
278 | the code, and should only be done when profiling and benchmarking shows that it
279 | is worthwhile.
280 | [**Example**](https://github.com/rust-lang/rust/pull/50855/commits/6872377357dbbf373cfd2aae352cb74cfcc66f34).
281 | 
282 | ## `Cow`
283 | 
284 | Sometimes code deals with a mixture of borrowed and owned data. Imagine a
285 | vector of error messages, some of which are static string literals and some of
286 | which are constructed with `format!`. The obvious representation is
287 | `Vec<String>`, as the following example shows.
288 | ```rust
289 | let mut errors: Vec<String> = vec![];
290 | errors.push("something went wrong".to_string());
291 | errors.push(format!("something went wrong on line {}", 100));
292 | ```
293 | That requires a `to_string` call to promote the static string literal to a
294 | `String`, which incurs an allocation.
295 | 
296 | Instead you can use the [`Cow`] type, which can hold either borrowed or owned
297 | data. A borrowed value `x` is wrapped with `Cow::Borrowed(x)`, and an owned
298 | value `y` is wrapped with `Cow::Owned(y)`. `Cow` also implements the `From<T>`
299 | trait for various string, slice, and path types, so you can usually use `into`
300 | as well. (Or `Cow::from`, which is longer but results in more readable code,
301 | because it makes the type clearer.) The following example puts all this together.
302 | 
303 | [`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html
304 | 
305 | ```rust
306 | use std::borrow::Cow;
307 | let mut errors: Vec<Cow<'static, str>> = vec![];
308 | errors.push(Cow::Borrowed("something went wrong"));
309 | errors.push(Cow::Owned(format!("something went wrong on line {}", 100)));
310 | errors.push(Cow::from("something else went wrong"));
311 | errors.push(format!("something else went wrong on line {}", 101).into());
312 | ```
313 | `errors` now holds a mixture of borrowed and owned data without requiring any
314 | extra allocations. This example involves `&str`/`String`, but other pairings
315 | such as `&[T]`/`Vec<T>` and `&Path`/`PathBuf` are also possible. 
316 | 
317 | [**Example 1**](https://github.com/rust-lang/rust/pull/37064/commits/b043e11de2eb2c60f7bfec5e15960f537b229e20),
318 | [**Example 2**](https://github.com/rust-lang/rust/pull/56336/commits/787959c20d062d396b97a5566e0a766d963af022).
319 | 
320 | All of the above applies if the data is immutable. But `Cow` also allows
321 | borrowed data to be promoted to owned data if it needs to be mutated.
322 | [`Cow::to_mut`] will obtain a mutable reference to an owned value, cloning if
323 | necessary. This is called "clone-on-write", which is where the name `Cow` comes
324 | from.
325 | 
326 | [`Deref`]: https://doc.rust-lang.org/std/ops/trait.Deref.html
327 | [`Cow::to_mut`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html#method.to_mut
328 | 
329 | This clone-on-write behaviour is useful when you have some borrowed data, such
330 | as a `&str`, that is mostly read-only but occasionally needs to be modified.
331 | 
332 | [**Example 1**](https://github.com/rust-lang/rust/pull/50855/commits/ad471452ba6fbbf91ad566dc4bdf1033a7281811),
333 | [**Example 2**](https://github.com/rust-lang/rust/pull/68848/commits/67da45f5084f98eeb20cc6022d68788510dc832a).
334 | 
335 | Finally, because `Cow` implements [`Deref`], you can call methods directly on
336 | the data it encloses. 
337 | 
338 | `Cow` can be fiddly to get working, but it is often worth the effort.
339 | 
340 | ## Reusing Collections
341 | 
342 | Sometimes you need to build up a collection such as a `Vec` in stages. It is
343 | usually better to do this by modifying a single `Vec` than by building multiple
344 | `Vec`s and then combining them.
345 | 
346 | For example, if you have a function `do_stuff` that produces a `Vec` that might
347 | be called multiple times:
348 | ```rust
349 | fn do_stuff(x: u32, y: u32) -> Vec<u32> {
350 |     vec![x, y]
351 | }
352 | ```
353 | It might be better to instead modify a passed-in `Vec`:
354 | ```rust
355 | fn do_stuff(x: u32, y: u32, vec: &mut Vec<u32>) {
356 |     vec.push(x);
357 |     vec.push(y);
358 | }
359 | ```
360 | Sometimes it is worth keeping around a "workhorse" collection that can be
361 | reused. For example, if a `Vec` is needed for each iteration of a loop, you
362 | could declare the `Vec` outside the loop, use it within the loop body, and then
363 | call [`clear`] at the end of the loop body (to empty the `Vec` without affecting
364 | its capacity). This avoids allocations at the cost of obscuring the fact that
365 | each iteration's usage of the `Vec` is unrelated to the others.
366 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323),
367 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7).
368 | 
369 | [`clear`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.clear
370 | 
371 | Similarly, it is sometimes worth keeping a workhorse collection within a
372 | struct, to be reused in one or more methods that are called repeatedly.
373 | 
374 | ## Reading Lines from a File
375 | 
376 | [`BufRead::lines`] makes it easy to read a file one line at a time:
377 | ```rust
378 | # fn blah() -> Result<(), std::io::Error> {
379 | # fn process(_: &str) {}
380 | use std::io::{self, BufRead};
381 | let mut lock = io::stdin().lock();
382 | for line in lock.lines() {
383 |     process(&line?);
384 | }
385 | # Ok(())
386 | # }
387 | ```
388 | But the iterator it produces returns `io::Result<String>`, which means it
389 | allocates for every line in the file.
390 | 
391 | [`BufRead::lines`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.lines
392 | 
393 | An alternative is to use a workhorse `String` in a loop over
394 | [`BufRead::read_line`]:
395 | ```rust
396 | # fn blah() -> Result<(), std::io::Error> {
397 | # fn process(_: &str) {}
398 | use std::io::{self, BufRead};
399 | let mut lock = io::stdin().lock();
400 | let mut line = String::new();
401 | while lock.read_line(&mut line)? != 0 {
402 |     process(&line);
403 |     line.clear();
404 | }
405 | # Ok(())
406 | # }
407 | ```
408 | This reduces the number of allocations to at most a handful, and possibly just
409 | one. (The exact number depends on how many times `line` needs to be
410 | reallocated, which depends on the distribution of line lengths in the file.)
411 | 
412 | This will only work if the loop body can operate on a `&str`, rather than a
413 | `String`.
414 | 
415 | [`BufRead::read_line`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.read_line
416 | 
417 | [**Example**](https://github.com/nnethercote/counts/commit/7d39bbb1867720ef3b9799fee739cd717ad1539a).
418 | 
419 | ## Using an Alternative Allocator
420 | 
421 | It is also possible to improve heap allocation performance without changing
422 | your code, simply by using a different allocator. See the [Alternative
423 | Allocators] section for details.
424 | 
425 | [Alternative Allocators]: build-configuration.md#alternative-allocators
426 | 
427 | ## Avoiding Regressions
428 | 
429 | To ensure the number and/or size of allocations done by your code doesn't
430 | increase unintentionally, you can use the *heap usage testing* feature of
431 | [dhat-rs] to write tests that check particular code snippets allocate the
432 | expected amount of heap memory.
433 | 
434 | [dhat-rs]: https://crates.io/crates/dhat
435 | 


--------------------------------------------------------------------------------
/src/heap-allocations_zh.md:
--------------------------------------------------------------------------------
  1 | # 堆分配
  2 | 
  3 | 堆分配的代价不高。具体细节取决于使用的分配器，但每次分配和deallocation通常都需要获取一个全局锁，做一些非平凡的数据结构操作。并可能执行一个系统调用。小分配不一定比大分配便宜。值得了解哪些Rust数据结构和操作会导致分配，因为避免它们可以大大提高性能。
  4 | 
  5 | [Rust Container Cheat Sheet]有常见的Rust类型的可视化，是下面章节的绝佳配套。
  6 | 
  7 | [Rust Container Cheat Sheet]: https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/
  8 | 
  9 | ## Profiling
 10 | 
 11 | 如果通用profile解析器显示 "malloc"、"free"和相关函数为热函数，那么很可能值得尝试降低分配率或使用其他分配器。
 12 | 
 13 | [DHAT]是降低分配率时可以使用的一个优秀的剖析器。它能精确地识别热分配点及其分配率。确切的结果会有所不同，但使用rustc的经验表明，每执行一百万条指令减少10条分配率可以有可衡量的性能改进（例如~1%）。
 14 | 
 15 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html
 16 | 
 17 | 下面是DHAT的一些输出示例。
 18 | ```text
 19 | AP 1.1/25 (2 children) {
 20 |   Total:     54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration)
 21 |   At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
 22 |   At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
 23 |   Reads:     15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte
 24 |   Writes:    20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte
 25 |   Allocated at {
 26 |     #1: 0x95CACC9: alloc (alloc.rs:72)
 27 |     #2: 0x95CACC9: alloc (alloc.rs:148)
 28 |     #3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
 29 |     #4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
 30 |     #5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
 31 |     #6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
 32 |     #7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
 33 |     #8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
 34 |   }
 35 | }
 36 | ```
 37 | 在这个例子中，描述所有的内容已经超出了本书的范围，但应该清楚的是，DHAT给出了大量关于分配的信息，比如它们发生的地点和频率，它们的规模有多大，它们的寿命有多长，以及它们被访问的频率。
 38 | 
 39 | ## `Box`
 40 | 
 41 | [`Box`]是最简单的堆分配类型。`Box<T>`值是一个在堆上分配的`T`值。
 42 | 
 43 | [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html
 44 | 
 45 | 有时值得将结构体或枚举字段中的一个或多个字段装箱，以使类型更小。(更多信息请参见 [Type Sizes](type-sizes.md) 一章)。
 46 | 
 47 | 除此之外，`Box`是直接的，并没有提供太多优化的空间。
 48 | 
 49 | ## `Rc`/`Arc`
 50 | 
 51 | [`Rc`]/[`Arc`]类似于`Box`，但堆上的值有两个引用计数。它们允许值共享，这可以是减少内存使用的有效方法。
 52 | 
 53 | [`Rc`]：https://doc.rust-lang.org/std/rc/struct.Rc.html
 54 | [Arc]：https://doc.rust-lang.org/std/sync/struct.Arc.html
 55 | 
 56 | 但是，如果用于很少共享的值，它们可以通过堆分配本来可能不会被堆分配的值来提高分配率。
 57 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe).
 58 | 
 59 | 与`Box`不同的是，在`Rc`/`Arc`值上调用`clone`并不涉及分配。相反，它只是增加一个引用计数。
 60 | 
 61 | ## `Vec`
 62 | 
 63 | [`Vec`]是一种堆分配类型，在优化分配数量和/或尽量减少浪费的空间方面有很大的空间。要做到这一点，需要了解其元素的存储方式。
 64 | 
 65 | [`Vec`]：https://doc.rust-lang.org/std/vec/struct.Vec.html
 66 | 
 67 | 一个 "Vec "包含三个词：一个长度、一个容量和一个指针。如果容量是非零，元素大小是非零，指针将指向堆分配的内存；否则，它将不指向分配的内存。
 68 | 
 69 | 即使 "Vec "本身不是堆分配的，元素（如果存在且大小非零）也会是堆分配的。如果存在非零大小的元素，那么存放这些元素的内存可能会比必要的大，为未来的元素提供空间。存在的元素数就是长度，不需要重新分配就可以容纳的元素数就是容量。
 70 | 
 71 | 当向量需要增长到超过其当前容量时，元素将被复制到一个更大的堆分配中，旧的堆分配将被释放。
 72 | 
 73 | ### `Vec` growth
 74 | 
 75 | 用普通方法创建一个新的、空的`Vec`。
 76 | ([`vec![]`](https://doc.rust-lang.org/std/macro.vec.html) 或 [`Vec::new`] 或 [`Vec::default`]）的长度和容量为零，不需要进行堆分配。如果你反复将单个元素推到`Vec`的末端，它将周期性地重新分配。增长策略没有被指定，但在写这篇文章的时候，它使用了一个准双倍策略，结果是以下容量。0, 4, 8, 16, 32, 64, 等等. (它直接从0跳到4，而不是通过1和2，因为这在实践中[避免了许多分配]。) 随着向量的增长，重新分配的频率将以指数形式减少，但可能浪费的多余容量将以指数形式增加。
 77 | 
 78 | [`Vec::new`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.new
 79 | [`Vec::default`]: https://doc.rust-lang.org/std/default/trait.Default.html#tymethod.default
 80 | [避免了许多分配]: https://github.com/rust-lang/rust/pull/72227
 81 | 
 82 | 这种增长策略对于可增长的数据结构来说是典型的，在一般情况下是合理的，但如果你事先知道一个向量的可能长度，你可以做的往往更好。如果你有一个热向量分配站点（例如一个热的 [`Vec::push`]调用），值得使用[`eprintln!`]来打印该站点的向量长度，然后做一些后处理（例如使用[`counts`]）来确定长度分布。例如，你可能有很多短向量，也可能有较少的超长向量，优化分配站点的最佳方式也会相应变化。
 83 | 
 84 | [`Vec::push`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.push
 85 | [`eprintln!`]: https://doc.rust-lang.org/std/macro.eprintln.html
 86 | [`counts`]: https://github.com/nnethercote/counts/
 87 | 
 88 | ### Short `Vec`s
 89 | 
 90 | 如果你有很多短向量，你可以使用[`smallvec`]crate中的`SmallVec`类型。`SmallVec<[T;N]>`是`Vec`的替代物，它可以在`SmallVec`本身中存储`N`个元素，如果元素数量超过这个数量，就会切换到堆分配。(还需要注意的是，`vec![]`字元必须用`smallvec![]`字元代替。)
 91 | [**Example 1**](https://github.com/rust-lang/rust/pull/50565/commits/78262e700dc6a7b57e376742f344e80115d2d3f2),
 92 | [**Example 2**](https://github.com/rust-lang/rust/pull/55383/commits/526dc1421b48e3ee8357d58d997e7a0f4bb26915).
 93 | 
 94 | [`smallvec`]: https://crates.io/crates/smallvec
 95 | 
 96 | `SmallVec`如果使用得当，可以可靠地降低分配率，但使用它并不能保证提高性能。对于正常的操作，它比`Vec`稍慢，因为它必须总是检查元素是否被堆分配。另外，如果`N`很高或者`T`很大，那么`SmallVec<[T; N]>`本身就会比`Vec<T>`大，复制`SmallVec`值的速度会比较慢。和以往一样，需要通过基准测试来确认优化是否有效。
 97 | 
 98 | 如果你有很多短向量，并且你精确地知道它们的最大长度，[arrayvec]箱子中的ArrayVec比SmallVec更好。它不需要回落到堆分配，这使得它更快一些。
 99 | [**Example**](https://github.com/rust-lang/rust/pull/74310/commits/c492ca40a288d8a85353ba112c4d38fe87ef453e).
100 | 
101 | [`arrayvec`]: https://crates.io/crates/arrayvec
102 | 
103 | ### Longer `Vec`s
104 | 
105 | 如果你知道一个向量的最小或精确大小，你可以用[`Vec::with_capacity`]、[`Vec::reserve`]或[`Vec::reserve_exact`]来保留一个特定的容量。例如，如果你知道一个向量将成长为至少有20个元素，这些函数可以使用一次分配立即提供一个至少有20个容量的向量，而一次推送一个项目将导致四次分配（对于4、8、16和32的容量）。
106 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/a7f2bb634308a5f05f2af716482b67ba43701681).
107 | 
108 | [`Vec::with_capacity`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity
109 | [`Vec::reserve`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve
110 | [`Vec::reserve_exact`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve_exact
111 | 
112 | 如果你知道一个向量的最大长度，上述函数也可以让你不分配多余的不必要的空间。同样，[`Vec::shrink_to_fit`]也可以用来尽量减少浪费的空间，但要注意它可能会引起重新分配。
113 | 
114 | [`Vec::shrink_to_fit`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit
115 | 
116 | ## `String`
117 | 
118 | 一个[`String`]包含堆分配的字节。`String`的表示和操作与`Vec<u8>`非常相似。许多与增长和容量有关的`Vec`方法与`String`有对应关系，如[`String::with_capacity`]。
119 | 
120 | [`String`]: https://doc.rust-lang.org/std/string/struct.String.html
121 | [`String::with_capacity`]: https://doc.rust-lang.org/std/string/struct.String.html#method.with_capacity
122 | 
123 | 来自[`smallstr`]箱子的`SmallString`类型与`SmallVec`类型类似。
124 | 
125 | [`smallstr`]: https://crates.io/crates/smallstr
126 | 
127 | 请注意，`format!`宏产生一个`String`，这意味着它进行了分配。如果你能通过使用字符串文字来避免`format!`的调用，就能避免这种分配。
128 | [**Example**](https://github.com/rust-lang/rust/pull/55905/commits/c6862992d947331cd6556f765f6efbde0a709cf9).
129 | 
130 | ## Hash tables
131 | 
132 | [`HashSet`]和[`HashMap`]是哈希表。在分配方面，它们的表示和操作与`Vec`的表示和操作相似：它们有一个单一的连续的堆分配，存放键和值，随着表的增长，必要时重新分配。许多与增长和容量有关的`Vec`方法都有与`HashSet`/`HashMap`对应的方法，如[`HashSet::with_capacity`]。
133 | 
134 | [`HashSet`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html
135 | [`HashMap`]: https://doc.rust-lang.org/std/collections/struct.HashMap.html
136 | [`HashSet::with_capacity`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.with_capacity
137 | 
138 | ## `Cow`
139 | 
140 | 有时候你有一些借来的数据，比如`&str`，大部分是只读的，但偶尔需要修改。每次都克隆数据会很浪费。相反，你可以通过[`Cow`]类型使用 "write-on-clone "语义，它既可以表示借来的数据，也可以表示拥有的数据。
141 | 
142 | [`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html
143 | 
144 | 通常情况下，当从一个借来的值`x`开始时，你用`Cow::Borrowed(x)`把它包在一个`Cow`中。因为`Cow`实现了[`Deref`]，所以你可以直接在它所包含的数据上调用非修改的方法。如果需要修改，[`Cow::to_mut`]将获得一个对所拥有的值的可修改引用，必要时进行克隆。
145 | 
146 | [`Deref`]: https://doc.rust-lang.org/std/ops/trait.Deref.html
147 | [`Cow::to_mut`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html#method.to_mut
148 | 
149 | `Cow`的工作可能会很麻烦，但它往往是值得的。
150 | [**Example 1**](https://github.com/rust-lang/rust/pull/37064/commits/b043e11de2eb2c60f7bfec5e15960f537b229e20),
151 | [**Example 2**](https://github.com/rust-lang/rust/pull/50855/commits/ad471452ba6fbbf91ad566dc4bdf1033a7281811),
152 | [**Example 3**](https://github.com/rust-lang/rust/pull/56336/commits/787959c20d062d396b97a5566e0a766d963af022),
153 | [**Example 4**](https://github.com/rust-lang/rust/pull/68848/commits/67da45f5084f98eeb20cc6022d68788510dc832a).
154 | 
155 | ## `clone`
156 | 
157 | 在一个包含堆分配内存的值上调用[clone]通常会涉及额外的分配。例如，在一个非空的`Vec`上调用`clone`，需要对元素进行新的分配（但请注意，新`Vec`的容量可能与原`Vec`的容量不同）。例外的情况是`Rc`/`Arc`，`clone`的调用只是增加引用数。
158 | 
159 | [`clone`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone
160 | 
161 | [`clone_from`]是`clone`的替代方法。`a.clone_from(&b)`相当于`a = b.clone()`，但可以避免不必要的分配。例如，如果你想在一个现有的`Vec`之上克隆一个`Vec`，现有`Vec`的堆分配将尽可能地被重复使用，如下例所示。
162 | ```rust
163 | let mut v1: Vec<u32> = Vec::with_capacity(99);
164 | let v2: Vec<u32> = vec![1, 2, 3];
165 | v1.clone_from(&v2); // v1's allocation is reused
166 | assert_eq!(v1.capacity(), 99);
167 | ```
168 | 虽然`clone`通常会造成分配，但在很多情况下使用它是一件很合理的事情，往往可以使代码更简单。使用剖析数据来查看哪些`clone`调用是热门的，值得花力气去避免。
169 | 
170 | [`clone_from`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#method.clone_from
171 | 
172 | 有时，由于(a)程序员的错误，或(b)代码中的变化，使以前必要的`clone`调用变得不必要，Rust代码最终会包含不必要的`clone`调用。如果你看到一个似乎没有必要的热`clone`调用，有时可以简单地删除它。
173 | [**Example 1**](https://github.com/rust-lang/rust/pull/37318/commits/e382267cfb9133ef12d59b66a2935ee45b546a61),
174 | [**Example 2**](https://github.com/rust-lang/rust/pull/37705/commits/11c1126688bab32f76dbe1a973906c7586da143f),
175 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/36b37e22de92b584b9cf4464ed1d4ad317b798be).
176 | 
177 | ## `to_owned`
178 | 
179 | `ToOwned::to_owned`]是为许多常见类型实现的。它从借来的数据中创建拥有的数据，通常是通过克隆的方式，因此经常会引起堆分配。例如，它可以用来从一个`&str`创建一个`String`。
180 | 
181 | [`ToOwned::to_owned`]: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html#tymethod.to_owned
182 | 
183 | 有时，可以通过在结构中存储对借入数据的引用而不是自有副本来避免`to_owned`调用。这需要在结构体上做终身注解，使代码复杂化，只有在分析和基准测试表明值得时才可以这样做。
184 | [**Example**](https://github.com/rust-lang/rust/pull/50855/commits/6872377357dbbf373cfd2aae352cb74cfcc66f34).
185 | 
186 | ## `Cow`
187 | 
188 | 有时代码处理的是借用和拥有数据的混合。想象一下一个包含错误消息的向量，其中一些是静态字符串字面量，另一些是用 `format!` 构造的。显而易见的表示是 `Vec<String>`，如下例所示。
189 | ```rust
190 | let mut errors: Vec<String> = vec![];
191 | errors.push("something went wrong".to_string());
192 | errors.push(format!("something went wrong on line {}", 100));
193 | ```
194 | 这需要一个 `to_string` 调用来将静态字符串字面量提升为 `String`，这会产生一次分配。
195 | 
196 | 相反，您可以使用 [`Cow`] 类型，它可以保存借用或拥有的数据。借用值 `x` 被包装在 `Cow::Borrowed(x)` 中，拥有值 `y` 被包装在 `Cow::Owned(y)` 中。`Cow` 还为各种字符串、切片和路径类型实现了 `From<T>` trait，因此通常也可以使用 `into`。 （或者 `Cow::from`，这更长一些，但会产生更易读的代码，因为它使类型更清晰。）以下示例将所有这些内容整合在一起。
197 | 
198 | [`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html
199 | 
200 | ```rust
201 | use std::borrow::Cow;
202 | let mut errors: Vec<Cow<'static, str>> = vec![];
203 | errors.push(Cow::Borrowed("something went wrong"));
204 | errors.push(Cow::Owned(format!("something went wrong on line {}", 100)));
205 | errors.push(Cow::from("something else went wrong"));
206 | errors.push(format!("something else went wrong on line {}", 101).into());
207 | ```
208 | 
209 | 现在，`errors` 包含了借用和拥有数据的混合，而无需进行任何额外的分配。这个示例涉及 `&str`/`String`，但其他配对，如 `&[T]`/`Vec<T>` 和 `&Path`/`PathBuf` 也是可能的。
210 | 
211 | 
212 | [**Example 1**](https://github.com/rust-lang/rust/pull/37064/commits/b043e11de2eb2c60f7bfec5e15960f537b229e20),
213 | [**Example 2**](https://github.com/rust-lang/rust/pull/56336/commits/787959c20d062d396b97a5566e0a766d963af022).
214 | 
215 | 以上所有内容适用于数据是不可变的情况。但是，`Cow` 也允许将借用数据提升为拥有数据，如果需要对其进行修改。[`Cow::to_mut`] 将获得一个可变引用到一个拥有值，必要时进行克隆。这就是所谓的“写时复制”，也是 `Cow` 名称的由来。
216 | 
217 | ## Reusing Collections
218 | 
219 | 有时你需要分阶段建立一个集合，如`Vec`。通常情况下，通过修改一个`Vec`比建立多个`Vec`然后将它们组合起来更好。
220 | 
221 | 例如，如果你有一个函数 "do_stuff"，产生一个 "Vec"，可能会被多次调用。
222 | ```rust
223 | fn do_stuff(x: u32, y: u32) -> Vec<u32> {
224 |     vec![x, y]
225 | }
226 | ```
227 | It might be better to instead modify a passed-in `Vec`:
228 | ```rust
229 | fn do_stuff(x: u32, y: u32, vec: &mut Vec<u32>) {
230 |     vec.push(x);
231 |     vec.push(y);
232 | }
233 | ```
234 | 有时，值得保留一个可以重复使用的 "主力"集合。例如，如果一个循环的每次迭代都需要一个`Vec`，你可以在循环外声明`Vec`，在循环体中使用它，然后在循环体结束时调用[`clear`]（清空`Vec`而不影响它的容量）。这避免了分配，但代价是掩盖了一个事实，即每次迭代对`Vec`的使用与其他迭代无关。
235 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323),
236 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7).
237 | 
238 | [`clear`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.clear
239 | 
240 | 同样，有时值得在一个结构中保留一个 "主力 "集合，以便在一个或多个被重复调用的方法中重用。
241 | 
242 | ## 从文件中逐行读取
243 | 
244 | [`BufRead::lines`] 使得逐行读取文件变得很容易：
245 | ```rust
246 | # fn blah() -> Result<(), std::io::Error> {
247 | # fn process(_: &str) {}
248 | use std::io::{self, BufRead};
249 | let mut lock = io::stdin().lock();
250 | for line in lock.lines() {
251 |     process(&line?);
252 | }
253 | # Ok(())
254 | # }
255 | ```
256 | 但它生成的迭代器返回的是 `io::Result<String>`，这意味着它为文件中的每一行分配内存。
257 | 
258 | [`BufRead::lines`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.lines
259 | 
260 | 另一种方法是在循环中使用一个工作用的 `String`，通过 [`BufRead::read_line`]：
261 | ```rust
262 | # fn blah() -> Result<(), std::io::Error> {
263 | # fn process(_: &str) {}
264 | use std::io::{self, BufRead};
265 | let mut lock = io::stdin().lock();
266 | let mut line = String::new();
267 | while lock.read_line(&mut line)? != 0 {
268 |     process(&line);
269 |     line.clear();
270 | }
271 | # Ok(())
272 | # }
273 | ```
274 | 这样可以将分配的次数降至最多几次，甚至可能只有一次。（确切的次数取决于需要多少次重新分配 `line`，这取决于文件中行长度的分布情况。）
275 | 
276 | 这种方法只适用于循环体能够操作 `&str` 而不是 `String` 的情况。
277 | 
278 | [`BufRead::read_line`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.read_line
279 | 
280 | [**Example**](https://github.com/nnethercote/counts/commit/7d39bbb1867720ef3b9799fee739cd717ad1539a).
281 | 
282 | ## 使用不同的分配器
283 | 
284 | 除了更改代码外，还可以通过使用不同的分配器来改善堆分配性能。请参阅 [Alternative Allocators] 部分获取详细信息。
285 | 
286 | [Alternative Allocators]: build-configuration.md#alternative-allocators_zh
287 | 
288 | ## 避免回归
289 | 
290 | 为了确保代码执行时的分配次数和/或大小不会意外增加，您可以使用 [dhat-rs] 的 *堆使用测试* 功能编写测试，检查特定代码段分配了预期量的堆内存。
291 | 
292 | [dhat-rs]: https://crates.io/crates/dhat


--------------------------------------------------------------------------------
/src/inlining.md:
--------------------------------------------------------------------------------
  1 | # Inlining
  2 | 
  3 | Entry to and exit from hot, uninlined functions often accounts for a
  4 | non-trivial fraction of execution time. Inlining these functions can provide
  5 | small but easy speed wins. 
  6 | 
  7 | There are four inline attributes that can be used on Rust functions.
  8 | - **None**. The compiler will decide itself if the function should be inlined.
  9 |   This will depend on factors such as the optimization level and the size of
 10 |   the function. Non-generic functions will never be inlined across crate
 11 |   boundaries unless link-time optimization is used; generic functions might be.
 12 | - **`#[inline]`**. This suggests that the function should be inlined, including
 13 |   across crate boundaries.
 14 | - **`#[inline(always)]`**. This strongly suggests that the function should be
 15 |   inlined, including across crate boundaries.
 16 | - **`#[inline(never)]`**. This strongly suggests that the function should not
 17 |   be inlined.
 18 | 
 19 | Inline attributes do not guarantee that a function is inlined or not inlined,
 20 | but in practice `#[inline(always)]` will cause inlining in all but the most
 21 | exceptional cases.
 22 | 
 23 | Inlining is non-transitive. If a function `f` calls a function `g` and you want
 24 | both functions to be inlined together at a callsite to `f`, both functions
 25 | should be marked with an inline attribute.
 26 | 
 27 | ## Simple Cases
 28 | 
 29 | The best candidates for inlining are (a) functions that are very small, or (b)
 30 | functions that have a single call site. The compiler will often inline these
 31 | functions itself even without an inline attribute. But the compiler cannot
 32 | always make the best choices, so attributes are sometimes needed.
 33 | [**Example 1**](https://github.com/rust-lang/rust/pull/37083/commits/6a4bb35b70862f33ac2491ffe6c55fb210c8490d),
 34 | [**Example 2**](https://github.com/rust-lang/rust/pull/50407/commits/e740b97be699c9445b8a1a7af6348ca2d4c460ce),
 35 | [**Example 3**](https://github.com/rust-lang/rust/pull/50564/commits/77c40f8c6f8cc472f6438f7724d60bf3b7718a0c),
 36 | [**Example 4**](https://github.com/rust-lang/rust/pull/57719/commits/92fd6f9d30d0b6b4ecbcf01534809fb66393f139),
 37 | [**Example 5**](https://github.com/rust-lang/rust/pull/69256/commits/e761f3af904b3c275bdebc73bb29ffc45384945d).
 38 | 
 39 | Cachegrind is a good profiler for determining if a function is inlined. When
 40 | looking at Cachegrind's output, you can tell that a function has been inlined
 41 | if (and only if) its first and last lines are *not* marked with event counts.
 42 | For example:
 43 | ```text
 44 |       .  #[inline(always)]
 45 |       .  fn inlined(x: u32, y: u32) -> u32 {
 46 | 700,000      eprintln!("inlined: {} + {}", x, y);
 47 | 200,000      x + y
 48 |       .  }
 49 |       .  
 50 |       .  #[inline(never)]
 51 | 400,000  fn not_inlined(x: u32, y: u32) -> u32 {
 52 | 700,000      eprintln!("not_inlined: {} + {}", x, y);
 53 | 200,000      x + y
 54 | 200,000  }
 55 | ```
 56 | You should measure again after adding inline attributes, because the effects
 57 | can be unpredictable. Sometimes it has no effect because a nearby function that
 58 | was previously inlined no longer is. Sometimes it slows the code down. Inlining
 59 | can also affect compile times, especially cross-crate inlining which involves
 60 | duplicating internal representations of the functions.
 61 | 
 62 | ## Harder Cases
 63 | 
 64 | Sometimes you have a function that is large and has multiple call sites, but
 65 | only one call site is hot. You would like to inline the hot call site for
 66 | speed, but not inline the cold call sites to avoid unnecessary code bloat. The
 67 | way to handle this is to split the function always-inlined and never-inlined
 68 | variants, with the latter calling the former.
 69 | 
 70 | For example, this function:
 71 | ```rust
 72 | # fn one() {};
 73 | # fn two() {};
 74 | # fn three() {};
 75 | fn my_function() {
 76 |     one();
 77 |     two();
 78 |     three();
 79 | }
 80 | ```
 81 | Would become these two functions:
 82 | ```rust
 83 | # fn one() {};
 84 | # fn two() {};
 85 | # fn three() {};
 86 | // Use this at the hot call site.
 87 | #[inline(always)]
 88 | fn inlined_my_function() {
 89 |     one();
 90 |     two();
 91 |     three();
 92 | }
 93 | 
 94 | // Use this at the cold call sites.
 95 | #[inline(never)]
 96 | fn uninlined_my_function() {
 97 |     inlined_my_function();
 98 | }
 99 | ```
100 | [**Example 1**](https://github.com/rust-lang/rust/pull/53513/commits/b73843f9422fb487b2d26ac2d65f79f73a4c9ae3),
101 | [**Example 2**](https://github.com/rust-lang/rust/pull/64420/commits/a2261ad66400c3145f96ebff0d9b75e910fa89dd).
102 | 
103 | 


--------------------------------------------------------------------------------
/src/inlining_zh.md:
--------------------------------------------------------------------------------
 1 | # Inlining
 2 | 
 3 | 调用和退出热函数、未内联的函数往往占执行时间的一小部分。内联这些函数可以提供小而简单的速度优势。
 4 | 
 5 | 有四个内联属性可以用于Rust函数。
 6 | - **None**. 编译器会自己决定是否应该内联函数。这将取决于优化级别、函数的大小等。如果你没有使用链接时间优化，函数永远不会跨箱子内联。
 7 | - **`#[inline]`**. 这表明该函数应该内嵌，包括跨越crate边界。
 8 | - **`#[inline(always)]`**. 这强烈建议该函数应该内嵌，包括跨越crate边界。
 9 | - **`#[inline(never)]`**. 这强烈表示该函数不应被内联。
10 | 
11 | 内联属性并不保证函数是否会被内联，但实际上 `#[inline(always)]` 会导致内联，除非在极端情况下。
12 | 
13 | 内联不具有传递性。如果函数 `f` 调用函数 `g`，并且您希望这两个函数在调用 `f` 的地方一起内联，那么这两个函数都应该标记为内联属性。
14 | 
15 | ## Simple Cases
16 | 
17 | 最适合内联的是(a)非常小的函数，或者(b)只有一个调用点的函数。编译器通常会自己内联这些函数，即使没有内联属性。但是编译器不可能总是做出最好的选择，所以有时需要属性。
18 | [**Example 1**](https://github.com/rust-lang/rust/pull/37083/commits/6a4bb35b70862f33ac2491ffe6c55fb210c8490d),
19 | [**Example 2**](https://github.com/rust-lang/rust/pull/50407/commits/e740b97be699c9445b8a1a7af6348ca2d4c460ce),
20 | [**Example 3**](https://github.com/rust-lang/rust/pull/50564/commits/77c40f8c6f8cc472f6438f7724d60bf3b7718a0c),
21 | [**Example 4**](https://github.com/rust-lang/rust/pull/57719/commits/92fd6f9d30d0b6b4ecbcf01534809fb66393f139),
22 | [**Example 5**](https://github.com/rust-lang/rust/pull/69256/commits/e761f3af904b3c275bdebc73bb29ffc45384945d).
23 | 
24 | Cachegrind是一个很好的判断函数是否被内联的剖析器。当查看Cachegrind的输出时，如果（也只有当）函数的第一行和最后一行没有*标记事件数，你就可以判断该函数被内联了。
25 | 例如
26 | ```text
27 |       .  #[inline(always)]
28 |       .  fn inlined(x: u32, y: u32) -> u32 {
29 | 700,000      eprintln!("inlined: {} + {}", x, y);
30 | 200,000      x + y
31 |       .  }
32 |       .  
33 |       .  #[inline(never)]
34 | 400,000  fn not_inlined(x: u32, y: u32) -> u32 {
35 | 700,000      eprintln!("not_inlined: {} + {}", x, y);
36 | 200,000      x + y
37 | 200,000  }
38 | ```
39 | 添加内联属性后你应该再测一次，因为效果可能是不可预知的。有时它没有效果，因为附近一个之前内联的函数不再内联了。有时会拖慢代码的速度。内联也会影响编译时间，特别是交叉速率内联，它涉及到重复函数的内部表示。
40 | 
41 | ## Harder Cases
42 | 
43 | 有时候，你有一个函数很大，有多个调用站点，但只有一个调用点是热调用点。你希望内联热调用点以提高速度，但不内联冷调用点以避免不必要的代码膨胀。处理的方法是将函数分成总是内联和从不内联的部分，后者调用前者。
44 | 
45 | 例如，这个函数。
46 | ```rust
47 | # fn one() {};
48 | # fn two() {};
49 | # fn three() {};
50 | fn my_function() {
51 |     one();
52 |     two();
53 |     three();
54 | }
55 | ```
56 | 应该修改为如下函数
57 | ```rust
58 | # fn one() {};
59 | # fn two() {};
60 | # fn three() {};
61 | // Use this at the hot call site.
62 | #[inline(always)]
63 | fn inlined_my_function() {
64 |     one();
65 |     two();
66 |     three();
67 | }
68 | 
69 | // Use this at the cold call sites.
70 | #[inline(never)]
71 | fn uninlined_my_function() {
72 |     inlined_my_function();
73 | }
74 | ```
75 | [**Example 1**](https://github.com/rust-lang/rust/pull/53513/commits/b73843f9422fb487b2d26ac2d65f79f73a4c9ae3),
76 | [**Example 2**](https://github.com/rust-lang/rust/pull/64420/commits/a2261ad66400c3145f96ebff0d9b75e910fa89dd).
77 | 
78 | 


--------------------------------------------------------------------------------
/src/introduction.md:
--------------------------------------------------------------------------------
 1 | # Introduction
 2 | 
 3 | Performance is important for many Rust programs. 
 4 | 
 5 | This book contains techniques that can improve the performance-related
 6 | characteristics of Rust programs, such as runtime speed, memory usage, and
 7 | binary size. The [Compile Times] section also contains techniques that will
 8 | improve the compile times of Rust programs. Some techniques only require
 9 | changing build configurations, but many require changing code.
10 | 
11 | [Compile Times]: compile-times.md
12 | 
13 | Some techniques are entirely Rust-specific, and some involve ideas that can be
14 | applied (often with modifications) to programs written in other languages. The
15 | [General Tips] section also includes some general principles that apply to any
16 | programming language. Nonetheless, this book is mostly about the performance of
17 | Rust programs and is no substitute for a general purpose guide to profiling and
18 | optimization.
19 | 
20 | [General Tips]: general-tips.md
21 | 
22 | This book also focuses on techniques that are practical and proven: many are
23 | accompanied by links to pull requests or other resources that show how the
24 | technique was used on a real-world Rust program. It reflects the primary
25 | author's background, being somewhat biased towards compiler development and
26 | away from other areas such as scientific computing.
27 | 
28 | This book is deliberately terse, favouring breadth over depth, so that it is
29 | quick to read. It links to external sources that provide more depth when
30 | appropriate.
31 | 
32 | This book is aimed at intermediate and advanced Rust users. Beginner Rust users
33 | have more than enough to learn and these techniques are likely to be an
34 | unhelpful distraction to them.
35 | 


--------------------------------------------------------------------------------
/src/introduction_zh.md:
--------------------------------------------------------------------------------
 1 | # 简介
 2 | 
 3 | 性能对许多Rust程序来说都很重要。
 4 | 
 5 | 本书包含了许多可以提高Rust程序的性能-速度和内存使用率的技术，其中[编译时间]部分也包含了一些可以提高Rust程序编译时间的技术。编译时间]部分也包含了一些可以改善Rust程序编译时间的技术。本书的一些技术只需要改变构建配置，但许多技术需要改变代码。
 6 | 
 7 | [编译时间]: compile-times_zh.md
 8 | 
 9 | 一些技术完全是 Rust 特有的，而一些涉及的思想可以应用于其他编程语言编写的程序（通常需要进行修改）。[General Tips] 部分还包括适用于任何编程语言的一些一般原则。尽管如此，这本书主要关注 Rust 程序的性能，不能替代一本关于分析和优化的通用指南。
10 | 
11 | [General Tips]: general-tips_zh.md
12 | 
13 | 本书侧重于实用且经过验证的技术：许多技术都附有指向拉取请求或其他资源的链接，展示了这些技术在真实的 Rust 程序中的应用。这反映了主要作者的背景，有点偏向编译器开发，而不太涉及其他领域，如科学计算。
14 | 
15 | 本书有意简洁，更注重广度而非深度，使其阅读起来迅速。在适当的情况下，它会链接到提供更深入信息的外部资源。
16 | 
17 | 本书面向中级和高级 Rust 用户。初学者 Rust 用户有很多东西要学习，这些技术可能会对他们造成不必要的干扰。


--------------------------------------------------------------------------------
/src/io.md:
--------------------------------------------------------------------------------
  1 | # I/O
  2 | 
  3 | ## Locking
  4 | 
  5 | Rust's [`print!`] and [`println!`] macros lock stdout on every call. If you
  6 | have repeated calls to these macros it may be better to lock stdout manually.
  7 | 
  8 | [`print!`]: https://doc.rust-lang.org/std/macro.print.html
  9 | [`println!`]: https://doc.rust-lang.org/std/macro.println.html
 10 | 
 11 | For example, change this code:
 12 | ```rust
 13 | # let lines = vec!["one", "two", "three"];
 14 | for line in lines {
 15 |     println!("{}", line);
 16 | }
 17 | ```
 18 | to this:
 19 | ```rust
 20 | # fn blah() -> Result<(), std::io::Error> {
 21 | # let lines = vec!["one", "two", "three"];
 22 | use std::io::Write;
 23 | let mut stdout = std::io::stdout();
 24 | let mut lock = stdout.lock();
 25 | for line in lines {
 26 |     writeln!(lock, "{}", line)?;
 27 | }
 28 | // stdout is unlocked when `lock` is dropped
 29 | # Ok(())
 30 | # }
 31 | ```
 32 | stdin and stderr can likewise be locked when doing repeated operations on them.
 33 | 
 34 | ## Buffering
 35 | 
 36 | Rust file I/O is unbuffered by default. If you have many small and repeated
 37 | read or write calls to a file or network socket, use [`BufReader`] or
 38 | [`BufWriter`]. They maintain an in-memory buffer for input and output,
 39 | minimizing the number of system calls required.
 40 | 
 41 | [`BufReader`]: https://doc.rust-lang.org/std/io/struct.BufReader.html
 42 | [`BufWriter`]: https://doc.rust-lang.org/std/io/struct.BufWriter.html
 43 | 
 44 | For example, change this unbuffered writer code:
 45 | ```rust
 46 | # fn blah() -> Result<(), std::io::Error> {
 47 | # let lines = vec!["one", "two", "three"];
 48 | use std::io::Write;
 49 | let mut out = std::fs::File::create("test.txt")?;
 50 | for line in lines {
 51 |     writeln!(out, "{}", line)?;
 52 | }
 53 | # Ok(())
 54 | # }
 55 | ```
 56 | to this:
 57 | ```rust
 58 | # fn blah() -> Result<(), std::io::Error> {
 59 | # let lines = vec!["one", "two", "three"];
 60 | use std::io::{BufWriter, Write};
 61 | let mut out = BufWriter::new(std::fs::File::create("test.txt")?);
 62 | for line in lines {
 63 |     writeln!(out, "{}", line)?;
 64 | }
 65 | out.flush()?;
 66 | # Ok(())
 67 | # }
 68 | ```
 69 | [**Example 1**](https://github.com/rust-lang/rust/pull/93954),
 70 | [**Example 2**](https://github.com/nnethercote/dhat-rs/pull/22/commits/8c3ae26f1219474ee55c30bc9981e6af2e869be2).
 71 | 
 72 | The explicit call to [`flush`] is not strictly necessary, as flushing will
 73 | happen automatically when `out` is dropped. However, in that case any error
 74 | that occurs on flushing will be ignored, whereas an explicit flush will make
 75 | that error explicit.
 76 | 
 77 | [`flush`]: https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.flush
 78 | 
 79 | Forgetting to buffer is more common when writing. Both unbuffered and buffered
 80 | writers implement the [`Write`] trait, which means the code for writing
 81 | to an unbuffered writer and a buffered writer is much the same. In contrast,
 82 | unbuffered readers implement the [`Read`] trait but buffered readers implement
 83 | the [`BufRead`] trait, which means the code for reading from an unbuffered reader
 84 | and a buffered reader is different. For example, it is difficult to read a file
 85 | line by line with an unbuffered reader, but it is trivial with a buffered
 86 | reader by using [`BufRead::read_line`] or [`BufRead::lines`]. For this reason,
 87 | it is hard to write an example for readers like the one above for writers,
 88 | where the before and after versions are so similar.
 89 | 
 90 | [`Write`]: https://doc.rust-lang.org/std/io/trait.Write.html
 91 | [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html
 92 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
 93 | [`BufRead::read_line`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_line
 94 | [`BufRead::lines`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines
 95 | 
 96 | Finally, note that buffering also works with stdout, so you might want to
 97 | combine manual locking *and* buffering when making many writes to stdout.
 98 | 
 99 | ## Reading Lines from a File
100 | 
101 | [This section] explains how to avoid excessive allocations when using
102 | [`BufRead`] to read a file one line at a time.
103 | 
104 | [This section]: heap-allocations.md#reading-lines-from-a-file
105 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
106 | 
107 | ## Reading Input as Raw Bytes
108 | 
109 | The built-in [String] type uses UTF-8 internally, which adds a small, but
110 | nonzero overhead caused by UTF-8 validation when you read input into it. If you
111 | just want to process input bytes without worrying about UTF-8 (for example if
112 | you handle ASCII text), you can use [`BufRead::read_until`].
113 | 
114 | [String]: https://doc.rust-lang.org/std/string/struct.String.html
115 | [`BufRead::read_until`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until
116 | 
117 | There are also dedicated crates for reading [byte-oriented lines of data]
118 | and working with [byte strings].
119 | 
120 | [byte-oriented lines of data]: https://github.com/Freaky/rust-linereader
121 | [byte strings]: https://github.com/BurntSushi/bstr
122 | 


--------------------------------------------------------------------------------
/src/io_zh.md:
--------------------------------------------------------------------------------
 1 | # I/O
 2 | 
 3 | ## Locking
 4 | 
 5 | Rust的[`print!`]和[`println!`]宏在每次调用时锁定stdout。如果你需要重复调用这些宏，最好手动锁定stdout。
 6 | 
 7 | [`print!`]: https://doc.rust-lang.org/std/macro.print.html
 8 | [`println!`]: https://doc.rust-lang.org/std/macro.println.html
 9 | 
10 | 例如，修改这段代码。
11 | ```rust
12 | # let lines = vec!["one", "two", "three"];
13 | for line in lines {
14 |     println!("{}", line);
15 | }
16 | ```
17 | to this:
18 | ```rust
19 | # fn blah() -> Result<(), std::io::Error> {
20 | # let lines = vec!["one", "two", "three"];
21 | use std::io::Write;
22 | let mut stdout = std::io::stdout();
23 | let mut lock = stdout.lock();
24 | for line in lines {
25 |     writeln!(lock, "{}", line)?;
26 | }
27 | // stdout is unlocked when `lock` is dropped
28 | # Ok(())
29 | # }
30 | ```
31 | 当对stdin和stderr进行重复操作时，同样可以锁定它们。
32 | 
33 | ## 缓冲
34 | 
35 | Rust文件I/O默认是无缓冲的。如果你对文件或网络套接字有许多小的和重复的读写调用，使用[`BufReader`]或[`BufWriter`]。它们为输入和输出维护了一个内存缓冲区，最大限度地减少了系统调用的次数。
36 | 
37 | [`BufReader`]: https://doc.rust-lang.org/std/io/struct.BufReader.html
38 | [`BufWriter`]: https://doc.rust-lang.org/std/io/struct.BufWriter.html
39 | 
40 | 例如，修改这个无缓冲的输出代码。
41 | ```rust
42 | # fn blah() -> Result<(), std::io::Error> {
43 | # let lines = vec!["one", "two", "three"];
44 | use std::io::Write;
45 | let mut out = std::fs::File::create("test.txt").unwrap();
46 | for line in lines {
47 |     writeln!(out, "{}", line)?;
48 | }
49 | # Ok(())
50 | # }
51 | ```
52 | 修改为：
53 | ```rust
54 | # fn blah() -> Result<(), std::io::Error> {
55 | # let lines = vec!["one", "two", "three"];
56 | use std::io::{BufWriter, Write};
57 | let mut out = std::fs::File::create("test.txt")?;
58 | let mut buf = BufWriter::new(out);
59 | for line in lines {
60 |     writeln!(buf, "{}", line)?;
61 | }
62 | buf.flush()?;
63 | # Ok(())
64 | # }
65 | ```
66 | 对[`flush`]的显式调用并不是绝对必要的，因为当`buf`被丢弃时，刷新将自动发生。然而，在这种情况下，刷新时发生的任何错误都将被忽略，而显式刷新将使该错误显式化。
67 | 
68 | [`flush`]: https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.flush
69 | 
70 | 在编写时忘记使用缓冲区是比较常见的。无缓冲和有缓冲的写入器都实现了 [Write] trait，这意味着向无缓冲写入器和有缓冲写入器写入的代码基本相同。相比之下，无缓冲读取器实现了 [Read] trait，但有缓冲读取器实现了 [BufRead] trait，这意味着从无缓冲读取器和有缓冲读取器读取的代码是不同的。例如，使用无缓冲读取器逐行读取文件是困难的，但使用有缓冲读取器通过 [BufRead::read_line] 或 [BufRead::lines] 则很简单。因此，对于读取器来说，很难像对写入器那样编写一个示例，其中之前和之后的版本是如此相似。
71 | 
72 | [`Write`]: https://doc.rust-lang.org/std/io/trait.Write.html
73 | [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html
74 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
75 | [`BufRead::read_line`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_line
76 | [`BufRead::lines`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines
77 | 
78 | 最后，请注意，缓冲也适用于标准输出(stdout)，因此当向标准输出频繁写入时，您可能希望结合手动锁定和缓冲。
79 | 
80 | ## Reading Lines from a File
81 | 
82 | [这一部分]解释了如何在使用 [BufRead] 逐行读取文件时避免过多的内存分配。
83 | 
84 | [这一部分]: heap-allocations_zh.md#reading-lines-from-a-file
85 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
86 | 
87 | ## Reading Input as Raw Bytes
88 | 
89 | 内置的[String]类型在内部使用UTF-8，当你读取输入到string类型时，会增加一个由UTF-8验证引起的小但非零的开销。如果你只想处理输入字节而不担心UTF-8（例如，如果你处理ASCII文本），你可以使用[`BufRead::read_until`]。
90 | 
91 | [String]: https://doc.rust-lang.org/std/string/struct.String.html
92 | [`BufRead::read_until`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until
93 | 
94 | 还有专门的箱子用于读取[面向字节的数据行]和处理[byte strings]。
95 | 
96 | [面向字节的数据行]: https://github.com/Freaky/rust-linereader
97 | [byte strings]: https://github.com/BurntSushi/bstr
98 | 


--------------------------------------------------------------------------------
/src/iterators.md:
--------------------------------------------------------------------------------
 1 | # Iterators
 2 | 
 3 | ## `collect` and `extend`
 4 | 
 5 | [`Iterator::collect`] converts an iterator into a collection such as `Vec`,
 6 | which typically requires an allocation. You should avoid calling `collect` if
 7 | the collection is then only iterated over again.
 8 | 
 9 | [`Iterator::collect`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect
10 | 
11 | For this reason, it is often better to return an iterator type like `impl
12 | Iterator<Item=T>` from a function than a `Vec<T>`. Note that sometimes
13 | additional lifetimes are required on these return types, as [this blog post]
14 | explains.
15 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/660d8a6550a126797aa66a417137e39a5639451b).
16 | 
17 | [this blog post]: https://blog.katona.me/2019/12/29/Rust-Lifetimes-and-Iterators/
18 | 
19 | Similarly, you can use [`extend`] to extend an existing collection (such as a
20 | `Vec`) with an iterator, rather than collecting the iterator into a `Vec` and
21 | then using [`append`].
22 | 
23 | [`extend`]: https://doc.rust-lang.org/std/iter/trait.Extend.html#tymethod.extend
24 | [`append`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.append
25 | 
26 | Finally, when you write an iterator it is often worth implementing the
27 | [`Iterator::size_hint`] or [`ExactSizeIterator::len`] method, if possible.
28 | `collect` and `extend` calls that use the iterator may then do fewer
29 | allocations, because they have advance information about the number of elements
30 | yielded by the iterator.
31 | 
32 | [`Iterator::size_hint`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint
33 | [`ExactSizeIterator::len`]: https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#method.len
34 | 
35 | ## Chaining
36 | 
37 | [`chain`] can be very convenient, but it can also be slower than a single
38 | iterator. It may be worth avoiding for hot iterators, if possible.
39 | [**Example**](https://github.com/rust-lang/rust/pull/64801/commits/5ca99b750e455e9b5e13e83d0d7886486231e48a).
40 | 
41 | Similarly, [`filter_map`] may be faster than using [`filter`] followed by
42 | [`map`].
43 | 
44 | [`chain`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.chain
45 | [`filter_map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter_map
46 | [`filter`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter
47 | [`map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map
48 | 
49 | ## Chunks
50 | 
51 | When a chunking iterator is required and the chunk size is known to exactly
52 | divide the slice length, use the faster [`slice::chunks_exact`] instead of [`slice::chunks`].
53 | 
54 | When the chunk size is not known to exactly divide the slice length, it can
55 | still be faster to use `slice::chunks_exact` in combination with either
56 | [`ChunksExact::remainder`] or manual handling of excess elements.
57 | [**Example 1**](https://github.com/johannesvollmer/exrs/pull/173/files),
58 | [**Example 2**](https://github.com/johannesvollmer/exrs/pull/175/files).
59 | 
60 | The same is true for related iterators:
61 | - [`slice::rchunks`], [`slice::rchunks_exact`], and [`RChunksExact::remainder`];
62 | - [`slice::chunks_mut`], [`slice::chunks_exact_mut`], and [`ChunksExactMut::into_remainder`];
63 | - [`slice::rchunks_mut`], [`slice::rchunks_exact_mut`], and [`RChunksExactMut::into_remainder`].
64 | 
65 | [`slice::chunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks
66 | [`slice::chunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact
67 | [`ChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExact.html#method.remainder
68 | 
69 | [`slice::rchunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks
70 | [`slice::rchunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact
71 | [`RChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExact.html#method.remainder
72 | 
73 | [`slice::chunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_mut
74 | [`slice::chunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact_mut
75 | [`ChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExactMut.html#method.into_remainder
76 | 
77 | [`slice::rchunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_mut
78 | [`slice::rchunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact_mut
79 | [`RChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExactMut.html#method.into_remainder
80 | 
81 | 


--------------------------------------------------------------------------------
/src/iterators_zh.md:
--------------------------------------------------------------------------------
 1 | # Iterators
 2 | 
 3 | ## `collect`
 4 | 
 5 | [`Iterator::collect`]将一个迭代器转换为一个集合，如`Vec`，它通常需要一个分配。如果该集合只是再次迭代，你应该避免调用`collect`。
 6 | 
 7 | [`Iterator::collect`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect
 8 | 
 9 | 出于这个原因，从函数中返回一个迭代器类型，比如`impl Iterator<Item=T>`，往往比`Vec<T>`更好。请注意，有时这些返回类型需要额外的生存期，正如[this post]所解释的那样。
10 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/660d8a6550a126797aa66a417137e39a5639451b).
11 | 
12 | [this post]: https://blog.katona.me/2019/12/29/Rust-Lifetimes-and-Iterators/
13 | 
14 | 同样，你可以使用[`extend`]用迭代器扩展一个现有的集合（如`Vec`），而不是将迭代器收集到`Vec`中，然后使用[`append`]。
15 | 
16 | [`extend`]: https://doc.rust-lang.org/std/iter/trait.Extend.html#tymethod.extend
17 | [`append`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.append
18 | 
19 | 最后，当编写迭代器时，如果可能的话，实现 [`Iterator::size_hint`] 或 [`ExactSizeIterator::len`] 方法通常是值得的。使用该迭代器的 `collect` 和 `extend` 调用可能会减少分配，因为它们提前了解迭代器产生的元素数量信息。
20 | 
21 | [`Iterator::size_hint`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint
22 | [`ExactSizeIterator::len`]: https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#method.len
23 | 
24 | ## Chaining
25 | 
26 | [`chain`]可以非常方便，但也可能比单个迭代器慢。如果可能的话，热迭代器可能值得避免。
27 | [**Example**](https://github.com/rust-lang/rust/pull/64801/commits/5ca99b750e455e9b5e13e83d0d7886486231e48a).
28 | 
29 | 类似地，[`filter_map`]可能比使用[`filter`]和[`map`]更快。
30 | 
31 | [`chain`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.chain
32 | [`filter_map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter_map
33 | [`filter`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter
34 | [`map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map
35 | 
36 | ## Chunks
37 | 
38 | 当需要一个分块迭代器，并且已知分块大小恰好能整除切片长度时，应该使用更快的 [`slice::chunks_exact`] 而不是 [`slice::chunks`]。
39 | 
40 | 当分块大小不确定能否恰好整除切片长度时，仍然可以更快地使用 `slice::chunks_exact`，结合 [`ChunksExact::remainder`] 或手动处理多余元素。
41 | [**Example 1**](https://github.com/johannesvollmer/exrs/pull/173/files),
42 | [**Example 2**](https://github.com/johannesvollmer/exrs/pull/175/files).
43 | 
44 | 同样适用于相关的迭代器:
45 | - [`slice::rchunks`], [`slice::rchunks_exact`], and [`RChunksExact::remainder`];
46 | - [`slice::chunks_mut`], [`slice::chunks_exact_mut`], and [`ChunksExactMut::into_remainder`];
47 | - [`slice::rchunks_mut`], [`slice::rchunks_exact_mut`], and [`RChunksExactMut::into_remainder`].
48 | 
49 | [`slice::chunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks
50 | [`slice::chunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact
51 | [`ChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExact.html#method.remainder
52 | 
53 | [`slice::rchunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks
54 | [`slice::rchunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact
55 | [`RChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExact.html#method.remainder
56 | 
57 | [`slice::chunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_mut
58 | [`slice::chunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact_mut
59 | [`ChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExactMut.html#method.into_remainder
60 | 
61 | [`slice::rchunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_mut
62 | [`slice::rchunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact_mut
63 | [`RChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExactMut.html#method.into_remainder
64 | 
65 | 


--------------------------------------------------------------------------------
/src/linting.md:
--------------------------------------------------------------------------------
 1 | # Linting
 2 | 
 3 | [Clippy] is a collection of lints to catch common mistakes in Rust code. It is
 4 | an excellent tool to run on Rust code in general. It can also help with
 5 | performance, because a number of the lints relate to code patterns that can
 6 | cause sub-optimal performance.
 7 | 
 8 | Given that automated detection of problems is preferable to manual detection,
 9 | the rest of this book will not mention performance problems that Clippy detects
10 | by default.
11 | 
12 | ## Basics
13 | 
14 | [Clippy]: https://github.com/rust-lang/rust-clippy
15 | 
16 | Once installed, it is easy to run:
17 | ```text
18 | cargo clippy
19 | ```
20 | The full list of performance lints can be seen by visiting the [lint list] and
21 | deselecting all the lint groups except for "Perf". 
22 | 
23 | [lint list]: https://rust-lang.github.io/rust-clippy/master/
24 | 
25 | As well as making the code faster, the performance lint suggestions usually
26 | result in code that is simpler and more idiomatic, so they are worth following
27 | even for code that is not executed frequently.
28 | 
29 | Conversely, some non-performance lint suggestions can improve performance. For
30 | example, the [`ptr_arg`] style lint suggests changing various container
31 | arguments to slices, such as changing `&mut Vec<T>` arguments to `&mut [T]`.
32 | The primary motivation here is that a slice gives a more flexible API, but it
33 | may also result in faster code due to less indirection and better optimization
34 | opportunities for the compiler.
35 | [**Example**](https://github.com/fschutt/fastblur/pull/3/files).
36 | 
37 | [`ptr_arg`]: https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
38 | 
39 | ## Disallowing Types
40 | 
41 | In the following chapters we will see that it is sometimes worth avoiding
42 | certain standard library types in favour of alternatives that are faster. If
43 | you decide to use these alternatives, it is easy to accidentally use the
44 | standard library types in some places by mistake.
45 | 
46 | You can use Clippy's [`disallowed_types`] lint to avoid this problem. For
47 | example, to disallow the use of the standard hash tables (for reasons explained
48 | in the [Hashing] section) add a `clippy.toml` file to your code with the
49 | following line.
50 | ```toml
51 | disallowed-types = ["std::collections::HashMap", "std::collections::HashSet"]
52 | ```
53 | 
54 | [Hashing]: hashing.md
55 | [`disallowed_types`]: https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_types
56 | 


--------------------------------------------------------------------------------
/src/linting_zh.md:
--------------------------------------------------------------------------------
 1 | # Linting
 2 | 
 3 | [Clippy]是一个用来捕捉Rust代码中常见错误的lints集合。它是运行在Rust代码上的一个优秀工具。它还可以帮助提高性能，因为许多lints与可能导致次优性能的代码模式有关。
 4 | 
 5 | [Clippy]: https://github.com/rust-lang/rust-clippy
 6 | 
 7 | 鉴于自动检测问题优于手动检测问题，本书的其余部分将不会提及 Clippy 默认检测到的性能问题。
 8 | 
 9 | ## 基础
10 | 
11 | Clippy 是一个 Rust 的 lint 工具，用于静态代码分析。安装后，可以通过以下命令轻松运行：
12 | 
13 | ```text
14 | cargo clippy
15 | ```
16 | 
17 | 可以通过访问 [lint list] 并取消选择除了 "Perf" 之外的所有 lint 组，查看完整的性能 lint 列表。
18 | 
19 | [lint list]: https://rust-lang.github.io/rust-clippy/master/
20 | 
21 | 除了使代码更快之外，性能 lint 建议通常会导致更简单、更符合惯例的代码，因此即使对于不经常执行的代码，也值得遵循这些建议。
22 | 
23 | 相反，一些非性能 lint 建议可能会提高性能。例如，[`ptr_arg`] 风格 lint 建议将各种容器参数更改为切片，例如将 `&mut Vec<T>` 参数更改为 `&mut [T]`。这里的主要动机是切片提供了更灵活的 API，但也可能由于减少间接性和为编译器提供更好的优化机会而导致更快的代码。
24 | [**Example**](https://github.com/fschutt/fastblur/pull/3/files).
25 | 
26 | [`ptr_arg`]: https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
27 | 
28 | ## Disallowing Types
29 | 
30 | 在接下来的章节中，我们将看到有时候值得避免使用某些标准库类型，而选择更快的替代方案。如果你决定使用这些替代方案，很容易在某些地方意外地错误使用标准库类型。
31 | 
32 | 你可以使用 Clippy 的 [`disallowed_types`] lint 来避免这个问题。例如，为了禁止使用标准哈希表（原因在 [Hashing] 部分有解释），可以在你的代码中添加一个 `clippy.toml` 文件，并包含以下行。
33 | 
34 | ```toml
35 | disallowed-types = ["std::collections::HashMap", "std::collections::HashSet"]
36 | ```
37 | 
38 | [Hashing]: hashing.md
39 | [`disallowed_types`]: https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_types


--------------------------------------------------------------------------------
/src/logging-and-debugging.md:
--------------------------------------------------------------------------------
 1 | # Logging and Debugging
 2 | 
 3 | Sometimes logging code or debugging code can slow down a program significantly.
 4 | Either the logging/debugging code itself is slow, or data collection code that
 5 | feeds into logging/debugging code is slow. Make sure that no unnecessary work
 6 | is done for logging/debugging purposes when logging/debugging is not enabled.
 7 | [**Example 1**](https://github.com/rust-lang/rust/pull/50246/commits/2e4f66a86f7baa5644d18bb2adc07a8cd1c7409d),
 8 | [**Example 2**](https://github.com/rust-lang/rust/pull/75133/commits/eeb4b83289e09956e0dda174047729ca87c709fe).
 9 | 
10 | Note that [`assert!`] calls always run, but [`debug_assert!`] calls only run in
11 | dev builds. If you have an assertion that is hot but is not necessary for
12 | safety, consider making it a `debug_assert!`.
13 | [**Example 1**](https://github.com/rust-lang/rust/pull/58210/commits/f7ed6e18160bc8fccf27a73c05f3935c9e8f672e),
14 | [**Example 2**](https://github.com/rust-lang/rust/pull/90746/commits/580d357b5adef605fc731d295ca53ab8532e26fb).
15 | 
16 | [`assert!`]: https://doc.rust-lang.org/std/macro.assert.html
17 | [`debug_assert!`]: https://doc.rust-lang.org/std/macro.debug_assert.html
18 | 


--------------------------------------------------------------------------------
/src/logging-and-debugging_zh.md:
--------------------------------------------------------------------------------
 1 | # Logging and Debugging
 2 | 
 3 | 有时，日志代码或调试代码会大大降低程序的速度。要么是日志记录/调试代码本身很慢，要么是反馈到日志记录/调试代码的数据收集代码很慢。确保在不启用日志记录/调试时，不为日志记录/调试目的做不必要的工作。
 4 | [**Example 1**](https://github.com/rust-lang/rust/pull/50246/commits/2e4f66a86f7baa5644d18bb2adc07a8cd1c7409d),
 5 | [**Example 2**](https://github.com/rust-lang/rust/pull/75133/commits/eeb4b83289e09956e0dda174047729ca87c709fe).
 6 | 
 7 | 请注意，[`assert!`]调用总是运行，但[`debug_assert!`]调用只在调试构建中运行。如果你有一个热的断言，但对安全来说不是必需的，可以考虑把它变成一个`debug_assert!`。
 8 | [**Example**](https://github.com/rust-lang/rust/pull/58210/commits/f7ed6e18160bc8fccf27a73c05f3935c9e8f672e).
 9 | 
10 | [`assert!`]: https://doc.rust-lang.org/std/macro.assert.html
11 | [`debug_assert!`]: https://doc.rust-lang.org/std/macro.debug_assert.html
12 | 


--------------------------------------------------------------------------------
/src/machine-code.md:
--------------------------------------------------------------------------------
 1 | # Machine Code
 2 | 
 3 | When you have a small piece of very hot code it may be worth inspecting the
 4 | generated machine code to see if it has any inefficiencies, such as removable
 5 | [bounds checks]. The [Compiler Explorer] website is an excellent resource when
 6 | doing this on small snippets. [`cargo-show-asm`] is an alternative tool that
 7 | can be used on full Rust projects.
 8 | 
 9 | [bounds checks]: bounds-checks.md
10 | [Compiler Explorer]: https://godbolt.org/
11 | [`cargo-show-asm`]: https://github.com/pacak/cargo-show-asm
12 | 
13 | Relatedly, the [`core::arch`] module provides access to architecture-specific
14 | intrinsics, many of which relate to SIMD instructions.
15 | 
16 | [`core::arch`]: https://doc.rust-lang.org/core/arch/index.html
17 | 


--------------------------------------------------------------------------------
/src/machine-code_zh.md:
--------------------------------------------------------------------------------
 1 | # Machine Code
 2 | 
 3 | 当你有一小段非常频繁执行的热点代码时，值得检查生成的机器代码，看看是否存在一些低效之处，比如可移除的[边界检查]。在处理小片段时，[Compiler Explorer] 网站是一个很好的资源。而 [`cargo-show-asm`] 则是另一个工具，可以用于完整的 Rust 项目。
 4 | 
 5 | [边界检查]: https://en.wikipedia.org/wiki/Bounds_checking
 6 | [Compiler Explorer]: https://godbolt.org/
 7 | [`cargo-show-asm`]: https://github.com/thephoeron/cargo-show-asm
 8 | 
 9 | 与此相关的是，[`core::arch`]模块提供了对特定架构的固有知识的访问，其中许多与SIMD指令有关。
10 | 
11 | [`core::arch`]: https://doc.rust-lang.org/core/arch/index.html
12 | 
13 | 
14 | 


--------------------------------------------------------------------------------
/src/parallelism.md:
--------------------------------------------------------------------------------
 1 | # Parallelism
 2 | 
 3 | Rust provides excellent support for safe parallel programming, which can lead
 4 | to large performance improvements. There are a variety of ways to introduce
 5 | parallelism into a program and the best way for any program will depend greatly
 6 | on its design. 
 7 | 
 8 | An in-depth treatment of parallelism is beyond the scope of this book. If you
 9 | are interested in this topic, the documentation for the [`rayon`] and
10 | [`crossbeam`] crates is a good place to start.
11 | 
12 | [`rayon`]: https://crates.io/crates/rayon
13 | [`crossbeam`]: https://crates.io/crates/crossbeam
14 | 
15 | 


--------------------------------------------------------------------------------
/src/parallelism_zh.md:
--------------------------------------------------------------------------------
 1 | # Parallelism
 2 | 
 3 | Rust为安全的并行编程提供了很好的支持，这可以带来很大的性能提升。有多种方法可以将并行性引入到程序中，对于任何程序来说，最好的方式在很大程度上取决于其设计。
 4 | 
 5 | 对并行性的深入研究超出了本书的范围。如果你对这一主题感兴趣，[rayon]和[crossbeam]板条箱的文档是一个很好的开始。
 6 | 
 7 | [`rayon`]: https://crates.io/crates/rayon
 8 | [`crossbeam`]: https://crates.io/crates/crossbeam
 9 | 
10 | 


--------------------------------------------------------------------------------
/src/profiling.md:
--------------------------------------------------------------------------------
  1 | # Profiling
  2 | 
  3 | When optimizing a program, you also need a way to determine which parts of the
  4 | program are "hot" (executed frequently enough to affect runtime) and worth
  5 | modifying. This is best done via profiling.
  6 | 
  7 | ## Profilers
  8 | 
  9 | There are many different profilers available, each with their strengths and
 10 | weaknesses. The following is an incomplete list of profilers that have been
 11 | used successfully on Rust programs.
 12 | - [perf] is a general-purpose profiler that uses hardware performance counters.
 13 |   [Hotspot] and [Firefox Profiler] are good for viewing data recorded by perf.
 14 |   It works on Linux.
 15 | - [Instruments] is a general-purpose profiler that comes with Xcode on macOS.
 16 | - [Intel VTune Profiler] is a general-purpose profiler. It works on Windows,
 17 |   Linux, and macOS.
 18 | - [AMD μProf] is a general-purpose profiler. It works on Windows and Linux.
 19 | - [samply] is a sampling profiler that produces profiles that can be viewed
 20 |   in the Firefox Profiler. It works on Mac and Linux.
 21 | - [flamegraph] is a Cargo command that uses perf/DTrace to profile your
 22 |   code and then displays the results in a flame graph. It works on Linux and
 23 |   all platforms that support DTrace (macOS, FreeBSD, NetBSD, and possibly
 24 |   Windows).
 25 | - [Cachegrind] & [Callgrind] give global, per-function, and per-source-line
 26 |   instruction counts and simulated cache and branch prediction data. They work
 27 |   on Linux and some other Unixes.
 28 | - [DHAT] is good for finding which parts of the code are causing a lot of
 29 |   allocations, and for giving insight into peak memory usage. It can also be
 30 |   used to identify hot calls to `memcpy`. It works on Linux and some other
 31 |   Unixes. [dhat-rs] is an experimental alternative that is a little less
 32 |   powerful and requires minor changes to your Rust program, but works on all
 33 |   platforms.
 34 | - [heaptrack] and [bytehound] are heap profiling tools. They work on Linux.
 35 | - [`counts`] supports ad hoc profiling, which combines the use of `eprintln!`
 36 |   statement with frequency-based post-processing, which is good for getting
 37 |   domain-specific insights into parts of your code. It works on all platforms.
 38 | - [Coz] performs *causal profiling* to measure optimization potential, and has
 39 |   Rust support via [coz-rs]. It works on Linux. 
 40 | 
 41 | [perf]: https://perf.wiki.kernel.org/index.php/Main_Page
 42 | [Hotspot]: https://github.com/KDAB/hotspot
 43 | [Firefox Profiler]: https://profiler.firefox.com/
 44 | [Instruments]: https://developer.apple.com/forums/tags/instruments
 45 | [Intel VTune Profiler]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html
 46 | [AMD μProf]: https://developer.amd.com/amd-uprof/
 47 | [samply]: https://github.com/mstange/samply/
 48 | [flamegraph]: https://github.com/flamegraph-rs/flamegraph
 49 | [Cachegrind]: https://www.valgrind.org/docs/manual/cg-manual.html
 50 | [Callgrind]: https://www.valgrind.org/docs/manual/cl-manual.html
 51 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html
 52 | [dhat-rs]: https://github.com/nnethercote/dhat-rs/
 53 | [heaptrack]: https://github.com/KDE/heaptrack
 54 | [bytehound]: https://github.com/koute/bytehound
 55 | [`counts`]: https://github.com/nnethercote/counts/
 56 | [Coz]: https://github.com/plasma-umass/coz
 57 | [coz-rs]: https://github.com/plasma-umass/coz/tree/master/rust
 58 | 
 59 | ## Debug Info
 60 | 
 61 | To profile a release build effectively you might need to enable source line
 62 | debug info. To do this, add the following lines to your `Cargo.toml` file:
 63 | ```toml
 64 | [profile.release]
 65 | debug = 1
 66 | ```
 67 | See the [Cargo documentation] for more details about the `debug` setting.
 68 | 
 69 | [Cargo documentation]: https://doc.rust-lang.org/cargo/reference/profiles.html#debug
 70 | 
 71 | Unfortunately, even after doing the above step you won't get detailed profiling
 72 | information for standard library code. This is because shipped versions of the
 73 | Rust standard library are not built with debug info.
 74 | 
 75 | The most reliable way around this is to build your own version of the compiler
 76 | and standard library, following [these instructions], and adding the following
 77 | lines to the `config.toml` file:
 78 |  ```toml
 79 | [rust]
 80 | debuginfo-level = 1
 81 | ```
 82 | This is a hassle, but may be worth the effort in some cases.
 83 | 
 84 | [these instructions]: https://github.com/rust-lang/rust
 85 | 
 86 | Alternatively, the unstable [build-std] feature lets you compile the standard
 87 | library as part of your program's normal compilation, with the same build
 88 | configuration. However, filenames present in the debug info for the standard
 89 | library will not point to source code files, because this feature does not also
 90 | download standard library source code. So this approach will not help with
 91 | profilers such as Cachegrind and Samply that require source code to work fully.
 92 | 
 93 | [build-std]: https://doc.rust-lang.org/cargo/reference/unstable.html#build-std
 94 | 
 95 | ## Symbol Demangling
 96 | 
 97 | Rust uses a form of name mangling to encode function names in compiled code. If
 98 | a profiler is unaware of this, its output may contain symbol names beginning
 99 | with `_ZN` or `_R`, such as `_ZN3foo3barE` or
100 | `_ZN28_$u7b$$u7b$closure$u7d$$u7d$E` or
101 | `_RMCsno73SFvQKx_1cINtB0_3StrKRe616263_E`
102 | 
103 | Names like these can be manually demangled using [`rustfilt`].
104 | 
105 | [`rustfilt`]: https://crates.io/crates/rustfilt
106 | 
107 | If you are having trouble with symbol demangling while profiling, it may be
108 | worth changing the [mangling format] from the default legacy format to the newer
109 | v0 format.
110 | 
111 | [mangling format]: https://doc.rust-lang.org/rustc/codegen-options/index.html#symbol-mangling-version
112 | 
113 | To use the v0 format from the command line, use the `-C
114 | symbol-mangling-version=v0` flag. For example:
115 | ```bash
116 | RUSTFLAGS="-C symbol-mangling-version=v0" cargo build --release
117 | ```
118 | 
119 | Alternatively, to request these instructions from a [`config.toml`] file (for
120 | one or more projects), add these lines:
121 | ```toml
122 | [build]
123 | rustflags = ["-C", "symbol-mangling-version=v0"]
124 | ```
125 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
126 | 
127 | 


--------------------------------------------------------------------------------
/src/profiling_zh.md:
--------------------------------------------------------------------------------
 1 | # Profiling
 2 | 
 3 | 在优化程序时，你还需要一种方法来确定程序的哪些部分是 "热 "的（执行频率足以影响运行时间），值得修改。这一点最好通过Profiling来完成。
 4 | 
 5 | ## Profilers
 6 | 
 7 | 有许多不同的性能分析工具可供选择，每种工具都有其优势和劣势。以下是一份未完整的性能分析工具列表，这些工具已成功用于 Rust 程序。
 8 | - [perf] 是一个使用硬件性能计数器的通用性能分析工具。[Hotspot] 和 [Firefox Profiler] 适用于查看 perf 记录的数据。它适用于 Linux。
 9 | - [Instruments] 是一个随 macOS 上的 Xcode 提供的通用性能分析工具。
10 | - [Intel VTune Profiler] 是一个通用性能分析工具。它适用于 Windows、Linux 和 macOS。
11 | - [AMD μProf] 是一个通用性能分析工具。它适用于 Windows 和 Linux。
12 | - [samply] 是一个采样性能分析工具，生成的分析结果可以在 Firefox Profiler 中查看。它适用于 Mac 和 Linux。
13 | - [flamegraph] 是一个 Cargo 命令，使用 perf/DTrace 对代码进行性能分析，然后在火焰图中显示结果。它适用于 Linux 和支持 DTrace 的所有平台（macOS、FreeBSD、NetBSD，可能还包括 Windows）。
14 | - [Cachegrind] 和 [Callgrind] 提供全局、每个函数和每个源代码行的指令计数以及模拟缓存和分支预测数据。它们适用于 Linux 和其他一些 Unix 系统。
15 | - [DHAT] 适用于找出代码中导致大量分配的部分，并提供有关峰值内存使用情况的见解。它还可以用于识别对 `memcpy` 的热调用。它适用于 Linux 和其他一些 Unix 系统。[dhat-rs] 是一个实验性的替代方案，功能稍弱，需要对 Rust 程序进行轻微更改，但适用于所有平台。
16 | - [heaptrack] 和 [bytehound] 是堆分析工具。它们适用于 Linux。
17 | - [`counts`] 支持临时性能分析，结合 `eprintln!` 语句和基于频率的后处理，适用于获取代码部分的领域特定见解。它适用于所有平台。
18 | - [Coz] 执行*因果分析*以测量优化潜力，并通过 [coz-rs] 支持 Rust。它适用于 Linux。
19 | - 
20 | [perf]: https://perf.wiki.kernel.org/index.php/Main_Page
21 | [Hotspot]: https://github.com/KDAB/hotspot
22 | [Firefox Profiler]: https://profiler.firefox.com/
23 | [Cachegrind]: https://www.valgrind.org/docs/manual/cg-manual.html
24 | [Callgrind]: https://www.valgrind.org/docs/manual/cl-manual.html
25 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html
26 | [heaptrack]: https://github.com/KDE/heaptrack
27 | [`counts`]: https://github.com/nnethercote/counts/
28 | [Coz]: https://github.com/plasma-umass/coz
29 | [coz-rs]: https://github.com/plasma-umass/coz/tree/master/rust
30 | [flamegraph]: https://github.com/flamegraph-rs/flamegraph
31 | 
32 | 为了有效地对发布版本进行性能分析，你可能需要启用源代码行调试信息。要做到这一点，在你的 `Cargo.toml` 文件中添加以下行：
33 | ```toml
34 | [profile.release]
35 | debug = 1
36 | ```
37 | 查看 [Cargo 文档] 以获取有关 `debug` 设置的更多详细信息。
38 | 
39 | [Cargo 文档]: https://doc.rust-lang.org/cargo/reference/profiles.html#debug
40 | 
41 | 不幸的是，即使执行了上述步骤，你也无法获得标准库代码的详细性能分析信息。这是因为 Rust 标准库的发布版本未使用调试信息构建。
42 | 
43 | 最可靠的解决方法是构建自己的编译器和标准库版本，遵循 [这些说明]，并在 `config.toml` 文件中添加以下行：
44 | ```toml
45 | [rust]
46 | debuginfo-level = 1
47 | ```
48 | 这可能有些麻烦，但在某些情况下值得努力。
49 | 
50 | [这些说明]: https://github.com/rust-lang/rust
51 | 
52 | 另外，不稳定的 [build-std] 功能允许你将标准库作为程序正常编译的一部分进行编译，使用相同的构建配置。然而，标准库调试信息中存在的文件名将不指向源代码文件，因为此功能不会下载标准库源代码。因此，这种方法对于像 Cachegrind 和 Samply 这样需要源代码才能完全工作的性能分析工具并不适用。
53 | 
54 | [build-std]: https://doc.rust-lang.org/cargo/reference/unstable.html#build-std
55 | 
56 | ## Symbol Demangling
57 | 
58 | Rust在编译代码中使用了一个杂乱的方案来编码函数名。如果一个剖析器不知道这个方案，它的输出可能会包含像这样的符号名
59 | `_ZN3foo3barE`或`_ZN28_$u7b$$u7b$closure$u7d$$u7d$E`或`_ZN28_$u7b$$$u7d$E`或
60 | `_ZN88_$LT$core.result.Result$LT$$u21$$C$$u20$E$GT$u20$as$u20$std.process.Termination$GT$6report17hfc41d0da4a40b3e8E`。
61 | 像这样的名字，可以用[`rustfilt`]手动拆分。
62 | 
63 | [`rustfilt`]: https://crates.io/crates/rustfilt
64 | 
65 | 如果在进行性能分析时遇到符号解缠混淆的问题，可能值得将 [编码格式] 从默认的传统格式更改为更新的 v0 格式。
66 | 
67 | [编码格式]: https://doc.rust-lang.org/rustc/codegen-options/index.html#symbol-mangling-version
68 | 
69 | 要从命令行中使用 v0 格式，可以使用 `-C symbol-mangling-version=v0` 标志。例如：
70 | ```bash
71 | RUSTFLAGS="-C symbol-mangling-version=v0" cargo build --release
72 | ```
73 | 
74 | 另外，要从 [`config.toml`] 文件（针对一个或多个项目）请求这些说明，添加以下行：
75 | ```toml
76 | [build]
77 | rustflags = ["-C", "symbol-mangling-version=v0"]
78 | ```
79 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html


--------------------------------------------------------------------------------
/src/standard-library-types.md:
--------------------------------------------------------------------------------
  1 | # Standard Library Types
  2 | 
  3 | It is worth reading through the documentation for common standard library
  4 | types—such as [`Box`], [`Vec`], [`Option`], [`Result`], and [`Rc`]/[`Arc`]—to find interesting
  5 | functions that can sometimes be used to improve performance.
  6 | 
  7 | [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html
  8 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html
  9 | [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html
 10 | [`Result`]: https://doc.rust-lang.org/std/result/enum.Result.html
 11 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html
 12 | [`Arc`]: https://doc.rust-lang.org/std/sync/struct.Arc.html
 13 | 
 14 | It is also worth knowing about high-performance alternatives to standard
 15 | library types, such as [`Mutex`], [`RwLock`], [`Condvar`], and
 16 | [`Once`].
 17 | 
 18 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html
 19 | [`RwLock`]: https://doc.rust-lang.org/std/sync/struct.RwLock.html
 20 | [`Condvar`]: https://doc.rust-lang.org/std/sync/struct.Condvar.html
 21 | [`Once`]: https://doc.rust-lang.org/std/sync/struct.Once.html
 22 | 
 23 | ## `Box`
 24 | 
 25 | The expression [`Box::default()`] has the same effect as
 26 | `Box::new(T::default())` but can be faster because the compiler can create the
 27 | value directly on the heap, rather than constructing it on the stack and then
 28 | copying it over.
 29 | [**Example**](https://github.com/komora-io/art/commit/d5dc58338f475709c375e15976d0d77eb5d7f7ef).
 30 | 
 31 | [`Box::default()`]: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.default
 32 | 
 33 | ## `Vec`
 34 | 
 35 | The best way to create a zero-filled `Vec` of length `n` is with `vec![0; n]`.
 36 | This is simple and probably [as fast or faster] than alternatives, such as
 37 | using `resize`, `extend`, or anything involving `unsafe`, because it can use OS
 38 | assistance.
 39 | 
 40 | [as fast or faster]: https://github.com/rust-lang/rust/issues/54628
 41 | 
 42 | [`Vec::remove`] removes an element at a particular index and shifts all
 43 | subsequent elements one to the left, which makes it O(n). [`Vec::swap_remove`]
 44 | replaces an element at a particular index with the final element, which does
 45 | not preserve ordering, but is O(1).
 46 | 
 47 | [`Vec::retain`] efficiently removes multiple items from a `Vec`. There is an
 48 | equivalent method for other collection types such as `String`, `HashSet`, and
 49 | `HashMap`.
 50 | 
 51 | [`Vec::remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.remove
 52 | [`Vec::swap_remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.swap_remove
 53 | [`Vec::retain`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain
 54 | 
 55 | ## `Option` and `Result`
 56 | 
 57 | [`Option::ok_or`] converts an `Option` into a `Result`, and is passed an `err`
 58 | parameter that is used if the `Option` value is `None`. `err` is computed
 59 | eagerly. If its computation is expensive, you should instead use
 60 | [`Option::ok_or_else`], which computes the error value lazily via a closure.
 61 | For example, this:
 62 | ```rust
 63 | # fn expensive() {}
 64 | # let o: Option<u32> = None;
 65 | let r = o.ok_or(expensive()); // always evaluates `expensive()`
 66 | ```
 67 | should be changed to this:
 68 | ```rust
 69 | # fn expensive() {}
 70 | # let o: Option<u32> = None;
 71 | let r = o.ok_or_else(|| expensive()); // evaluates `expensive()` only when needed
 72 | ```
 73 | [**Example**](https://github.com/rust-lang/rust/pull/50051/commits/5070dea2366104fb0b5c344ce7f2a5cf8af176b0).
 74 | 
 75 | [`Option::ok_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or
 76 | [`Option::ok_or_else`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or_else
 77 | 
 78 | There are similar alternatives for [`Option::map_or`], [`Option::unwrap_or`],
 79 | [`Result::or`], [`Result::map_or`], and [`Result::unwrap_or`].
 80 | 
 81 | [`Option::map_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or
 82 | [`Option::unwrap_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.unwrap_or
 83 | [`Result::or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.or
 84 | [`Result::map_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_or
 85 | [`Result::unwrap_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or
 86 | 
 87 | ## `Rc`/`Arc`
 88 | 
 89 | [`Rc::make_mut`]/[`Arc::make_mut`] provide clone-on-write semantics. They make
 90 | a mutable reference to an `Rc`/`Arc`. If the refcount is greater than one, they
 91 | will `clone` the inner value to ensure unique ownership; otherwise, they will
 92 | modify the original value. They are not needed often, but they can be extremely
 93 | useful on occasion.
 94 | [**Example 1**](https://github.com/rust-lang/rust/pull/65198/commits/3832a634d3aa6a7c60448906e6656a22f7e35628),
 95 | [**Example 2**](https://github.com/rust-lang/rust/pull/65198/commits/75e0078a1703448a19e25eac85daaa5a4e6e68ac).
 96 | 
 97 | [`Rc::make_mut`]: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.make_mut
 98 | [`Arc::make_mut`]: https://doc.rust-lang.org/std/sync/struct.Arc.html#method.make_mut
 99 | 
100 | ## `Mutex`, `RwLock`, `Condvar`, and `Once`
101 | 
102 | The [`parking_lot`] crate provides alternative implementations of these
103 | synchronization types. The APIs and semantics of the `parking_lot` types are
104 | similar but not identical to those of the equivalent types in the standard
105 | library.
106 | 
107 | The `parking_lot` versions used to be reliably smaller, faster, and more
108 | flexible than those in the standard library, but the standard library versions
109 | have greatly improved on some platforms. So you should measure before switching
110 | to `parking_lot`. 
111 | 
112 | [`parking_lot`]: https://crates.io/crates/parking_lot
113 | 
114 | If you decide to universally use the `parking_lot` types it is easy to
115 | accidentally use the standard library equivalents in some places. You can [use
116 | Clippy] to avoid this problem.
117 | 
118 | [use Clippy]: linting.md#disallowing-types
119 | 


--------------------------------------------------------------------------------
/src/standard-library-types_zh.md:
--------------------------------------------------------------------------------
 1 | # Standard Library Types
 2 | 
 3 | 值得阅读常见标准库类型的文档--如[`Vec`]、[`Option`]、[`Result`]和[`Rc`]--以找到有趣的函数，有时可以用来提高性能。
 4 | 
 5 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html
 6 | [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html
 7 | [`Result`]: https://doc.rust-lang.org/std/result/enum.Result.html
 8 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html
 9 | 
10 | 还值得了解标准库类型的高性能替代品，如[`Mutex`]、[`RwLock`]、[`Condvar`]和[`Once`]。
11 | 
12 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html
13 | [`RwLock`]: https://doc.rust-lang.org/std/sync/struct.RwLock.html
14 | [`Condvar`]: https://doc.rust-lang.org/std/sync/struct.Condvar.html
15 | [`Once`]: https://doc.rust-lang.org/std/sync/struct.Once.html
16 | 
17 | ## `Box`
18 | 表达式 [`Box::default()`] 的效果与 `Box::new(T::default())` 相同，但可能更快，因为编译器可以直接在堆上创建值，而不是在堆栈上构造值然后复制它。
19 | [**示例**](https://github.com/komora-io/art/commit/d5dc58338f475709c375e15976d0d77eb5d7f7ef)。
20 | 
21 | [`Box::default()`]: https://doc.rust-lang.org/std/boxed/struct.Box.html#method.default
22 | 
23 | ## `Vec`
24 | 
25 | 创建长度为 `n` 的零填充 `Vec` 的最佳方法是使用 `vec![0; n]`。这种方法简单且可能比其他方法更快，比如使用 `resize`、`extend` 或涉及 `unsafe` 的任何操作，因为它可以利用操作系统的帮助。
26 | 
27 | [`Vec::remove`] 会移除特定索引处的元素，并将所有后续元素向左移动一个位置，这使得它的时间复杂度为 O(n)。[`Vec::swap_remove`] 会用最后一个元素替换特定索引处的元素，这不会保留顺序，但时间复杂度为 O(1)。
28 | 
29 | [`Vec::retain`] 可以高效地从 `Vec` 中移除多个项。其他集合类型如 `String`、`HashSet` 和 `HashMap` 也有类似的方法。
30 | 
31 | [`Vec::remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.remove
32 | [`Vec::swap_remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.swap_remove
33 | [`Vec::retain`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain
34 | 
35 | ## `Option` and `Result`
36 | 
37 | [`Option::ok_or']将`Option'转换为`Result'，并传递一个`err'参数，如果`Option'值为`None'，则使用该参数。`err`是急于计算的。如果它的计算很昂贵，你应该使用[`Option::ok_or_else`]，它通过一个闭包缓慢地计算错误值。
38 | 例如，这个。
39 | ```rust
40 | # fn expensive() {}
41 | # let o: Option<u32> = None;
42 | let r = o.ok_or(expensive()); // always evaluates `expensive()`
43 | ```
44 | should be changed to this:
45 | ```rust
46 | # fn expensive() {}
47 | # let o: Option<u32> = None;
48 | let r = o.ok_or_else(|| expensive()); // evaluates `expensive()` only when needed
49 | ```
50 | [**Example**](https://github.com/rust-lang/rust/pull/50051/commits/5070dea2366104fb0b5c344ce7f2a5cf8af176b0).
51 | 
52 | [`Option::ok_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or
53 | [`Option::ok_or_else`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or_else
54 | 
55 | [`Option::map_or`]、[`Option::unwrap_or`]、[`Result::or`]、[`Result::map_or`]和[`Result::unwrap_or`]有类似的替代方案。
56 | 
57 | [`Option::map_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or
58 | [`Option::unwrap_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.unwrap_or
59 | [`Result::or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.or
60 | [`Result::map_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_or
61 | [`Result::unwrap_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or
62 | 
63 | ## `Rc`/`Arc`
64 | 
65 | [`Rc::make_mut`]/[`Arc::make_mut`]提供了clone-on-write语义。它对`Rc`做了一个可改变的引用。如果refcount大于1，它将`clone`内部值以确保唯一的所有权；否则，它将修改原始值。它不经常需要，但偶尔会非常有用。
66 | [**Example 1**](https://github.com/rust-lang/rust/pull/65198/commits/3832a634d3aa6a7c60448906e6656a22f7e35628),
67 | [**Example 2**](https://github.com/rust-lang/rust/pull/65198/commits/75e0078a1703448a19e25eac85daaa5a4e6e68ac).
68 | 
69 | [`Rc::make_mut`]: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.make_mut
70 | [`Arc::make_mut`]: https://doc.rust-lang.org/std/sync/struct.Arc.html#method.make_mut
71 | 
72 | ## `Mutex`, `RwLock`, `Condvar`, and `Once`
73 | 
74 | [`parking_lot`] crate提供了这些同步类型的替代实现。`parking_lot` 类型的API和语义与标准库中等效类型的类似但并非完全相同。
75 | 
76 | 过去，`parking_lot` 版本通常比标准库中的版本更小、更快、更灵活，但在某些平台上，标准库版本已经有了很大改进。因此，在切换到 `parking_lot` 之前，您应该进行测量。
77 | 
78 | [`parking_lot`]: https://crates.io/crates/parking_lot
79 | 
80 | 如果决定普遍使用 `parking_lot` 类型，很容易在某些地方意外地使用标准库的等效类型。您可以使用 [Clippy] 来避免这个问题。
81 | 
82 | [Clippy]: linting.md#disallowing-types
83 | 


--------------------------------------------------------------------------------
/src/title-page.md:
--------------------------------------------------------------------------------
1 | # <span style="font-size: 150%">RUST性能手册</span>
2 | 
3 | **<span style="font-size: 130%">First published in November 2020</span>**
4 | 
5 | **<span style="font-size: 130%">Written by Nicholas Nethercote and others</span>**
6 | 
7 | **Chinese translated by Blues-star**
8 | 


--------------------------------------------------------------------------------
/src/type-sizes.md:
--------------------------------------------------------------------------------
  1 | # Type Sizes
  2 | 
  3 | Shrinking oft-instantiated types can help performance.
  4 | 
  5 | For example, if memory usage is high, a heap profiler like [DHAT] can identify
  6 | the hot allocation points and the types involved. Shrinking these types can
  7 | reduce peak memory usage, and possibly improve performance by reducing memory
  8 | traffic and cache pressure.
  9 | 
 10 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html
 11 | 
 12 | Furthermore, Rust types that are larger than 128 bytes are copied with `memcpy`
 13 | rather than inline code. If `memcpy` shows up in non-trivial amounts in
 14 | profiles, DHAT's "copy profiling" mode will tell you exactly where the hot
 15 | `memcpy` calls are and the types involved. Shrinking these types to 128 bytes
 16 | or less can make the code faster by avoiding `memcpy` calls and reducing memory
 17 | traffic.
 18 | 
 19 | ## Measuring Type Sizes
 20 | 
 21 | [`std::mem::size_of`] gives the size of a type, in bytes, but often you want to
 22 | know the exact layout as well. For example, an enum might be surprisingly large
 23 | due to a single outsized variant.
 24 | 
 25 | [`std::mem::size_of`]: https://doc.rust-lang.org/std/mem/fn.size_of.html
 26 | 
 27 | The `-Zprint-type-sizes` option does exactly this. It isn’t enabled on release
 28 | versions of rustc, so you’ll need to use a nightly version of rustc. Here is
 29 | one possible invocation via Cargo:
 30 | ```text
 31 | RUSTFLAGS=-Zprint-type-sizes cargo +nightly build --release
 32 | ```
 33 | And here is a possible invocation of rustc:
 34 | ```text
 35 | rustc +nightly -Zprint-type-sizes input.rs
 36 | ```
 37 | It will print out details of the size, layout, and alignment of all types in
 38 | use. For example, for this type:
 39 | ```rust
 40 | enum E {
 41 |     A,
 42 |     B(i32),
 43 |     C(u64, u8, u64, u8),
 44 |     D(Vec<u32>),
 45 | }
 46 | ```
 47 | it prints the following, plus information about a few built-in types.
 48 | ```text
 49 | print-type-size type: `E`: 32 bytes, alignment: 8 bytes
 50 | print-type-size     discriminant: 1 bytes
 51 | print-type-size     variant `D`: 31 bytes
 52 | print-type-size         padding: 7 bytes
 53 | print-type-size         field `.0`: 24 bytes, alignment: 8 bytes
 54 | print-type-size     variant `C`: 23 bytes
 55 | print-type-size         field `.1`: 1 bytes
 56 | print-type-size         field `.3`: 1 bytes
 57 | print-type-size         padding: 5 bytes
 58 | print-type-size         field `.0`: 8 bytes, alignment: 8 bytes
 59 | print-type-size         field `.2`: 8 bytes
 60 | print-type-size     variant `B`: 7 bytes
 61 | print-type-size         padding: 3 bytes
 62 | print-type-size         field `.0`: 4 bytes, alignment: 4 bytes
 63 | print-type-size     variant `A`: 0 bytes
 64 | ```
 65 | The output shows the following.
 66 | - The size and alignment of the type.
 67 | - For enums, the size of the discriminant.
 68 | - For enums, the size of each variant (sorted from largest to smallest).
 69 | - The size, alignment, and ordering of all fields. (Note that the compiler has
 70 |   reordered variant `C`'s fields to minimize the size of `E`.)
 71 | - The size and location of all padding.
 72 | 
 73 | Alternatively, the [top-type-sizes] crate can be used to display the output in
 74 | a more compact form.
 75 | 
 76 | [top-type-sizes]: https://crates.io/crates/top-type-sizes
 77 | 
 78 | Once you know the layout of a hot type, there are multiple ways to shrink it.
 79 | 
 80 | ## Field Ordering
 81 | 
 82 | The Rust compiler automatically sorts the fields in struct and enums to
 83 | minimize their sizes (unless the `#[repr(C)]` attribute is specified), so you
 84 | do not have to worry about field ordering. But there are other ways to minimize
 85 | the size of hot types.
 86 | 
 87 | ## Smaller Enums
 88 | 
 89 | If an enum has an outsized variant, consider boxing one or more fields. For
 90 | example, you could change this type:
 91 | ```rust
 92 | type LargeType = [u8; 100];
 93 | enum A {
 94 |     X,
 95 |     Y(i32),
 96 |     Z(i32, LargeType),
 97 | }
 98 | ```
 99 | to this:
100 | ```rust
101 | # type LargeType = [u8; 100];
102 | enum A {
103 |     X,
104 |     Y(i32),
105 |     Z(Box<(i32, LargeType)>),
106 | }
107 | ```
108 | This reduces the type size at the cost of requiring an extra heap allocation
109 | for the `A::Z` variant. This is more likely to be a net performance win if the
110 | `A::Z` variant is relatively rare. The `Box` will also make `A::Z` slightly
111 | less ergonomic to use, especially in `match` patterns.
112 | [**Example 1**](https://github.com/rust-lang/rust/pull/37445/commits/a920e355ea837a950b484b5791051337cd371f5d),
113 | [**Example 2**](https://github.com/rust-lang/rust/pull/55346/commits/38d9277a77e982e49df07725b62b21c423b6428e),
114 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/b972ac818c98373b6d045956b049dc34932c41be),
115 | [**Example 4**](https://github.com/rust-lang/rust/pull/64374/commits/2fcd870711ce267c79408ec631f7eba8e0afcdf6),
116 | [**Example 5**](https://github.com/rust-lang/rust/pull/64394/commits/7f0637da5144c7435e88ea3805021882f077d50c),
117 | [**Example 6**](https://github.com/rust-lang/rust/pull/71942/commits/27ae2f0d60d9201133e1f9ec7a04c05c8e55e665).
118 | 
119 | ## Smaller Integers
120 | 
121 | It is often possible to shrink types by using smaller integer types. For
122 | example, while it is most natural to use `usize` for indices, it is often
123 | reasonable to stores indices as `u32`, `u16`, or even `u8`, and then coerce to
124 | `usize` at use points.
125 | [**Example 1**](https://github.com/rust-lang/rust/pull/49993/commits/4d34bfd00a57f8a8bdb60ec3f908c5d4256f8a9a),
126 | [**Example 2**](https://github.com/rust-lang/rust/pull/50981/commits/8d0fad5d3832c6c1f14542ea0be038274e454524).
127 | 
128 | ## Boxed Slices
129 | 
130 | Rust vectors contain three words: a length, a capacity, and a pointer. If you
131 | have a vector that is unlikely to be changed in the future, you can convert it
132 | to a *boxed slice* with [`Vec::into_boxed_slice`]. A boxed slice contains only
133 | two words, a length and a pointer. Any excess element capacity is dropped,
134 | which may cause a reallocation.
135 | ```rust
136 | # use std::mem::{size_of, size_of_val};
137 | let v: Vec<u32> = vec![1, 2, 3];
138 | assert_eq!(size_of_val(&v), 3 * size_of::<usize>());
139 | 
140 | let bs: Box<[u32]> = v.into_boxed_slice();
141 | assert_eq!(size_of_val(&bs), 2 * size_of::<usize>());
142 | ```
143 | The boxed slice can be converted back to a vector with [`slice::into_vec`]
144 | without any cloning or a reallocation.
145 | 
146 | [`Vec::into_boxed_slice`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.into_boxed_slice
147 | [`slice::into_vec`]: https://doc.rust-lang.org/std/primitive.slice.html#method.into_vec
148 | 
149 | ## `ThinVec`
150 | 
151 | An alternative to boxed slices is `ThinVec`, from the [`thin_vec`] crate. It is
152 | functionally equivalent to `Vec`, but stores the length and capacity in the
153 | same allocation as the elements (if there are any). This means that
154 | `size_of::<ThinVec<T>>` is only one word.
155 | 
156 | `ThinVec` is a good choice within oft-instantiated types for vectors that are
157 | often empty. It can also be used to shrink the largest variant of an enum, if
158 | that variant contains a `Vec`.
159 | 
160 | [`thin_vec`]: https://crates.io/crates/thin-vec
161 | 
162 | ## Avoiding Regressions
163 | 
164 | If a type is hot enough that its size can affect performance, it is a good idea
165 | to use a static assertion to ensure that it does not accidentally regress. The
166 | following example uses a macro from the [`static_assertions`] crate.
167 | ```rust,ignore
168 |   // This type is used a lot. Make sure it doesn't unintentionally get bigger.
169 |   #[cfg(target_arch = "x86_64")]
170 |   static_assertions::assert_eq_size!(HotType, [u8; 64]);
171 | ```
172 | The `cfg` attribute is important, because type sizes can vary on different
173 | platforms. Restricting the assertion to `x86_64` (which is typically the most
174 | widely-used platform) is likely to be good enough to prevent regressions in
175 | practice.
176 | 
177 | [`static_assertions`]: https://crates.io/crates/static_assertions
178 | 


--------------------------------------------------------------------------------
/src/type-sizes_zh.md:
--------------------------------------------------------------------------------
  1 | # 类型大小
  2 | 
  3 | 缩小经常实例化的类型可以提高性能。
  4 | 
  5 | 例如，如果内存使用量很高，像 [DHAT] 这样的堆分析器可以识别热点分配点和涉及的类型。缩小这些类型可以减少峰值内存使用量，并通过减少内存流量和缓存压力可能改善性能。
  6 | 
  7 | 此外，Rust 中大于 128 字节的类型会使用 `memcpy` 进行复制，而不是内联代码。如果在性能分析中出现大量 `memcpy`，DHAT 的 "copy profiling" 模式将告诉您热点 `memcpy` 调用的确切位置和涉及的类型。将这些类型缩小到 128 字节或更小可以通过避免 `memcpy` 调用和减少内存流量使代码更快。
  8 | 
  9 | ## 测量类型大小
 10 | 
 11 | [`std::mem::size_of`]给出了一个类型的大小，以字节为单位，但通常你也想知道确切的布局。例如，一个枚举可能会出乎意料的大，这可能是由一个超大的变体造成的。
 12 | 
 13 | [`std::mem::size_of`]: https://doc.rust-lang.org/std/mem/fn.size_of.html
 14 | 
 15 | `-Zprint-type-sizes`选项正是这样做的，它在rustc的发行版上没有被启用，所以你需要使用`rustc`的夜间版本。 下面是一个通过`Cargo`的可能调用
 16 | ```text
 17 | RUSTFLAGS=-Zprint-type-sizes cargo +nightly build --release
 18 | ```
 19 | 而这里是一个`rustc`的可能调用
 20 | ```text
 21 | rustc +nightly -Zprint-type-sizes input.rs
 22 | ```
 23 | 它将打印出所有使用中的类型的尺寸、布局和对齐方式的详细信息。例如，对于这种类型。
 24 | ```rust
 25 | enum E {
 26 |     A,
 27 |     B(i32),
 28 |     C(u64, u8, u64, u8),
 29 |     D(Vec<u32>),
 30 | }
 31 | ```
 32 | 它打印以下信息，以及一些内置类型的信息。
 33 | ```text
 34 | print-type-size type: `E`: 32 bytes, alignment: 8 bytes
 35 | print-type-size     discriminant: 1 bytes
 36 | print-type-size     variant `D`: 31 bytes
 37 | print-type-size         padding: 7 bytes
 38 | print-type-size         field `.0`: 24 bytes, alignment: 8 bytes
 39 | print-type-size     variant `C`: 23 bytes
 40 | print-type-size         field `.1`: 1 bytes
 41 | print-type-size         field `.3`: 1 bytes
 42 | print-type-size         padding: 5 bytes
 43 | print-type-size         field `.0`: 8 bytes, alignment: 8 bytes
 44 | print-type-size         field `.2`: 8 bytes
 45 | print-type-size     variant `B`: 7 bytes
 46 | print-type-size         padding: 3 bytes
 47 | print-type-size         field `.0`: 4 bytes, alignment: 4 bytes
 48 | print-type-size     variant `A`: 0 bytes
 49 | ```
 50 | 输出显示以下内容。
 51 | - 类型的大小和排列。
 52 | - 对于enums，判别子的大小。
 53 | - 对于enums，每个变量的大小（从最大到最小排序）。
 54 | - 所有字段的大小、对齐和排序。(请注意，编译器对变体`C`的字段进行了重新排序，以最小化`E`的大小。)
 55 | - 所有padding的大小和位置。
 56 | 
 57 | 另外，可以使用 [top-type-sizes] crate 来以更紧凑的形式显示输出。
 58 | 
 59 | [top-type-sizes]: https://crates.io/crates/top-type-sizes
 60 | 
 61 | 一旦你知道了热型的布局，就有多种方法来收缩它。
 62 | 
 63 | ## 字段顺序
 64 | 
 65 | Rust编译器会自动对结构体和枚举的字段进行排序，以最小化它们的大小（除非指定了 `#[repr(C)]` 属性），因此您无需担心字段顺序的问题。但是，还有其他方法可以最小化热门类型的大小。
 66 | 
 67 | ## 更小的枚举
 68 | 
 69 | 如果一个枚举有一个超大的变体，可以考虑将一个或多个字段装箱。例如，你可以改变这个类型。
 70 | ```rust
 71 | type LargeType = [u8; 100];
 72 | enum A {
 73 |     X,
 74 |     Y(i32),
 75 |     Z(i32, LargeType),
 76 | }
 77 | ```
 78 | 修改为：
 79 | ```rust
 80 | # type LargeType = [u8; 100];
 81 | enum A {
 82 |     X,
 83 |     Y(i32),
 84 |     Z(Box<(i32, LargeType)>),
 85 | }
 86 | ```
 87 | 这减少了类型大小，但代价是需要为`A::Z`变体分配一个额外的堆。如果`A::Z`变体比较少见，这更有可能成为提高性能的好方法。`Box`也会使`A::Z`的使用略微不符合人的直觉，特别是在`匹配`模式中。
 88 | [**Example 1**](https://github.com/rust-lang/rust/pull/37445/commits/a920e355ea837a950b484b5791051337cd371f5d),
 89 | [**Example 2**](https://github.com/rust-lang/rust/pull/55346/commits/38d9277a77e982e49df07725b62b21c423b6428e),
 90 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/b972ac818c98373b6d045956b049dc34932c41be),
 91 | [**Example 4**](https://github.com/rust-lang/rust/pull/64374/commits/2fcd870711ce267c79408ec631f7eba8e0afcdf6),
 92 | [**Example 5**](https://github.com/rust-lang/rust/pull/64394/commits/7f0637da5144c7435e88ea3805021882f077d50c),
 93 | [**Example 6**](https://github.com/rust-lang/rust/pull/71942/commits/27ae2f0d60d9201133e1f9ec7a04c05c8e55e665).
 94 | 
 95 | ## 更小的intergers
 96 | 
 97 | 通常可以通过使用较小的整数类型来缩小类型。例如，虽然对索引使用 "usize "是最自然的，但将索引存储为 "u32"、"u16"、甚至 "u8"，然后在使用点强制使用 "usize"，往往是合理的。
 98 | [**Example 1**](https://github.com/rust-lang/rust/pull/49993/commits/4d34bfd00a57f8a8bdb60ec3f908c5d4256f8a9a),
 99 | [**Example 2**](https://github.com/rust-lang/rust/pull/50981/commits/8d0fad5d3832c6c1f14542ea0be038274e454524).
100 | 
101 | ## Boxed Slices
102 | 
103 | Rust向量包含三个词：一个长度、一个容量和一个指针。如果你有一个将来不太可能被改变的向量，你可以用[`Vec::into_boxed_slice`]把它转换为一个*boxed slice*。一个boxed slice只包含两个词，一个长度和一个指针。任何多余的元素容量都会被丢弃，这可能会导致重新分配。
104 | ```rust
105 | # use std::mem::{size_of, size_of_val};
106 | let v: Vec<u32> = vec![1, 2, 3];
107 | assert_eq!(size_of_val(&v), 3 * size_of::<usize>());
108 | 
109 | let bs: Box<[u32]> = v.into_boxed_slice();
110 | assert_eq!(size_of_val(&bs), 2 * size_of::<usize>());
111 | ```
112 | 盒状切片可以用[`slice::into_vec`]转换回一个矢量，而无需任何克隆或重新分配。
113 | 
114 | [`Vec::into_boxed_slice`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.into_boxed_slice
115 | [`slice::into_vec`]: https://doc.rust-lang.org/std/primitive.slice.html#method.into_vec
116 | 
117 | ## `ThinVec`
118 | 
119 | `ThinVec`是一个替代Boxed slices的选择，来自于`thin_vec` crate。它在功能上等同于`Vec`，但是在与元素相同的分配中存储长度和容量。这意味着`size_of::<ThinVec<T>>`只占用一个字。
120 | 
121 | 在经常实例化的类型中，`ThinVec`是一个不错的选择，适用于经常为空的向量。它还可以用于缩小枚举的最大变体，如果该变体包含一个`Vec`。
122 | 
123 | [`thin_vec`]: https://crates.io/crates/thin-vec
124 | 
125 | ## Avoiding Regressions
126 | 
127 | 如果一个类型足够热，它的大小会影响性能，那么最好使用静态断言来确保它不会意外地回归。下面的例子使用了[`static_assertions`]中的一个宏。
128 | ```rust,ignore
129 |   // This type is used a lot. Make sure it doesn't unintentionally get bigger.
130 |   #[cfg(target_arch = "x86_64")]
131 |   static_assertions::assert_eq_size!(HotType, [u8; 64]);
132 | ```
133 | `cfg`属性很重要，因为类型大小在不同的平台上会有所不同。将断言限制在 "`x86_64`"(通常是最广泛使用的平台)可能足以防止实际中的回落。
134 | 
135 | [`static_assertions`]: https://crates.io/crates/static_assertions
136 | 
137 | 


--------------------------------------------------------------------------------
/src/wrapper-types.md:
--------------------------------------------------------------------------------
 1 | # Wrapper Types
 2 | 
 3 | Rust has a variety of "wrapper" types, such as [`RefCell`] and [`Mutex`], that
 4 | provide special behavior for values. Accessing these values can take a
 5 | non-trivial amount of time. If multiple such values are typically accessed
 6 | together, it may be better to put them within a single wrapper.
 7 | 
 8 | [`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html
 9 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html
10 | 
11 | For example, a struct like this:
12 | ```rust
13 | # use std::sync::{Arc, Mutex};
14 | struct S {
15 |     x: Arc<Mutex<u32>>,
16 |     y: Arc<Mutex<u32>>,
17 | }
18 | ```
19 | may be better represented like this:
20 | ```rust
21 | # use std::sync::{Arc, Mutex};
22 | struct S {
23 |     xy: Arc<Mutex<(u32, u32)>>,
24 | }
25 | ```
26 | Whether or not this helps performance will depend on the exact access patterns
27 | of the values.
28 | [**Example**](https://github.com/rust-lang/rust/pull/68694/commits/7426853ba255940b880f2e7f8026d60b94b42404).
29 | 


--------------------------------------------------------------------------------
/src/wrapper-types_zh.md:
--------------------------------------------------------------------------------
 1 | # Wrapper Types
 2 | 
 3 | Rust有多种 "封装 "类型，如[`RefCell`]和[`Mutex`]，它们为值提供了特殊行为。访问这些值可能会耗费大量的时间。如果多个这样的值通常是一起访问的，那么最好将它们放在一个包装器中。
 4 | 
 5 | [`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html
 6 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html
 7 | 
 8 | 例如，这样的结构。
 9 | ```rust
10 | # use std::sync::{Arc, Mutex};
11 | struct S {
12 |     x: Arc<Mutex<u32>>,
13 |     y: Arc<Mutex<u32>>,
14 | }
15 | ```
16 | 也许这样更典型。
17 | ```rust
18 | # use std::sync::{Arc, Mutex};
19 | struct S {
20 |     xy: Arc<Mutex<(u32, u32)>>,
21 | }
22 | ```
23 | 这是否有助于性能，将取决于值的具体访问模式。
24 | [**Example**](https://github.com/rust-lang/rust/pull/68694/commits/7426853ba255940b880f2e7f8026d60b94b42404).
25 | 


--------------------------------------------------------------------------------