├── .editorconfig
├── .github
    └── workflows
    │   └── ci.yml
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE-APACHE
├── LICENSE-MIT
├── README.md
├── book.toml
└── src
    ├── SUMMARY.md
    ├── benchmarking.md
    ├── bounds-checks.md
    ├── build-configuration.md
    ├── compile-times.md
    ├── general-tips.md
    ├── hashing.md
    ├── heap-allocations.md
    ├── inlining.md
    ├── introduction.md
    ├── io.md
    ├── iterators.md
    ├── linting.md
    ├── logging-and-debugging.md
    ├── machine-code.md
    ├── parallelism.md
    ├── profiling.md
    ├── standard-library-types.md
    ├── title-page.md
    ├── type-sizes.md
    └── wrapper-types.md


/.editorconfig:
--------------------------------------------------------------------------------
1 | [*.md]
2 | max_line_length = 79
3 | 


--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------
 1 | name: CI
 2 | 
 3 | on:
 4 |   pull_request:
 5 |   push:
 6 |     branches:
 7 |       - master
 8 | 
 9 | jobs:
10 |   test_and_maybe_deploy:
11 |     runs-on: ubuntu-latest
12 |     steps:
13 |       - name: Clone repository
14 |         uses: actions/checkout@v4
15 | 
16 |       - name: Setup mdbook
17 |         uses: peaceiris/actions-mdbook@v2
18 |         with:
19 |           mdbook-version: "latest"
20 | 
21 |       # EPUB
22 |       # Currently disabled due to
23 |       # https://github.com/nnethercote/perf-book/actions/runs/6358429874/job/17270643057
24 |       #- name: Setup mdbook-epub
25 |       #  run: cargo install mdbook-epub
26 | 
27 |       - name: Build
28 |         run: mdbook build
29 | 
30 |       - name: Test
31 |         run: mdbook test
32 | 
33 |       # EPUB
34 |       #- name: Copy ePub
35 |       #  run: cp book/epub/The\ Rust\ Performance\ Book.epub book/html
36 | 
37 |       - name: Deploy
38 |         uses: peaceiris/actions-gh-pages@v4
39 |         with:
40 |           github_token: ${{ secrets.GITHUB_TOKEN }}
41 |           #publish_dir: ./book/html  # use if EPUB is enabled
42 |           publish_dir: ./book # use if EPUB is disabled
43 |         # Only deploy on a push to master, not on a pull request.
44 |         if: github.event_name == 'push' && github.ref == 'refs/heads/master' && github.repository == 'nnethercote/perf-book'
45 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | book
2 | 
3 | # Prevent Vim swap files from making `mdbook serve` regenerate HTML frequently.
4 | *.sw*
5 | 
6 | # Also `diff` files, which I generate a lot.
7 | diff
8 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # The Rust Performance Book Code of Conduct
2 | 
3 | This repository uses the [Rust Code of Conduct].
4 | 
5 | [Rust Code of Conduct]: https://www.rust-lang.org/conduct.html
6 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Style Guide for The Rust Performance Book
 2 | 
 3 | These style guidelines are used for the book.
 4 | 
 5 | ## Line Lengths
 6 | 
 7 | Lines of text are limited to 79 characters. (There is a `.editorconfig` file
 8 | that specifies this.) Lines containing non-text elements, such as links, can be
 9 | longer.
10 | 
11 | ## Examples
12 | 
13 | Links to examples that demonstrate performance techniques on real-world
14 | programs are encouraged. These examples might be pull requests, blog posts,
15 | etc.
16 | 
17 | Single examples are written like this:
18 | ```markdown
19 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe).
20 | ```
21 | 
22 | Multiple examples are written like this:
23 | ```markdown
24 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323),
25 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7).
26 | ```
27 | 
28 | ## Title Style
29 | 
30 | Section titles are capitalized, which means that all words within the title are
31 | capitalized, other than "small" words such as conjunctions. For example, "Using
32 | an Alternative Allocator", rather than "Using an alternative allocator".
33 | 
34 | ## External Link Style
35 | 
36 | For external links—those that point outside the book—reference links are
37 | preferred to inline links. For example, this:
38 | ```markdown
39 | The book's title is [The Rust Performance Book].
40 | 
41 | [The Rust Performance Book]: https://nnethercote.github.io/perf-book/
42 | ```
43 | is preferred to this:
44 | ```markdown
45 | The book's title is [The Rust Performance Book](https://nnethercote.github.io/perf-book/).
46 | ```
47 | The reason for this preference is that external links are usually relatively
48 | long, and long inline links often break awkwardly across lines.
49 | 
50 | One exception to this rule is that **Example** links are inline, with each one
51 | put on its own line, as seen above.
52 | 


--------------------------------------------------------------------------------
/LICENSE-APACHE:
--------------------------------------------------------------------------------
  1 |                               Apache License
  2 |                         Version 2.0, January 2004
  3 |                      http://www.apache.org/licenses/
  4 | 
  5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 | 1. Definitions.
  8 | 
  9 |    "License" shall mean the terms and conditions for use, reproduction,
 10 |    and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |    "Licensor" shall mean the copyright owner or entity authorized by
 13 |    the copyright owner that is granting the License.
 14 | 
 15 |    "Legal Entity" shall mean the union of the acting entity and all
 16 |    other entities that control, are controlled by, or are under common
 17 |    control with that entity. For the purposes of this definition,
 18 |    "control" means (i) the power, direct or indirect, to cause the
 19 |    direction or management of such entity, whether by contract or
 20 |    otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |    outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |    "You" (or "Your") shall mean an individual or Legal Entity
 24 |    exercising permissions granted by this License.
 25 | 
 26 |    "Source" form shall mean the preferred form for making modifications,
 27 |    including but not limited to software source code, documentation
 28 |    source, and configuration files.
 29 | 
 30 |    "Object" form shall mean any form resulting from mechanical
 31 |    transformation or translation of a Source form, including but
 32 |    not limited to compiled object code, generated documentation,
 33 |    and conversions to other media types.
 34 | 
 35 |    "Work" shall mean the work of authorship, whether in Source or
 36 |    Object form, made available under the License, as indicated by a
 37 |    copyright notice that is included in or attached to the work
 38 |    (an example is provided in the Appendix below).
 39 | 
 40 |    "Derivative Works" shall mean any work, whether in Source or Object
 41 |    form, that is based on (or derived from) the Work and for which the
 42 |    editorial revisions, annotations, elaborations, or other modifications
 43 |    represent, as a whole, an original work of authorship. For the purposes
 44 |    of this License, Derivative Works shall not include works that remain
 45 |    separable from, or merely link (or bind by name) to the interfaces of,
 46 |    the Work and Derivative Works thereof.
 47 | 
 48 |    "Contribution" shall mean any work of authorship, including
 49 |    the original version of the Work and any modifications or additions
 50 |    to that Work or Derivative Works thereof, that is intentionally
 51 |    submitted to Licensor for inclusion in the Work by the copyright owner
 52 |    or by an individual or Legal Entity authorized to submit on behalf of
 53 |    the copyright owner. For the purposes of this definition, "submitted"
 54 |    means any form of electronic, verbal, or written communication sent
 55 |    to the Licensor or its representatives, including but not limited to
 56 |    communication on electronic mailing lists, source code control systems,
 57 |    and issue tracking systems that are managed by, or on behalf of, the
 58 |    Licensor for the purpose of discussing and improving the Work, but
 59 |    excluding communication that is conspicuously marked or otherwise
 60 |    designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |    "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |    on behalf of whom a Contribution has been received by Licensor and
 64 |    subsequently incorporated within the Work.
 65 | 
 66 | 2. Grant of Copyright License. Subject to the terms and conditions of
 67 |    this License, each Contributor hereby grants to You a perpetual,
 68 |    worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |    copyright license to reproduce, prepare Derivative Works of,
 70 |    publicly display, publicly perform, sublicense, and distribute the
 71 |    Work and such Derivative Works in Source or Object form.
 72 | 
 73 | 3. Grant of Patent License. Subject to the terms and conditions of
 74 |    this License, each Contributor hereby grants to You a perpetual,
 75 |    worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |    (except as stated in this section) patent license to make, have made,
 77 |    use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |    where such license applies only to those patent claims licensable
 79 |    by such Contributor that are necessarily infringed by their
 80 |    Contribution(s) alone or by combination of their Contribution(s)
 81 |    with the Work to which such Contribution(s) was submitted. If You
 82 |    institute patent litigation against any entity (including a
 83 |    cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |    or a Contribution incorporated within the Work constitutes direct
 85 |    or contributory patent infringement, then any patent licenses
 86 |    granted to You under this License for that Work shall terminate
 87 |    as of the date such litigation is filed.
 88 | 
 89 | 4. Redistribution. You may reproduce and distribute copies of the
 90 |    Work or Derivative Works thereof in any medium, with or without
 91 |    modifications, and in Source or Object form, provided that You
 92 |    meet the following conditions:
 93 | 
 94 |    (a) You must give any other recipients of the Work or
 95 |        Derivative Works a copy of this License; and
 96 | 
 97 |    (b) You must cause any modified files to carry prominent notices
 98 |        stating that You changed the files; and
 99 | 
100 |    (c) You must retain, in the Source form of any Derivative Works
101 |        that You distribute, all copyright, patent, trademark, and
102 |        attribution notices from the Source form of the Work,
103 |        excluding those notices that do not pertain to any part of
104 |        the Derivative Works; and
105 | 
106 |    (d) If the Work includes a "NOTICE" text file as part of its
107 |        distribution, then any Derivative Works that You distribute must
108 |        include a readable copy of the attribution notices contained
109 |        within such NOTICE file, excluding those notices that do not
110 |        pertain to any part of the Derivative Works, in at least one
111 |        of the following places: within a NOTICE text file distributed
112 |        as part of the Derivative Works; within the Source form or
113 |        documentation, if provided along with the Derivative Works; or,
114 |        within a display generated by the Derivative Works, if and
115 |        wherever such third-party notices normally appear. The contents
116 |        of the NOTICE file are for informational purposes only and
117 |        do not modify the License. You may add Your own attribution
118 |        notices within Derivative Works that You distribute, alongside
119 |        or as an addendum to the NOTICE text from the Work, provided
120 |        that such additional attribution notices cannot be construed
121 |        as modifying the License.
122 | 
123 |    You may add Your own copyright statement to Your modifications and
124 |    may provide additional or different license terms and conditions
125 |    for use, reproduction, or distribution of Your modifications, or
126 |    for any such Derivative Works as a whole, provided Your use,
127 |    reproduction, and distribution of the Work otherwise complies with
128 |    the conditions stated in this License.
129 | 
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 |    any Contribution intentionally submitted for inclusion in the Work
132 |    by You to the Licensor shall be under the terms and conditions of
133 |    this License, without any additional terms or conditions.
134 |    Notwithstanding the above, nothing herein shall supersede or modify
135 |    the terms of any separate license agreement you may have executed
136 |    with Licensor regarding such Contributions.
137 | 
138 | 6. Trademarks. This License does not grant permission to use the trade
139 |    names, trademarks, service marks, or product names of the Licensor,
140 |    except as required for reasonable and customary use in describing the
141 |    origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 |    agreed to in writing, Licensor provides the Work (and each
145 |    Contributor provides its Contributions) on an "AS IS" BASIS,
146 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |    implied, including, without limitation, any warranties or conditions
148 |    of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |    PARTICULAR PURPOSE. You are solely responsible for determining the
150 |    appropriateness of using or redistributing the Work and assume any
151 |    risks associated with Your exercise of permissions under this License.
152 | 
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 |    whether in tort (including negligence), contract, or otherwise,
155 |    unless required by applicable law (such as deliberate and grossly
156 |    negligent acts) or agreed to in writing, shall any Contributor be
157 |    liable to You for damages, including any direct, indirect, special,
158 |    incidental, or consequential damages of any character arising as a
159 |    result of this License or out of the use or inability to use the
160 |    Work (including but not limited to damages for loss of goodwill,
161 |    work stoppage, computer failure or malfunction, or any and all
162 |    other commercial damages or losses), even if such Contributor
163 |    has been advised of the possibility of such damages.
164 | 
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 |    the Work or Derivative Works thereof, You may choose to offer,
167 |    and charge a fee for, acceptance of support, warranty, indemnity,
168 |    or other liability obligations and/or rights consistent with this
169 |    License. However, in accepting such obligations, You may act only
170 |    on Your own behalf and on Your sole responsibility, not on behalf
171 |    of any other Contributor, and only if You agree to indemnify,
172 |    defend, and hold each Contributor harmless for any liability
173 |    incurred by, or claims asserted against, such Contributor by reason
174 |    of your accepting any such warranty or additional liability.
175 | 
176 | END OF TERMS AND CONDITIONS
177 | 


--------------------------------------------------------------------------------
/LICENSE-MIT:
--------------------------------------------------------------------------------
 1 | Permission is hereby granted, free of charge, to any
 2 | person obtaining a copy of this software and associated
 3 | documentation files (the "Software"), to deal in the
 4 | Software without restriction, including without
 5 | limitation the rights to use, copy, modify, merge,
 6 | publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software
 8 | is furnished to do so, subject to the following
 9 | conditions:
10 | 
11 | The above copyright notice and this permission notice
12 | shall be included in all copies or substantial portions
13 | of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
16 | ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED
17 | TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
18 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
19 | SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
20 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
22 | IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
23 | DEALINGS IN THE SOFTWARE.
24 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # perf-book
 2 | 
 3 | The Rust Performance Book.
 4 | 
 5 | ## Viewing
 6 | 
 7 | The rendered (HTML) book is [here](https://nnethercote.github.io/perf-book/).
 8 | 
 9 | <!-- EPUB
10 | Currently disabled due to
11 | https://github.com/nnethercote/perf-book/actions/runs/6358429874/job/17270643057
12 | 
13 | An ePub version is available
14 | [here](https://nnethercote.github.io/perf-book/The%20Rust%20Performance%20Book.epub).
15 | experimental. The ePub file is generated with
16 | [mdbook-epub](https://crates.io/crates/mdbook-epub), which is experimental. It
17 | has excessive whitespace and is not as nice to read as the HTML version.
18 | Nonetheless, it is usable if you really want to read the book on an e-reader.
19 | -->
20 | 
21 | ## Building
22 | 
23 | The book is built with [`mdbook`](https://github.com/rust-lang/mdBook), which
24 | can be installed with this command:
25 | ```
26 | cargo install mdbook
27 | ```
28 | To build the book, run this command:
29 | ```
30 | mdbook build
31 | ```
32 | The generated files are put in the `book/` directory.
33 | 
34 | ## Development
35 | 
36 | To view the built book, run this command:
37 | ```
38 | mdbook serve
39 | ```
40 | This will launch a local web server to serve the book. View the built book by
41 | navigating to `localhost:3000` in a web browser. While the web server is
42 | running, the rendered book will automatically update if the book's files
43 | change.
44 | 
45 | To test the code within the book, run this command:
46 | ```
47 | mdbook test
48 | ```
49 | 
50 | ## Improvements
51 | 
52 | Suggestions for improvements are welcome, but I prefer them to be filed as
53 | issues rather than pull requests. This is because I am very particular about
54 | the wording used in the book. When pull requests are made, I typically take the
55 | underlying idea of a pull request and rewrite it into my own words anyway.
56 | 
57 | This book contains no material produced by generative AI, and none will be
58 | accepted.
59 | 
60 | ## License
61 | 
62 | Licensed under either of
63 | * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or
64 |   http://www.apache.org/licenses/LICENSE-2.0)
65 | * MIT license ([LICENSE-MIT](LICENSE-MIT) or
66 |   http://opensource.org/licenses/MIT)
67 | 
68 | at your option.
69 | 
70 | ## Contribution
71 | 
72 | Unless you explicitly state otherwise, any contribution intentionally submitted
73 | for inclusion in the work by you, as defined in the Apache-2.0 license, shall
74 | be dual licensed as above, without any additional terms or conditions.
75 | 


--------------------------------------------------------------------------------
/book.toml:
--------------------------------------------------------------------------------
 1 | [book]
 2 | title = "The Rust Performance Book"
 3 | authors = ["Nicholas Nethercote"]
 4 | src = "src"
 5 | language = "en"
 6 | multilingual = false
 7 | 
 8 | [build]
 9 | create-missing = false
10 | 
11 | [rust]
12 | edition = "2018"
13 | 
14 | [output.html]
15 | curly-quotes = true
16 | default-theme = "rust"
17 | git-repository-url = "https://github.com/nnethercote/perf-book"
18 | edit-url-template = "https://github.com/nnethercote/perf-book/edit/master/{path}"
19 | site-url = "https://nnethercote.github.io/perf-book/"
20 | 
21 | # EPUB
22 | # Currently disabled due to
23 | # https://github.com/nnethercote/perf-book/actions/runs/6358429874/job/17270643057
24 | #[output.epub]
25 | #optional = true  # So epub generation is skipped if mdbook-epub isn't installed.
26 | 
27 | 


--------------------------------------------------------------------------------
/src/SUMMARY.md:
--------------------------------------------------------------------------------
 1 | # Summary
 2 | 
 3 | [Title Page](title-page.md)
 4 | 
 5 | - [Introduction](introduction.md)
 6 | - [Benchmarking](benchmarking.md)
 7 | - [Build Configuration](build-configuration.md)
 8 | - [Linting](linting.md)
 9 | - [Profiling](profiling.md)
10 | - [Inlining](inlining.md)
11 | - [Hashing](hashing.md)
12 | - [Heap Allocations](heap-allocations.md)
13 | - [Type Sizes](type-sizes.md)
14 | - [Standard Library Types](standard-library-types.md)
15 | - [Iterators](iterators.md)
16 | - [Bounds Checks](bounds-checks.md)
17 | - [I/O](io.md)
18 | - [Logging and Debugging](logging-and-debugging.md)
19 | - [Wrapper Types](wrapper-types.md)
20 | - [Machine Code](machine-code.md)
21 | - [Parallelism](parallelism.md)
22 | - [General Tips](general-tips.md)
23 | - [Compile Times](compile-times.md)
24 | 
25 | 


--------------------------------------------------------------------------------
/src/benchmarking.md:
--------------------------------------------------------------------------------
 1 | # Benchmarking
 2 | 
 3 | Benchmarking typically involves comparing the performance of two or more
 4 | programs that do the same thing. Sometimes this might involve comparing two or
 5 | more different programs, e.g. Firefox vs Safari vs Chrome. Sometimes it
 6 | involves comparing two different versions of the same program. This latter case
 7 | lets us reliably answer the question "did this change speed things up?"
 8 | 
 9 | Benchmarking is a complex topic and a thorough coverage is beyond the scope of
10 | this book, but here are the basics.
11 | 
12 | First, you need workloads to measure. Ideally, you would have a variety of
13 | workloads that represent realistic usage of your program. Workloads using
14 | real-world inputs are best, but [microbenchmarks] and [stress tests] can be
15 | useful in moderation.
16 | 
17 | [microbenchmarks]: https://stackoverflow.com/questions/2842695/what-is-microbenchmarking
18 | [stress tests]: https://en.wikipedia.org/wiki/Stress_testing_(software)
19 | 
20 | Second, you need a way to run the workloads, which will also dictate the
21 | metrics used.
22 | - Rust's built-in [benchmark tests] are a simple starting point, but they use
23 |   unstable features and therefore only work on nightly Rust.
24 | - [Criterion] and [Divan] are more sophisticated alternatives.
25 | - [Hyperfine] is an excellent general-purpose benchmarking tool.
26 | - [Bencher] can do continuous benchmarking on CI, including GitHub CI.
27 | - Custom benchmarking harnesses are also possible. For example, [rustc-perf] is
28 |   the harness used to benchmark the Rust compiler.
29 | 
30 | [benchmark tests]: https://doc.rust-lang.org/nightly/unstable-book/library-features/test.html
31 | [Criterion]: https://github.com/bheisler/criterion.rs
32 | [Divan]: https://github.com/nvzqz/divan
33 | [Hyperfine]: https://github.com/sharkdp/hyperfine
34 | [Bencher]: https://github.com/bencherdev/bencher
35 | [rustc-perf]: https://github.com/rust-lang/rustc-perf/
36 | 
37 | When it comes to metrics, there are many choices, and the right one(s) will
38 | depend on the nature of the program being benchmarked. For example, metrics
39 | that make sense for a batch program might not make sense for an interactive
40 | program. Wall-time is an obvious choice in many cases because it corresponds to
41 | what users perceive. However, it can suffer from high variance. In particular,
42 | tiny changes in memory layout can cause significant but ephemeral performance
43 | fluctuations. Therefore, other metrics with lower variance (such as cycles or
44 | instruction counts) may be a reasonable alternative.
45 | 
46 | Summarizing measurements from multiple workloads is also a challenge, and there
47 | are a variety of ways to do it, with no single method being obviously best.
48 | 
49 | Good benchmarking is hard. Having said that, do not stress too much about
50 | having a perfect benchmarking setup, particularly when you start optimizing a
51 | program. Mediocre benchmarking is far better than no benchmarking. Keep an open
52 | mind about what you are measuring, and over time you can make benchmarking
53 | improvements as you learn about the performance characteristics of your
54 | program.
55 | 


--------------------------------------------------------------------------------
/src/bounds-checks.md:
--------------------------------------------------------------------------------
 1 | # Bounds Checks
 2 | 
 3 | By default, accesses to container types such as slices and vectors involve
 4 | bounds checks in Rust. These can affect performance, e.g. within hot loops,
 5 | though less often than you might expect.
 6 | 
 7 | There are several safe ways to change code so that the compiler knows about
 8 | container lengths and can optimize away bounds checks.
 9 | 
10 | - Replace direct element accesses in a loop by using iteration.
11 | - Instead of indexing into a `Vec` within a loop, make a slice of the `Vec`
12 |   before the loop and then index into the slice within the loop.
13 | - Add assertions on the ranges of index variables.
14 | [**Example 1**](https://github.com/rust-random/rand/pull/960/commits/de9dfdd86851032d942eb583d8d438e06085867b),
15 | [**Example 2**](https://github.com/image-rs/jpeg-decoder/pull/167/files).
16 | 
17 | Getting these to work can be tricky. The [Bounds Check Cookbook] goes into more
18 | detail on this topic.
19 | 
20 | [Bounds Check Cookbook]: https://github.com/Shnatsel/bounds-check-cookbook/
21 | 
22 | As a last resort, there are the unsafe methods [`get_unchecked`] and
23 | [`get_unchecked_mut`].
24 | 
25 | [`get_unchecked`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked
26 | [`get_unchecked_mut`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked_mut
27 | 
28 | 


--------------------------------------------------------------------------------
/src/build-configuration.md:
--------------------------------------------------------------------------------
  1 | # Build Configuration
  2 | 
  3 | You can drastically change the performance of a Rust program without changing
  4 | its code, just by changing its build configuration. There are many possible
  5 | build configurations for each Rust program. The one chosen will affect several
  6 | characteristics of the compiled code, such as compile times, runtime speed,
  7 | memory use, binary size, debuggability, profilability, and which architectures
  8 | your compiled program will run on.
  9 | 
 10 | Most configuration choices will improve one or more characteristics while
 11 | worsening one or more others. For example, a common trade-off is to accept
 12 | worse compile times in exchange for higher runtime speeds. The right choice
 13 | for your program depends on your needs and the specifics of your program, and
 14 | performance-related choices (which is most of them) should be validated with
 15 | benchmarking.
 16 | 
 17 | It is worth reading this chapter carefully to understand all the build
 18 | configuration choices. However, for the impatient or forgetful,
 19 | [`cargo-wizard`] encapsulates this information and can help you choose an
 20 | appropriate build configuration.
 21 | 
 22 | Note that Cargo only looks at the profile settings in the `Cargo.toml` file at
 23 | the root of the workspace. Profile settings defined in dependencies are
 24 | ignored. Therefore, these options are mostly relevant for binary crates, not
 25 | library crates.
 26 | 
 27 | [`cargo-wizard`]: https://github.com/Kobzol/cargo-wizard
 28 | 
 29 | ## Release Builds
 30 | 
 31 | The single most important build configuration choice is simple but [easy to
 32 | overlook]: make sure you are using a [release build] rather than a [dev build]
 33 | when you want high performance. This is usually done by specifying the
 34 | `--release` flag to Cargo.
 35 | 
 36 | [easy to overlook]: https://users.rust-lang.org/t/why-my-rust-program-is-so-slow/47764/5
 37 | [release build]: https://doc.rust-lang.org/cargo/reference/profiles.html#release
 38 | [dev build]: https://doc.rust-lang.org/cargo/reference/profiles.html#dev
 39 | 
 40 | Dev builds are the default. They are good for debugging, but are not optimized.
 41 | They are produced if you run `cargo build` or `cargo run`. (Alternatively,
 42 | running `rustc` without additional options also produces an unoptimized build.)
 43 | 
 44 | Consider the following final line of output from a `cargo build` run.
 45 | ```text
 46 | Finished dev [unoptimized + debuginfo] target(s) in 29.80s
 47 | ```
 48 | This output indicates that a dev build has been produced. The compiled code
 49 | will be placed in the `target/debug/` directory. `cargo run` will run the dev
 50 | build.
 51 | 
 52 | In comparison, release builds are much more optimized, omit debug assertions
 53 | and integer overflow checks, and omit debug info. 10-100x speedups over dev
 54 | builds are common! They are produced if you run `cargo build --release` or
 55 | `cargo run --release`. (Alternatively, `rustc` has multiple options for
 56 | optimized builds, such as `-O` and `-C opt-level`.) This will typically take
 57 | longer than a dev build because of the additional optimizations.
 58 | 
 59 | Consider the following final line of output from a `cargo build --release` run.
 60 | ```text
 61 | Finished release [optimized] target(s) in 1m 01s
 62 | ```
 63 | This output indicates that a release build has been produced. The compiled code
 64 | will be placed in the `target/release/` directory. `cargo run --release` will
 65 | run the release build.
 66 | 
 67 | See the [Cargo profile documentation] for more details about the differences
 68 | between dev builds (which use the `dev` profile) and release builds (which use
 69 | the `release` profile).
 70 | 
 71 | [Cargo profile documentation]: https://doc.rust-lang.org/cargo/reference/profiles.html
 72 | 
 73 | The default build configuration choices used in release builds provide a good
 74 | balance between the abovementioned characteristics such as compile times, runtime
 75 | speed, and binary size. But there are many possible adjustments, as the
 76 | following sections explain.
 77 | 
 78 | ## Maximizing Runtime Speed
 79 | 
 80 | The following build configuration options are designed primarily to maximize
 81 | runtime speed. Some of them may also reduce binary size.
 82 | 
 83 | ### Codegen Units
 84 | 
 85 | The Rust compiler splits crates into multiple [codegen units] to parallelize
 86 | (and thus speed up) compilation. However, this might cause it to miss some
 87 | potential optimizations. You may be able to improve runtime speed and reduce
 88 | binary size, at the cost of increased compile times, by setting the number of
 89 | units to one. Add these lines to the `Cargo.toml` file:
 90 | ```toml
 91 | [profile.release]
 92 | codegen-units = 1
 93 | ```
 94 | <!-- Using `https` for this link triggers "potential security risk" warnings due
 95 | to a certificate problem. -->
 96 | [**Example 1**](http://likebike.com/posts/How_To_Write_Fast_Rust_Code.html#emit-asm),
 97 | [**Example 2**](https://github.com/rust-lang/rust/pull/115554#issuecomment-1742192440).
 98 | 
 99 | [codegen units]: https://doc.rust-lang.org/cargo/reference/profiles.html#codegen-units
100 | 
101 | ### Link-time Optimization
102 | 
103 | [Link-time optimization] (LTO) is a whole-program optimization technique that
104 | can improve runtime speed by 10-20% or more, and also reduce binary size, at
105 | the cost of worse compile times. It comes in several forms.
106 | 
107 | [Link-time optimization]: https://doc.rust-lang.org/cargo/reference/profiles.html#lto
108 | 
109 | The first form of LTO is *thin local LTO*, a lightweight form of LTO. By
110 | default the compiler uses this for any build that involves a non-zero level of
111 | optimization. This includes release builds. To explicitly request this level of
112 | LTO, put these lines in the `Cargo.toml` file:
113 | ```toml
114 | [profile.release]
115 | lto = false
116 | ```
117 | 
118 | The second form of LTO is *thin LTO*, which is a little more aggressive, and
119 | likely to improve runtime speed and reduce binary size while also increasing
120 | compile times. Use `lto = "thin"` in `Cargo.toml` to enable it.
121 | 
122 | The third form of LTO is *fat LTO*, which is even more aggressive, and may
123 | improve performance and reduce binary size further (but [not always]) while
124 | increasing build times again. Use `lto = "fat"` in `Cargo.toml` to enable it.
125 | 
126 | [not always]: https://github.com/rust-lang/rust/pull/103453
127 | 
128 | Finally, it is possible to fully disable LTO, which will likely worsen runtime
129 | speed and increase binary size but reduce compile times. Use `lto = "off"` in
130 | `Cargo.toml` for this. Note that this is different to the `lto = false` option,
131 | which, as mentioned above, leaves thin local LTO enabled.
132 | 
133 | ### Alternative Allocators
134 | 
135 | It is possible to replace the default (system) heap allocator used by a Rust
136 | program with an alternative allocator. The exact effect will depend on the
137 | individual program and the alternative allocator chosen, but large improvements
138 | in runtime speed and large reductions in memory usage have been seen in
139 | practice. The effect will also vary across platforms, because each platform's
140 | system allocator has its own strengths and weaknesses. The use of an
141 | alternative allocator is also likely to increase binary size and compile times.
142 | 
143 | #### jemalloc
144 | 
145 | One popular alternative allocator for Linux and Mac is [jemalloc], usable via
146 | the [`tikv-jemallocator`] crate. To use it, add a dependency to your
147 | `Cargo.toml` file:
148 | ```toml
149 | [dependencies]
150 | tikv-jemallocator = "0.5"
151 | ```
152 | Then add the following to your Rust code, e.g. at the top of `src/main.rs`:
153 | ```rust,ignore
154 | #[global_allocator]
155 | static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
156 | ```
157 | 
158 | Furthermore, on Linux, jemalloc can be configured to use [transparent huge
159 | pages][THP] (THP). This can further speed up programs, possibly at the cost of
160 | higher memory usage.
161 | 
162 | [THP]: https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html
163 | 
164 | Do this by setting the `MALLOC_CONF` environment variable (or perhaps
165 | [`_RJEM_MALLOC_CONF`]) appropriately before building your program, for example:
166 | ```bash
167 | MALLOC_CONF="thp:always,metadata_thp:always" cargo build --release
168 | ```
169 | The system running the compiled program also has to be configured to support
170 | THP. See [this blog post] for more details.
171 | 
172 | [`_RJEM_MALLOC_CONF`]: https://github.com/tikv/jemallocator/issues/65
173 | [this blog post]: https://kobzol.github.io/rust/rustc/2023/10/21/make-rust-compiler-5percent-faster.html
174 | 
175 | #### mimalloc
176 | 
177 | Another alternative allocator that works on many platforms is [mimalloc],
178 | usable via the [`mimalloc`] crate. To use it, add a dependency to your
179 | `Cargo.toml` file:
180 | ```toml
181 | [dependencies]
182 | mimalloc = "0.1"
183 | ```
184 | Then add the following to your Rust code, e.g. at the top of `src/main.rs`:
185 | ```rust,ignore
186 | #[global_allocator]
187 | static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
188 | ```
189 | 
190 | [jemalloc]: https://github.com/jemalloc/jemalloc
191 | [`tikv-jemallocator`]: https://crates.io/crates/tikv-jemallocator
192 | [better performance]: https://github.com/rust-lang/rust/pull/83152
193 | [mimalloc]: https://github.com/microsoft/mimalloc
194 | [`mimalloc`]: https://crates.io/crates/mimalloc
195 | 
196 | ### CPU Specific Instructions
197 | 
198 | If you do not care about the compatibility of your binary on older (or other
199 | types of) processors, you can tell the compiler to generate the newest (and
200 | potentially fastest) instructions specific to a [certain CPU architecture],
201 | such as AVX SIMD instructions for x86-64 CPUs.
202 | 
203 | [certain CPU architecture]: https://doc.rust-lang.org/rustc/codegen-options/index.html#target-cpu
204 | 
205 | To request these instructions from the command line, use the `-C
206 | target-cpu=native` flag. For example:
207 | ```bash
208 | RUSTFLAGS="-C target-cpu=native" cargo build --release
209 | ```
210 | 
211 | Alternatively, to request these instructions from a [`config.toml`] file (for
212 | one or more projects), add these lines:
213 | ```toml
214 | [build]
215 | rustflags = ["-C", "target-cpu=native"]
216 | ```
217 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
218 | 
219 | This can improve runtime speed, especially if the compiler finds vectorization
220 | opportunities in your code.
221 | 
222 | If you are unsure whether `-C target-cpu=native` is working optimally, compare
223 | the output of `rustc --print cfg` and `rustc --print cfg -C target-cpu=native`
224 | to see if the CPU features are being detected correctly in the latter case. If
225 | not, you can use `-C target-feature` to target specific features.
226 | 
227 | ### Profile-guided Optimization
228 | 
229 | Profile-guided optimization (PGO) is a compilation model where you compile
230 | your program, run it on sample data while collecting profiling data, and then
231 | use that profiling data to guide a second compilation of the program. This can
232 | improve runtime speed by 10% or more.
233 | [**Example 1**](https://blog.rust-lang.org/inside-rust/2020/11/11/exploring-pgo-for-the-rust-compiler.html),
234 | [**Example 2**](https://github.com/rust-lang/rust/pull/96978).
235 | 
236 | It is an advanced technique that takes some effort to set up, but is worthwhile
237 | in some cases. See the [rustc PGO documentation] for details. Also, the
238 | [`cargo-pgo`] command makes it easier to use PGO (and [BOLT], which is similar)
239 | to optimize Rust binaries.
240 | 
241 | Unfortunately, PGO is not supported for binaries hosted on crates.io and
242 | distributed via `cargo install`, which limits its usability.
243 | 
244 | [rustc PGO documentation]: https://doc.rust-lang.org/rustc/profile-guided-optimization.html
245 | [`cargo-pgo`]: https://github.com/Kobzol/cargo-pgo
246 | [BOLT]: https://github.com/llvm/llvm-project/tree/main/bolt
247 | 
248 | ## Minimizing Binary Size
249 | 
250 | The following build configuration options are designed primarily to minimize
251 | binary size. Their effects on runtime speed vary.
252 | 
253 | ### Optimization Level
254 | 
255 | You can request an [optimization level] that aims to minimize binary size by
256 | adding these lines to the `Cargo.toml` file:
257 | ```toml
258 | [profile.release]
259 | opt-level = "z"
260 | ```
261 | [optimization level]: https://doc.rust-lang.org/cargo/reference/profiles.html#opt-level
262 | 
263 | This may also reduce runtime speed.
264 | 
265 | An alternative is `opt-level = "s"`, which targets minimal binary size a little
266 | less aggressively. Compared to `opt-level = "z"`, it allows [slightly more
267 | inlining] and also the vectorization of loops.
268 | 
269 | [slightly more inlining]: https://doc.rust-lang.org/rustc/codegen-options/index.html#inline-threshold
270 | 
271 | ### Abort on `panic!`
272 | 
273 | If you do not need to unwind on panic, e.g. because your program doesn't use
274 | [`catch_unwind`], you can tell the compiler to simply [abort on panic]. On
275 | panic, your program will still produce a backtrace.
276 | 
277 | [`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html
278 | [abort on panic]: https://doc.rust-lang.org/cargo/reference/profiles.html#panic
279 | 
280 | This might reduce binary size and increase runtime speed slightly, and may even
281 | reduce compile times slightly. Add these lines to the `Cargo.toml` file:
282 | ```toml
283 | [profile.release]
284 | panic = "abort"
285 | ```
286 | 
287 | ### Strip Symbols
288 | 
289 | You can tell the compiler to [strip] symbols from a release build by adding
290 | these lines to `Cargo.toml`:
291 | ```toml
292 | [profile.release]
293 | strip = "symbols"
294 | ```
295 | [strip]: https://doc.rust-lang.org/cargo/reference/profiles.html#strip
296 | 
297 | [**Example**](https://github.com/nnethercote/counts/commit/53cab44cd09ff1aa80de70a6dbe1893ff8a41142).
298 | 
299 | However, stripping symbols may make your compiled program more difficult to
300 | debug and profile. For example, if a stripped program panics, the backtrace
301 | produced may contain less useful information than normal. The exact effects
302 | depend on the platform.
303 | 
304 | Debug info does not need to be stripped from release builds. By default, debug
305 | info is not generated for local release builds, and debug info for the standard
306 | library has been stripped automatically in release builds [since Rust 1.77].
307 | 
308 | [since Rust 1.77]: https://blog.rust-lang.org/2024/03/21/Rust-1.77.0.html#enable-strip-in-release-profiles-by-default
309 | 
310 | ### Other Ideas
311 | 
312 | For more advanced binary size minimization techniques, consult the
313 | comprehensive documentation in the excellent [`min-sized-rust`] repository.
314 | 
315 | [`min-sized-rust`]: https://github.com/johnthagen/min-sized-rust
316 | 
317 | ## Minimizing Compile Times
318 | 
319 | The following build configuration options are designed primarily to minimize
320 | compile times.
321 | 
322 | ### Linking
323 | 
324 | A big part of compile time is actually linking time, particularly when
325 | rebuilding a program after a small change. On some platforms it is possible to
326 | select a faster linker than the default one.
327 | 
328 | One option is [lld], which is available on Linux and Windows. lld has been the
329 | default linker on Linux [since Rust 1.90]. It is not yet the default on
330 | Windows, but it should work for most use cases.
331 | 
332 | [since Rust 1.90]: https://blog.rust-lang.org/2025/09/01/rust-lld-on-1.90.0-stable/
333 | 
334 | To specify lld
335 | from the command line, use the `-C link-arg=-fuse-ld=lld` flag. For example:
336 | ```bash
337 | RUSTFLAGS="-C link-arg=-fuse-ld=lld" cargo build --release
338 | ```
339 | 
340 | [lld]: https://lld.llvm.org/
341 | 
342 | Alternatively, to specify lld from a [`config.toml`] file (for one or more
343 | projects), add these lines:
344 | ```toml
345 | [build]
346 | rustflags = ["-C", "link-arg=-fuse-ld=lld"]
347 | ```
348 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
349 | 
350 | There is a [GitHub Issue] tracking full
351 | support for lld.
352 | 
353 | [GitHub Issue]: https://github.com/rust-lang/rust/issues/39915#issuecomment-618726211
354 | 
355 | Another option is [mold], which is currently available on Linux.
356 | Simply substitute `mold` for `lld` in the instructions above. mold is often
357 | faster than lld.
358 | [**Example**](https://davidlattimore.github.io/posts/2024/02/04/speeding-up-the-rust-edit-build-run-cycle.html).
359 | It is also much newer and may not work in all cases.
360 | 
361 | [mold]: https://github.com/rui314/mold
362 | 
363 | A final option is [wild], which is currently only available on Linux. It may be
364 | even faster than mold, but it is less mature.
365 | 
366 | [wild]: https://github.com/davidlattimore/wild
367 | 
368 | On Mac, an alternative linker isn't necessary because the system linker is
369 | fast.
370 | 
371 | Unlike the other options in this chapter, there are no trade-offs to choosing
372 | another linker. As long as the linker works correctly for your program, which
373 | is likely to be true unless you are doing unusual things, an alternative
374 | linker can be dramatically faster without any downsides.
375 | 
376 | ### Disable Debug Info Generation
377 | 
378 | Although release builds give the best performance, many people use dev builds
379 | while developing because they build more quickly. If you use dev builds but
380 | don't often use a debugger, consider disabling debuginfo. This can improve dev
381 | build times significantly, by as much as 20-40%.
382 | [**Example.**](https://kobzol.github.io/rust/rustc/2025/05/20/disable-debuginfo-to-improve-rust-compile-times.html)
383 | 
384 | To disable debug info generation, add these lines to the `Cargo.toml` file:
385 | ```toml
386 | [profile.dev]
387 | debug = false
388 | ```
389 | Note that this means that stack traces will not contain line information. If
390 | you want to keep that line information, but do not require full information for
391 | the debugger, you can use `debug = "line-tables-only"` instead, which still
392 | gives most of the compile time benefits.
393 | 
394 | ### Experimental Parallel Front-end
395 | 
396 | If you use nightly Rust, you can enable the experimental [parallel front-end].
397 | It may reduce compile times at the cost of higher compile-time memory usage. It
398 | won't affect the quality of the generated code.
399 | 
400 | [parallel front-end]: https://blog.rust-lang.org/2023/11/09/parallel-rustc.html
401 | 
402 | You can do that by adding `-Zthreads=N` to RUSTFLAGS, for example:
403 | ```bash
404 | RUSTFLAGS="-Zthreads=8" cargo build --release
405 | ```
406 | 
407 | Alternatively, to enable the parallel front-end from a [`config.toml`] file (for
408 | one or more projects), add these lines:
409 | ```toml
410 | [build]
411 | rustflags = ["-Z", "threads=8"]
412 | ```
413 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
414 | 
415 | Values other than `8` are possible, but that is the number that tends to give
416 | the best results.
417 | 
418 | In the best cases, the experimental parallel front-end reduces compile times by
419 | up to 50%. But the effects vary widely and depend on the characteristics of the
420 | code and its build configuration, and for some programs there is no compile
421 | time improvement.
422 | 
423 | ### Cranelift Codegen Back-end
424 | 
425 | If you use nightly Rust you can enable the Cranelift codegen back-end on [some
426 | platforms]. It may reduce compile times at the cost of lower quality generated
427 | code, and therefore is recommended for dev builds rather than release builds.
428 | 
429 | First, install the back-end with this `rustup` command:
430 | ```bash
431 | rustup component add rustc-codegen-cranelift-preview --toolchain nightly
432 | ```
433 | 
434 | To select Cranelift from the command line, use the
435 | `-Zcodegen-backend=cranelift` flag. For example:
436 | ```bash
437 | RUSTFLAGS="-Zcodegen-backend=cranelift" cargo +nightly build
438 | ```
439 | 
440 | Alternatively, to specify Cranelift from a [`config.toml`] file (for one or
441 | more projects), add these lines:
442 | ```toml
443 | [unstable]
444 | codegen-backend = true
445 | 
446 | [profile.dev]
447 | codegen-backend = "cranelift"
448 | ```
449 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
450 | 
451 | For more information, see the [Cranelift documentation].
452 | 
453 | [some platforms]: https://github.com/rust-lang/rustc_codegen_cranelift#platform-support
454 | [Cranelift documentation]: https://github.com/rust-lang/rustc_codegen_cranelift
455 | 
456 | ## Custom profiles
457 | 
458 | In addition to the `dev` and `release` profiles, Cargo supports [custom
459 | profiles]. It might be useful, for example, to create a custom profile halfway
460 | between `dev` and `release` if you find the runtime speed of dev builds
461 | insufficient and the compile times of release builds too slow for everyday
462 | development.
463 | 
464 | [custom profiles]: https://doc.rust-lang.org/cargo/reference/profiles.html#custom-profiles
465 | 
466 | ## Summary
467 | 
468 | There are many choices to be made when it comes to build configurations. The
469 | following points summarize the above information into some recommendations.
470 | 
471 | - If you want to maximize runtime speed, consider all of the following:
472 |   `codegen-units = 1`, `lto = "fat"`, an alternative allocator, and `panic =
473 |   "abort"`.
474 | - If you want to minimize binary size, consider `opt-level = "z"`,
475 |   `codegen-units = 1`, `lto = "fat"`, `panic = "abort"`, and `strip =
476 |   "symbols"`.
477 | - In either case, consider `-C target-cpu=native` if broad architecture support
478 |   is not needed, and `cargo-pgo` if it works with your distribution mechanism.
479 | - Always use a faster linker if you are on a platform that supports it, because
480 |   there are no downsides to doing so.
481 | - Use `cargo-wizard` if you need additional help with these choices.
482 | - Benchmark all changes, one at a time, to ensure they have the expected
483 |   effects.
484 | 
485 | Finally, [this issue] tracks the evolution of the Rust compiler's own build
486 | configuration. The Rust compiler's build system is stranger and more complex
487 | than that of most Rust programs. Nonetheless, this issue may be instructive in
488 | showing how build configuration choices can be applied to a large program.
489 | 
490 | [this issue]: https://github.com/rust-lang/rust/issues/103595
491 | 


--------------------------------------------------------------------------------
/src/compile-times.md:
--------------------------------------------------------------------------------
  1 | # Compile Times
  2 | 
  3 | Although this book is primarily about improving the performance of Rust
  4 | programs, this section is about reducing the compile times of Rust programs,
  5 | because that is a related topic of interest to many people.
  6 | 
  7 | The [Minimizing Compile Times] section discussed ways to reduce compile times
  8 | via build configuration choices. The rest of this section discusses ways to
  9 | reduce compile times that require modifying your program's code.
 10 | 
 11 | [Minimizing Compile Times]: build-configuration.md#minimizing-compile-times
 12 | 
 13 | For additional compile time reduction techniques, consult Corrode's
 14 | comprehensive list of [Tips for Faster Rust Compile Times][Tips].
 15 | 
 16 | [Tips]: https://corrode.dev/blog/tips-for-faster-rust-compile-times/
 17 | 
 18 | ## Visualization 
 19 | 
 20 | Cargo has a feature that lets you visualize compilation of your
 21 | program. Build with this command:
 22 | ```text
 23 | cargo build --timings
 24 | ```
 25 | On completion it will print the name of an HTML file. Open that file in a web
 26 | browser. It contains a [Gantt chart] that shows the dependencies between the
 27 | various crates in your program. This shows how much parallelism there is in
 28 | your crate graph, which can indicate if any large crates that serialize
 29 | compilation should be broken up. See [the documentation][timings] for more
 30 | details on how to read the graphs.
 31 | 
 32 | [Gantt chart]: https://en.wikipedia.org/wiki/Gantt_chart
 33 | [timings]: https://doc.rust-lang.org/nightly/cargo/reference/timings.html
 34 | 
 35 | ## Macros
 36 | 
 37 | Some macros generate a lot of code. That code then takes time to compile. The
 38 | Rust compiler's `-Zmacro-stats` flag can help identify such cases.
 39 | 
 40 | For example, if you just want to measure a leaf crate of your project:
 41 | ```text
 42 | cargo +nightly rustc -- -Zmacro-stats
 43 | ```
 44 | The compiler will print information about the amount of code generated by both
 45 | procedural macros and declarative macros. The former are usually more notable.
 46 | 
 47 | Or, if you want to measure all the crates in your project:
 48 | ```text
 49 | RUSTFLAGS="-Zmacro-stats" cargo +nightly build
 50 | ```
 51 | To see the generated code itself, you can use [cargo-expand].
 52 | 
 53 | [cargo-expand]: https://github.com/dtolnay/cargo-expand
 54 | 
 55 | It's not worth worrying over macros that produce small amounts of code, but if
 56 | a macro is generating an amount of code comparable to the amount of
 57 | hand-written code, it might be possible to remove the use of that macro
 58 | entirely, or replace it with a cheaper alternative.
 59 | [**Example**](https://nnethercote.github.io/2025/06/26/how-much-code-does-that-proc-macro-generate.html).
 60 | 
 61 | Alternatively, it might be possible to modify the macro to generate less code.
 62 | [**Example 1**](https://github.com/bevyengine/bevy/issues/19873),
 63 | [**Example 2**](https://nnethercote.github.io/2025/08/16/speed-wins-when-fuzzing-rust-code-with-derive-arbitrary.html).
 64 | 
 65 | ## LLVM IR
 66 | 
 67 | The Rust compiler uses [LLVM] for its back-end. LLVM's execution can be a large
 68 | part of compile times, especially when the Rust compiler's front end generates
 69 | a lot of [IR] which takes LLVM a long time to optimize.
 70 | 
 71 | [LLVM]: https://llvm.org/
 72 | [IR]: https://en.wikipedia.org/wiki/Intermediate_representation
 73 | 
 74 | These problems can be diagnosed with [`cargo llvm-lines`], which shows which
 75 | Rust functions cause the most LLVM IR to be generated. Generic functions are
 76 | often the most important ones, because they can be instantiated dozens or even
 77 | hundreds of times in large programs.
 78 | 
 79 | [`cargo llvm-lines`]: https://github.com/dtolnay/cargo-llvm-lines/
 80 | 
 81 | If a generic function causes IR bloat, there are several ways to fix it. The
 82 | simplest is to just make the function smaller.
 83 | [**Example 1**](https://github.com/rust-lang/rust/pull/72166/commits/5a0ac0552e05c079f252482cfcdaab3c4b39d614),
 84 | [**Example 2**](https://github.com/rust-lang/rust/pull/91246/commits/f3bda74d363a060ade5e5caeb654ba59bfed51a4).
 85 | 
 86 | Another way is to move the non-generic parts of the function into a separate,
 87 | non-generic function, which will only be instantiated once. Whether this is
 88 | possible will depend on the details of the generic function. When it is
 89 | possible, the non-generic function can often be written neatly as an inner
 90 | function within the generic function, as shown by the code for
 91 | [`std::fs::read`]:
 92 | ```rust,ignore
 93 | pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
 94 |     fn inner(path: &Path) -> io::Result<Vec<u8>> {
 95 |         let mut file = File::open(path)?;
 96 |         let size = file.metadata().map(|m| m.len()).unwrap_or(0);
 97 |         let mut bytes = Vec::with_capacity(size as usize);
 98 |         io::default_read_to_end(&mut file, &mut bytes)?;
 99 |         Ok(bytes)
100 |     }
101 |     inner(path.as_ref())
102 | }
103 | ```
104 | [`std::fs::read`]: https://doc.rust-lang.org/std/fs/fn.read.html
105 | 
106 | [**Example**](https://github.com/rust-lang/rust/pull/72013/commits/68b75033ad78d88872450a81745cacfc11e58178).
107 | 
108 | Sometimes common utility functions like [`Option::map`] and [`Result::map_err`]
109 | are instantiated many times. Replacing them with equivalent `match` expressions
110 | can help compile times.
111 | 
112 | [`Option::map`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map
113 | [`Result::map_err`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_err
114 | 
115 | The effects of these sorts of changes on compile times will usually be small,
116 | though occasionally they can be large.
117 | [**Example**](https://github.com/servo/servo/issues/26585).
118 | 
119 | Such changes can also reduce binary size.
120 | 


--------------------------------------------------------------------------------
/src/general-tips.md:
--------------------------------------------------------------------------------
 1 | # General Tips
 2 | 
 3 | The previous sections of this book have discussed Rust-specific techniques.
 4 | This section gives a brief overview of some general performance principles.
 5 | 
 6 | As long as the obvious pitfalls are avoided (e.g. [using non-release builds]),
 7 | Rust code generally is fast and uses little memory. Especially if you are used
 8 | to dynamically-typed languages such as Python and Ruby, or statically-types
 9 | languages with a garbage collector such as Java and C#.
10 | 
11 | [using non-release builds]: build-configuration.md
12 | 
13 | Optimized code is often more complex and takes more effort to write than
14 | unoptimized code. For this reason, it is only worth optimizing hot code.
15 | 
16 | The biggest performance improvements often come from changes to algorithms or
17 | data structures, rather than low-level optimizations.
18 | [**Example 1**](https://github.com/rust-lang/rust/pull/53383/commits/5745597e6195fe0591737f242d02350001b6c590),
19 | [**Example 2**](https://github.com/rust-lang/rust/pull/54318/commits/154be2c98cf348de080ce951df3f73649e8bb1a6).
20 | 
21 | Writing code that works well with modern hardware is not always easy, but worth
22 | striving for. For example, try to minimize cache misses and branch
23 | mispredictions, where possible.
24 | 
25 | Most optimizations result in small speedups. Although no single small speedup
26 | is noticeable, they really add up if you can do enough of them.
27 | 
28 | Different profilers have different strengths. It is good to use more than one.
29 | 
30 | When profiling indicates that a function is hot, there are two common ways to
31 | speed things up: (a) make the function faster, and/or (b) avoid calling it as
32 | much.
33 | 
34 | It is often easier to eliminate silly slowdowns than it is to introduce clever
35 | speedups.
36 | 
37 | Avoid computing things unless necessary. Lazy/on-demand computations are
38 | often a win.
39 | [**Example 1**](https://github.com/rust-lang/rust/pull/36592/commits/80a44779f7a211e075da9ed0ff2763afa00f43dc),
40 | [**Example 2**](https://github.com/rust-lang/rust/pull/50339/commits/989815d5670826078d9984a3515eeb68235a4687).
41 | 
42 | Complex general cases can often be avoided by optimistically checking for
43 | common special cases that are simpler.
44 | [**Example 1**](https://github.com/rust-lang/rust/pull/68790/commits/d62b6f204733d255a3e943388ba99f14b053bf4a),
45 | [**Example 2**](https://github.com/rust-lang/rust/pull/53733/commits/130e55665f8c9f078dec67a3e92467853f400250),
46 | [**Example 3**](https://github.com/rust-lang/rust/pull/65260/commits/59e41edcc15ed07de604c61876ea091900f73649).
47 | In particular, specially handling collections with 0, 1, or 2 elements is often
48 | a win when small sizes dominate.
49 | [**Example 1**](https://github.com/rust-lang/rust/pull/50932/commits/2ff632484cd8c2e3b123fbf52d9dd39b54a94505),
50 | [**Example 2**](https://github.com/rust-lang/rust/pull/64627/commits/acf7d4dcdba4046917c61aab141c1dec25669ce9),
51 | [**Example 3**](https://github.com/rust-lang/rust/pull/64949/commits/14192607d38f5501c75abea7a4a0e46349df5b5f),
52 | [**Example 4**](https://github.com/rust-lang/rust/pull/64949/commits/d1a7bb36ad0a5932384eac03d3fb834efc0317e5).
53 | 
54 | Similarly, when dealing with repetitive data, it is often possible to use a
55 | simple form of data compression, by using a compact representation for common
56 | values and then having a fallback to a secondary table for unusual values.
57 | [**Example 1**](https://github.com/rust-lang/rust/pull/54420/commits/b2f25e3c38ff29eebe6c8ce69b8c69243faa440d),
58 | [**Example 2**](https://github.com/rust-lang/rust/pull/59693/commits/fd7f605365b27bfdd3cd6763124e81bddd61dd28),
59 | [**Example 3**](https://github.com/rust-lang/rust/pull/65750/commits/eea6f23a0ed67fd8c6b8e1b02cda3628fee56b2f).
60 | 
61 | When code deals with multiple cases, measure case frequencies and handle the
62 | most common ones first.
63 | 
64 | When dealing with lookups that involve high locality, it can be a win to put a
65 | small cache in front of a data structure.
66 | 
67 | Optimized code often has a non-obvious structure, which means that explanatory
68 | comments are valuable, particularly those that reference profiling
69 | measurements. A comment like "99% of the time this vector has 0 or 1 elements,
70 | so handle those cases first" can be illuminating.
71 | 


--------------------------------------------------------------------------------
/src/hashing.md:
--------------------------------------------------------------------------------
 1 | # Hashing
 2 | 
 3 | `HashSet` and `HashMap` are two widely-used types and there are ways to make
 4 | them faster.
 5 | 
 6 | ## Alternative Hashers
 7 | 
 8 | The default hashing algorithm is not specified, but at the time of writing the
 9 | default is an algorithm called [SipHash 1-3]. This algorithm is high quality—it
10 | provides high protection against collisions—but is relatively slow,
11 | particularly for short keys such as integers.
12 | 
13 | [SipHash 1-3]: https://en.wikipedia.org/wiki/SipHash
14 | 
15 | If profiling shows that hashing is hot, and [HashDoS attacks] are not a concern
16 | for your application, the use of hash tables with faster hash algorithms can
17 | provide large speed wins.
18 | - [`rustc-hash`] provides `FxHashSet` and `FxHashMap` types that are drop-in
19 |   replacements for `HashSet` and `HashMap`. Its hashing algorithm is
20 |   low-quality but very fast, especially for integer keys, and has been found to
21 |   out-perform all other hash algorithms within rustc. ([`fxhash`] is an older,
22 |   less well maintained implementation of the same algorithm and types.)
23 | - [`fnv`] provides `FnvHashSet` and `FnvHashMap` types. Its hashing algorithm
24 |   is higher quality than `rustc-hash`'s but a little slower.
25 | - [`ahash`] provides `AHashSet` and `AHashMap`. Its hashing algorithm can take
26 |   advantage of AES instruction support that is available on some processors.
27 | 
28 | [HashDoS attacks]: https://en.wikipedia.org/wiki/Collision_attack
29 | [`rustc-hash`]: https://crates.io/crates/rustc-hash
30 | [`fxhash`]: https://crates.io/crates/fxhash
31 | [`fnv`]: https://crates.io/crates/fnv
32 | [`ahash`]: https://crates.io/crates/ahash
33 | 
34 | If hashing performance is important in your program, it is worth trying more
35 | than one of these alternatives. For example, the following results were seen in
36 | rustc.
37 | - The switch from `fnv` to `fxhash` gave [speedups of up to 6%][fnv2fx].
38 | - An attempt to switch from `fxhash` to `ahash` resulted in [slowdowns of
39 |   1-4%][fx2a].
40 | - An attempt to switch from `fxhash` back to the default hasher resulted in
41 |   [slowdowns ranging from 4-84%][fx2default]!
42 | 
43 | [fnv2fx]: https://github.com/rust-lang/rust/pull/37229/commits/00e48affde2d349e3b3bfbd3d0f6afb5d76282a7
44 | [fx2a]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589504301
45 | [fx2default]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589338446
46 | 
47 | If you decide to universally use one of the alternatives, such as
48 | `FxHashSet`/`FxHashMap`, it is easy to accidentally use `HashSet`/`HashMap` in
49 | some places. You can [use Clippy] to avoid this problem.
50 | 
51 | [use Clippy]: linting.md#disallowing-types
52 | 
53 | Some types don't need hashing. For example, you might have a newtype that wraps
54 | an integer and the integer values are random, or close to random. For such a
55 | type, the distribution of the hashed values won't be that different to the
56 | distribution of the values themselves. In this case the [`nohash_hasher`] crate
57 | can be useful.
58 | 
59 | [`nohash_hasher`]: https://crates.io/crates/nohash-hasher
60 | 
61 | Hash function design is a complex topic and is beyond the scope of this book.
62 | The [`ahash` documentation] has a good discussion. 
63 | 
64 | [`ahash` documentation]: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md
65 | 
66 | ## Byte-wise Hashing
67 | 
68 | When you annotate a type with `#[derive(Hash)]` the generated `hash` method
69 | will hash each field separately. For some hash functions it may be faster to
70 | convert the type to raw bytes and hash the bytes as a stream. This is possible
71 | for types that satisfy certain properties such as having no padding bytes.
72 | 
73 | The [`zerocopy`] and [`bytemuck`] crates both provide a `#[derive(ByteHash)]`
74 | macro that generates a `hash` method that does this kind of byte-wise hashing.
75 | The README for the [`derive_hash_fast`] crate provides more detail for this
76 | technique.
77 | 
78 | [`zerocopy`]: https://crates.io/crates/zerocopy
79 | [`bytemuck`]: https://crates.io/crates/bytemuck
80 | [`derive_hash_fast`]: https://crates.io/crates/derive_hash_fast
81 | 
82 | This is an advanced technique, and the performance effects are highly dependent
83 | on the hash function and the exact structure of the types being hashed. Measure
84 | carefully.
85 | 


--------------------------------------------------------------------------------
/src/heap-allocations.md:
--------------------------------------------------------------------------------
  1 | # Heap Allocations
  2 | 
  3 | Heap allocations are moderately expensive. The exact details depend on which
  4 | allocator is in use, but each allocation (and deallocation) typically involves
  5 | acquiring a global lock, doing some non-trivial data structure manipulation,
  6 | and possibly executing a system call. Small allocations are not necessarily
  7 | cheaper than large allocations. It is worth understanding which Rust data
  8 | structures and operations cause allocations, because avoiding them can greatly
  9 | improve performance.
 10 | 
 11 | The [Rust Container Cheat Sheet] has visualizations of common Rust types, and
 12 | is an excellent companion to the following sections.
 13 | 
 14 | [Rust Container Cheat Sheet]: https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/
 15 | 
 16 | ## Profiling
 17 | 
 18 | If a general-purpose profiler shows `malloc`, `free`, and related functions as
 19 | hot, then it is likely worth trying to reduce the allocation rate and/or using
 20 | an alternative allocator.
 21 | 
 22 | [DHAT] is an excellent profiler to use when reducing allocation rates. It works
 23 | on Linux and some other Unixes. It precisely identifies hot allocation
 24 | sites and their allocation rates. Exact results will vary, but experience with
 25 | rustc has shown that reducing allocation rates by 10 allocations per million
 26 | instructions executed can have measurable performance improvements (e.g. ~1%).
 27 | 
 28 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html
 29 | 
 30 | Here is some example output from DHAT.
 31 | ```text
 32 | AP 1.1/25 (2 children) {
 33 |   Total:     54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration)
 34 |   At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
 35 |   At t-end:  0 bytes (0%) in 0 blocks (0%), avg size 0 bytes
 36 |   Reads:     15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte
 37 |   Writes:    20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte
 38 |   Allocated at {
 39 |     #1: 0x95CACC9: alloc (alloc.rs:72)
 40 |     #2: 0x95CACC9: alloc (alloc.rs:148)
 41 |     #3: 0x95CACC9: reserve_internal<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:669)
 42 |     #4: 0x95CACC9: reserve<syntax::tokenstream::TokenStream,alloc::alloc::Global> (raw_vec.rs:492)
 43 |     #5: 0x95CACC9: reserve<syntax::tokenstream::TokenStream> (vec.rs:460)
 44 |     #6: 0x95CACC9: push<syntax::tokenstream::TokenStream> (vec.rs:989)
 45 |     #7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27)
 46 |     #8: 0x95CACC9: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader<'a>>::parse_token_tree (tokentrees.rs:81)
 47 |   }
 48 | }
 49 | ```
 50 | It is beyond the scope of this book to describe everything in this example, but
 51 | it should be clear that DHAT gives a wealth of information about allocations,
 52 | such as where and how often they happen, how big they are, how long they live
 53 | for, and how often they are accessed.
 54 | 
 55 | ## `Box`
 56 | 
 57 | [`Box`] is the simplest heap-allocated type. A `Box<T>` value is a `T` value
 58 | that is allocated on the heap.
 59 | 
 60 | [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html
 61 | 
 62 | It is sometimes worth boxing one or more fields in a struct or enum fields to
 63 | make a type smaller. (See the [Type Sizes](type-sizes.md) chapter for more
 64 | about this.)
 65 | 
 66 | Other than that, `Box` is straightforward and does not offer much scope for
 67 | optimizations.
 68 | 
 69 | ## `Rc`/`Arc`
 70 | 
 71 | [`Rc`]/[`Arc`] are similar to `Box`, but the value on the heap is accompanied by
 72 | two reference counts. They allow value sharing, which can be an effective way
 73 | to reduce memory usage.
 74 | 
 75 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html
 76 | [`Arc`]: https://doc.rust-lang.org/std/sync/struct.Arc.html
 77 | 
 78 | However, if used for values that are rarely shared, they can increase allocation
 79 | rates by heap allocating values that might otherwise not be heap-allocated.
 80 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe).
 81 | 
 82 | Unlike `Box`, calling `clone` on an `Rc`/`Arc` value does not involve an
 83 | allocation. Instead, it merely increments a reference count.
 84 | 
 85 | ## `Vec`
 86 | 
 87 | [`Vec`] is a heap-allocated type with a great deal of scope for optimizing the
 88 | number of allocations, and/or minimizing the amount of wasted space. To do this
 89 | requires understanding how its elements are stored.
 90 | 
 91 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html
 92 | 
 93 | A `Vec` contains three words: a length, a capacity, and a pointer. The pointer
 94 | will point to heap-allocated memory if the capacity is nonzero and the element
 95 | size is nonzero; otherwise, it will not point to allocated memory.
 96 | 
 97 | Even if the `Vec` itself is not heap-allocated, the elements (if present and
 98 | nonzero-sized) always will be. If nonzero-sized elements are present, the
 99 | memory holding those elements may be larger than necessary, providing space for
100 | additional future elements. The number of elements present is the length, and
101 | the number of elements that could be held without reallocating is the capacity.
102 | 
103 | When the vector needs to grow beyond its current capacity, the elements will be
104 | copied into a larger heap allocation, and the old heap allocation will be
105 | freed.
106 | 
107 | ### `Vec` Growth
108 | 
109 | A new, empty `Vec` created by the common means
110 | ([`vec![]`](https://doc.rust-lang.org/std/macro.vec.html)
111 | or [`Vec::new`] or [`Vec::default`]) has a length and capacity of zero, and no
112 | heap allocation is required. If you repeatedly push individual elements onto
113 | the end of the `Vec`, it will periodically reallocate. The growth strategy is
114 | not specified, but at the time of writing it uses a quasi-doubling strategy
115 | resulting in the following capacities: 0, 4, 8, 16, 32, 64, and so on. (It
116 | skips directly from 0 to 4, instead of going via 1 and 2, because this [avoids
117 | many allocations] in practice.) As a vector grows, the frequency of
118 | reallocations will decrease exponentially, but the amount of possibly-wasted
119 | excess capacity will increase exponentially.
120 | 
121 | [`Vec::new`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.new
122 | [`Vec::default`]: https://doc.rust-lang.org/std/default/trait.Default.html#tymethod.default
123 | [avoids many allocations]: https://github.com/rust-lang/rust/pull/72227
124 | 
125 | This growth strategy is typical for growable data structures and reasonable in
126 | the general case, but if you know in advance the likely length of a vector you
127 | can often do better. If you have a hot vector allocation site (e.g. a hot
128 | [`Vec::push`] call), it is worth using [`eprintln!`] to print the vector length
129 | at that site and then doing some post-processing (e.g. with [`counts`]) to
130 | determine the length distribution. For example, you might have many short
131 | vectors, or you might have a smaller number of very long vectors, and the best
132 | way to optimize the allocation site will vary accordingly.
133 | 
134 | [`Vec::push`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.push
135 | [`eprintln!`]: https://doc.rust-lang.org/std/macro.eprintln.html
136 | [`counts`]: https://github.com/nnethercote/counts/
137 | 
138 | ### Short `Vec`s
139 | 
140 | If you have many short vectors, you can use the `SmallVec` type from the
141 | [`smallvec`] crate. `SmallVec<[T; N]>` is a drop-in replacement for `Vec` that
142 | can store `N` elements within the `SmallVec` itself, and then switches to a
143 | heap allocation if the number of elements exceeds that. (Note also that
144 | `vec![]` literals must be replaced with `smallvec![]` literals.)
145 | [**Example 1**](https://github.com/rust-lang/rust/pull/50565/commits/78262e700dc6a7b57e376742f344e80115d2d3f2),
146 | [**Example 2**](https://github.com/rust-lang/rust/pull/55383/commits/526dc1421b48e3ee8357d58d997e7a0f4bb26915).
147 | 
148 | [`smallvec`]: https://crates.io/crates/smallvec
149 | 
150 | `SmallVec` reliably reduces the allocation rate when used appropriately, but
151 | its use does not guarantee improved performance. It is slightly slower than
152 | `Vec` for normal operations because it must always check if the elements are
153 | heap-allocated or not. Also, If `N` is high or `T` is large, then the
154 | `SmallVec<[T; N]>` itself can be larger than `Vec<T>`, and copying of
155 | `SmallVec` values will be slower. As always, benchmarking is required to
156 | confirm that an optimization is effective.
157 | 
158 | If you have many short vectors *and* you precisely know their maximum length,
159 | `ArrayVec` from the [`arrayvec`] crate is a better choice than `SmallVec`. It
160 | does not require the fallback to heap allocation, which makes it a little
161 | faster.
162 | [**Example**](https://github.com/rust-lang/rust/pull/74310/commits/c492ca40a288d8a85353ba112c4d38fe87ef453e).
163 | 
164 | [`arrayvec`]: https://crates.io/crates/arrayvec
165 | 
166 | ### Longer `Vec`s
167 | 
168 | If you know the minimum or exact size of a vector, you can reserve a specific
169 | capacity with [`Vec::with_capacity`], [`Vec::reserve`], or
170 | [`Vec::reserve_exact`]. For example, if you know a vector will grow to have at
171 | least 20 elements, these functions can immediately provide a vector with a
172 | capacity of at least 20 using a single allocation, whereas pushing the items
173 | one at a time would result in four allocations (for capacities of 4, 8, 16, and
174 | 32).
175 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/a7f2bb634308a5f05f2af716482b67ba43701681).
176 | 
177 | [`Vec::with_capacity`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity
178 | [`Vec::reserve`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve
179 | [`Vec::reserve_exact`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve_exact
180 | 
181 | If you know the maximum length of a vector, the above functions also let you
182 | not allocate excess space unnecessarily. Similarly, [`Vec::shrink_to_fit`] can be
183 | used to minimize wasted space, but note that it may cause a reallocation.
184 | 
185 | [`Vec::shrink_to_fit`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit
186 | 
187 | ## `String`
188 | 
189 | A [`String`] contains heap-allocated bytes. The representation and operation of
190 | `String` are very similar to that of `Vec<u8>`. Many `Vec` methods relating to
191 | growth and capacity have equivalents for `String`, such as
192 | [`String::with_capacity`].
193 | 
194 | [`String`]: https://doc.rust-lang.org/std/string/struct.String.html
195 | [`String::with_capacity`]: https://doc.rust-lang.org/std/string/struct.String.html#method.with_capacity
196 | 
197 | The `SmallString` type from the [`smallstr`] crate is similar to the `SmallVec`
198 | type.
199 | 
200 | [`smallstr`]: https://crates.io/crates/smallstr
201 | 
202 | The `String` type from the [`smartstring`] crate is a drop-in replacement for
203 | `String` that avoids heap allocations for strings with less than three words'
204 | worth of characters. On 64-bit platforms, this is any string that is less than
205 | 24 bytes, which includes all strings containing 23 or fewer ASCII characters.
206 | [**Example**](https://github.com/djc/topfew-rs/commit/803fd566e9b889b7ba452a2a294a3e4df76e6c4c).
207 | 
208 | [`smartstring`]: https://crates.io/crates/smartstring
209 | 
210 | Note that the `format!` macro produces a `String`, which means it performs an
211 | allocation. If you can avoid a `format!` call by using a string literal, that
212 | will avoid this allocation.
213 | [**Example**](https://github.com/rust-lang/rust/pull/55905/commits/c6862992d947331cd6556f765f6efbde0a709cf9).
214 | [`std::format_args`] and/or the [`lazy_format`] crate may help with this.
215 | 
216 | [`std::format_args`]: https://doc.rust-lang.org/std/macro.format_args.html
217 | [`lazy_format`]: https://crates.io/crates/lazy_format
218 | 
219 | ## Hash Tables
220 | 
221 | [`HashSet`] and [`HashMap`] are hash tables. Their representation and
222 | operations are similar to those of `Vec`, in terms of allocations: they have
223 | a single contiguous heap allocation, holding keys and values, which is
224 | reallocated as necessary as the table grows. Many `Vec` methods relating to
225 | growth and capacity have equivalents for `HashSet`/`HashMap`, such as
226 | [`HashSet::with_capacity`].
227 | 
228 | [`HashSet`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html
229 | [`HashMap`]: https://doc.rust-lang.org/std/collections/struct.HashMap.html
230 | [`HashSet::with_capacity`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.with_capacity
231 | 
232 | ## `clone`
233 | 
234 | Calling [`clone`] on a value that contains heap-allocated memory typically
235 | involves additional allocations. For example, calling `clone` on a non-empty
236 | `Vec` requires a new allocation for the elements (but note that the capacity of
237 | the new `Vec` might not be the same as the capacity of the original `Vec`). The
238 | exception is `Rc`/`Arc`, where a `clone` call just increments the reference
239 | count.
240 | 
241 | [`clone`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone
242 | 
243 | [`clone_from`] is an alternative to `clone`. `a.clone_from(&b)` is equivalent
244 | to `a = b.clone()` but may avoid unnecessary allocations. For example, if you
245 | want to clone one `Vec` over the top of an existing `Vec`, the existing `Vec`'s
246 | heap allocation will be reused if possible, as the following example shows.
247 | ```rust
248 | let mut v1: Vec<u32> = Vec::with_capacity(99);
249 | let v2: Vec<u32> = vec![1, 2, 3];
250 | v1.clone_from(&v2); // v1's allocation is reused
251 | assert_eq!(v1.capacity(), 99);
252 | ```
253 | Although `clone` usually causes allocations, it is a reasonable thing to use in
254 | many circumstances and can often make code simpler. Use profiling data to see
255 | which `clone` calls are hot and worth taking the effort to avoid.
256 | 
257 | [`clone_from`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#method.clone_from
258 | 
259 | Sometimes Rust code ends up containing unnecessary `clone` calls, due to (a)
260 | programmer error, or (b) changes in the code that render previously-necessary
261 | `clone` calls unnecessary. If you see a hot `clone` call that does not seem
262 | necessary, sometimes it can simply be removed.
263 | [**Example 1**](https://github.com/rust-lang/rust/pull/37318/commits/e382267cfb9133ef12d59b66a2935ee45b546a61),
264 | [**Example 2**](https://github.com/rust-lang/rust/pull/37705/commits/11c1126688bab32f76dbe1a973906c7586da143f),
265 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/36b37e22de92b584b9cf4464ed1d4ad317b798be).
266 | 
267 | ## `to_owned`
268 | 
269 | [`ToOwned::to_owned`] is implemented for many common types. It creates owned
270 | data from borrowed data, usually by cloning, and therefore often causes heap
271 | allocations. For example, it can be used to create a `String` from a `&str`.
272 | 
273 | [`ToOwned::to_owned`]: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html#tymethod.to_owned
274 | 
275 | Sometimes `to_owned` calls (and related calls such as `clone` and `to_string`)
276 | can be avoided by storing a reference to borrowed data in a struct rather than
277 | an owned copy. This requires lifetime annotations on the struct, complicating
278 | the code, and should only be done when profiling and benchmarking shows that it
279 | is worthwhile.
280 | [**Example**](https://github.com/rust-lang/rust/pull/50855/commits/6872377357dbbf373cfd2aae352cb74cfcc66f34).
281 | 
282 | ## `Cow`
283 | 
284 | Sometimes code deals with a mixture of borrowed and owned data. Imagine a
285 | vector of error messages, some of which are static string literals and some of
286 | which are constructed with `format!`. The obvious representation is
287 | `Vec<String>`, as the following example shows.
288 | ```rust
289 | let mut errors: Vec<String> = vec![];
290 | errors.push("something went wrong".to_string());
291 | errors.push(format!("something went wrong on line {}", 100));
292 | ```
293 | That requires a `to_string` call to promote the static string literal to a
294 | `String`, which incurs an allocation.
295 | 
296 | Instead you can use the [`Cow`] type, which can hold either borrowed or owned
297 | data. A borrowed value `x` is wrapped with `Cow::Borrowed(x)`, and an owned
298 | value `y` is wrapped with `Cow::Owned(y)`. `Cow` also implements the `From<T>`
299 | trait for various string, slice, and path types, so you can usually use `into`
300 | as well. (Or `Cow::from`, which is longer but results in more readable code,
301 | because it makes the type clearer.) The following example puts all this together.
302 | 
303 | [`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html
304 | 
305 | ```rust
306 | use std::borrow::Cow;
307 | let mut errors: Vec<Cow<'static, str>> = vec![];
308 | errors.push(Cow::Borrowed("something went wrong"));
309 | errors.push(Cow::Owned(format!("something went wrong on line {}", 100)));
310 | errors.push(Cow::from("something else went wrong"));
311 | errors.push(format!("something else went wrong on line {}", 101).into());
312 | ```
313 | `errors` now holds a mixture of borrowed and owned data without requiring any
314 | extra allocations. This example involves `&str`/`String`, but other pairings
315 | such as `&[T]`/`Vec<T>` and `&Path`/`PathBuf` are also possible. 
316 | 
317 | [**Example 1**](https://github.com/rust-lang/rust/pull/37064/commits/b043e11de2eb2c60f7bfec5e15960f537b229e20),
318 | [**Example 2**](https://github.com/rust-lang/rust/pull/56336/commits/787959c20d062d396b97a5566e0a766d963af022).
319 | 
320 | All of the above applies if the data is immutable. But `Cow` also allows
321 | borrowed data to be promoted to owned data if it needs to be mutated.
322 | [`Cow::to_mut`] will obtain a mutable reference to an owned value, cloning if
323 | necessary. This is called "clone-on-write", which is where the name `Cow` comes
324 | from.
325 | 
326 | [`Deref`]: https://doc.rust-lang.org/std/ops/trait.Deref.html
327 | [`Cow::to_mut`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html#method.to_mut
328 | 
329 | This clone-on-write behaviour is useful when you have some borrowed data, such
330 | as a `&str`, that is mostly read-only but occasionally needs to be modified.
331 | 
332 | [**Example 1**](https://github.com/rust-lang/rust/pull/50855/commits/ad471452ba6fbbf91ad566dc4bdf1033a7281811),
333 | [**Example 2**](https://github.com/rust-lang/rust/pull/68848/commits/67da45f5084f98eeb20cc6022d68788510dc832a).
334 | 
335 | Finally, because `Cow` implements [`Deref`], you can call methods directly on
336 | the data it encloses. 
337 | 
338 | `Cow` can be fiddly to get working, but it is often worth the effort.
339 | 
340 | ## Reusing Collections
341 | 
342 | Sometimes you need to build up a collection such as a `Vec` in stages. It is
343 | usually better to do this by modifying a single `Vec` than by building multiple
344 | `Vec`s and then combining them.
345 | 
346 | For example, if you have a function `do_stuff` that produces a `Vec` that might
347 | be called multiple times:
348 | ```rust
349 | fn do_stuff(x: u32, y: u32) -> Vec<u32> {
350 |     vec![x, y]
351 | }
352 | ```
353 | It might be better to instead modify a passed-in `Vec`:
354 | ```rust
355 | fn do_stuff(x: u32, y: u32, vec: &mut Vec<u32>) {
356 |     vec.push(x);
357 |     vec.push(y);
358 | }
359 | ```
360 | Sometimes it is worth keeping around a "workhorse" collection that can be
361 | reused. For example, if a `Vec` is needed for each iteration of a loop, you
362 | could declare the `Vec` outside the loop, use it within the loop body, and then
363 | call [`clear`] at the end of the loop body (to empty the `Vec` without affecting
364 | its capacity). This avoids allocations at the cost of obscuring the fact that
365 | each iteration's usage of the `Vec` is unrelated to the others.
366 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323),
367 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7).
368 | 
369 | [`clear`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.clear
370 | 
371 | Similarly, it is sometimes worth keeping a workhorse collection within a
372 | struct, to be reused in one or more methods that are called repeatedly.
373 | 
374 | ## Reading Lines from a File
375 | 
376 | [`BufRead::lines`] makes it easy to read a file one line at a time:
377 | ```rust
378 | # fn blah() -> Result<(), std::io::Error> {
379 | # fn process(_: &str) {}
380 | use std::io::{self, BufRead};
381 | let mut lock = io::stdin().lock();
382 | for line in lock.lines() {
383 |     process(&line?);
384 | }
385 | # Ok(())
386 | # }
387 | ```
388 | But the iterator it produces returns `io::Result<String>`, which means it
389 | allocates for every line in the file.
390 | 
391 | [`BufRead::lines`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.lines
392 | 
393 | An alternative is to use a workhorse `String` in a loop over
394 | [`BufRead::read_line`]:
395 | ```rust
396 | # fn blah() -> Result<(), std::io::Error> {
397 | # fn process(_: &str) {}
398 | use std::io::{self, BufRead};
399 | let mut lock = io::stdin().lock();
400 | let mut line = String::new();
401 | while lock.read_line(&mut line)? != 0 {
402 |     process(&line);
403 |     line.clear();
404 | }
405 | # Ok(())
406 | # }
407 | ```
408 | This reduces the number of allocations to at most a handful, and possibly just
409 | one. (The exact number depends on how many times `line` needs to be
410 | reallocated, which depends on the distribution of line lengths in the file.)
411 | 
412 | This will only work if the loop body can operate on a `&str`, rather than a
413 | `String`.
414 | 
415 | [`BufRead::read_line`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.read_line
416 | 
417 | [**Example**](https://github.com/nnethercote/counts/commit/7d39bbb1867720ef3b9799fee739cd717ad1539a).
418 | 
419 | ## Using an Alternative Allocator
420 | 
421 | It is also possible to improve heap allocation performance without changing
422 | your code, simply by using a different allocator. See the [Alternative
423 | Allocators] section for details.
424 | 
425 | [Alternative Allocators]: build-configuration.md#alternative-allocators
426 | 
427 | ## Avoiding Regressions
428 | 
429 | To ensure the number and/or size of allocations done by your code doesn't
430 | increase unintentionally, you can use the *heap usage testing* feature of
431 | [dhat-rs] to write tests that check particular code snippets allocate the
432 | expected amount of heap memory.
433 | 
434 | [dhat-rs]: https://crates.io/crates/dhat
435 | 


--------------------------------------------------------------------------------
/src/inlining.md:
--------------------------------------------------------------------------------
  1 | # Inlining
  2 | 
  3 | Entry to and exit from hot, uninlined functions often accounts for a
  4 | non-trivial fraction of execution time. Inlining these functions removes these
  5 | entries and exits and can enable additional low-level optimizations by the
  6 | compiler. In the best case the overall effect is small but easy speed wins.
  7 | 
  8 | There are four inline attributes that can be used on Rust functions.
  9 | - **None**. The compiler will decide itself if the function should be inlined.
 10 |   This will depend on factors such as the optimization level, the size of the
 11 |   function, whether the function is generic, and if the inlining is across a
 12 |   crate boundary.
 13 | - **`#[inline]`**. This suggests that the function should be inlined.
 14 | - **`#[inline(always)]`**. This strongly suggests that the function should be
 15 |   inlined.
 16 | - **`#[inline(never)]`**. This strongly suggests that the function should not
 17 |   be inlined.
 18 | 
 19 | Inline attributes do not guarantee that a function is inlined or not inlined,
 20 | but in practice `#[inline(always)]` will cause inlining in all but the most
 21 | exceptional cases.
 22 | 
 23 | Inlining is non-transitive. If a function `f` calls a function `g` and you want
 24 | both functions to be inlined together at a callsite to `f`, both functions
 25 | should be marked with an inline attribute.
 26 | 
 27 | ## Simple Cases
 28 | 
 29 | The best candidates for inlining are (a) functions that are very small, or (b)
 30 | functions that have a single call site. The compiler will often inline these
 31 | functions itself even without an inline attribute. But the compiler cannot
 32 | always make the best choices, so attributes are sometimes needed.
 33 | [**Example 1**](https://github.com/rust-lang/rust/pull/37083/commits/6a4bb35b70862f33ac2491ffe6c55fb210c8490d),
 34 | [**Example 2**](https://github.com/rust-lang/rust/pull/50407/commits/e740b97be699c9445b8a1a7af6348ca2d4c460ce),
 35 | [**Example 3**](https://github.com/rust-lang/rust/pull/50564/commits/77c40f8c6f8cc472f6438f7724d60bf3b7718a0c),
 36 | [**Example 4**](https://github.com/rust-lang/rust/pull/57719/commits/92fd6f9d30d0b6b4ecbcf01534809fb66393f139),
 37 | [**Example 5**](https://github.com/rust-lang/rust/pull/69256/commits/e761f3af904b3c275bdebc73bb29ffc45384945d).
 38 | 
 39 | Cachegrind is a good profiler for determining if a function is inlined. When
 40 | looking at Cachegrind's output, you can tell that a function has been inlined
 41 | if (and only if) its first and last lines are *not* marked with event counts.
 42 | For example:
 43 | ```text
 44 |       .  #[inline(always)]
 45 |       .  fn inlined(x: u32, y: u32) -> u32 {
 46 | 700,000      eprintln!("inlined: {} + {}", x, y);
 47 | 200,000      x + y
 48 |       .  }
 49 |       .  
 50 |       .  #[inline(never)]
 51 | 400,000  fn not_inlined(x: u32, y: u32) -> u32 {
 52 | 700,000      eprintln!("not_inlined: {} + {}", x, y);
 53 | 200,000      x + y
 54 | 200,000  }
 55 | ```
 56 | You should measure again after adding inline attributes, because the effects
 57 | can be unpredictable. Sometimes it has no effect because a nearby function that
 58 | was previously inlined no longer is. Sometimes it slows the code down. Inlining
 59 | can also affect compile times, especially cross-crate inlining which involves
 60 | duplicating internal representations of the functions.
 61 | 
 62 | ## Harder Cases
 63 | 
 64 | Sometimes you have a function that is large and has multiple call sites, but
 65 | only one call site is hot. You would like to inline the hot call site for
 66 | speed, but not inline the cold call sites to avoid unnecessary code bloat. The
 67 | way to handle this is to split the function always-inlined and never-inlined
 68 | variants, with the latter calling the former.
 69 | 
 70 | For example, this function:
 71 | ```rust
 72 | # fn one() {};
 73 | # fn two() {};
 74 | # fn three() {};
 75 | fn my_function() {
 76 |     one();
 77 |     two();
 78 |     three();
 79 | }
 80 | ```
 81 | Would become these two functions:
 82 | ```rust
 83 | # fn one() {};
 84 | # fn two() {};
 85 | # fn three() {};
 86 | // Use this at the hot call site.
 87 | #[inline(always)]
 88 | fn inlined_my_function() {
 89 |     one();
 90 |     two();
 91 |     three();
 92 | }
 93 | 
 94 | // Use this at the cold call sites.
 95 | #[inline(never)]
 96 | fn uninlined_my_function() {
 97 |     inlined_my_function();
 98 | }
 99 | ```
100 | [**Example 1**](https://github.com/rust-lang/rust/pull/53513/commits/b73843f9422fb487b2d26ac2d65f79f73a4c9ae3),
101 | [**Example 2**](https://github.com/rust-lang/rust/pull/64420/commits/a2261ad66400c3145f96ebff0d9b75e910fa89dd).
102 | 
103 | ## Outlining
104 | 
105 | The inverse of inlining is *outlining*: moving rarely executed code into a
106 | separate function. You can add a `#[cold]` attribute to such functions to tell
107 | the compiler that the function is rarely called. This can result in better code
108 | generation for the hot path.
109 | [**Example 1**](https://github.com/Lokathor/tinyvec/pull/127),
110 | [**Example 2**](https://crates.io/crates/fast_assert).
111 | 


--------------------------------------------------------------------------------
/src/introduction.md:
--------------------------------------------------------------------------------
 1 | # Introduction
 2 | 
 3 | Performance is important for many Rust programs. 
 4 | 
 5 | This book contains techniques that can improve the performance-related
 6 | characteristics of Rust programs, such as runtime speed, memory usage, and
 7 | binary size. The [Compile Times] section also contains techniques that will
 8 | improve the compile times of Rust programs. Some techniques only require
 9 | changing build configurations, but many require changing code.
10 | 
11 | [Compile Times]: compile-times.md
12 | 
13 | Some techniques are entirely Rust-specific, and some involve ideas that can be
14 | applied (often with modifications) to programs written in other languages. The
15 | [General Tips] section also includes some general principles that apply to any
16 | programming language. Nonetheless, this book is mostly about the performance of
17 | Rust programs and is no substitute for a general purpose guide to profiling and
18 | optimization.
19 | 
20 | [General Tips]: general-tips.md
21 | 
22 | This book also focuses on techniques that are practical and proven: many are
23 | accompanied by links to pull requests or other resources that show how the
24 | technique was used on a real-world Rust program. It reflects the primary
25 | author's background, being somewhat biased towards compiler development and
26 | away from other areas such as scientific computing.
27 | 
28 | This book is deliberately terse, favouring breadth over depth, so that it is
29 | quick to read. It links to external sources that provide more depth when
30 | appropriate.
31 | 
32 | This book is aimed at intermediate and advanced Rust users. Beginner Rust users
33 | have more than enough to learn and these techniques are likely to be an
34 | unhelpful distraction to them.
35 | 


--------------------------------------------------------------------------------
/src/io.md:
--------------------------------------------------------------------------------
  1 | # I/O
  2 | 
  3 | ## Locking
  4 | 
  5 | Rust's [`print!`] and [`println!`] macros lock stdout on every call. If you
  6 | have repeated calls to these macros it may be better to lock stdout manually.
  7 | 
  8 | [`print!`]: https://doc.rust-lang.org/std/macro.print.html
  9 | [`println!`]: https://doc.rust-lang.org/std/macro.println.html
 10 | 
 11 | For example, change this code:
 12 | ```rust
 13 | # let lines = vec!["one", "two", "three"];
 14 | for line in lines {
 15 |     println!("{}", line);
 16 | }
 17 | ```
 18 | to this:
 19 | ```rust
 20 | # fn blah() -> Result<(), std::io::Error> {
 21 | # let lines = vec!["one", "two", "three"];
 22 | use std::io::Write;
 23 | let mut stdout = std::io::stdout();
 24 | let mut lock = stdout.lock();
 25 | for line in lines {
 26 |     writeln!(lock, "{}", line)?;
 27 | }
 28 | // stdout is unlocked when `lock` is dropped
 29 | # Ok(())
 30 | # }
 31 | ```
 32 | stdin and stderr can likewise be locked when doing repeated operations on them.
 33 | 
 34 | ## Buffering
 35 | 
 36 | Rust file I/O is unbuffered by default. If you have many small and repeated
 37 | read or write calls to a file or network socket, use [`BufReader`] or
 38 | [`BufWriter`]. They maintain an in-memory buffer for input and output,
 39 | minimizing the number of system calls required.
 40 | 
 41 | [`BufReader`]: https://doc.rust-lang.org/std/io/struct.BufReader.html
 42 | [`BufWriter`]: https://doc.rust-lang.org/std/io/struct.BufWriter.html
 43 | 
 44 | For example, change this unbuffered writer code:
 45 | ```rust
 46 | # fn blah() -> Result<(), std::io::Error> {
 47 | # let lines = vec!["one", "two", "three"];
 48 | use std::io::Write;
 49 | let mut out = std::fs::File::create("test.txt")?;
 50 | for line in lines {
 51 |     writeln!(out, "{}", line)?;
 52 | }
 53 | # Ok(())
 54 | # }
 55 | ```
 56 | to this:
 57 | ```rust
 58 | # fn blah() -> Result<(), std::io::Error> {
 59 | # let lines = vec!["one", "two", "three"];
 60 | use std::io::{BufWriter, Write};
 61 | let mut out = BufWriter::new(std::fs::File::create("test.txt")?);
 62 | for line in lines {
 63 |     writeln!(out, "{}", line)?;
 64 | }
 65 | out.flush()?;
 66 | # Ok(())
 67 | # }
 68 | ```
 69 | [**Example 1**](https://github.com/rust-lang/rust/pull/93954),
 70 | [**Example 2**](https://github.com/nnethercote/dhat-rs/pull/22/commits/8c3ae26f1219474ee55c30bc9981e6af2e869be2).
 71 | 
 72 | The explicit call to [`flush`] is not strictly necessary, as flushing will
 73 | happen automatically when `out` is dropped. However, in that case any error
 74 | that occurs on flushing will be ignored, whereas an explicit flush will make
 75 | that error explicit.
 76 | 
 77 | [`flush`]: https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.flush
 78 | 
 79 | Forgetting to buffer is more common when writing. Both unbuffered and buffered
 80 | writers implement the [`Write`] trait, which means the code for writing
 81 | to an unbuffered writer and a buffered writer is much the same. In contrast,
 82 | unbuffered readers implement the [`Read`] trait but buffered readers implement
 83 | the [`BufRead`] trait, which means the code for reading from an unbuffered reader
 84 | and a buffered reader is different. For example, it is difficult to read a file
 85 | line by line with an unbuffered reader, but it is trivial with a buffered
 86 | reader by using [`BufRead::read_line`] or [`BufRead::lines`]. For this reason,
 87 | it is hard to write an example for readers like the one above for writers,
 88 | where the before and after versions are so similar.
 89 | 
 90 | [`Write`]: https://doc.rust-lang.org/std/io/trait.Write.html
 91 | [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html
 92 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
 93 | [`BufRead::read_line`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_line
 94 | [`BufRead::lines`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines
 95 | 
 96 | Finally, note that buffering also works with stdout, so you might want to
 97 | combine manual locking *and* buffering when making many writes to stdout.
 98 | 
 99 | ## Reading Lines from a File
100 | 
101 | [This section] explains how to avoid excessive allocations when using
102 | [`BufRead`] to read a file one line at a time.
103 | 
104 | [This section]: heap-allocations.md#reading-lines-from-a-file
105 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html
106 | 
107 | ## Reading Input as Raw Bytes
108 | 
109 | The built-in [String] type uses UTF-8 internally, which adds a small, but
110 | nonzero overhead caused by UTF-8 validation when you read input into it. If you
111 | just want to process input bytes without worrying about UTF-8 (for example if
112 | you handle ASCII text), you can use [`BufRead::read_until`].
113 | 
114 | [String]: https://doc.rust-lang.org/std/string/struct.String.html
115 | [`BufRead::read_until`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until
116 | 
117 | There are also dedicated crates for reading [byte-oriented lines of data]
118 | and working with [byte strings].
119 | 
120 | [byte-oriented lines of data]: https://github.com/Freaky/rust-linereader
121 | [byte strings]: https://github.com/BurntSushi/bstr
122 | 


--------------------------------------------------------------------------------
/src/iterators.md:
--------------------------------------------------------------------------------
 1 | # Iterators
 2 | 
 3 | ## `collect` and `extend`
 4 | 
 5 | [`Iterator::collect`] converts an iterator into a collection such as `Vec`,
 6 | which typically requires an allocation. You should avoid calling `collect` if
 7 | the collection is then only iterated over again.
 8 | 
 9 | [`Iterator::collect`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect
10 | 
11 | For this reason, it is often better to return an iterator type like `impl
12 | Iterator<Item=T>` from a function than a `Vec<T>`. Note that sometimes
13 | additional lifetimes are required on these return types, as [this blog post]
14 | explains.
15 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/660d8a6550a126797aa66a417137e39a5639451b).
16 | 
17 | [this blog post]: https://blog.katona.me/2019/12/29/Rust-Lifetimes-and-Iterators/
18 | 
19 | Similarly, you can use [`extend`] to extend an existing collection (such as a
20 | `Vec`) with an iterator, rather than collecting the iterator into a `Vec` and
21 | then using [`append`].
22 | 
23 | [`extend`]: https://doc.rust-lang.org/std/iter/trait.Extend.html#tymethod.extend
24 | [`append`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.append
25 | 
26 | Finally, when you write an iterator it is often worth implementing the
27 | [`Iterator::size_hint`] or [`ExactSizeIterator::len`] method, if possible.
28 | `collect` and `extend` calls that use the iterator may then do fewer
29 | allocations, because they have advance information about the number of elements
30 | yielded by the iterator.
31 | 
32 | [`Iterator::size_hint`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint
33 | [`ExactSizeIterator::len`]: https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#method.len
34 | 
35 | ## Chaining
36 | 
37 | [`chain`] can be very convenient, but it can also be slower than a single
38 | iterator. It may be worth avoiding for hot iterators, if possible.
39 | [**Example**](https://github.com/rust-lang/rust/pull/64801/commits/5ca99b750e455e9b5e13e83d0d7886486231e48a).
40 | 
41 | Similarly, [`filter_map`] may be faster than using [`filter`] followed by
42 | [`map`].
43 | 
44 | [`chain`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.chain
45 | [`filter_map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter_map
46 | [`filter`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter
47 | [`map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map
48 | 
49 | ## Chunks
50 | 
51 | When a chunking iterator is required and the chunk size is known to exactly
52 | divide the slice length, use the faster [`slice::chunks_exact`] instead of [`slice::chunks`].
53 | 
54 | When the chunk size is not known to exactly divide the slice length, it can
55 | still be faster to use `slice::chunks_exact` in combination with either
56 | [`ChunksExact::remainder`] or manual handling of excess elements.
57 | [**Example 1**](https://github.com/johannesvollmer/exrs/pull/173/files),
58 | [**Example 2**](https://github.com/johannesvollmer/exrs/pull/175/files).
59 | 
60 | The same is true for related iterators:
61 | - [`slice::rchunks`], [`slice::rchunks_exact`], and [`RChunksExact::remainder`];
62 | - [`slice::chunks_mut`], [`slice::chunks_exact_mut`], and [`ChunksExactMut::into_remainder`];
63 | - [`slice::rchunks_mut`], [`slice::rchunks_exact_mut`], and [`RChunksExactMut::into_remainder`].
64 | 
65 | [`slice::chunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks
66 | [`slice::chunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact
67 | [`ChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExact.html#method.remainder
68 | 
69 | [`slice::rchunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks
70 | [`slice::rchunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact
71 | [`RChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExact.html#method.remainder
72 | 
73 | [`slice::chunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_mut
74 | [`slice::chunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact_mut
75 | [`ChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExactMut.html#method.into_remainder
76 | 
77 | [`slice::rchunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_mut
78 | [`slice::rchunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact_mut
79 | [`RChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExactMut.html#method.into_remainder
80 | 
81 | ## `copied`
82 | 
83 | When iterating over collections of small data types, such as integers, it may
84 | be better to use `iter().copied()` instead of `iter()`. Whatever consumes that
85 | iterator will receive the integers by value instead of by reference, and LLVM
86 | may generate better code in that case.
87 | [**Example 1**](https://github.com/rust-lang/rust/issues/106539),
88 | [**Example 2**](https://github.com/rust-lang/rust/issues/113789).
89 | 
90 | This is an advanced technique. You might need to check the generated machine
91 | code to be certain it is having an effect. See the [Machine
92 | Code](machine-code.md) chapter for details on how to do that.
93 | 


--------------------------------------------------------------------------------
/src/linting.md:
--------------------------------------------------------------------------------
 1 | # Linting
 2 | 
 3 | [Clippy] is a collection of lints to catch common mistakes in Rust code. It is
 4 | an excellent tool to run on Rust code in general. It can also help with
 5 | performance, because a number of the lints relate to code patterns that can
 6 | cause sub-optimal performance.
 7 | 
 8 | Given that automated detection of problems is preferable to manual detection,
 9 | the rest of this book will not mention performance problems that Clippy detects
10 | by default.
11 | 
12 | ## Basics
13 | 
14 | [Clippy]: https://github.com/rust-lang/rust-clippy
15 | 
16 | Once installed, it is easy to run:
17 | ```text
18 | cargo clippy
19 | ```
20 | The full list of performance lints can be seen by visiting the [lint list] and
21 | deselecting all the lint groups except for "Perf". 
22 | 
23 | [lint list]: https://rust-lang.github.io/rust-clippy/master/
24 | 
25 | As well as making the code faster, the performance lint suggestions usually
26 | result in code that is simpler and more idiomatic, so they are worth following
27 | even for code that is not executed frequently.
28 | 
29 | Conversely, some non-performance lint suggestions can improve performance. For
30 | example, the [`ptr_arg`] style lint suggests changing various container
31 | arguments to slices, such as changing `&mut Vec<T>` arguments to `&mut [T]`.
32 | The primary motivation here is that a slice gives a more flexible API, but it
33 | may also result in faster code due to less indirection and better optimization
34 | opportunities for the compiler.
35 | [**Example**](https://github.com/fschutt/fastblur/pull/3/files).
36 | 
37 | [`ptr_arg`]: https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
38 | 
39 | ## Disallowing Types
40 | 
41 | In the following chapters we will see that it is sometimes worth avoiding
42 | certain standard library types in favour of alternatives that are faster. If
43 | you decide to use these alternatives, it is easy to accidentally use the
44 | standard library types in some places by mistake.
45 | 
46 | You can use Clippy's [`disallowed_types`] lint to avoid this problem. For
47 | example, to disallow the use of the standard hash tables (for reasons explained
48 | in the [Hashing] section) add a `clippy.toml` file to your code with the
49 | following line.
50 | ```toml
51 | disallowed-types = ["std::collections::HashMap", "std::collections::HashSet"]
52 | ```
53 | 
54 | [Hashing]: hashing.md
55 | [`disallowed_types`]: https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_types
56 | 


--------------------------------------------------------------------------------
/src/logging-and-debugging.md:
--------------------------------------------------------------------------------
 1 | # Logging and Debugging
 2 | 
 3 | Sometimes logging code or debugging code can slow down a program significantly.
 4 | Either the logging/debugging code itself is slow, or data collection code that
 5 | feeds into logging/debugging code is slow. Make sure that no unnecessary work
 6 | is done for logging/debugging purposes when logging/debugging is not enabled.
 7 | [**Example 1**](https://github.com/rust-lang/rust/pull/50246/commits/2e4f66a86f7baa5644d18bb2adc07a8cd1c7409d),
 8 | [**Example 2**](https://github.com/rust-lang/rust/pull/75133/commits/eeb4b83289e09956e0dda174047729ca87c709fe),
 9 | [**Example 3**](https://github.com/rust-lang/rust/pull/147293/commits/cb0f969b623a7e12a0d8166c9a498e17a8b5a3c4).
10 | 
11 | Note that [`assert!`] calls always run, but [`debug_assert!`] calls only run in
12 | dev builds. If you have an assertion that is hot but is not necessary for
13 | safety, consider making it a `debug_assert!`.
14 | [**Example 1**](https://github.com/rust-lang/rust/pull/58210/commits/f7ed6e18160bc8fccf27a73c05f3935c9e8f672e),
15 | [**Example 2**](https://github.com/rust-lang/rust/pull/90746/commits/580d357b5adef605fc731d295ca53ab8532e26fb).
16 | 
17 | [`assert!`]: https://doc.rust-lang.org/std/macro.assert.html
18 | [`debug_assert!`]: https://doc.rust-lang.org/std/macro.debug_assert.html
19 | 


--------------------------------------------------------------------------------
/src/machine-code.md:
--------------------------------------------------------------------------------
 1 | # Machine Code
 2 | 
 3 | When you have a small piece of very hot code it may be worth inspecting the
 4 | generated machine code to see if it has any inefficiencies, such as removable
 5 | [bounds checks]. The [Compiler Explorer] website is an excellent resource when
 6 | doing this on small snippets. [`cargo-show-asm`] is an alternative tool that
 7 | can be used on full Rust projects.
 8 | 
 9 | [bounds checks]: bounds-checks.md
10 | [Compiler Explorer]: https://godbolt.org/
11 | [`cargo-show-asm`]: https://github.com/pacak/cargo-show-asm
12 | 
13 | Relatedly, the [`core::arch`] module provides access to architecture-specific
14 | intrinsics, many of which relate to SIMD instructions.
15 | 
16 | [`core::arch`]: https://doc.rust-lang.org/core/arch/index.html
17 | 


--------------------------------------------------------------------------------
/src/parallelism.md:
--------------------------------------------------------------------------------
 1 | # Parallelism
 2 | 
 3 | Rust provides excellent support for safe parallel programming, which can lead
 4 | to large performance improvements. There are a variety of ways to introduce
 5 | parallelism into a program and the best way for any program will depend greatly
 6 | on its design. 
 7 | 
 8 | Having said that, an in-depth treatment of parallelism is beyond the scope of
 9 | this book.
10 | 
11 | If you are interested in thread-based parallelism, the documentation for the
12 | [`rayon`] and [`crossbeam`] crates is a good place to start. [Rust Atomics and
13 | Locks][Atomics] is also an excellent resource.
14 | 
15 | [`rayon`]: https://crates.io/crates/rayon
16 | [`crossbeam`]: https://crates.io/crates/crossbeam
17 | [Atomics]: https://marabos.nl/atomics/
18 | 
19 | If you are interested in fine-grained data parallelism, this [blog post] is a
20 | good overview of the state of SIMD support in Rust as of November 2025.
21 | 
22 | [blog post]: https://shnatsel.medium.com/the-state-of-simd-in-rust-in-2025-32c263e5f53d
23 | 
24 | 


--------------------------------------------------------------------------------
/src/profiling.md:
--------------------------------------------------------------------------------
  1 | # Profiling
  2 | 
  3 | When optimizing a program, you also need a way to determine which parts of the
  4 | program are "hot" (executed frequently enough to affect runtime) and worth
  5 | modifying. This is best done via profiling.
  6 | 
  7 | ## Profilers
  8 | 
  9 | There are many different profilers available, each with their strengths and
 10 | weaknesses. The following is an incomplete list of profilers that have been
 11 | used successfully on Rust programs.
 12 | - [perf] is a general-purpose profiler that uses hardware performance counters.
 13 |   [Hotspot] and [Firefox Profiler] are good for viewing data recorded by perf.
 14 |   It works on Linux.
 15 | - [Instruments] is a general-purpose profiler that comes with Xcode on macOS.
 16 | - [Intel VTune Profiler] is a general-purpose profiler. It works on Windows,
 17 |   Linux, and macOS.
 18 | - [AMD μProf] is a general-purpose profiler. It works on Windows and Linux.
 19 | - [samply] is a sampling profiler that produces profiles that can be viewed
 20 |   in the Firefox Profiler. It works on Mac, Linux, and Windows.
 21 | - [flamegraph] is a Cargo command that uses perf/DTrace to profile your
 22 |   code and then displays the results in a flame graph. It works on Linux and
 23 |   all platforms that support DTrace (macOS, FreeBSD, NetBSD, and possibly
 24 |   Windows).
 25 | - [Cachegrind] & [Callgrind] give global, per-function, and per-source-line
 26 |   instruction counts and simulated cache and branch prediction data. They work
 27 |   on Linux and some other Unixes.
 28 | - [DHAT] is good for finding which parts of the code are causing a lot of
 29 |   allocations, and for giving insight into peak memory usage. It can also be
 30 |   used to identify hot calls to `memcpy`. It works on Linux and some other
 31 |   Unixes. [dhat-rs] is an experimental alternative that is a little less
 32 |   powerful and requires minor changes to your Rust program, but works on all
 33 |   platforms.
 34 | - [Iai-Callgrind] provides `cargo bench` integration for the [Valgrind]-based
 35 |   profilers: Cachegrind, Callgrind,
 36 |   DHAT, and others. It works on Linux and some other Unixes.
 37 | - [heaptrack] and [bytehound] are heap profiling tools. They work on Linux.
 38 | - [`counts`] supports ad hoc profiling, which combines the use of `eprintln!`
 39 |   statement with frequency-based post-processing, which is good for getting
 40 |   domain-specific insights into parts of your code. It works on all platforms.
 41 | - [Coz] performs *causal profiling* to measure optimization potential, and has
 42 |   Rust support via [coz-rs]. It works on Linux. 
 43 | 
 44 | [perf]: https://perf.wiki.kernel.org/index.php/Main_Page
 45 | [Hotspot]: https://github.com/KDAB/hotspot
 46 | [Firefox Profiler]: https://profiler.firefox.com/
 47 | [Instruments]: https://developer.apple.com/forums/tags/instruments
 48 | [Intel VTune Profiler]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html
 49 | [AMD μProf]: https://developer.amd.com/amd-uprof/
 50 | [samply]: https://github.com/mstange/samply/
 51 | [flamegraph]: https://github.com/flamegraph-rs/flamegraph
 52 | [Cachegrind]: https://www.valgrind.org/docs/manual/cg-manual.html
 53 | [Callgrind]: https://www.valgrind.org/docs/manual/cl-manual.html
 54 | [Iai-Callgrind]: https://github.com/iai-callgrind/iai-callgrind
 55 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html
 56 | [dhat-rs]: https://github.com/nnethercote/dhat-rs/
 57 | [Valgrind]: https://valgrind.org/
 58 | [heaptrack]: https://github.com/KDE/heaptrack
 59 | [bytehound]: https://github.com/koute/bytehound
 60 | [`counts`]: https://github.com/nnethercote/counts/
 61 | [Coz]: https://github.com/plasma-umass/coz
 62 | [coz-rs]: https://github.com/plasma-umass/coz/tree/master/rust
 63 | 
 64 | ## Debug Info
 65 | 
 66 | To profile a release build effectively you might need to enable source line
 67 | debug info. To do this, add the following lines to your `Cargo.toml` file:
 68 | ```toml
 69 | [profile.release]
 70 | debug = "line-tables-only"
 71 | ```
 72 | See the [Cargo documentation] for more details about the `debug` setting.
 73 | 
 74 | [Cargo documentation]: https://doc.rust-lang.org/cargo/reference/profiles.html#debug
 75 | 
 76 | Unfortunately, even after doing the above step you won't get detailed profiling
 77 | information for standard library code. This is because shipped versions of the
 78 | Rust standard library are not built with debug info.
 79 | 
 80 | The most reliable way around this is to build your own version of the compiler
 81 | and standard library, following [these instructions], and adding the following
 82 | lines to a `bootstrap.toml` file in the repository root:
 83 |  ```toml
 84 | [rust]
 85 | debuginfo-level = 1
 86 | ```
 87 | This is a hassle, but may be worth the effort in some cases.
 88 | 
 89 | [these instructions]: https://github.com/rust-lang/rust
 90 | 
 91 | Alternatively, the unstable [build-std] feature lets you compile the standard
 92 | library as part of your program's normal compilation, with the same build
 93 | configuration. However, filenames present in the debug info for the standard
 94 | library will not point to source code files, because this feature does not also
 95 | download standard library source code. So this approach will not help with
 96 | profilers such as Cachegrind and samply that require source code to work fully.
 97 | 
 98 | [build-std]: https://doc.rust-lang.org/cargo/reference/unstable.html#build-std
 99 | 
100 | ## Frame pointers
101 | 
102 | The Rust compiler may optimize away frame pointers, which can hurt the quality
103 | of profiling information such as stack traces. To force the compiler to use
104 | frame pointers, use the `-C force-frame-pointers=yes` flag. For example:
105 | ```bash
106 | RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release
107 | ```
108 | 
109 | Alternatively, to force the use frame pointers from a [`config.toml`] file (for
110 | one or more projects), add these lines:
111 | ```toml
112 | [build]
113 | rustflags = ["-C", "force-frame-pointers=yes"]
114 | ```
115 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
116 | 
117 | ## Symbol Demangling
118 | 
119 | Rust uses a form of name mangling to encode function names in compiled code. If
120 | a profiler is unaware of this, its output may contain symbol names beginning
121 | with `_ZN` or `_R`, such as `_ZN3foo3barE` or
122 | `_ZN28_$u7b$$u7b$closure$u7d$$u7d$E` or
123 | `_RMCsno73SFvQKx_1cINtB0_3StrKRe616263_E`
124 | 
125 | Names like these can be manually demangled using [`rustfilt`].
126 | 
127 | [`rustfilt`]: https://crates.io/crates/rustfilt
128 | 
129 | If you are having trouble with symbol demangling while profiling, it may be
130 | worth changing the [mangling format] from the default legacy format to the newer
131 | v0 format.
132 | 
133 | [mangling format]: https://doc.rust-lang.org/rustc/codegen-options/index.html#symbol-mangling-version
134 | 
135 | To use the v0 format from the command line, use the `-C
136 | symbol-mangling-version=v0` flag. For example:
137 | ```bash
138 | RUSTFLAGS="-C symbol-mangling-version=v0" cargo build --release
139 | ```
140 | 
141 | Alternatively, to request these instructions from a [`config.toml`] file (for
142 | one or more projects), add these lines:
143 | ```toml
144 | [build]
145 | rustflags = ["-C", "symbol-mangling-version=v0"]
146 | ```
147 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html
148 | 
149 | 


--------------------------------------------------------------------------------
/src/standard-library-types.md:
--------------------------------------------------------------------------------
  1 | # Standard Library Types
  2 | 
  3 | It is worth reading through the documentation for common standard library
  4 | types—such as [`Vec`], [`Option`], [`Result`], and [`Rc`]/[`Arc`]—to find interesting
  5 | functions that can sometimes be used to improve performance.
  6 | 
  7 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html
  8 | [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html
  9 | [`Result`]: https://doc.rust-lang.org/std/result/enum.Result.html
 10 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html
 11 | [`Arc`]: https://doc.rust-lang.org/std/sync/struct.Arc.html
 12 | 
 13 | It is also worth knowing about high-performance alternatives to standard
 14 | library types, such as [`Mutex`], [`RwLock`], [`Condvar`], and
 15 | [`Once`].
 16 | 
 17 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html
 18 | [`RwLock`]: https://doc.rust-lang.org/std/sync/struct.RwLock.html
 19 | [`Condvar`]: https://doc.rust-lang.org/std/sync/struct.Condvar.html
 20 | [`Once`]: https://doc.rust-lang.org/std/sync/struct.Once.html
 21 | 
 22 | ## `Vec`
 23 | 
 24 | The best way to create a zero-filled `Vec` of length `n` is with `vec![0; n]`.
 25 | This is simple and probably [as fast or faster] than alternatives, such as
 26 | using `resize`, `extend`, or anything involving `unsafe`, because it can use OS
 27 | assistance.
 28 | 
 29 | [as fast or faster]: https://github.com/rust-lang/rust/issues/54628
 30 | 
 31 | [`Vec::remove`] removes an element at a particular index and shifts all
 32 | subsequent elements one to the left, which makes it O(n). [`Vec::swap_remove`]
 33 | replaces an element at a particular index with the final element, which does
 34 | not preserve ordering, but is O(1).
 35 | 
 36 | [`Vec::retain`] efficiently removes multiple items from a `Vec`. There is an
 37 | equivalent method for other collection types such as `String`, `HashSet`, and
 38 | `HashMap`.
 39 | 
 40 | [`Vec::remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.remove
 41 | [`Vec::swap_remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.swap_remove
 42 | [`Vec::retain`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain
 43 | 
 44 | ## `Option` and `Result`
 45 | 
 46 | [`Option::ok_or`] converts an `Option` into a `Result`, and is passed an `err`
 47 | parameter that is used if the `Option` value is `None`. `err` is computed
 48 | eagerly. If its computation is expensive, you should instead use
 49 | [`Option::ok_or_else`], which computes the error value lazily via a closure.
 50 | For example, this:
 51 | ```rust
 52 | # fn expensive() {}
 53 | # let o: Option<u32> = None;
 54 | let r = o.ok_or(expensive()); // always evaluates `expensive()`
 55 | ```
 56 | should be changed to this:
 57 | ```rust
 58 | # fn expensive() {}
 59 | # let o: Option<u32> = None;
 60 | let r = o.ok_or_else(|| expensive()); // evaluates `expensive()` only when needed
 61 | ```
 62 | [**Example**](https://github.com/rust-lang/rust/pull/50051/commits/5070dea2366104fb0b5c344ce7f2a5cf8af176b0).
 63 | 
 64 | [`Option::ok_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or
 65 | [`Option::ok_or_else`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or_else
 66 | 
 67 | There are similar alternatives for [`Option::map_or`], [`Option::unwrap_or`],
 68 | [`Result::or`], [`Result::map_or`], and [`Result::unwrap_or`].
 69 | 
 70 | [`Option::map_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or
 71 | [`Option::unwrap_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.unwrap_or
 72 | [`Result::or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.or
 73 | [`Result::map_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_or
 74 | [`Result::unwrap_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or
 75 | 
 76 | ## `Rc`/`Arc`
 77 | 
 78 | [`Rc::make_mut`]/[`Arc::make_mut`] provide clone-on-write semantics. They make
 79 | a mutable reference to an `Rc`/`Arc`. If the refcount is greater than one, they
 80 | will `clone` the inner value to ensure unique ownership; otherwise, they will
 81 | modify the original value. They are not needed often, but they can be extremely
 82 | useful on occasion.
 83 | [**Example 1**](https://github.com/rust-lang/rust/pull/65198/commits/3832a634d3aa6a7c60448906e6656a22f7e35628),
 84 | [**Example 2**](https://github.com/rust-lang/rust/pull/65198/commits/75e0078a1703448a19e25eac85daaa5a4e6e68ac).
 85 | 
 86 | [`Rc::make_mut`]: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.make_mut
 87 | [`Arc::make_mut`]: https://doc.rust-lang.org/std/sync/struct.Arc.html#method.make_mut
 88 | 
 89 | ## `Mutex`, `RwLock`, `Condvar`, and `Once`
 90 | 
 91 | The [`parking_lot`] crate provides alternative implementations of these
 92 | synchronization types. The APIs and semantics of the `parking_lot` types are
 93 | similar but not identical to those of the equivalent types in the standard
 94 | library.
 95 | 
 96 | The `parking_lot` versions used to be reliably smaller, faster, and more
 97 | flexible than those in the standard library, but the standard library versions
 98 | have greatly improved on some platforms. So you should measure before switching
 99 | to `parking_lot`. 
100 | 
101 | [`parking_lot`]: https://crates.io/crates/parking_lot
102 | 
103 | If you decide to universally use the `parking_lot` types it is easy to
104 | accidentally use the standard library equivalents in some places. You can [use
105 | Clippy] to avoid this problem.
106 | 
107 | [use Clippy]: linting.md#disallowing-types
108 | 


--------------------------------------------------------------------------------
/src/title-page.md:
--------------------------------------------------------------------------------
1 | # <span style="font-size: 150%">The Rust Performance Book</span>
2 | 
3 | **<span style="font-size: 130%">First published in November 2020</span>**
4 | 
5 | **<span style="font-size: 130%">Written by Nicholas Nethercote and others</span>**
6 | 
7 | [Source code](https://github.com/nnethercote/perf-book)
8 | 


--------------------------------------------------------------------------------
/src/type-sizes.md:
--------------------------------------------------------------------------------
  1 | # Type Sizes
  2 | 
  3 | Shrinking oft-instantiated types can help performance.
  4 | 
  5 | For example, if memory usage is high, a heap profiler like [DHAT] can identify
  6 | the hot allocation points and the types involved. Shrinking these types can
  7 | reduce peak memory usage, and possibly improve performance by reducing memory
  8 | traffic and cache pressure.
  9 | 
 10 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html
 11 | 
 12 | Furthermore, Rust types that are larger than 128 bytes are copied with `memcpy`
 13 | rather than inline code. If `memcpy` shows up in non-trivial amounts in
 14 | profiles, DHAT's "copy profiling" mode will tell you exactly where the hot
 15 | `memcpy` calls are and the types involved. Shrinking these types to 128 bytes
 16 | or less can make the code faster by avoiding `memcpy` calls and reducing memory
 17 | traffic.
 18 | 
 19 | ## Measuring Type Sizes
 20 | 
 21 | [`std::mem::size_of`] gives the size of a type, in bytes, but often you want to
 22 | know the exact layout as well. For example, an enum might be surprisingly large
 23 | due to a single outsized variant.
 24 | 
 25 | [`std::mem::size_of`]: https://doc.rust-lang.org/std/mem/fn.size_of.html
 26 | 
 27 | The `-Zprint-type-sizes` option does exactly this. It isn’t enabled on release
 28 | versions of rustc, so you’ll need to use a nightly version of rustc. Here is
 29 | one possible invocation via Cargo:
 30 | ```text
 31 | RUSTFLAGS=-Zprint-type-sizes cargo +nightly build --release
 32 | ```
 33 | And here is a possible invocation of rustc:
 34 | ```text
 35 | rustc +nightly -Zprint-type-sizes input.rs
 36 | ```
 37 | It will print out details of the size, layout, and alignment of all types in
 38 | use. For example, for this type:
 39 | ```rust
 40 | enum E {
 41 |     A,
 42 |     B(i32),
 43 |     C(u64, u8, u64, u8),
 44 |     D(Vec<u32>),
 45 | }
 46 | ```
 47 | it prints the following, plus information about a few built-in types.
 48 | ```text
 49 | print-type-size type: `E`: 32 bytes, alignment: 8 bytes
 50 | print-type-size     discriminant: 1 bytes
 51 | print-type-size     variant `D`: 31 bytes
 52 | print-type-size         padding: 7 bytes
 53 | print-type-size         field `.0`: 24 bytes, alignment: 8 bytes
 54 | print-type-size     variant `C`: 23 bytes
 55 | print-type-size         field `.1`: 1 bytes
 56 | print-type-size         field `.3`: 1 bytes
 57 | print-type-size         padding: 5 bytes
 58 | print-type-size         field `.0`: 8 bytes, alignment: 8 bytes
 59 | print-type-size         field `.2`: 8 bytes
 60 | print-type-size     variant `B`: 7 bytes
 61 | print-type-size         padding: 3 bytes
 62 | print-type-size         field `.0`: 4 bytes, alignment: 4 bytes
 63 | print-type-size     variant `A`: 0 bytes
 64 | ```
 65 | The output shows the following.
 66 | - The size and alignment of the type.
 67 | - For enums, the size of the discriminant.
 68 | - For enums, the size of each variant (sorted from largest to smallest).
 69 | - The size, alignment, and ordering of all fields. (Note that the compiler has
 70 |   reordered variant `C`'s fields to minimize the size of `E`.)
 71 | - The size and location of all padding.
 72 | 
 73 | Alternatively, the [top-type-sizes] crate can be used to display the output in
 74 | a more compact form.
 75 | 
 76 | [top-type-sizes]: https://crates.io/crates/top-type-sizes
 77 | 
 78 | Once you know the layout of a hot type, there are multiple ways to shrink it.
 79 | 
 80 | ## Field Ordering
 81 | 
 82 | The Rust compiler automatically sorts the fields in struct and enums to
 83 | minimize their sizes (unless the `#[repr(C)]` attribute is specified), so you
 84 | do not have to worry about field ordering. But there are other ways to minimize
 85 | the size of hot types.
 86 | 
 87 | ## Smaller Enums
 88 | 
 89 | If an enum has an outsized variant, consider boxing one or more fields. For
 90 | example, you could change this type:
 91 | ```rust
 92 | type LargeType = [u8; 100];
 93 | enum A {
 94 |     X,
 95 |     Y(i32),
 96 |     Z(i32, LargeType),
 97 | }
 98 | ```
 99 | to this:
100 | ```rust
101 | # type LargeType = [u8; 100];
102 | enum A {
103 |     X,
104 |     Y(i32),
105 |     Z(Box<(i32, LargeType)>),
106 | }
107 | ```
108 | This reduces the type size at the cost of requiring an extra heap allocation
109 | for the `A::Z` variant. This is more likely to be a net performance win if the
110 | `A::Z` variant is relatively rare. The `Box` will also make `A::Z` slightly
111 | less ergonomic to use, especially in `match` patterns.
112 | [**Example 1**](https://github.com/rust-lang/rust/pull/37445/commits/a920e355ea837a950b484b5791051337cd371f5d),
113 | [**Example 2**](https://github.com/rust-lang/rust/pull/55346/commits/38d9277a77e982e49df07725b62b21c423b6428e),
114 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/b972ac818c98373b6d045956b049dc34932c41be),
115 | [**Example 4**](https://github.com/rust-lang/rust/pull/64374/commits/2fcd870711ce267c79408ec631f7eba8e0afcdf6),
116 | [**Example 5**](https://github.com/rust-lang/rust/pull/64394/commits/7f0637da5144c7435e88ea3805021882f077d50c),
117 | [**Example 6**](https://github.com/rust-lang/rust/pull/71942/commits/27ae2f0d60d9201133e1f9ec7a04c05c8e55e665).
118 | 
119 | ## Smaller Integers
120 | 
121 | It is often possible to shrink types by using smaller integer types. For
122 | example, while it is most natural to use `usize` for indices, it is often
123 | reasonable to stores indices as `u32`, `u16`, or even `u8`, and then coerce to
124 | `usize` at use points.
125 | [**Example 1**](https://github.com/rust-lang/rust/pull/49993/commits/4d34bfd00a57f8a8bdb60ec3f908c5d4256f8a9a),
126 | [**Example 2**](https://github.com/rust-lang/rust/pull/50981/commits/8d0fad5d3832c6c1f14542ea0be038274e454524).
127 | 
128 | ## Boxed Slices
129 | 
130 | Rust vectors contain three words: a length, a capacity, and a pointer. If you
131 | have a vector that is unlikely to be changed in the future, you can convert it
132 | to a *boxed slice* with [`Vec::into_boxed_slice`]. A boxed slice contains only
133 | two words, a length and a pointer. Any excess element capacity is dropped,
134 | which may cause a reallocation.
135 | ```rust
136 | # use std::mem::{size_of, size_of_val};
137 | let v: Vec<u32> = vec![1, 2, 3];
138 | assert_eq!(size_of_val(&v), 3 * size_of::<usize>());
139 | 
140 | let bs: Box<[u32]> = v.into_boxed_slice();
141 | assert_eq!(size_of_val(&bs), 2 * size_of::<usize>());
142 | ```
143 | Alternatively, a boxed slice can be constructed directly from an iterator with
144 | [`Iterator::collect`], avoiding the need for reallocation.
145 | ```rust
146 | let bs: Box<[u32]> = (1..3).collect();
147 | ```
148 | A boxed slice can be converted to a vector with [`slice::into_vec`] without any
149 | cloning or reallocation.
150 | 
151 | [`Vec::into_boxed_slice`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.into_boxed_slice
152 | [`Iterator::collect`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect
153 | [`slice::into_vec`]: https://doc.rust-lang.org/std/primitive.slice.html#method.into_vec
154 | 
155 | ## `ThinVec`
156 | 
157 | An alternative to boxed slices is `ThinVec`, from the [`thin_vec`] crate. It is
158 | functionally equivalent to `Vec`, but stores the length and capacity in the
159 | same allocation as the elements (if there are any). This means that
160 | `size_of::<ThinVec<T>>` is only one word.
161 | 
162 | `ThinVec` is a good choice within oft-instantiated types for vectors that are
163 | often empty. It can also be used to shrink the largest variant of an enum, if
164 | that variant contains a `Vec`.
165 | 
166 | [`thin_vec`]: https://crates.io/crates/thin-vec
167 | 
168 | ## Avoiding Regressions
169 | 
170 | If a type is hot enough that its size can affect performance, it is a good idea
171 | to use a static assertion to ensure that it does not accidentally regress. The
172 | following example uses a macro from the [`static_assertions`] crate.
173 | ```rust,ignore
174 |   // This type is used a lot. Make sure it doesn't unintentionally get bigger.
175 |   #[cfg(target_arch = "x86_64")]
176 |   static_assertions::assert_eq_size!(HotType, [u8; 64]);
177 | ```
178 | The `cfg` attribute is important, because type sizes can vary on different
179 | platforms. Restricting the assertion to `x86_64` (which is typically the most
180 | widely-used platform) is likely to be good enough to prevent regressions in
181 | practice.
182 | 
183 | [`static_assertions`]: https://crates.io/crates/static_assertions
184 | 


--------------------------------------------------------------------------------
/src/wrapper-types.md:
--------------------------------------------------------------------------------
 1 | # Wrapper Types
 2 | 
 3 | Rust has a variety of "wrapper" types, such as [`RefCell`] and [`Mutex`], that
 4 | provide special behavior for values. Accessing these values can take a
 5 | non-trivial amount of time. If multiple such values are typically accessed
 6 | together, it may be better to put them within a single wrapper.
 7 | 
 8 | [`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html
 9 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html
10 | 
11 | For example, a struct like this:
12 | ```rust
13 | # use std::sync::{Arc, Mutex};
14 | struct S {
15 |     x: Arc<Mutex<u32>>,
16 |     y: Arc<Mutex<u32>>,
17 | }
18 | ```
19 | may be better represented like this:
20 | ```rust
21 | # use std::sync::{Arc, Mutex};
22 | struct S {
23 |     xy: Arc<Mutex<(u32, u32)>>,
24 | }
25 | ```
26 | Whether or not this helps performance will depend on the exact access patterns
27 | of the values.
28 | [**Example**](https://github.com/rust-lang/rust/pull/68694/commits/7426853ba255940b880f2e7f8026d60b94b42404).
29 | 


--------------------------------------------------------------------------------