├── .editorconfig ├── .github └── workflows │ └── ci.yml ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE-APACHE ├── LICENSE-MIT ├── README.md ├── book.toml └── src ├── SUMMARY.md ├── benchmarking.md ├── bounds-checks.md ├── build-configuration.md ├── compile-times.md ├── general-tips.md ├── hashing.md ├── heap-allocations.md ├── inlining.md ├── introduction.md ├── io.md ├── iterators.md ├── linting.md ├── logging-and-debugging.md ├── machine-code.md ├── parallelism.md ├── profiling.md ├── standard-library-types.md ├── title-page.md ├── type-sizes.md └── wrapper-types.md /.editorconfig: -------------------------------------------------------------------------------- 1 | [*.md] 2 | max_line_length = 79 3 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | 3 | on: 4 | pull_request: 5 | push: 6 | branches: 7 | - master 8 | 9 | jobs: 10 | test_and_maybe_deploy: 11 | runs-on: ubuntu-latest 12 | steps: 13 | - name: Clone repository 14 | uses: actions/checkout@v4 15 | 16 | - name: Setup mdbook 17 | uses: peaceiris/actions-mdbook@v2 18 | with: 19 | mdbook-version: "latest" 20 | 21 | # EPUB 22 | # Currently disabled due to 23 | # https://github.com/nnethercote/perf-book/actions/runs/6358429874/job/17270643057 24 | #- name: Setup mdbook-epub 25 | # run: cargo install mdbook-epub 26 | 27 | - name: Build 28 | run: mdbook build 29 | 30 | - name: Test 31 | run: mdbook test 32 | 33 | # EPUB 34 | #- name: Copy ePub 35 | # run: cp book/epub/The\ Rust\ Performance\ Book.epub book/html 36 | 37 | - name: Deploy 38 | uses: peaceiris/actions-gh-pages@v4 39 | with: 40 | github_token: ${{ secrets.GITHUB_TOKEN }} 41 | #publish_dir: ./book/html # use if EPUB is enabled 42 | publish_dir: ./book # use if EPUB is disabled 43 | # Only deploy on a push to master, not on a pull request. 44 | if: github.event_name == 'push' && github.ref == 'refs/heads/master' && github.repository == 'nnethercote/perf-book' 45 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | book 2 | 3 | # Prevent Vim swap files from making `mdbook serve` regenerate HTML frequently. 4 | *.sw* 5 | 6 | # Also `diff` files, which I generate a lot. 7 | diff 8 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # The Rust Performance Book Code of Conduct 2 | 3 | This repository uses the [Rust Code of Conduct]. 4 | 5 | [Rust Code of Conduct]: https://www.rust-lang.org/conduct.html 6 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Style Guide for The Rust Performance Book 2 | 3 | These style guidelines are used for the book. 4 | 5 | ## Line Lengths 6 | 7 | Lines of text are limited to 79 characters. (There is a `.editorconfig` file 8 | that specifies this.) Lines containing non-text elements, such as links, can be 9 | longer. 10 | 11 | ## Examples 12 | 13 | Links to examples that demonstrate performance techniques on real-world 14 | programs are encouraged. These examples might be pull requests, blog posts, 15 | etc. 16 | 17 | Single examples are written like this: 18 | ```markdown 19 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe). 20 | ``` 21 | 22 | Multiple examples are written like this: 23 | ```markdown 24 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323), 25 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7). 26 | ``` 27 | 28 | ## Title Style 29 | 30 | Section titles are capitalized, which means that all words within the title are 31 | capitalized, other than "small" words such as conjunctions. For example, "Using 32 | an Alternative Allocator", rather than "Using an alternative allocator". 33 | 34 | ## External Link Style 35 | 36 | For external links—those that point outside the book—reference links are 37 | preferred to inline links. For example, this: 38 | ```markdown 39 | The book's title is [The Rust Performance Book]. 40 | 41 | [The Rust Performance Book]: https://nnethercote.github.io/perf-book/ 42 | ``` 43 | is preferred to this: 44 | ```markdown 45 | The book's title is [The Rust Performance Book](https://nnethercote.github.io/perf-book/). 46 | ``` 47 | The reason for this preference is that external links are usually relatively 48 | long, and long inline links often break awkwardly across lines. 49 | 50 | One exception to this rule is that **Example** links are inline, with each one 51 | put on its own line, as seen above. 52 | -------------------------------------------------------------------------------- /LICENSE-APACHE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | -------------------------------------------------------------------------------- /LICENSE-MIT: -------------------------------------------------------------------------------- 1 | Permission is hereby granted, free of charge, to any 2 | person obtaining a copy of this software and associated 3 | documentation files (the "Software"), to deal in the 4 | Software without restriction, including without 5 | limitation the rights to use, copy, modify, merge, 6 | publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software 8 | is furnished to do so, subject to the following 9 | conditions: 10 | 11 | The above copyright notice and this permission notice 12 | shall be included in all copies or substantial portions 13 | of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF 16 | ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED 17 | TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 18 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 19 | SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 20 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR 22 | IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 23 | DEALINGS IN THE SOFTWARE. 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # perf-book 2 | 3 | The Rust Performance Book. 4 | 5 | ## Viewing 6 | 7 | The rendered (HTML) book is [here](https://nnethercote.github.io/perf-book/). 8 | 9 | 20 | 21 | ## Building 22 | 23 | The book is built with [`mdbook`](https://github.com/rust-lang/mdBook), which 24 | can be installed with this command: 25 | ``` 26 | cargo install mdbook 27 | ``` 28 | To build the book, run this command: 29 | ``` 30 | mdbook build 31 | ``` 32 | The generated files are put in the `book/` directory. 33 | 34 | ## Development 35 | 36 | To view the built book, run this command: 37 | ``` 38 | mdbook serve 39 | ``` 40 | This will launch a local web server to serve the book. View the built book by 41 | navigating to `localhost:3000` in a web browser. While the web server is 42 | running, the rendered book will automatically update if the book's files 43 | change. 44 | 45 | To test the code within the book, run this command: 46 | ``` 47 | mdbook test 48 | ``` 49 | 50 | ## Improvements 51 | 52 | Suggestions for improvements are welcome, but I prefer them to be filed as 53 | issues rather than pull requests. This is because I am very particular about 54 | the wording used in the book. When pull requests are made, I typically take the 55 | underlying idea of a pull request and rewrite it into my own words anyway. 56 | 57 | This book contains no material produced by generative AI, and none will be 58 | accepted. 59 | 60 | ## License 61 | 62 | Licensed under either of 63 | * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or 64 | http://www.apache.org/licenses/LICENSE-2.0) 65 | * MIT license ([LICENSE-MIT](LICENSE-MIT) or 66 | http://opensource.org/licenses/MIT) 67 | 68 | at your option. 69 | 70 | ## Contribution 71 | 72 | Unless you explicitly state otherwise, any contribution intentionally submitted 73 | for inclusion in the work by you, as defined in the Apache-2.0 license, shall 74 | be dual licensed as above, without any additional terms or conditions. 75 | -------------------------------------------------------------------------------- /book.toml: -------------------------------------------------------------------------------- 1 | [book] 2 | title = "The Rust Performance Book" 3 | authors = ["Nicholas Nethercote"] 4 | src = "src" 5 | language = "en" 6 | multilingual = false 7 | 8 | [build] 9 | create-missing = false 10 | 11 | [rust] 12 | edition = "2018" 13 | 14 | [output.html] 15 | curly-quotes = true 16 | default-theme = "rust" 17 | git-repository-url = "https://github.com/nnethercote/perf-book" 18 | edit-url-template = "https://github.com/nnethercote/perf-book/edit/master/{path}" 19 | site-url = "https://nnethercote.github.io/perf-book/" 20 | 21 | # EPUB 22 | # Currently disabled due to 23 | # https://github.com/nnethercote/perf-book/actions/runs/6358429874/job/17270643057 24 | #[output.epub] 25 | #optional = true # So epub generation is skipped if mdbook-epub isn't installed. 26 | 27 | -------------------------------------------------------------------------------- /src/SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | [Title Page](title-page.md) 4 | 5 | - [Introduction](introduction.md) 6 | - [Benchmarking](benchmarking.md) 7 | - [Build Configuration](build-configuration.md) 8 | - [Linting](linting.md) 9 | - [Profiling](profiling.md) 10 | - [Inlining](inlining.md) 11 | - [Hashing](hashing.md) 12 | - [Heap Allocations](heap-allocations.md) 13 | - [Type Sizes](type-sizes.md) 14 | - [Standard Library Types](standard-library-types.md) 15 | - [Iterators](iterators.md) 16 | - [Bounds Checks](bounds-checks.md) 17 | - [I/O](io.md) 18 | - [Logging and Debugging](logging-and-debugging.md) 19 | - [Wrapper Types](wrapper-types.md) 20 | - [Machine Code](machine-code.md) 21 | - [Parallelism](parallelism.md) 22 | - [General Tips](general-tips.md) 23 | - [Compile Times](compile-times.md) 24 | 25 | -------------------------------------------------------------------------------- /src/benchmarking.md: -------------------------------------------------------------------------------- 1 | # Benchmarking 2 | 3 | Benchmarking typically involves comparing the performance of two or more 4 | programs that do the same thing. Sometimes this might involve comparing two or 5 | more different programs, e.g. Firefox vs Safari vs Chrome. Sometimes it 6 | involves comparing two different versions of the same program. This latter case 7 | lets us reliably answer the question "did this change speed things up?" 8 | 9 | Benchmarking is a complex topic and a thorough coverage is beyond the scope of 10 | this book, but here are the basics. 11 | 12 | First, you need workloads to measure. Ideally, you would have a variety of 13 | workloads that represent realistic usage of your program. Workloads using 14 | real-world inputs are best, but [microbenchmarks] and [stress tests] can be 15 | useful in moderation. 16 | 17 | [microbenchmarks]: https://stackoverflow.com/questions/2842695/what-is-microbenchmarking 18 | [stress tests]: https://en.wikipedia.org/wiki/Stress_testing_(software) 19 | 20 | Second, you need a way to run the workloads, which will also dictate the 21 | metrics used. 22 | - Rust's built-in [benchmark tests] are a simple starting point, but they use 23 | unstable features and therefore only work on nightly Rust. 24 | - [Criterion] and [Divan] are more sophisticated alternatives. 25 | - [Hyperfine] is an excellent general-purpose benchmarking tool. 26 | - [Bencher] can do continuous benchmarking on CI, including GitHub CI. 27 | - Custom benchmarking harnesses are also possible. For example, [rustc-perf] is 28 | the harness used to benchmark the Rust compiler. 29 | 30 | [benchmark tests]: https://doc.rust-lang.org/nightly/unstable-book/library-features/test.html 31 | [Criterion]: https://github.com/bheisler/criterion.rs 32 | [Divan]: https://github.com/nvzqz/divan 33 | [Hyperfine]: https://github.com/sharkdp/hyperfine 34 | [Bencher]: https://github.com/bencherdev/bencher 35 | [rustc-perf]: https://github.com/rust-lang/rustc-perf/ 36 | 37 | When it comes to metrics, there are many choices, and the right one(s) will 38 | depend on the nature of the program being benchmarked. For example, metrics 39 | that make sense for a batch program might not make sense for an interactive 40 | program. Wall-time is an obvious choice in many cases because it corresponds to 41 | what users perceive. However, it can suffer from high variance. In particular, 42 | tiny changes in memory layout can cause significant but ephemeral performance 43 | fluctuations. Therefore, other metrics with lower variance (such as cycles or 44 | instruction counts) may be a reasonable alternative. 45 | 46 | Summarizing measurements from multiple workloads is also a challenge, and there 47 | are a variety of ways to do it, with no single method being obviously best. 48 | 49 | Good benchmarking is hard. Having said that, do not stress too much about 50 | having a perfect benchmarking setup, particularly when you start optimizing a 51 | program. Mediocre benchmarking is far better than no benchmarking. Keep an open 52 | mind about what you are measuring, and over time you can make benchmarking 53 | improvements as you learn about the performance characteristics of your 54 | program. 55 | -------------------------------------------------------------------------------- /src/bounds-checks.md: -------------------------------------------------------------------------------- 1 | # Bounds Checks 2 | 3 | By default, accesses to container types such as slices and vectors involve 4 | bounds checks in Rust. These can affect performance, e.g. within hot loops, 5 | though less often than you might expect. 6 | 7 | There are several safe ways to change code so that the compiler knows about 8 | container lengths and can optimize away bounds checks. 9 | 10 | - Replace direct element accesses in a loop by using iteration. 11 | - Instead of indexing into a `Vec` within a loop, make a slice of the `Vec` 12 | before the loop and then index into the slice within the loop. 13 | - Add assertions on the ranges of index variables. 14 | [**Example 1**](https://github.com/rust-random/rand/pull/960/commits/de9dfdd86851032d942eb583d8d438e06085867b), 15 | [**Example 2**](https://github.com/image-rs/jpeg-decoder/pull/167/files). 16 | 17 | Getting these to work can be tricky. The [Bounds Check Cookbook] goes into more 18 | detail on this topic. 19 | 20 | [Bounds Check Cookbook]: https://github.com/Shnatsel/bounds-check-cookbook/ 21 | 22 | As a last resort, there are the unsafe methods [`get_unchecked`] and 23 | [`get_unchecked_mut`]. 24 | 25 | [`get_unchecked`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked 26 | [`get_unchecked_mut`]: https://doc.rust-lang.org/std/primitive.slice.html#method.get_unchecked_mut 27 | 28 | -------------------------------------------------------------------------------- /src/build-configuration.md: -------------------------------------------------------------------------------- 1 | # Build Configuration 2 | 3 | You can drastically change the performance of a Rust program without changing 4 | its code, just by changing its build configuration. There are many possible 5 | build configurations for each Rust program. The one chosen will affect several 6 | characteristics of the compiled code, such as compile times, runtime speed, 7 | memory use, binary size, debuggability, profilability, and which architectures 8 | your compiled program will run on. 9 | 10 | Most configuration choices will improve one or more characteristics while 11 | worsening one or more others. For example, a common trade-off is to accept 12 | worse compile times in exchange for higher runtime speeds. The right choice 13 | for your program depends on your needs and the specifics of your program, and 14 | performance-related choices (which is most of them) should be validated with 15 | benchmarking. 16 | 17 | It is worth reading this chapter carefully to understand all the build 18 | configuration choices. However, for the impatient or forgetful, 19 | [`cargo-wizard`] encapsulates this information and can help you choose an 20 | appropriate build configuration. 21 | 22 | Note that Cargo only looks at the profile settings in the `Cargo.toml` file at 23 | the root of the workspace. Profile settings defined in dependencies are 24 | ignored. Therefore, these options are mostly relevant for binary crates, not 25 | library crates. 26 | 27 | [`cargo-wizard`]: https://github.com/Kobzol/cargo-wizard 28 | 29 | ## Release Builds 30 | 31 | The single most important build configuration choice is simple but [easy to 32 | overlook]: make sure you are using a [release build] rather than a [dev build] 33 | when you want high performance. This is usually done by specifying the 34 | `--release` flag to Cargo. 35 | 36 | [easy to overlook]: https://users.rust-lang.org/t/why-my-rust-program-is-so-slow/47764/5 37 | [release build]: https://doc.rust-lang.org/cargo/reference/profiles.html#release 38 | [dev build]: https://doc.rust-lang.org/cargo/reference/profiles.html#dev 39 | 40 | Dev builds are the default. They are good for debugging, but are not optimized. 41 | They are produced if you run `cargo build` or `cargo run`. (Alternatively, 42 | running `rustc` without additional options also produces an unoptimized build.) 43 | 44 | Consider the following final line of output from a `cargo build` run. 45 | ```text 46 | Finished dev [unoptimized + debuginfo] target(s) in 29.80s 47 | ``` 48 | This output indicates that a dev build has been produced. The compiled code 49 | will be placed in the `target/debug/` directory. `cargo run` will run the dev 50 | build. 51 | 52 | In comparison, release builds are much more optimized, omit debug assertions 53 | and integer overflow checks, and omit debug info. 10-100x speedups over dev 54 | builds are common! They are produced if you run `cargo build --release` or 55 | `cargo run --release`. (Alternatively, `rustc` has multiple options for 56 | optimized builds, such as `-O` and `-C opt-level`.) This will typically take 57 | longer than a dev build because of the additional optimizations. 58 | 59 | Consider the following final line of output from a `cargo build --release` run. 60 | ```text 61 | Finished release [optimized] target(s) in 1m 01s 62 | ``` 63 | This output indicates that a release build has been produced. The compiled code 64 | will be placed in the `target/release/` directory. `cargo run --release` will 65 | run the release build. 66 | 67 | See the [Cargo profile documentation] for more details about the differences 68 | between dev builds (which use the `dev` profile) and release builds (which use 69 | the `release` profile). 70 | 71 | [Cargo profile documentation]: https://doc.rust-lang.org/cargo/reference/profiles.html 72 | 73 | The default build configuration choices used in release builds provide a good 74 | balance between the abovementioned characteristics such as compile times, runtime 75 | speed, and binary size. But there are many possible adjustments, as the 76 | following sections explain. 77 | 78 | ## Maximizing Runtime Speed 79 | 80 | The following build configuration options are designed primarily to maximize 81 | runtime speed. Some of them may also reduce binary size. 82 | 83 | ### Codegen Units 84 | 85 | The Rust compiler splits crates into multiple [codegen units] to parallelize 86 | (and thus speed up) compilation. However, this might cause it to miss some 87 | potential optimizations. You may be able to improve runtime speed and reduce 88 | binary size, at the cost of increased compile times, by setting the number of 89 | units to one. Add these lines to the `Cargo.toml` file: 90 | ```toml 91 | [profile.release] 92 | codegen-units = 1 93 | ``` 94 | 96 | [**Example 1**](http://likebike.com/posts/How_To_Write_Fast_Rust_Code.html#emit-asm), 97 | [**Example 2**](https://github.com/rust-lang/rust/pull/115554#issuecomment-1742192440). 98 | 99 | [codegen units]: https://doc.rust-lang.org/cargo/reference/profiles.html#codegen-units 100 | 101 | ### Link-time Optimization 102 | 103 | [Link-time optimization] (LTO) is a whole-program optimization technique that 104 | can improve runtime speed by 10-20% or more, and also reduce binary size, at 105 | the cost of worse compile times. It comes in several forms. 106 | 107 | [Link-time optimization]: https://doc.rust-lang.org/cargo/reference/profiles.html#lto 108 | 109 | The first form of LTO is *thin local LTO*, a lightweight form of LTO. By 110 | default the compiler uses this for any build that involves a non-zero level of 111 | optimization. This includes release builds. To explicitly request this level of 112 | LTO, put these lines in the `Cargo.toml` file: 113 | ```toml 114 | [profile.release] 115 | lto = false 116 | ``` 117 | 118 | The second form of LTO is *thin LTO*, which is a little more aggressive, and 119 | likely to improve runtime speed and reduce binary size while also increasing 120 | compile times. Use `lto = "thin"` in `Cargo.toml` to enable it. 121 | 122 | The third form of LTO is *fat LTO*, which is even more aggressive, and may 123 | improve performance and reduce binary size further (but [not always]) while 124 | increasing build times again. Use `lto = "fat"` in `Cargo.toml` to enable it. 125 | 126 | [not always]: https://github.com/rust-lang/rust/pull/103453 127 | 128 | Finally, it is possible to fully disable LTO, which will likely worsen runtime 129 | speed and increase binary size but reduce compile times. Use `lto = "off"` in 130 | `Cargo.toml` for this. Note that this is different to the `lto = false` option, 131 | which, as mentioned above, leaves thin local LTO enabled. 132 | 133 | ### Alternative Allocators 134 | 135 | It is possible to replace the default (system) heap allocator used by a Rust 136 | program with an alternative allocator. The exact effect will depend on the 137 | individual program and the alternative allocator chosen, but large improvements 138 | in runtime speed and large reductions in memory usage have been seen in 139 | practice. The effect will also vary across platforms, because each platform's 140 | system allocator has its own strengths and weaknesses. The use of an 141 | alternative allocator is also likely to increase binary size and compile times. 142 | 143 | #### jemalloc 144 | 145 | One popular alternative allocator for Linux and Mac is [jemalloc], usable via 146 | the [`tikv-jemallocator`] crate. To use it, add a dependency to your 147 | `Cargo.toml` file: 148 | ```toml 149 | [dependencies] 150 | tikv-jemallocator = "0.5" 151 | ``` 152 | Then add the following to your Rust code, e.g. at the top of `src/main.rs`: 153 | ```rust,ignore 154 | #[global_allocator] 155 | static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc; 156 | ``` 157 | 158 | Furthermore, on Linux, jemalloc can be configured to use [transparent huge 159 | pages][THP] (THP). This can further speed up programs, possibly at the cost of 160 | higher memory usage. 161 | 162 | [THP]: https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html 163 | 164 | Do this by setting the `MALLOC_CONF` environment variable (or perhaps 165 | [`_RJEM_MALLOC_CONF`]) appropriately before building your program, for example: 166 | ```bash 167 | MALLOC_CONF="thp:always,metadata_thp:always" cargo build --release 168 | ``` 169 | The system running the compiled program also has to be configured to support 170 | THP. See [this blog post] for more details. 171 | 172 | [`_RJEM_MALLOC_CONF`]: https://github.com/tikv/jemallocator/issues/65 173 | [this blog post]: https://kobzol.github.io/rust/rustc/2023/10/21/make-rust-compiler-5percent-faster.html 174 | 175 | #### mimalloc 176 | 177 | Another alternative allocator that works on many platforms is [mimalloc], 178 | usable via the [`mimalloc`] crate. To use it, add a dependency to your 179 | `Cargo.toml` file: 180 | ```toml 181 | [dependencies] 182 | mimalloc = "0.1" 183 | ``` 184 | Then add the following to your Rust code, e.g. at the top of `src/main.rs`: 185 | ```rust,ignore 186 | #[global_allocator] 187 | static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; 188 | ``` 189 | 190 | [jemalloc]: https://github.com/jemalloc/jemalloc 191 | [`tikv-jemallocator`]: https://crates.io/crates/tikv-jemallocator 192 | [better performance]: https://github.com/rust-lang/rust/pull/83152 193 | [mimalloc]: https://github.com/microsoft/mimalloc 194 | [`mimalloc`]: https://crates.io/crates/mimalloc 195 | 196 | ### CPU Specific Instructions 197 | 198 | If you do not care about the compatibility of your binary on older (or other 199 | types of) processors, you can tell the compiler to generate the newest (and 200 | potentially fastest) instructions specific to a [certain CPU architecture], 201 | such as AVX SIMD instructions for x86-64 CPUs. 202 | 203 | [certain CPU architecture]: https://doc.rust-lang.org/rustc/codegen-options/index.html#target-cpu 204 | 205 | To request these instructions from the command line, use the `-C 206 | target-cpu=native` flag. For example: 207 | ```bash 208 | RUSTFLAGS="-C target-cpu=native" cargo build --release 209 | ``` 210 | 211 | Alternatively, to request these instructions from a [`config.toml`] file (for 212 | one or more projects), add these lines: 213 | ```toml 214 | [build] 215 | rustflags = ["-C", "target-cpu=native"] 216 | ``` 217 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 218 | 219 | This can improve runtime speed, especially if the compiler finds vectorization 220 | opportunities in your code. 221 | 222 | If you are unsure whether `-C target-cpu=native` is working optimally, compare 223 | the output of `rustc --print cfg` and `rustc --print cfg -C target-cpu=native` 224 | to see if the CPU features are being detected correctly in the latter case. If 225 | not, you can use `-C target-feature` to target specific features. 226 | 227 | ### Profile-guided Optimization 228 | 229 | Profile-guided optimization (PGO) is a compilation model where you compile 230 | your program, run it on sample data while collecting profiling data, and then 231 | use that profiling data to guide a second compilation of the program. This can 232 | improve runtime speed by 10% or more. 233 | [**Example 1**](https://blog.rust-lang.org/inside-rust/2020/11/11/exploring-pgo-for-the-rust-compiler.html), 234 | [**Example 2**](https://github.com/rust-lang/rust/pull/96978). 235 | 236 | It is an advanced technique that takes some effort to set up, but is worthwhile 237 | in some cases. See the [rustc PGO documentation] for details. Also, the 238 | [`cargo-pgo`] command makes it easier to use PGO (and [BOLT], which is similar) 239 | to optimize Rust binaries. 240 | 241 | Unfortunately, PGO is not supported for binaries hosted on crates.io and 242 | distributed via `cargo install`, which limits its usability. 243 | 244 | [rustc PGO documentation]: https://doc.rust-lang.org/rustc/profile-guided-optimization.html 245 | [`cargo-pgo`]: https://github.com/Kobzol/cargo-pgo 246 | [BOLT]: https://github.com/llvm/llvm-project/tree/main/bolt 247 | 248 | ## Minimizing Binary Size 249 | 250 | The following build configuration options are designed primarily to minimize 251 | binary size. Their effects on runtime speed vary. 252 | 253 | ### Optimization Level 254 | 255 | You can request an [optimization level] that aims to minimize binary size by 256 | adding these lines to the `Cargo.toml` file: 257 | ```toml 258 | [profile.release] 259 | opt-level = "z" 260 | ``` 261 | [optimization level]: https://doc.rust-lang.org/cargo/reference/profiles.html#opt-level 262 | 263 | This may also reduce runtime speed. 264 | 265 | An alternative is `opt-level = "s"`, which targets minimal binary size a little 266 | less aggressively. Compared to `opt-level = "z"`, it allows [slightly more 267 | inlining] and also the vectorization of loops. 268 | 269 | [slightly more inlining]: https://doc.rust-lang.org/rustc/codegen-options/index.html#inline-threshold 270 | 271 | ### Abort on `panic!` 272 | 273 | If you do not need to unwind on panic, e.g. because your program doesn't use 274 | [`catch_unwind`], you can tell the compiler to simply [abort on panic]. On 275 | panic, your program will still produce a backtrace. 276 | 277 | [`catch_unwind`]: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html 278 | [abort on panic]: https://doc.rust-lang.org/cargo/reference/profiles.html#panic 279 | 280 | This might reduce binary size and increase runtime speed slightly, and may even 281 | reduce compile times slightly. Add these lines to the `Cargo.toml` file: 282 | ```toml 283 | [profile.release] 284 | panic = "abort" 285 | ``` 286 | 287 | ### Strip Symbols 288 | 289 | You can tell the compiler to [strip] symbols from a release build by adding 290 | these lines to `Cargo.toml`: 291 | ```toml 292 | [profile.release] 293 | strip = "symbols" 294 | ``` 295 | [strip]: https://doc.rust-lang.org/cargo/reference/profiles.html#strip 296 | 297 | [**Example**](https://github.com/nnethercote/counts/commit/53cab44cd09ff1aa80de70a6dbe1893ff8a41142). 298 | 299 | However, stripping symbols may make your compiled program more difficult to 300 | debug and profile. For example, if a stripped program panics, the backtrace 301 | produced may contain less useful information than normal. The exact effects 302 | depend on the platform. 303 | 304 | Debug info does not need to be stripped from release builds. By default, debug 305 | info is not generated for local release builds, and debug info for the standard 306 | library has been stripped automatically in release builds [since Rust 1.77]. 307 | 308 | [since Rust 1.77]: https://blog.rust-lang.org/2024/03/21/Rust-1.77.0.html#enable-strip-in-release-profiles-by-default 309 | 310 | ### Other Ideas 311 | 312 | For more advanced binary size minimization techniques, consult the 313 | comprehensive documentation in the excellent [`min-sized-rust`] repository. 314 | 315 | [`min-sized-rust`]: https://github.com/johnthagen/min-sized-rust 316 | 317 | ## Minimizing Compile Times 318 | 319 | The following build configuration options are designed primarily to minimize 320 | compile times. 321 | 322 | ### Linking 323 | 324 | A big part of compile time is actually linking time, particularly when 325 | rebuilding a program after a small change. On some platforms it is possible to 326 | select a faster linker than the default one. 327 | 328 | One option is [lld], which is available on Linux and Windows. lld has been the 329 | default linker on Linux [since Rust 1.90]. It is not yet the default on 330 | Windows, but it should work for most use cases. 331 | 332 | [since Rust 1.90]: https://blog.rust-lang.org/2025/09/01/rust-lld-on-1.90.0-stable/ 333 | 334 | To specify lld 335 | from the command line, use the `-C link-arg=-fuse-ld=lld` flag. For example: 336 | ```bash 337 | RUSTFLAGS="-C link-arg=-fuse-ld=lld" cargo build --release 338 | ``` 339 | 340 | [lld]: https://lld.llvm.org/ 341 | 342 | Alternatively, to specify lld from a [`config.toml`] file (for one or more 343 | projects), add these lines: 344 | ```toml 345 | [build] 346 | rustflags = ["-C", "link-arg=-fuse-ld=lld"] 347 | ``` 348 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 349 | 350 | There is a [GitHub Issue] tracking full 351 | support for lld. 352 | 353 | [GitHub Issue]: https://github.com/rust-lang/rust/issues/39915#issuecomment-618726211 354 | 355 | Another option is [mold], which is currently available on Linux. 356 | Simply substitute `mold` for `lld` in the instructions above. mold is often 357 | faster than lld. 358 | [**Example**](https://davidlattimore.github.io/posts/2024/02/04/speeding-up-the-rust-edit-build-run-cycle.html). 359 | It is also much newer and may not work in all cases. 360 | 361 | [mold]: https://github.com/rui314/mold 362 | 363 | A final option is [wild], which is currently only available on Linux. It may be 364 | even faster than mold, but it is less mature. 365 | 366 | [wild]: https://github.com/davidlattimore/wild 367 | 368 | On Mac, an alternative linker isn't necessary because the system linker is 369 | fast. 370 | 371 | Unlike the other options in this chapter, there are no trade-offs to choosing 372 | another linker. As long as the linker works correctly for your program, which 373 | is likely to be true unless you are doing unusual things, an alternative 374 | linker can be dramatically faster without any downsides. 375 | 376 | ### Disable Debug Info Generation 377 | 378 | Although release builds give the best performance, many people use dev builds 379 | while developing because they build more quickly. If you use dev builds but 380 | don't often use a debugger, consider disabling debuginfo. This can improve dev 381 | build times significantly, by as much as 20-40%. 382 | [**Example.**](https://kobzol.github.io/rust/rustc/2025/05/20/disable-debuginfo-to-improve-rust-compile-times.html) 383 | 384 | To disable debug info generation, add these lines to the `Cargo.toml` file: 385 | ```toml 386 | [profile.dev] 387 | debug = false 388 | ``` 389 | Note that this means that stack traces will not contain line information. If 390 | you want to keep that line information, but do not require full information for 391 | the debugger, you can use `debug = "line-tables-only"` instead, which still 392 | gives most of the compile time benefits. 393 | 394 | ### Experimental Parallel Front-end 395 | 396 | If you use nightly Rust, you can enable the experimental [parallel front-end]. 397 | It may reduce compile times at the cost of higher compile-time memory usage. It 398 | won't affect the quality of the generated code. 399 | 400 | [parallel front-end]: https://blog.rust-lang.org/2023/11/09/parallel-rustc.html 401 | 402 | You can do that by adding `-Zthreads=N` to RUSTFLAGS, for example: 403 | ```bash 404 | RUSTFLAGS="-Zthreads=8" cargo build --release 405 | ``` 406 | 407 | Alternatively, to enable the parallel front-end from a [`config.toml`] file (for 408 | one or more projects), add these lines: 409 | ```toml 410 | [build] 411 | rustflags = ["-Z", "threads=8"] 412 | ``` 413 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 414 | 415 | Values other than `8` are possible, but that is the number that tends to give 416 | the best results. 417 | 418 | In the best cases, the experimental parallel front-end reduces compile times by 419 | up to 50%. But the effects vary widely and depend on the characteristics of the 420 | code and its build configuration, and for some programs there is no compile 421 | time improvement. 422 | 423 | ### Cranelift Codegen Back-end 424 | 425 | If you use nightly Rust you can enable the Cranelift codegen back-end on [some 426 | platforms]. It may reduce compile times at the cost of lower quality generated 427 | code, and therefore is recommended for dev builds rather than release builds. 428 | 429 | First, install the back-end with this `rustup` command: 430 | ```bash 431 | rustup component add rustc-codegen-cranelift-preview --toolchain nightly 432 | ``` 433 | 434 | To select Cranelift from the command line, use the 435 | `-Zcodegen-backend=cranelift` flag. For example: 436 | ```bash 437 | RUSTFLAGS="-Zcodegen-backend=cranelift" cargo +nightly build 438 | ``` 439 | 440 | Alternatively, to specify Cranelift from a [`config.toml`] file (for one or 441 | more projects), add these lines: 442 | ```toml 443 | [unstable] 444 | codegen-backend = true 445 | 446 | [profile.dev] 447 | codegen-backend = "cranelift" 448 | ``` 449 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 450 | 451 | For more information, see the [Cranelift documentation]. 452 | 453 | [some platforms]: https://github.com/rust-lang/rustc_codegen_cranelift#platform-support 454 | [Cranelift documentation]: https://github.com/rust-lang/rustc_codegen_cranelift 455 | 456 | ## Custom profiles 457 | 458 | In addition to the `dev` and `release` profiles, Cargo supports [custom 459 | profiles]. It might be useful, for example, to create a custom profile halfway 460 | between `dev` and `release` if you find the runtime speed of dev builds 461 | insufficient and the compile times of release builds too slow for everyday 462 | development. 463 | 464 | [custom profiles]: https://doc.rust-lang.org/cargo/reference/profiles.html#custom-profiles 465 | 466 | ## Summary 467 | 468 | There are many choices to be made when it comes to build configurations. The 469 | following points summarize the above information into some recommendations. 470 | 471 | - If you want to maximize runtime speed, consider all of the following: 472 | `codegen-units = 1`, `lto = "fat"`, an alternative allocator, and `panic = 473 | "abort"`. 474 | - If you want to minimize binary size, consider `opt-level = "z"`, 475 | `codegen-units = 1`, `lto = "fat"`, `panic = "abort"`, and `strip = 476 | "symbols"`. 477 | - In either case, consider `-C target-cpu=native` if broad architecture support 478 | is not needed, and `cargo-pgo` if it works with your distribution mechanism. 479 | - Always use a faster linker if you are on a platform that supports it, because 480 | there are no downsides to doing so. 481 | - Use `cargo-wizard` if you need additional help with these choices. 482 | - Benchmark all changes, one at a time, to ensure they have the expected 483 | effects. 484 | 485 | Finally, [this issue] tracks the evolution of the Rust compiler's own build 486 | configuration. The Rust compiler's build system is stranger and more complex 487 | than that of most Rust programs. Nonetheless, this issue may be instructive in 488 | showing how build configuration choices can be applied to a large program. 489 | 490 | [this issue]: https://github.com/rust-lang/rust/issues/103595 491 | -------------------------------------------------------------------------------- /src/compile-times.md: -------------------------------------------------------------------------------- 1 | # Compile Times 2 | 3 | Although this book is primarily about improving the performance of Rust 4 | programs, this section is about reducing the compile times of Rust programs, 5 | because that is a related topic of interest to many people. 6 | 7 | The [Minimizing Compile Times] section discussed ways to reduce compile times 8 | via build configuration choices. The rest of this section discusses ways to 9 | reduce compile times that require modifying your program's code. 10 | 11 | [Minimizing Compile Times]: build-configuration.md#minimizing-compile-times 12 | 13 | For additional compile time reduction techniques, consult Corrode's 14 | comprehensive list of [Tips for Faster Rust Compile Times][Tips]. 15 | 16 | [Tips]: https://corrode.dev/blog/tips-for-faster-rust-compile-times/ 17 | 18 | ## Visualization 19 | 20 | Cargo has a feature that lets you visualize compilation of your 21 | program. Build with this command: 22 | ```text 23 | cargo build --timings 24 | ``` 25 | On completion it will print the name of an HTML file. Open that file in a web 26 | browser. It contains a [Gantt chart] that shows the dependencies between the 27 | various crates in your program. This shows how much parallelism there is in 28 | your crate graph, which can indicate if any large crates that serialize 29 | compilation should be broken up. See [the documentation][timings] for more 30 | details on how to read the graphs. 31 | 32 | [Gantt chart]: https://en.wikipedia.org/wiki/Gantt_chart 33 | [timings]: https://doc.rust-lang.org/nightly/cargo/reference/timings.html 34 | 35 | ## Macros 36 | 37 | Some macros generate a lot of code. That code then takes time to compile. The 38 | Rust compiler's `-Zmacro-stats` flag can help identify such cases. 39 | 40 | For example, if you just want to measure a leaf crate of your project: 41 | ```text 42 | cargo +nightly rustc -- -Zmacro-stats 43 | ``` 44 | The compiler will print information about the amount of code generated by both 45 | procedural macros and declarative macros. The former are usually more notable. 46 | 47 | Or, if you want to measure all the crates in your project: 48 | ```text 49 | RUSTFLAGS="-Zmacro-stats" cargo +nightly build 50 | ``` 51 | To see the generated code itself, you can use [cargo-expand]. 52 | 53 | [cargo-expand]: https://github.com/dtolnay/cargo-expand 54 | 55 | It's not worth worrying over macros that produce small amounts of code, but if 56 | a macro is generating an amount of code comparable to the amount of 57 | hand-written code, it might be possible to remove the use of that macro 58 | entirely, or replace it with a cheaper alternative. 59 | [**Example**](https://nnethercote.github.io/2025/06/26/how-much-code-does-that-proc-macro-generate.html). 60 | 61 | Alternatively, it might be possible to modify the macro to generate less code. 62 | [**Example 1**](https://github.com/bevyengine/bevy/issues/19873), 63 | [**Example 2**](https://nnethercote.github.io/2025/08/16/speed-wins-when-fuzzing-rust-code-with-derive-arbitrary.html). 64 | 65 | ## LLVM IR 66 | 67 | The Rust compiler uses [LLVM] for its back-end. LLVM's execution can be a large 68 | part of compile times, especially when the Rust compiler's front end generates 69 | a lot of [IR] which takes LLVM a long time to optimize. 70 | 71 | [LLVM]: https://llvm.org/ 72 | [IR]: https://en.wikipedia.org/wiki/Intermediate_representation 73 | 74 | These problems can be diagnosed with [`cargo llvm-lines`], which shows which 75 | Rust functions cause the most LLVM IR to be generated. Generic functions are 76 | often the most important ones, because they can be instantiated dozens or even 77 | hundreds of times in large programs. 78 | 79 | [`cargo llvm-lines`]: https://github.com/dtolnay/cargo-llvm-lines/ 80 | 81 | If a generic function causes IR bloat, there are several ways to fix it. The 82 | simplest is to just make the function smaller. 83 | [**Example 1**](https://github.com/rust-lang/rust/pull/72166/commits/5a0ac0552e05c079f252482cfcdaab3c4b39d614), 84 | [**Example 2**](https://github.com/rust-lang/rust/pull/91246/commits/f3bda74d363a060ade5e5caeb654ba59bfed51a4). 85 | 86 | Another way is to move the non-generic parts of the function into a separate, 87 | non-generic function, which will only be instantiated once. Whether this is 88 | possible will depend on the details of the generic function. When it is 89 | possible, the non-generic function can often be written neatly as an inner 90 | function within the generic function, as shown by the code for 91 | [`std::fs::read`]: 92 | ```rust,ignore 93 | pub fn read>(path: P) -> io::Result> { 94 | fn inner(path: &Path) -> io::Result> { 95 | let mut file = File::open(path)?; 96 | let size = file.metadata().map(|m| m.len()).unwrap_or(0); 97 | let mut bytes = Vec::with_capacity(size as usize); 98 | io::default_read_to_end(&mut file, &mut bytes)?; 99 | Ok(bytes) 100 | } 101 | inner(path.as_ref()) 102 | } 103 | ``` 104 | [`std::fs::read`]: https://doc.rust-lang.org/std/fs/fn.read.html 105 | 106 | [**Example**](https://github.com/rust-lang/rust/pull/72013/commits/68b75033ad78d88872450a81745cacfc11e58178). 107 | 108 | Sometimes common utility functions like [`Option::map`] and [`Result::map_err`] 109 | are instantiated many times. Replacing them with equivalent `match` expressions 110 | can help compile times. 111 | 112 | [`Option::map`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map 113 | [`Result::map_err`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_err 114 | 115 | The effects of these sorts of changes on compile times will usually be small, 116 | though occasionally they can be large. 117 | [**Example**](https://github.com/servo/servo/issues/26585). 118 | 119 | Such changes can also reduce binary size. 120 | -------------------------------------------------------------------------------- /src/general-tips.md: -------------------------------------------------------------------------------- 1 | # General Tips 2 | 3 | The previous sections of this book have discussed Rust-specific techniques. 4 | This section gives a brief overview of some general performance principles. 5 | 6 | As long as the obvious pitfalls are avoided (e.g. [using non-release builds]), 7 | Rust code generally is fast and uses little memory. Especially if you are used 8 | to dynamically-typed languages such as Python and Ruby, or statically-types 9 | languages with a garbage collector such as Java and C#. 10 | 11 | [using non-release builds]: build-configuration.md 12 | 13 | Optimized code is often more complex and takes more effort to write than 14 | unoptimized code. For this reason, it is only worth optimizing hot code. 15 | 16 | The biggest performance improvements often come from changes to algorithms or 17 | data structures, rather than low-level optimizations. 18 | [**Example 1**](https://github.com/rust-lang/rust/pull/53383/commits/5745597e6195fe0591737f242d02350001b6c590), 19 | [**Example 2**](https://github.com/rust-lang/rust/pull/54318/commits/154be2c98cf348de080ce951df3f73649e8bb1a6). 20 | 21 | Writing code that works well with modern hardware is not always easy, but worth 22 | striving for. For example, try to minimize cache misses and branch 23 | mispredictions, where possible. 24 | 25 | Most optimizations result in small speedups. Although no single small speedup 26 | is noticeable, they really add up if you can do enough of them. 27 | 28 | Different profilers have different strengths. It is good to use more than one. 29 | 30 | When profiling indicates that a function is hot, there are two common ways to 31 | speed things up: (a) make the function faster, and/or (b) avoid calling it as 32 | much. 33 | 34 | It is often easier to eliminate silly slowdowns than it is to introduce clever 35 | speedups. 36 | 37 | Avoid computing things unless necessary. Lazy/on-demand computations are 38 | often a win. 39 | [**Example 1**](https://github.com/rust-lang/rust/pull/36592/commits/80a44779f7a211e075da9ed0ff2763afa00f43dc), 40 | [**Example 2**](https://github.com/rust-lang/rust/pull/50339/commits/989815d5670826078d9984a3515eeb68235a4687). 41 | 42 | Complex general cases can often be avoided by optimistically checking for 43 | common special cases that are simpler. 44 | [**Example 1**](https://github.com/rust-lang/rust/pull/68790/commits/d62b6f204733d255a3e943388ba99f14b053bf4a), 45 | [**Example 2**](https://github.com/rust-lang/rust/pull/53733/commits/130e55665f8c9f078dec67a3e92467853f400250), 46 | [**Example 3**](https://github.com/rust-lang/rust/pull/65260/commits/59e41edcc15ed07de604c61876ea091900f73649). 47 | In particular, specially handling collections with 0, 1, or 2 elements is often 48 | a win when small sizes dominate. 49 | [**Example 1**](https://github.com/rust-lang/rust/pull/50932/commits/2ff632484cd8c2e3b123fbf52d9dd39b54a94505), 50 | [**Example 2**](https://github.com/rust-lang/rust/pull/64627/commits/acf7d4dcdba4046917c61aab141c1dec25669ce9), 51 | [**Example 3**](https://github.com/rust-lang/rust/pull/64949/commits/14192607d38f5501c75abea7a4a0e46349df5b5f), 52 | [**Example 4**](https://github.com/rust-lang/rust/pull/64949/commits/d1a7bb36ad0a5932384eac03d3fb834efc0317e5). 53 | 54 | Similarly, when dealing with repetitive data, it is often possible to use a 55 | simple form of data compression, by using a compact representation for common 56 | values and then having a fallback to a secondary table for unusual values. 57 | [**Example 1**](https://github.com/rust-lang/rust/pull/54420/commits/b2f25e3c38ff29eebe6c8ce69b8c69243faa440d), 58 | [**Example 2**](https://github.com/rust-lang/rust/pull/59693/commits/fd7f605365b27bfdd3cd6763124e81bddd61dd28), 59 | [**Example 3**](https://github.com/rust-lang/rust/pull/65750/commits/eea6f23a0ed67fd8c6b8e1b02cda3628fee56b2f). 60 | 61 | When code deals with multiple cases, measure case frequencies and handle the 62 | most common ones first. 63 | 64 | When dealing with lookups that involve high locality, it can be a win to put a 65 | small cache in front of a data structure. 66 | 67 | Optimized code often has a non-obvious structure, which means that explanatory 68 | comments are valuable, particularly those that reference profiling 69 | measurements. A comment like "99% of the time this vector has 0 or 1 elements, 70 | so handle those cases first" can be illuminating. 71 | -------------------------------------------------------------------------------- /src/hashing.md: -------------------------------------------------------------------------------- 1 | # Hashing 2 | 3 | `HashSet` and `HashMap` are two widely-used types and there are ways to make 4 | them faster. 5 | 6 | ## Alternative Hashers 7 | 8 | The default hashing algorithm is not specified, but at the time of writing the 9 | default is an algorithm called [SipHash 1-3]. This algorithm is high quality—it 10 | provides high protection against collisions—but is relatively slow, 11 | particularly for short keys such as integers. 12 | 13 | [SipHash 1-3]: https://en.wikipedia.org/wiki/SipHash 14 | 15 | If profiling shows that hashing is hot, and [HashDoS attacks] are not a concern 16 | for your application, the use of hash tables with faster hash algorithms can 17 | provide large speed wins. 18 | - [`rustc-hash`] provides `FxHashSet` and `FxHashMap` types that are drop-in 19 | replacements for `HashSet` and `HashMap`. Its hashing algorithm is 20 | low-quality but very fast, especially for integer keys, and has been found to 21 | out-perform all other hash algorithms within rustc. ([`fxhash`] is an older, 22 | less well maintained implementation of the same algorithm and types.) 23 | - [`fnv`] provides `FnvHashSet` and `FnvHashMap` types. Its hashing algorithm 24 | is higher quality than `rustc-hash`'s but a little slower. 25 | - [`ahash`] provides `AHashSet` and `AHashMap`. Its hashing algorithm can take 26 | advantage of AES instruction support that is available on some processors. 27 | 28 | [HashDoS attacks]: https://en.wikipedia.org/wiki/Collision_attack 29 | [`rustc-hash`]: https://crates.io/crates/rustc-hash 30 | [`fxhash`]: https://crates.io/crates/fxhash 31 | [`fnv`]: https://crates.io/crates/fnv 32 | [`ahash`]: https://crates.io/crates/ahash 33 | 34 | If hashing performance is important in your program, it is worth trying more 35 | than one of these alternatives. For example, the following results were seen in 36 | rustc. 37 | - The switch from `fnv` to `fxhash` gave [speedups of up to 6%][fnv2fx]. 38 | - An attempt to switch from `fxhash` to `ahash` resulted in [slowdowns of 39 | 1-4%][fx2a]. 40 | - An attempt to switch from `fxhash` back to the default hasher resulted in 41 | [slowdowns ranging from 4-84%][fx2default]! 42 | 43 | [fnv2fx]: https://github.com/rust-lang/rust/pull/37229/commits/00e48affde2d349e3b3bfbd3d0f6afb5d76282a7 44 | [fx2a]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589504301 45 | [fx2default]: https://github.com/rust-lang/rust/issues/69153#issuecomment-589338446 46 | 47 | If you decide to universally use one of the alternatives, such as 48 | `FxHashSet`/`FxHashMap`, it is easy to accidentally use `HashSet`/`HashMap` in 49 | some places. You can [use Clippy] to avoid this problem. 50 | 51 | [use Clippy]: linting.md#disallowing-types 52 | 53 | Some types don't need hashing. For example, you might have a newtype that wraps 54 | an integer and the integer values are random, or close to random. For such a 55 | type, the distribution of the hashed values won't be that different to the 56 | distribution of the values themselves. In this case the [`nohash_hasher`] crate 57 | can be useful. 58 | 59 | [`nohash_hasher`]: https://crates.io/crates/nohash-hasher 60 | 61 | Hash function design is a complex topic and is beyond the scope of this book. 62 | The [`ahash` documentation] has a good discussion. 63 | 64 | [`ahash` documentation]: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md 65 | 66 | ## Byte-wise Hashing 67 | 68 | When you annotate a type with `#[derive(Hash)]` the generated `hash` method 69 | will hash each field separately. For some hash functions it may be faster to 70 | convert the type to raw bytes and hash the bytes as a stream. This is possible 71 | for types that satisfy certain properties such as having no padding bytes. 72 | 73 | The [`zerocopy`] and [`bytemuck`] crates both provide a `#[derive(ByteHash)]` 74 | macro that generates a `hash` method that does this kind of byte-wise hashing. 75 | The README for the [`derive_hash_fast`] crate provides more detail for this 76 | technique. 77 | 78 | [`zerocopy`]: https://crates.io/crates/zerocopy 79 | [`bytemuck`]: https://crates.io/crates/bytemuck 80 | [`derive_hash_fast`]: https://crates.io/crates/derive_hash_fast 81 | 82 | This is an advanced technique, and the performance effects are highly dependent 83 | on the hash function and the exact structure of the types being hashed. Measure 84 | carefully. 85 | -------------------------------------------------------------------------------- /src/heap-allocations.md: -------------------------------------------------------------------------------- 1 | # Heap Allocations 2 | 3 | Heap allocations are moderately expensive. The exact details depend on which 4 | allocator is in use, but each allocation (and deallocation) typically involves 5 | acquiring a global lock, doing some non-trivial data structure manipulation, 6 | and possibly executing a system call. Small allocations are not necessarily 7 | cheaper than large allocations. It is worth understanding which Rust data 8 | structures and operations cause allocations, because avoiding them can greatly 9 | improve performance. 10 | 11 | The [Rust Container Cheat Sheet] has visualizations of common Rust types, and 12 | is an excellent companion to the following sections. 13 | 14 | [Rust Container Cheat Sheet]: https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/ 15 | 16 | ## Profiling 17 | 18 | If a general-purpose profiler shows `malloc`, `free`, and related functions as 19 | hot, then it is likely worth trying to reduce the allocation rate and/or using 20 | an alternative allocator. 21 | 22 | [DHAT] is an excellent profiler to use when reducing allocation rates. It works 23 | on Linux and some other Unixes. It precisely identifies hot allocation 24 | sites and their allocation rates. Exact results will vary, but experience with 25 | rustc has shown that reducing allocation rates by 10 allocations per million 26 | instructions executed can have measurable performance improvements (e.g. ~1%). 27 | 28 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html 29 | 30 | Here is some example output from DHAT. 31 | ```text 32 | AP 1.1/25 (2 children) { 33 | Total: 54,533,440 bytes (4.02%, 2,714.28/Minstr) in 458,839 blocks (7.72%, 22.84/Minstr), avg size 118.85 bytes, avg lifetime 1,127,259,403.64 instrs (5.61% of program duration) 34 | At t-gmax: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes 35 | At t-end: 0 bytes (0%) in 0 blocks (0%), avg size 0 bytes 36 | Reads: 15,993,012 bytes (0.29%, 796.02/Minstr), 0.29/byte 37 | Writes: 20,974,752 bytes (1.03%, 1,043.97/Minstr), 0.38/byte 38 | Allocated at { 39 | #1: 0x95CACC9: alloc (alloc.rs:72) 40 | #2: 0x95CACC9: alloc (alloc.rs:148) 41 | #3: 0x95CACC9: reserve_internal (raw_vec.rs:669) 42 | #4: 0x95CACC9: reserve (raw_vec.rs:492) 43 | #5: 0x95CACC9: reserve (vec.rs:460) 44 | #6: 0x95CACC9: push (vec.rs:989) 45 | #7: 0x95CACC9: parse_token_trees_until_close_delim (tokentrees.rs:27) 46 | #8: 0x95CACC9: syntax::parse::lexer::tokentrees::>::parse_token_tree (tokentrees.rs:81) 47 | } 48 | } 49 | ``` 50 | It is beyond the scope of this book to describe everything in this example, but 51 | it should be clear that DHAT gives a wealth of information about allocations, 52 | such as where and how often they happen, how big they are, how long they live 53 | for, and how often they are accessed. 54 | 55 | ## `Box` 56 | 57 | [`Box`] is the simplest heap-allocated type. A `Box` value is a `T` value 58 | that is allocated on the heap. 59 | 60 | [`Box`]: https://doc.rust-lang.org/std/boxed/struct.Box.html 61 | 62 | It is sometimes worth boxing one or more fields in a struct or enum fields to 63 | make a type smaller. (See the [Type Sizes](type-sizes.md) chapter for more 64 | about this.) 65 | 66 | Other than that, `Box` is straightforward and does not offer much scope for 67 | optimizations. 68 | 69 | ## `Rc`/`Arc` 70 | 71 | [`Rc`]/[`Arc`] are similar to `Box`, but the value on the heap is accompanied by 72 | two reference counts. They allow value sharing, which can be an effective way 73 | to reduce memory usage. 74 | 75 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html 76 | [`Arc`]: https://doc.rust-lang.org/std/sync/struct.Arc.html 77 | 78 | However, if used for values that are rarely shared, they can increase allocation 79 | rates by heap allocating values that might otherwise not be heap-allocated. 80 | [**Example**](https://github.com/rust-lang/rust/pull/37373/commits/c440a7ae654fb641e68a9ee53b03bf3f7133c2fe). 81 | 82 | Unlike `Box`, calling `clone` on an `Rc`/`Arc` value does not involve an 83 | allocation. Instead, it merely increments a reference count. 84 | 85 | ## `Vec` 86 | 87 | [`Vec`] is a heap-allocated type with a great deal of scope for optimizing the 88 | number of allocations, and/or minimizing the amount of wasted space. To do this 89 | requires understanding how its elements are stored. 90 | 91 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html 92 | 93 | A `Vec` contains three words: a length, a capacity, and a pointer. The pointer 94 | will point to heap-allocated memory if the capacity is nonzero and the element 95 | size is nonzero; otherwise, it will not point to allocated memory. 96 | 97 | Even if the `Vec` itself is not heap-allocated, the elements (if present and 98 | nonzero-sized) always will be. If nonzero-sized elements are present, the 99 | memory holding those elements may be larger than necessary, providing space for 100 | additional future elements. The number of elements present is the length, and 101 | the number of elements that could be held without reallocating is the capacity. 102 | 103 | When the vector needs to grow beyond its current capacity, the elements will be 104 | copied into a larger heap allocation, and the old heap allocation will be 105 | freed. 106 | 107 | ### `Vec` Growth 108 | 109 | A new, empty `Vec` created by the common means 110 | ([`vec![]`](https://doc.rust-lang.org/std/macro.vec.html) 111 | or [`Vec::new`] or [`Vec::default`]) has a length and capacity of zero, and no 112 | heap allocation is required. If you repeatedly push individual elements onto 113 | the end of the `Vec`, it will periodically reallocate. The growth strategy is 114 | not specified, but at the time of writing it uses a quasi-doubling strategy 115 | resulting in the following capacities: 0, 4, 8, 16, 32, 64, and so on. (It 116 | skips directly from 0 to 4, instead of going via 1 and 2, because this [avoids 117 | many allocations] in practice.) As a vector grows, the frequency of 118 | reallocations will decrease exponentially, but the amount of possibly-wasted 119 | excess capacity will increase exponentially. 120 | 121 | [`Vec::new`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.new 122 | [`Vec::default`]: https://doc.rust-lang.org/std/default/trait.Default.html#tymethod.default 123 | [avoids many allocations]: https://github.com/rust-lang/rust/pull/72227 124 | 125 | This growth strategy is typical for growable data structures and reasonable in 126 | the general case, but if you know in advance the likely length of a vector you 127 | can often do better. If you have a hot vector allocation site (e.g. a hot 128 | [`Vec::push`] call), it is worth using [`eprintln!`] to print the vector length 129 | at that site and then doing some post-processing (e.g. with [`counts`]) to 130 | determine the length distribution. For example, you might have many short 131 | vectors, or you might have a smaller number of very long vectors, and the best 132 | way to optimize the allocation site will vary accordingly. 133 | 134 | [`Vec::push`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.push 135 | [`eprintln!`]: https://doc.rust-lang.org/std/macro.eprintln.html 136 | [`counts`]: https://github.com/nnethercote/counts/ 137 | 138 | ### Short `Vec`s 139 | 140 | If you have many short vectors, you can use the `SmallVec` type from the 141 | [`smallvec`] crate. `SmallVec<[T; N]>` is a drop-in replacement for `Vec` that 142 | can store `N` elements within the `SmallVec` itself, and then switches to a 143 | heap allocation if the number of elements exceeds that. (Note also that 144 | `vec![]` literals must be replaced with `smallvec![]` literals.) 145 | [**Example 1**](https://github.com/rust-lang/rust/pull/50565/commits/78262e700dc6a7b57e376742f344e80115d2d3f2), 146 | [**Example 2**](https://github.com/rust-lang/rust/pull/55383/commits/526dc1421b48e3ee8357d58d997e7a0f4bb26915). 147 | 148 | [`smallvec`]: https://crates.io/crates/smallvec 149 | 150 | `SmallVec` reliably reduces the allocation rate when used appropriately, but 151 | its use does not guarantee improved performance. It is slightly slower than 152 | `Vec` for normal operations because it must always check if the elements are 153 | heap-allocated or not. Also, If `N` is high or `T` is large, then the 154 | `SmallVec<[T; N]>` itself can be larger than `Vec`, and copying of 155 | `SmallVec` values will be slower. As always, benchmarking is required to 156 | confirm that an optimization is effective. 157 | 158 | If you have many short vectors *and* you precisely know their maximum length, 159 | `ArrayVec` from the [`arrayvec`] crate is a better choice than `SmallVec`. It 160 | does not require the fallback to heap allocation, which makes it a little 161 | faster. 162 | [**Example**](https://github.com/rust-lang/rust/pull/74310/commits/c492ca40a288d8a85353ba112c4d38fe87ef453e). 163 | 164 | [`arrayvec`]: https://crates.io/crates/arrayvec 165 | 166 | ### Longer `Vec`s 167 | 168 | If you know the minimum or exact size of a vector, you can reserve a specific 169 | capacity with [`Vec::with_capacity`], [`Vec::reserve`], or 170 | [`Vec::reserve_exact`]. For example, if you know a vector will grow to have at 171 | least 20 elements, these functions can immediately provide a vector with a 172 | capacity of at least 20 using a single allocation, whereas pushing the items 173 | one at a time would result in four allocations (for capacities of 4, 8, 16, and 174 | 32). 175 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/a7f2bb634308a5f05f2af716482b67ba43701681). 176 | 177 | [`Vec::with_capacity`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.with_capacity 178 | [`Vec::reserve`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve 179 | [`Vec::reserve_exact`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve_exact 180 | 181 | If you know the maximum length of a vector, the above functions also let you 182 | not allocate excess space unnecessarily. Similarly, [`Vec::shrink_to_fit`] can be 183 | used to minimize wasted space, but note that it may cause a reallocation. 184 | 185 | [`Vec::shrink_to_fit`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit 186 | 187 | ## `String` 188 | 189 | A [`String`] contains heap-allocated bytes. The representation and operation of 190 | `String` are very similar to that of `Vec`. Many `Vec` methods relating to 191 | growth and capacity have equivalents for `String`, such as 192 | [`String::with_capacity`]. 193 | 194 | [`String`]: https://doc.rust-lang.org/std/string/struct.String.html 195 | [`String::with_capacity`]: https://doc.rust-lang.org/std/string/struct.String.html#method.with_capacity 196 | 197 | The `SmallString` type from the [`smallstr`] crate is similar to the `SmallVec` 198 | type. 199 | 200 | [`smallstr`]: https://crates.io/crates/smallstr 201 | 202 | The `String` type from the [`smartstring`] crate is a drop-in replacement for 203 | `String` that avoids heap allocations for strings with less than three words' 204 | worth of characters. On 64-bit platforms, this is any string that is less than 205 | 24 bytes, which includes all strings containing 23 or fewer ASCII characters. 206 | [**Example**](https://github.com/djc/topfew-rs/commit/803fd566e9b889b7ba452a2a294a3e4df76e6c4c). 207 | 208 | [`smartstring`]: https://crates.io/crates/smartstring 209 | 210 | Note that the `format!` macro produces a `String`, which means it performs an 211 | allocation. If you can avoid a `format!` call by using a string literal, that 212 | will avoid this allocation. 213 | [**Example**](https://github.com/rust-lang/rust/pull/55905/commits/c6862992d947331cd6556f765f6efbde0a709cf9). 214 | [`std::format_args`] and/or the [`lazy_format`] crate may help with this. 215 | 216 | [`std::format_args`]: https://doc.rust-lang.org/std/macro.format_args.html 217 | [`lazy_format`]: https://crates.io/crates/lazy_format 218 | 219 | ## Hash Tables 220 | 221 | [`HashSet`] and [`HashMap`] are hash tables. Their representation and 222 | operations are similar to those of `Vec`, in terms of allocations: they have 223 | a single contiguous heap allocation, holding keys and values, which is 224 | reallocated as necessary as the table grows. Many `Vec` methods relating to 225 | growth and capacity have equivalents for `HashSet`/`HashMap`, such as 226 | [`HashSet::with_capacity`]. 227 | 228 | [`HashSet`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html 229 | [`HashMap`]: https://doc.rust-lang.org/std/collections/struct.HashMap.html 230 | [`HashSet::with_capacity`]: https://doc.rust-lang.org/std/collections/struct.HashSet.html#method.with_capacity 231 | 232 | ## `clone` 233 | 234 | Calling [`clone`] on a value that contains heap-allocated memory typically 235 | involves additional allocations. For example, calling `clone` on a non-empty 236 | `Vec` requires a new allocation for the elements (but note that the capacity of 237 | the new `Vec` might not be the same as the capacity of the original `Vec`). The 238 | exception is `Rc`/`Arc`, where a `clone` call just increments the reference 239 | count. 240 | 241 | [`clone`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone 242 | 243 | [`clone_from`] is an alternative to `clone`. `a.clone_from(&b)` is equivalent 244 | to `a = b.clone()` but may avoid unnecessary allocations. For example, if you 245 | want to clone one `Vec` over the top of an existing `Vec`, the existing `Vec`'s 246 | heap allocation will be reused if possible, as the following example shows. 247 | ```rust 248 | let mut v1: Vec = Vec::with_capacity(99); 249 | let v2: Vec = vec![1, 2, 3]; 250 | v1.clone_from(&v2); // v1's allocation is reused 251 | assert_eq!(v1.capacity(), 99); 252 | ``` 253 | Although `clone` usually causes allocations, it is a reasonable thing to use in 254 | many circumstances and can often make code simpler. Use profiling data to see 255 | which `clone` calls are hot and worth taking the effort to avoid. 256 | 257 | [`clone_from`]: https://doc.rust-lang.org/std/clone/trait.Clone.html#method.clone_from 258 | 259 | Sometimes Rust code ends up containing unnecessary `clone` calls, due to (a) 260 | programmer error, or (b) changes in the code that render previously-necessary 261 | `clone` calls unnecessary. If you see a hot `clone` call that does not seem 262 | necessary, sometimes it can simply be removed. 263 | [**Example 1**](https://github.com/rust-lang/rust/pull/37318/commits/e382267cfb9133ef12d59b66a2935ee45b546a61), 264 | [**Example 2**](https://github.com/rust-lang/rust/pull/37705/commits/11c1126688bab32f76dbe1a973906c7586da143f), 265 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/36b37e22de92b584b9cf4464ed1d4ad317b798be). 266 | 267 | ## `to_owned` 268 | 269 | [`ToOwned::to_owned`] is implemented for many common types. It creates owned 270 | data from borrowed data, usually by cloning, and therefore often causes heap 271 | allocations. For example, it can be used to create a `String` from a `&str`. 272 | 273 | [`ToOwned::to_owned`]: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html#tymethod.to_owned 274 | 275 | Sometimes `to_owned` calls (and related calls such as `clone` and `to_string`) 276 | can be avoided by storing a reference to borrowed data in a struct rather than 277 | an owned copy. This requires lifetime annotations on the struct, complicating 278 | the code, and should only be done when profiling and benchmarking shows that it 279 | is worthwhile. 280 | [**Example**](https://github.com/rust-lang/rust/pull/50855/commits/6872377357dbbf373cfd2aae352cb74cfcc66f34). 281 | 282 | ## `Cow` 283 | 284 | Sometimes code deals with a mixture of borrowed and owned data. Imagine a 285 | vector of error messages, some of which are static string literals and some of 286 | which are constructed with `format!`. The obvious representation is 287 | `Vec`, as the following example shows. 288 | ```rust 289 | let mut errors: Vec = vec![]; 290 | errors.push("something went wrong".to_string()); 291 | errors.push(format!("something went wrong on line {}", 100)); 292 | ``` 293 | That requires a `to_string` call to promote the static string literal to a 294 | `String`, which incurs an allocation. 295 | 296 | Instead you can use the [`Cow`] type, which can hold either borrowed or owned 297 | data. A borrowed value `x` is wrapped with `Cow::Borrowed(x)`, and an owned 298 | value `y` is wrapped with `Cow::Owned(y)`. `Cow` also implements the `From` 299 | trait for various string, slice, and path types, so you can usually use `into` 300 | as well. (Or `Cow::from`, which is longer but results in more readable code, 301 | because it makes the type clearer.) The following example puts all this together. 302 | 303 | [`Cow`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html 304 | 305 | ```rust 306 | use std::borrow::Cow; 307 | let mut errors: Vec> = vec![]; 308 | errors.push(Cow::Borrowed("something went wrong")); 309 | errors.push(Cow::Owned(format!("something went wrong on line {}", 100))); 310 | errors.push(Cow::from("something else went wrong")); 311 | errors.push(format!("something else went wrong on line {}", 101).into()); 312 | ``` 313 | `errors` now holds a mixture of borrowed and owned data without requiring any 314 | extra allocations. This example involves `&str`/`String`, but other pairings 315 | such as `&[T]`/`Vec` and `&Path`/`PathBuf` are also possible. 316 | 317 | [**Example 1**](https://github.com/rust-lang/rust/pull/37064/commits/b043e11de2eb2c60f7bfec5e15960f537b229e20), 318 | [**Example 2**](https://github.com/rust-lang/rust/pull/56336/commits/787959c20d062d396b97a5566e0a766d963af022). 319 | 320 | All of the above applies if the data is immutable. But `Cow` also allows 321 | borrowed data to be promoted to owned data if it needs to be mutated. 322 | [`Cow::to_mut`] will obtain a mutable reference to an owned value, cloning if 323 | necessary. This is called "clone-on-write", which is where the name `Cow` comes 324 | from. 325 | 326 | [`Deref`]: https://doc.rust-lang.org/std/ops/trait.Deref.html 327 | [`Cow::to_mut`]: https://doc.rust-lang.org/std/borrow/enum.Cow.html#method.to_mut 328 | 329 | This clone-on-write behaviour is useful when you have some borrowed data, such 330 | as a `&str`, that is mostly read-only but occasionally needs to be modified. 331 | 332 | [**Example 1**](https://github.com/rust-lang/rust/pull/50855/commits/ad471452ba6fbbf91ad566dc4bdf1033a7281811), 333 | [**Example 2**](https://github.com/rust-lang/rust/pull/68848/commits/67da45f5084f98eeb20cc6022d68788510dc832a). 334 | 335 | Finally, because `Cow` implements [`Deref`], you can call methods directly on 336 | the data it encloses. 337 | 338 | `Cow` can be fiddly to get working, but it is often worth the effort. 339 | 340 | ## Reusing Collections 341 | 342 | Sometimes you need to build up a collection such as a `Vec` in stages. It is 343 | usually better to do this by modifying a single `Vec` than by building multiple 344 | `Vec`s and then combining them. 345 | 346 | For example, if you have a function `do_stuff` that produces a `Vec` that might 347 | be called multiple times: 348 | ```rust 349 | fn do_stuff(x: u32, y: u32) -> Vec { 350 | vec![x, y] 351 | } 352 | ``` 353 | It might be better to instead modify a passed-in `Vec`: 354 | ```rust 355 | fn do_stuff(x: u32, y: u32, vec: &mut Vec) { 356 | vec.push(x); 357 | vec.push(y); 358 | } 359 | ``` 360 | Sometimes it is worth keeping around a "workhorse" collection that can be 361 | reused. For example, if a `Vec` is needed for each iteration of a loop, you 362 | could declare the `Vec` outside the loop, use it within the loop body, and then 363 | call [`clear`] at the end of the loop body (to empty the `Vec` without affecting 364 | its capacity). This avoids allocations at the cost of obscuring the fact that 365 | each iteration's usage of the `Vec` is unrelated to the others. 366 | [**Example 1**](https://github.com/rust-lang/rust/pull/77990/commits/45faeb43aecdc98c9e3f2b24edf2ecc71f39d323), 367 | [**Example 2**](https://github.com/rust-lang/rust/pull/51870/commits/b0c78120e3ecae5f4043781f7a3f79e2277293e7). 368 | 369 | [`clear`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.clear 370 | 371 | Similarly, it is sometimes worth keeping a workhorse collection within a 372 | struct, to be reused in one or more methods that are called repeatedly. 373 | 374 | ## Reading Lines from a File 375 | 376 | [`BufRead::lines`] makes it easy to read a file one line at a time: 377 | ```rust 378 | # fn blah() -> Result<(), std::io::Error> { 379 | # fn process(_: &str) {} 380 | use std::io::{self, BufRead}; 381 | let mut lock = io::stdin().lock(); 382 | for line in lock.lines() { 383 | process(&line?); 384 | } 385 | # Ok(()) 386 | # } 387 | ``` 388 | But the iterator it produces returns `io::Result`, which means it 389 | allocates for every line in the file. 390 | 391 | [`BufRead::lines`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.lines 392 | 393 | An alternative is to use a workhorse `String` in a loop over 394 | [`BufRead::read_line`]: 395 | ```rust 396 | # fn blah() -> Result<(), std::io::Error> { 397 | # fn process(_: &str) {} 398 | use std::io::{self, BufRead}; 399 | let mut lock = io::stdin().lock(); 400 | let mut line = String::new(); 401 | while lock.read_line(&mut line)? != 0 { 402 | process(&line); 403 | line.clear(); 404 | } 405 | # Ok(()) 406 | # } 407 | ``` 408 | This reduces the number of allocations to at most a handful, and possibly just 409 | one. (The exact number depends on how many times `line` needs to be 410 | reallocated, which depends on the distribution of line lengths in the file.) 411 | 412 | This will only work if the loop body can operate on a `&str`, rather than a 413 | `String`. 414 | 415 | [`BufRead::read_line`]: https://doc.rust-lang.org/stable/std/io/trait.BufRead.html#method.read_line 416 | 417 | [**Example**](https://github.com/nnethercote/counts/commit/7d39bbb1867720ef3b9799fee739cd717ad1539a). 418 | 419 | ## Using an Alternative Allocator 420 | 421 | It is also possible to improve heap allocation performance without changing 422 | your code, simply by using a different allocator. See the [Alternative 423 | Allocators] section for details. 424 | 425 | [Alternative Allocators]: build-configuration.md#alternative-allocators 426 | 427 | ## Avoiding Regressions 428 | 429 | To ensure the number and/or size of allocations done by your code doesn't 430 | increase unintentionally, you can use the *heap usage testing* feature of 431 | [dhat-rs] to write tests that check particular code snippets allocate the 432 | expected amount of heap memory. 433 | 434 | [dhat-rs]: https://crates.io/crates/dhat 435 | -------------------------------------------------------------------------------- /src/inlining.md: -------------------------------------------------------------------------------- 1 | # Inlining 2 | 3 | Entry to and exit from hot, uninlined functions often accounts for a 4 | non-trivial fraction of execution time. Inlining these functions removes these 5 | entries and exits and can enable additional low-level optimizations by the 6 | compiler. In the best case the overall effect is small but easy speed wins. 7 | 8 | There are four inline attributes that can be used on Rust functions. 9 | - **None**. The compiler will decide itself if the function should be inlined. 10 | This will depend on factors such as the optimization level, the size of the 11 | function, whether the function is generic, and if the inlining is across a 12 | crate boundary. 13 | - **`#[inline]`**. This suggests that the function should be inlined. 14 | - **`#[inline(always)]`**. This strongly suggests that the function should be 15 | inlined. 16 | - **`#[inline(never)]`**. This strongly suggests that the function should not 17 | be inlined. 18 | 19 | Inline attributes do not guarantee that a function is inlined or not inlined, 20 | but in practice `#[inline(always)]` will cause inlining in all but the most 21 | exceptional cases. 22 | 23 | Inlining is non-transitive. If a function `f` calls a function `g` and you want 24 | both functions to be inlined together at a callsite to `f`, both functions 25 | should be marked with an inline attribute. 26 | 27 | ## Simple Cases 28 | 29 | The best candidates for inlining are (a) functions that are very small, or (b) 30 | functions that have a single call site. The compiler will often inline these 31 | functions itself even without an inline attribute. But the compiler cannot 32 | always make the best choices, so attributes are sometimes needed. 33 | [**Example 1**](https://github.com/rust-lang/rust/pull/37083/commits/6a4bb35b70862f33ac2491ffe6c55fb210c8490d), 34 | [**Example 2**](https://github.com/rust-lang/rust/pull/50407/commits/e740b97be699c9445b8a1a7af6348ca2d4c460ce), 35 | [**Example 3**](https://github.com/rust-lang/rust/pull/50564/commits/77c40f8c6f8cc472f6438f7724d60bf3b7718a0c), 36 | [**Example 4**](https://github.com/rust-lang/rust/pull/57719/commits/92fd6f9d30d0b6b4ecbcf01534809fb66393f139), 37 | [**Example 5**](https://github.com/rust-lang/rust/pull/69256/commits/e761f3af904b3c275bdebc73bb29ffc45384945d). 38 | 39 | Cachegrind is a good profiler for determining if a function is inlined. When 40 | looking at Cachegrind's output, you can tell that a function has been inlined 41 | if (and only if) its first and last lines are *not* marked with event counts. 42 | For example: 43 | ```text 44 | . #[inline(always)] 45 | . fn inlined(x: u32, y: u32) -> u32 { 46 | 700,000 eprintln!("inlined: {} + {}", x, y); 47 | 200,000 x + y 48 | . } 49 | . 50 | . #[inline(never)] 51 | 400,000 fn not_inlined(x: u32, y: u32) -> u32 { 52 | 700,000 eprintln!("not_inlined: {} + {}", x, y); 53 | 200,000 x + y 54 | 200,000 } 55 | ``` 56 | You should measure again after adding inline attributes, because the effects 57 | can be unpredictable. Sometimes it has no effect because a nearby function that 58 | was previously inlined no longer is. Sometimes it slows the code down. Inlining 59 | can also affect compile times, especially cross-crate inlining which involves 60 | duplicating internal representations of the functions. 61 | 62 | ## Harder Cases 63 | 64 | Sometimes you have a function that is large and has multiple call sites, but 65 | only one call site is hot. You would like to inline the hot call site for 66 | speed, but not inline the cold call sites to avoid unnecessary code bloat. The 67 | way to handle this is to split the function always-inlined and never-inlined 68 | variants, with the latter calling the former. 69 | 70 | For example, this function: 71 | ```rust 72 | # fn one() {}; 73 | # fn two() {}; 74 | # fn three() {}; 75 | fn my_function() { 76 | one(); 77 | two(); 78 | three(); 79 | } 80 | ``` 81 | Would become these two functions: 82 | ```rust 83 | # fn one() {}; 84 | # fn two() {}; 85 | # fn three() {}; 86 | // Use this at the hot call site. 87 | #[inline(always)] 88 | fn inlined_my_function() { 89 | one(); 90 | two(); 91 | three(); 92 | } 93 | 94 | // Use this at the cold call sites. 95 | #[inline(never)] 96 | fn uninlined_my_function() { 97 | inlined_my_function(); 98 | } 99 | ``` 100 | [**Example 1**](https://github.com/rust-lang/rust/pull/53513/commits/b73843f9422fb487b2d26ac2d65f79f73a4c9ae3), 101 | [**Example 2**](https://github.com/rust-lang/rust/pull/64420/commits/a2261ad66400c3145f96ebff0d9b75e910fa89dd). 102 | 103 | ## Outlining 104 | 105 | The inverse of inlining is *outlining*: moving rarely executed code into a 106 | separate function. You can add a `#[cold]` attribute to such functions to tell 107 | the compiler that the function is rarely called. This can result in better code 108 | generation for the hot path. 109 | [**Example 1**](https://github.com/Lokathor/tinyvec/pull/127), 110 | [**Example 2**](https://crates.io/crates/fast_assert). 111 | -------------------------------------------------------------------------------- /src/introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Performance is important for many Rust programs. 4 | 5 | This book contains techniques that can improve the performance-related 6 | characteristics of Rust programs, such as runtime speed, memory usage, and 7 | binary size. The [Compile Times] section also contains techniques that will 8 | improve the compile times of Rust programs. Some techniques only require 9 | changing build configurations, but many require changing code. 10 | 11 | [Compile Times]: compile-times.md 12 | 13 | Some techniques are entirely Rust-specific, and some involve ideas that can be 14 | applied (often with modifications) to programs written in other languages. The 15 | [General Tips] section also includes some general principles that apply to any 16 | programming language. Nonetheless, this book is mostly about the performance of 17 | Rust programs and is no substitute for a general purpose guide to profiling and 18 | optimization. 19 | 20 | [General Tips]: general-tips.md 21 | 22 | This book also focuses on techniques that are practical and proven: many are 23 | accompanied by links to pull requests or other resources that show how the 24 | technique was used on a real-world Rust program. It reflects the primary 25 | author's background, being somewhat biased towards compiler development and 26 | away from other areas such as scientific computing. 27 | 28 | This book is deliberately terse, favouring breadth over depth, so that it is 29 | quick to read. It links to external sources that provide more depth when 30 | appropriate. 31 | 32 | This book is aimed at intermediate and advanced Rust users. Beginner Rust users 33 | have more than enough to learn and these techniques are likely to be an 34 | unhelpful distraction to them. 35 | -------------------------------------------------------------------------------- /src/io.md: -------------------------------------------------------------------------------- 1 | # I/O 2 | 3 | ## Locking 4 | 5 | Rust's [`print!`] and [`println!`] macros lock stdout on every call. If you 6 | have repeated calls to these macros it may be better to lock stdout manually. 7 | 8 | [`print!`]: https://doc.rust-lang.org/std/macro.print.html 9 | [`println!`]: https://doc.rust-lang.org/std/macro.println.html 10 | 11 | For example, change this code: 12 | ```rust 13 | # let lines = vec!["one", "two", "three"]; 14 | for line in lines { 15 | println!("{}", line); 16 | } 17 | ``` 18 | to this: 19 | ```rust 20 | # fn blah() -> Result<(), std::io::Error> { 21 | # let lines = vec!["one", "two", "three"]; 22 | use std::io::Write; 23 | let mut stdout = std::io::stdout(); 24 | let mut lock = stdout.lock(); 25 | for line in lines { 26 | writeln!(lock, "{}", line)?; 27 | } 28 | // stdout is unlocked when `lock` is dropped 29 | # Ok(()) 30 | # } 31 | ``` 32 | stdin and stderr can likewise be locked when doing repeated operations on them. 33 | 34 | ## Buffering 35 | 36 | Rust file I/O is unbuffered by default. If you have many small and repeated 37 | read or write calls to a file or network socket, use [`BufReader`] or 38 | [`BufWriter`]. They maintain an in-memory buffer for input and output, 39 | minimizing the number of system calls required. 40 | 41 | [`BufReader`]: https://doc.rust-lang.org/std/io/struct.BufReader.html 42 | [`BufWriter`]: https://doc.rust-lang.org/std/io/struct.BufWriter.html 43 | 44 | For example, change this unbuffered writer code: 45 | ```rust 46 | # fn blah() -> Result<(), std::io::Error> { 47 | # let lines = vec!["one", "two", "three"]; 48 | use std::io::Write; 49 | let mut out = std::fs::File::create("test.txt")?; 50 | for line in lines { 51 | writeln!(out, "{}", line)?; 52 | } 53 | # Ok(()) 54 | # } 55 | ``` 56 | to this: 57 | ```rust 58 | # fn blah() -> Result<(), std::io::Error> { 59 | # let lines = vec!["one", "two", "three"]; 60 | use std::io::{BufWriter, Write}; 61 | let mut out = BufWriter::new(std::fs::File::create("test.txt")?); 62 | for line in lines { 63 | writeln!(out, "{}", line)?; 64 | } 65 | out.flush()?; 66 | # Ok(()) 67 | # } 68 | ``` 69 | [**Example 1**](https://github.com/rust-lang/rust/pull/93954), 70 | [**Example 2**](https://github.com/nnethercote/dhat-rs/pull/22/commits/8c3ae26f1219474ee55c30bc9981e6af2e869be2). 71 | 72 | The explicit call to [`flush`] is not strictly necessary, as flushing will 73 | happen automatically when `out` is dropped. However, in that case any error 74 | that occurs on flushing will be ignored, whereas an explicit flush will make 75 | that error explicit. 76 | 77 | [`flush`]: https://doc.rust-lang.org/std/io/trait.Write.html#tymethod.flush 78 | 79 | Forgetting to buffer is more common when writing. Both unbuffered and buffered 80 | writers implement the [`Write`] trait, which means the code for writing 81 | to an unbuffered writer and a buffered writer is much the same. In contrast, 82 | unbuffered readers implement the [`Read`] trait but buffered readers implement 83 | the [`BufRead`] trait, which means the code for reading from an unbuffered reader 84 | and a buffered reader is different. For example, it is difficult to read a file 85 | line by line with an unbuffered reader, but it is trivial with a buffered 86 | reader by using [`BufRead::read_line`] or [`BufRead::lines`]. For this reason, 87 | it is hard to write an example for readers like the one above for writers, 88 | where the before and after versions are so similar. 89 | 90 | [`Write`]: https://doc.rust-lang.org/std/io/trait.Write.html 91 | [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html 92 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html 93 | [`BufRead::read_line`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_line 94 | [`BufRead::lines`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.lines 95 | 96 | Finally, note that buffering also works with stdout, so you might want to 97 | combine manual locking *and* buffering when making many writes to stdout. 98 | 99 | ## Reading Lines from a File 100 | 101 | [This section] explains how to avoid excessive allocations when using 102 | [`BufRead`] to read a file one line at a time. 103 | 104 | [This section]: heap-allocations.md#reading-lines-from-a-file 105 | [`BufRead`]: https://doc.rust-lang.org/std/io/trait.BufRead.html 106 | 107 | ## Reading Input as Raw Bytes 108 | 109 | The built-in [String] type uses UTF-8 internally, which adds a small, but 110 | nonzero overhead caused by UTF-8 validation when you read input into it. If you 111 | just want to process input bytes without worrying about UTF-8 (for example if 112 | you handle ASCII text), you can use [`BufRead::read_until`]. 113 | 114 | [String]: https://doc.rust-lang.org/std/string/struct.String.html 115 | [`BufRead::read_until`]: https://doc.rust-lang.org/std/io/trait.BufRead.html#method.read_until 116 | 117 | There are also dedicated crates for reading [byte-oriented lines of data] 118 | and working with [byte strings]. 119 | 120 | [byte-oriented lines of data]: https://github.com/Freaky/rust-linereader 121 | [byte strings]: https://github.com/BurntSushi/bstr 122 | -------------------------------------------------------------------------------- /src/iterators.md: -------------------------------------------------------------------------------- 1 | # Iterators 2 | 3 | ## `collect` and `extend` 4 | 5 | [`Iterator::collect`] converts an iterator into a collection such as `Vec`, 6 | which typically requires an allocation. You should avoid calling `collect` if 7 | the collection is then only iterated over again. 8 | 9 | [`Iterator::collect`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect 10 | 11 | For this reason, it is often better to return an iterator type like `impl 12 | Iterator` from a function than a `Vec`. Note that sometimes 13 | additional lifetimes are required on these return types, as [this blog post] 14 | explains. 15 | [**Example**](https://github.com/rust-lang/rust/pull/77990/commits/660d8a6550a126797aa66a417137e39a5639451b). 16 | 17 | [this blog post]: https://blog.katona.me/2019/12/29/Rust-Lifetimes-and-Iterators/ 18 | 19 | Similarly, you can use [`extend`] to extend an existing collection (such as a 20 | `Vec`) with an iterator, rather than collecting the iterator into a `Vec` and 21 | then using [`append`]. 22 | 23 | [`extend`]: https://doc.rust-lang.org/std/iter/trait.Extend.html#tymethod.extend 24 | [`append`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.append 25 | 26 | Finally, when you write an iterator it is often worth implementing the 27 | [`Iterator::size_hint`] or [`ExactSizeIterator::len`] method, if possible. 28 | `collect` and `extend` calls that use the iterator may then do fewer 29 | allocations, because they have advance information about the number of elements 30 | yielded by the iterator. 31 | 32 | [`Iterator::size_hint`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint 33 | [`ExactSizeIterator::len`]: https://doc.rust-lang.org/std/iter/trait.ExactSizeIterator.html#method.len 34 | 35 | ## Chaining 36 | 37 | [`chain`] can be very convenient, but it can also be slower than a single 38 | iterator. It may be worth avoiding for hot iterators, if possible. 39 | [**Example**](https://github.com/rust-lang/rust/pull/64801/commits/5ca99b750e455e9b5e13e83d0d7886486231e48a). 40 | 41 | Similarly, [`filter_map`] may be faster than using [`filter`] followed by 42 | [`map`]. 43 | 44 | [`chain`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.chain 45 | [`filter_map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter_map 46 | [`filter`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.filter 47 | [`map`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.map 48 | 49 | ## Chunks 50 | 51 | When a chunking iterator is required and the chunk size is known to exactly 52 | divide the slice length, use the faster [`slice::chunks_exact`] instead of [`slice::chunks`]. 53 | 54 | When the chunk size is not known to exactly divide the slice length, it can 55 | still be faster to use `slice::chunks_exact` in combination with either 56 | [`ChunksExact::remainder`] or manual handling of excess elements. 57 | [**Example 1**](https://github.com/johannesvollmer/exrs/pull/173/files), 58 | [**Example 2**](https://github.com/johannesvollmer/exrs/pull/175/files). 59 | 60 | The same is true for related iterators: 61 | - [`slice::rchunks`], [`slice::rchunks_exact`], and [`RChunksExact::remainder`]; 62 | - [`slice::chunks_mut`], [`slice::chunks_exact_mut`], and [`ChunksExactMut::into_remainder`]; 63 | - [`slice::rchunks_mut`], [`slice::rchunks_exact_mut`], and [`RChunksExactMut::into_remainder`]. 64 | 65 | [`slice::chunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks 66 | [`slice::chunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact 67 | [`ChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExact.html#method.remainder 68 | 69 | [`slice::rchunks`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks 70 | [`slice::rchunks_exact`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact 71 | [`RChunksExact::remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExact.html#method.remainder 72 | 73 | [`slice::chunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_mut 74 | [`slice::chunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunks_exact_mut 75 | [`ChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.ChunksExactMut.html#method.into_remainder 76 | 77 | [`slice::rchunks_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_mut 78 | [`slice::rchunks_exact_mut`]: https://doc.rust-lang.org/stable/std/primitive.slice.html#method.rchunks_exact_mut 79 | [`RChunksExactMut::into_remainder`]: https://doc.rust-lang.org/stable/std/slice/struct.RChunksExactMut.html#method.into_remainder 80 | 81 | ## `copied` 82 | 83 | When iterating over collections of small data types, such as integers, it may 84 | be better to use `iter().copied()` instead of `iter()`. Whatever consumes that 85 | iterator will receive the integers by value instead of by reference, and LLVM 86 | may generate better code in that case. 87 | [**Example 1**](https://github.com/rust-lang/rust/issues/106539), 88 | [**Example 2**](https://github.com/rust-lang/rust/issues/113789). 89 | 90 | This is an advanced technique. You might need to check the generated machine 91 | code to be certain it is having an effect. See the [Machine 92 | Code](machine-code.md) chapter for details on how to do that. 93 | -------------------------------------------------------------------------------- /src/linting.md: -------------------------------------------------------------------------------- 1 | # Linting 2 | 3 | [Clippy] is a collection of lints to catch common mistakes in Rust code. It is 4 | an excellent tool to run on Rust code in general. It can also help with 5 | performance, because a number of the lints relate to code patterns that can 6 | cause sub-optimal performance. 7 | 8 | Given that automated detection of problems is preferable to manual detection, 9 | the rest of this book will not mention performance problems that Clippy detects 10 | by default. 11 | 12 | ## Basics 13 | 14 | [Clippy]: https://github.com/rust-lang/rust-clippy 15 | 16 | Once installed, it is easy to run: 17 | ```text 18 | cargo clippy 19 | ``` 20 | The full list of performance lints can be seen by visiting the [lint list] and 21 | deselecting all the lint groups except for "Perf". 22 | 23 | [lint list]: https://rust-lang.github.io/rust-clippy/master/ 24 | 25 | As well as making the code faster, the performance lint suggestions usually 26 | result in code that is simpler and more idiomatic, so they are worth following 27 | even for code that is not executed frequently. 28 | 29 | Conversely, some non-performance lint suggestions can improve performance. For 30 | example, the [`ptr_arg`] style lint suggests changing various container 31 | arguments to slices, such as changing `&mut Vec` arguments to `&mut [T]`. 32 | The primary motivation here is that a slice gives a more flexible API, but it 33 | may also result in faster code due to less indirection and better optimization 34 | opportunities for the compiler. 35 | [**Example**](https://github.com/fschutt/fastblur/pull/3/files). 36 | 37 | [`ptr_arg`]: https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg 38 | 39 | ## Disallowing Types 40 | 41 | In the following chapters we will see that it is sometimes worth avoiding 42 | certain standard library types in favour of alternatives that are faster. If 43 | you decide to use these alternatives, it is easy to accidentally use the 44 | standard library types in some places by mistake. 45 | 46 | You can use Clippy's [`disallowed_types`] lint to avoid this problem. For 47 | example, to disallow the use of the standard hash tables (for reasons explained 48 | in the [Hashing] section) add a `clippy.toml` file to your code with the 49 | following line. 50 | ```toml 51 | disallowed-types = ["std::collections::HashMap", "std::collections::HashSet"] 52 | ``` 53 | 54 | [Hashing]: hashing.md 55 | [`disallowed_types`]: https://rust-lang.github.io/rust-clippy/master/index.html#disallowed_types 56 | -------------------------------------------------------------------------------- /src/logging-and-debugging.md: -------------------------------------------------------------------------------- 1 | # Logging and Debugging 2 | 3 | Sometimes logging code or debugging code can slow down a program significantly. 4 | Either the logging/debugging code itself is slow, or data collection code that 5 | feeds into logging/debugging code is slow. Make sure that no unnecessary work 6 | is done for logging/debugging purposes when logging/debugging is not enabled. 7 | [**Example 1**](https://github.com/rust-lang/rust/pull/50246/commits/2e4f66a86f7baa5644d18bb2adc07a8cd1c7409d), 8 | [**Example 2**](https://github.com/rust-lang/rust/pull/75133/commits/eeb4b83289e09956e0dda174047729ca87c709fe), 9 | [**Example 3**](https://github.com/rust-lang/rust/pull/147293/commits/cb0f969b623a7e12a0d8166c9a498e17a8b5a3c4). 10 | 11 | Note that [`assert!`] calls always run, but [`debug_assert!`] calls only run in 12 | dev builds. If you have an assertion that is hot but is not necessary for 13 | safety, consider making it a `debug_assert!`. 14 | [**Example 1**](https://github.com/rust-lang/rust/pull/58210/commits/f7ed6e18160bc8fccf27a73c05f3935c9e8f672e), 15 | [**Example 2**](https://github.com/rust-lang/rust/pull/90746/commits/580d357b5adef605fc731d295ca53ab8532e26fb). 16 | 17 | [`assert!`]: https://doc.rust-lang.org/std/macro.assert.html 18 | [`debug_assert!`]: https://doc.rust-lang.org/std/macro.debug_assert.html 19 | -------------------------------------------------------------------------------- /src/machine-code.md: -------------------------------------------------------------------------------- 1 | # Machine Code 2 | 3 | When you have a small piece of very hot code it may be worth inspecting the 4 | generated machine code to see if it has any inefficiencies, such as removable 5 | [bounds checks]. The [Compiler Explorer] website is an excellent resource when 6 | doing this on small snippets. [`cargo-show-asm`] is an alternative tool that 7 | can be used on full Rust projects. 8 | 9 | [bounds checks]: bounds-checks.md 10 | [Compiler Explorer]: https://godbolt.org/ 11 | [`cargo-show-asm`]: https://github.com/pacak/cargo-show-asm 12 | 13 | Relatedly, the [`core::arch`] module provides access to architecture-specific 14 | intrinsics, many of which relate to SIMD instructions. 15 | 16 | [`core::arch`]: https://doc.rust-lang.org/core/arch/index.html 17 | -------------------------------------------------------------------------------- /src/parallelism.md: -------------------------------------------------------------------------------- 1 | # Parallelism 2 | 3 | Rust provides excellent support for safe parallel programming, which can lead 4 | to large performance improvements. There are a variety of ways to introduce 5 | parallelism into a program and the best way for any program will depend greatly 6 | on its design. 7 | 8 | Having said that, an in-depth treatment of parallelism is beyond the scope of 9 | this book. 10 | 11 | If you are interested in thread-based parallelism, the documentation for the 12 | [`rayon`] and [`crossbeam`] crates is a good place to start. [Rust Atomics and 13 | Locks][Atomics] is also an excellent resource. 14 | 15 | [`rayon`]: https://crates.io/crates/rayon 16 | [`crossbeam`]: https://crates.io/crates/crossbeam 17 | [Atomics]: https://marabos.nl/atomics/ 18 | 19 | If you are interested in fine-grained data parallelism, this [blog post] is a 20 | good overview of the state of SIMD support in Rust as of November 2025. 21 | 22 | [blog post]: https://shnatsel.medium.com/the-state-of-simd-in-rust-in-2025-32c263e5f53d 23 | 24 | -------------------------------------------------------------------------------- /src/profiling.md: -------------------------------------------------------------------------------- 1 | # Profiling 2 | 3 | When optimizing a program, you also need a way to determine which parts of the 4 | program are "hot" (executed frequently enough to affect runtime) and worth 5 | modifying. This is best done via profiling. 6 | 7 | ## Profilers 8 | 9 | There are many different profilers available, each with their strengths and 10 | weaknesses. The following is an incomplete list of profilers that have been 11 | used successfully on Rust programs. 12 | - [perf] is a general-purpose profiler that uses hardware performance counters. 13 | [Hotspot] and [Firefox Profiler] are good for viewing data recorded by perf. 14 | It works on Linux. 15 | - [Instruments] is a general-purpose profiler that comes with Xcode on macOS. 16 | - [Intel VTune Profiler] is a general-purpose profiler. It works on Windows, 17 | Linux, and macOS. 18 | - [AMD μProf] is a general-purpose profiler. It works on Windows and Linux. 19 | - [samply] is a sampling profiler that produces profiles that can be viewed 20 | in the Firefox Profiler. It works on Mac, Linux, and Windows. 21 | - [flamegraph] is a Cargo command that uses perf/DTrace to profile your 22 | code and then displays the results in a flame graph. It works on Linux and 23 | all platforms that support DTrace (macOS, FreeBSD, NetBSD, and possibly 24 | Windows). 25 | - [Cachegrind] & [Callgrind] give global, per-function, and per-source-line 26 | instruction counts and simulated cache and branch prediction data. They work 27 | on Linux and some other Unixes. 28 | - [DHAT] is good for finding which parts of the code are causing a lot of 29 | allocations, and for giving insight into peak memory usage. It can also be 30 | used to identify hot calls to `memcpy`. It works on Linux and some other 31 | Unixes. [dhat-rs] is an experimental alternative that is a little less 32 | powerful and requires minor changes to your Rust program, but works on all 33 | platforms. 34 | - [Iai-Callgrind] provides `cargo bench` integration for the [Valgrind]-based 35 | profilers: Cachegrind, Callgrind, 36 | DHAT, and others. It works on Linux and some other Unixes. 37 | - [heaptrack] and [bytehound] are heap profiling tools. They work on Linux. 38 | - [`counts`] supports ad hoc profiling, which combines the use of `eprintln!` 39 | statement with frequency-based post-processing, which is good for getting 40 | domain-specific insights into parts of your code. It works on all platforms. 41 | - [Coz] performs *causal profiling* to measure optimization potential, and has 42 | Rust support via [coz-rs]. It works on Linux. 43 | 44 | [perf]: https://perf.wiki.kernel.org/index.php/Main_Page 45 | [Hotspot]: https://github.com/KDAB/hotspot 46 | [Firefox Profiler]: https://profiler.firefox.com/ 47 | [Instruments]: https://developer.apple.com/forums/tags/instruments 48 | [Intel VTune Profiler]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html 49 | [AMD μProf]: https://developer.amd.com/amd-uprof/ 50 | [samply]: https://github.com/mstange/samply/ 51 | [flamegraph]: https://github.com/flamegraph-rs/flamegraph 52 | [Cachegrind]: https://www.valgrind.org/docs/manual/cg-manual.html 53 | [Callgrind]: https://www.valgrind.org/docs/manual/cl-manual.html 54 | [Iai-Callgrind]: https://github.com/iai-callgrind/iai-callgrind 55 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html 56 | [dhat-rs]: https://github.com/nnethercote/dhat-rs/ 57 | [Valgrind]: https://valgrind.org/ 58 | [heaptrack]: https://github.com/KDE/heaptrack 59 | [bytehound]: https://github.com/koute/bytehound 60 | [`counts`]: https://github.com/nnethercote/counts/ 61 | [Coz]: https://github.com/plasma-umass/coz 62 | [coz-rs]: https://github.com/plasma-umass/coz/tree/master/rust 63 | 64 | ## Debug Info 65 | 66 | To profile a release build effectively you might need to enable source line 67 | debug info. To do this, add the following lines to your `Cargo.toml` file: 68 | ```toml 69 | [profile.release] 70 | debug = "line-tables-only" 71 | ``` 72 | See the [Cargo documentation] for more details about the `debug` setting. 73 | 74 | [Cargo documentation]: https://doc.rust-lang.org/cargo/reference/profiles.html#debug 75 | 76 | Unfortunately, even after doing the above step you won't get detailed profiling 77 | information for standard library code. This is because shipped versions of the 78 | Rust standard library are not built with debug info. 79 | 80 | The most reliable way around this is to build your own version of the compiler 81 | and standard library, following [these instructions], and adding the following 82 | lines to a `bootstrap.toml` file in the repository root: 83 | ```toml 84 | [rust] 85 | debuginfo-level = 1 86 | ``` 87 | This is a hassle, but may be worth the effort in some cases. 88 | 89 | [these instructions]: https://github.com/rust-lang/rust 90 | 91 | Alternatively, the unstable [build-std] feature lets you compile the standard 92 | library as part of your program's normal compilation, with the same build 93 | configuration. However, filenames present in the debug info for the standard 94 | library will not point to source code files, because this feature does not also 95 | download standard library source code. So this approach will not help with 96 | profilers such as Cachegrind and samply that require source code to work fully. 97 | 98 | [build-std]: https://doc.rust-lang.org/cargo/reference/unstable.html#build-std 99 | 100 | ## Frame pointers 101 | 102 | The Rust compiler may optimize away frame pointers, which can hurt the quality 103 | of profiling information such as stack traces. To force the compiler to use 104 | frame pointers, use the `-C force-frame-pointers=yes` flag. For example: 105 | ```bash 106 | RUSTFLAGS="-C force-frame-pointers=yes" cargo build --release 107 | ``` 108 | 109 | Alternatively, to force the use frame pointers from a [`config.toml`] file (for 110 | one or more projects), add these lines: 111 | ```toml 112 | [build] 113 | rustflags = ["-C", "force-frame-pointers=yes"] 114 | ``` 115 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 116 | 117 | ## Symbol Demangling 118 | 119 | Rust uses a form of name mangling to encode function names in compiled code. If 120 | a profiler is unaware of this, its output may contain symbol names beginning 121 | with `_ZN` or `_R`, such as `_ZN3foo3barE` or 122 | `_ZN28_$u7b$$u7b$closure$u7d$$u7d$E` or 123 | `_RMCsno73SFvQKx_1cINtB0_3StrKRe616263_E` 124 | 125 | Names like these can be manually demangled using [`rustfilt`]. 126 | 127 | [`rustfilt`]: https://crates.io/crates/rustfilt 128 | 129 | If you are having trouble with symbol demangling while profiling, it may be 130 | worth changing the [mangling format] from the default legacy format to the newer 131 | v0 format. 132 | 133 | [mangling format]: https://doc.rust-lang.org/rustc/codegen-options/index.html#symbol-mangling-version 134 | 135 | To use the v0 format from the command line, use the `-C 136 | symbol-mangling-version=v0` flag. For example: 137 | ```bash 138 | RUSTFLAGS="-C symbol-mangling-version=v0" cargo build --release 139 | ``` 140 | 141 | Alternatively, to request these instructions from a [`config.toml`] file (for 142 | one or more projects), add these lines: 143 | ```toml 144 | [build] 145 | rustflags = ["-C", "symbol-mangling-version=v0"] 146 | ``` 147 | [`config.toml`]: https://doc.rust-lang.org/cargo/reference/config.html 148 | 149 | -------------------------------------------------------------------------------- /src/standard-library-types.md: -------------------------------------------------------------------------------- 1 | # Standard Library Types 2 | 3 | It is worth reading through the documentation for common standard library 4 | types—such as [`Vec`], [`Option`], [`Result`], and [`Rc`]/[`Arc`]—to find interesting 5 | functions that can sometimes be used to improve performance. 6 | 7 | [`Vec`]: https://doc.rust-lang.org/std/vec/struct.Vec.html 8 | [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html 9 | [`Result`]: https://doc.rust-lang.org/std/result/enum.Result.html 10 | [`Rc`]: https://doc.rust-lang.org/std/rc/struct.Rc.html 11 | [`Arc`]: https://doc.rust-lang.org/std/sync/struct.Arc.html 12 | 13 | It is also worth knowing about high-performance alternatives to standard 14 | library types, such as [`Mutex`], [`RwLock`], [`Condvar`], and 15 | [`Once`]. 16 | 17 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html 18 | [`RwLock`]: https://doc.rust-lang.org/std/sync/struct.RwLock.html 19 | [`Condvar`]: https://doc.rust-lang.org/std/sync/struct.Condvar.html 20 | [`Once`]: https://doc.rust-lang.org/std/sync/struct.Once.html 21 | 22 | ## `Vec` 23 | 24 | The best way to create a zero-filled `Vec` of length `n` is with `vec![0; n]`. 25 | This is simple and probably [as fast or faster] than alternatives, such as 26 | using `resize`, `extend`, or anything involving `unsafe`, because it can use OS 27 | assistance. 28 | 29 | [as fast or faster]: https://github.com/rust-lang/rust/issues/54628 30 | 31 | [`Vec::remove`] removes an element at a particular index and shifts all 32 | subsequent elements one to the left, which makes it O(n). [`Vec::swap_remove`] 33 | replaces an element at a particular index with the final element, which does 34 | not preserve ordering, but is O(1). 35 | 36 | [`Vec::retain`] efficiently removes multiple items from a `Vec`. There is an 37 | equivalent method for other collection types such as `String`, `HashSet`, and 38 | `HashMap`. 39 | 40 | [`Vec::remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.remove 41 | [`Vec::swap_remove`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.swap_remove 42 | [`Vec::retain`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain 43 | 44 | ## `Option` and `Result` 45 | 46 | [`Option::ok_or`] converts an `Option` into a `Result`, and is passed an `err` 47 | parameter that is used if the `Option` value is `None`. `err` is computed 48 | eagerly. If its computation is expensive, you should instead use 49 | [`Option::ok_or_else`], which computes the error value lazily via a closure. 50 | For example, this: 51 | ```rust 52 | # fn expensive() {} 53 | # let o: Option = None; 54 | let r = o.ok_or(expensive()); // always evaluates `expensive()` 55 | ``` 56 | should be changed to this: 57 | ```rust 58 | # fn expensive() {} 59 | # let o: Option = None; 60 | let r = o.ok_or_else(|| expensive()); // evaluates `expensive()` only when needed 61 | ``` 62 | [**Example**](https://github.com/rust-lang/rust/pull/50051/commits/5070dea2366104fb0b5c344ce7f2a5cf8af176b0). 63 | 64 | [`Option::ok_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or 65 | [`Option::ok_or_else`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.ok_or_else 66 | 67 | There are similar alternatives for [`Option::map_or`], [`Option::unwrap_or`], 68 | [`Result::or`], [`Result::map_or`], and [`Result::unwrap_or`]. 69 | 70 | [`Option::map_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or 71 | [`Option::unwrap_or`]: https://doc.rust-lang.org/std/option/enum.Option.html#method.unwrap_or 72 | [`Result::or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.or 73 | [`Result::map_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.map_or 74 | [`Result::unwrap_or`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or 75 | 76 | ## `Rc`/`Arc` 77 | 78 | [`Rc::make_mut`]/[`Arc::make_mut`] provide clone-on-write semantics. They make 79 | a mutable reference to an `Rc`/`Arc`. If the refcount is greater than one, they 80 | will `clone` the inner value to ensure unique ownership; otherwise, they will 81 | modify the original value. They are not needed often, but they can be extremely 82 | useful on occasion. 83 | [**Example 1**](https://github.com/rust-lang/rust/pull/65198/commits/3832a634d3aa6a7c60448906e6656a22f7e35628), 84 | [**Example 2**](https://github.com/rust-lang/rust/pull/65198/commits/75e0078a1703448a19e25eac85daaa5a4e6e68ac). 85 | 86 | [`Rc::make_mut`]: https://doc.rust-lang.org/std/rc/struct.Rc.html#method.make_mut 87 | [`Arc::make_mut`]: https://doc.rust-lang.org/std/sync/struct.Arc.html#method.make_mut 88 | 89 | ## `Mutex`, `RwLock`, `Condvar`, and `Once` 90 | 91 | The [`parking_lot`] crate provides alternative implementations of these 92 | synchronization types. The APIs and semantics of the `parking_lot` types are 93 | similar but not identical to those of the equivalent types in the standard 94 | library. 95 | 96 | The `parking_lot` versions used to be reliably smaller, faster, and more 97 | flexible than those in the standard library, but the standard library versions 98 | have greatly improved on some platforms. So you should measure before switching 99 | to `parking_lot`. 100 | 101 | [`parking_lot`]: https://crates.io/crates/parking_lot 102 | 103 | If you decide to universally use the `parking_lot` types it is easy to 104 | accidentally use the standard library equivalents in some places. You can [use 105 | Clippy] to avoid this problem. 106 | 107 | [use Clippy]: linting.md#disallowing-types 108 | -------------------------------------------------------------------------------- /src/title-page.md: -------------------------------------------------------------------------------- 1 | # The Rust Performance Book 2 | 3 | **First published in November 2020** 4 | 5 | **Written by Nicholas Nethercote and others** 6 | 7 | [Source code](https://github.com/nnethercote/perf-book) 8 | -------------------------------------------------------------------------------- /src/type-sizes.md: -------------------------------------------------------------------------------- 1 | # Type Sizes 2 | 3 | Shrinking oft-instantiated types can help performance. 4 | 5 | For example, if memory usage is high, a heap profiler like [DHAT] can identify 6 | the hot allocation points and the types involved. Shrinking these types can 7 | reduce peak memory usage, and possibly improve performance by reducing memory 8 | traffic and cache pressure. 9 | 10 | [DHAT]: https://www.valgrind.org/docs/manual/dh-manual.html 11 | 12 | Furthermore, Rust types that are larger than 128 bytes are copied with `memcpy` 13 | rather than inline code. If `memcpy` shows up in non-trivial amounts in 14 | profiles, DHAT's "copy profiling" mode will tell you exactly where the hot 15 | `memcpy` calls are and the types involved. Shrinking these types to 128 bytes 16 | or less can make the code faster by avoiding `memcpy` calls and reducing memory 17 | traffic. 18 | 19 | ## Measuring Type Sizes 20 | 21 | [`std::mem::size_of`] gives the size of a type, in bytes, but often you want to 22 | know the exact layout as well. For example, an enum might be surprisingly large 23 | due to a single outsized variant. 24 | 25 | [`std::mem::size_of`]: https://doc.rust-lang.org/std/mem/fn.size_of.html 26 | 27 | The `-Zprint-type-sizes` option does exactly this. It isn’t enabled on release 28 | versions of rustc, so you’ll need to use a nightly version of rustc. Here is 29 | one possible invocation via Cargo: 30 | ```text 31 | RUSTFLAGS=-Zprint-type-sizes cargo +nightly build --release 32 | ``` 33 | And here is a possible invocation of rustc: 34 | ```text 35 | rustc +nightly -Zprint-type-sizes input.rs 36 | ``` 37 | It will print out details of the size, layout, and alignment of all types in 38 | use. For example, for this type: 39 | ```rust 40 | enum E { 41 | A, 42 | B(i32), 43 | C(u64, u8, u64, u8), 44 | D(Vec), 45 | } 46 | ``` 47 | it prints the following, plus information about a few built-in types. 48 | ```text 49 | print-type-size type: `E`: 32 bytes, alignment: 8 bytes 50 | print-type-size discriminant: 1 bytes 51 | print-type-size variant `D`: 31 bytes 52 | print-type-size padding: 7 bytes 53 | print-type-size field `.0`: 24 bytes, alignment: 8 bytes 54 | print-type-size variant `C`: 23 bytes 55 | print-type-size field `.1`: 1 bytes 56 | print-type-size field `.3`: 1 bytes 57 | print-type-size padding: 5 bytes 58 | print-type-size field `.0`: 8 bytes, alignment: 8 bytes 59 | print-type-size field `.2`: 8 bytes 60 | print-type-size variant `B`: 7 bytes 61 | print-type-size padding: 3 bytes 62 | print-type-size field `.0`: 4 bytes, alignment: 4 bytes 63 | print-type-size variant `A`: 0 bytes 64 | ``` 65 | The output shows the following. 66 | - The size and alignment of the type. 67 | - For enums, the size of the discriminant. 68 | - For enums, the size of each variant (sorted from largest to smallest). 69 | - The size, alignment, and ordering of all fields. (Note that the compiler has 70 | reordered variant `C`'s fields to minimize the size of `E`.) 71 | - The size and location of all padding. 72 | 73 | Alternatively, the [top-type-sizes] crate can be used to display the output in 74 | a more compact form. 75 | 76 | [top-type-sizes]: https://crates.io/crates/top-type-sizes 77 | 78 | Once you know the layout of a hot type, there are multiple ways to shrink it. 79 | 80 | ## Field Ordering 81 | 82 | The Rust compiler automatically sorts the fields in struct and enums to 83 | minimize their sizes (unless the `#[repr(C)]` attribute is specified), so you 84 | do not have to worry about field ordering. But there are other ways to minimize 85 | the size of hot types. 86 | 87 | ## Smaller Enums 88 | 89 | If an enum has an outsized variant, consider boxing one or more fields. For 90 | example, you could change this type: 91 | ```rust 92 | type LargeType = [u8; 100]; 93 | enum A { 94 | X, 95 | Y(i32), 96 | Z(i32, LargeType), 97 | } 98 | ``` 99 | to this: 100 | ```rust 101 | # type LargeType = [u8; 100]; 102 | enum A { 103 | X, 104 | Y(i32), 105 | Z(Box<(i32, LargeType)>), 106 | } 107 | ``` 108 | This reduces the type size at the cost of requiring an extra heap allocation 109 | for the `A::Z` variant. This is more likely to be a net performance win if the 110 | `A::Z` variant is relatively rare. The `Box` will also make `A::Z` slightly 111 | less ergonomic to use, especially in `match` patterns. 112 | [**Example 1**](https://github.com/rust-lang/rust/pull/37445/commits/a920e355ea837a950b484b5791051337cd371f5d), 113 | [**Example 2**](https://github.com/rust-lang/rust/pull/55346/commits/38d9277a77e982e49df07725b62b21c423b6428e), 114 | [**Example 3**](https://github.com/rust-lang/rust/pull/64302/commits/b972ac818c98373b6d045956b049dc34932c41be), 115 | [**Example 4**](https://github.com/rust-lang/rust/pull/64374/commits/2fcd870711ce267c79408ec631f7eba8e0afcdf6), 116 | [**Example 5**](https://github.com/rust-lang/rust/pull/64394/commits/7f0637da5144c7435e88ea3805021882f077d50c), 117 | [**Example 6**](https://github.com/rust-lang/rust/pull/71942/commits/27ae2f0d60d9201133e1f9ec7a04c05c8e55e665). 118 | 119 | ## Smaller Integers 120 | 121 | It is often possible to shrink types by using smaller integer types. For 122 | example, while it is most natural to use `usize` for indices, it is often 123 | reasonable to stores indices as `u32`, `u16`, or even `u8`, and then coerce to 124 | `usize` at use points. 125 | [**Example 1**](https://github.com/rust-lang/rust/pull/49993/commits/4d34bfd00a57f8a8bdb60ec3f908c5d4256f8a9a), 126 | [**Example 2**](https://github.com/rust-lang/rust/pull/50981/commits/8d0fad5d3832c6c1f14542ea0be038274e454524). 127 | 128 | ## Boxed Slices 129 | 130 | Rust vectors contain three words: a length, a capacity, and a pointer. If you 131 | have a vector that is unlikely to be changed in the future, you can convert it 132 | to a *boxed slice* with [`Vec::into_boxed_slice`]. A boxed slice contains only 133 | two words, a length and a pointer. Any excess element capacity is dropped, 134 | which may cause a reallocation. 135 | ```rust 136 | # use std::mem::{size_of, size_of_val}; 137 | let v: Vec = vec![1, 2, 3]; 138 | assert_eq!(size_of_val(&v), 3 * size_of::()); 139 | 140 | let bs: Box<[u32]> = v.into_boxed_slice(); 141 | assert_eq!(size_of_val(&bs), 2 * size_of::()); 142 | ``` 143 | Alternatively, a boxed slice can be constructed directly from an iterator with 144 | [`Iterator::collect`], avoiding the need for reallocation. 145 | ```rust 146 | let bs: Box<[u32]> = (1..3).collect(); 147 | ``` 148 | A boxed slice can be converted to a vector with [`slice::into_vec`] without any 149 | cloning or reallocation. 150 | 151 | [`Vec::into_boxed_slice`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.into_boxed_slice 152 | [`Iterator::collect`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect 153 | [`slice::into_vec`]: https://doc.rust-lang.org/std/primitive.slice.html#method.into_vec 154 | 155 | ## `ThinVec` 156 | 157 | An alternative to boxed slices is `ThinVec`, from the [`thin_vec`] crate. It is 158 | functionally equivalent to `Vec`, but stores the length and capacity in the 159 | same allocation as the elements (if there are any). This means that 160 | `size_of::>` is only one word. 161 | 162 | `ThinVec` is a good choice within oft-instantiated types for vectors that are 163 | often empty. It can also be used to shrink the largest variant of an enum, if 164 | that variant contains a `Vec`. 165 | 166 | [`thin_vec`]: https://crates.io/crates/thin-vec 167 | 168 | ## Avoiding Regressions 169 | 170 | If a type is hot enough that its size can affect performance, it is a good idea 171 | to use a static assertion to ensure that it does not accidentally regress. The 172 | following example uses a macro from the [`static_assertions`] crate. 173 | ```rust,ignore 174 | // This type is used a lot. Make sure it doesn't unintentionally get bigger. 175 | #[cfg(target_arch = "x86_64")] 176 | static_assertions::assert_eq_size!(HotType, [u8; 64]); 177 | ``` 178 | The `cfg` attribute is important, because type sizes can vary on different 179 | platforms. Restricting the assertion to `x86_64` (which is typically the most 180 | widely-used platform) is likely to be good enough to prevent regressions in 181 | practice. 182 | 183 | [`static_assertions`]: https://crates.io/crates/static_assertions 184 | -------------------------------------------------------------------------------- /src/wrapper-types.md: -------------------------------------------------------------------------------- 1 | # Wrapper Types 2 | 3 | Rust has a variety of "wrapper" types, such as [`RefCell`] and [`Mutex`], that 4 | provide special behavior for values. Accessing these values can take a 5 | non-trivial amount of time. If multiple such values are typically accessed 6 | together, it may be better to put them within a single wrapper. 7 | 8 | [`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html 9 | [`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html 10 | 11 | For example, a struct like this: 12 | ```rust 13 | # use std::sync::{Arc, Mutex}; 14 | struct S { 15 | x: Arc>, 16 | y: Arc>, 17 | } 18 | ``` 19 | may be better represented like this: 20 | ```rust 21 | # use std::sync::{Arc, Mutex}; 22 | struct S { 23 | xy: Arc>, 24 | } 25 | ``` 26 | Whether or not this helps performance will depend on the exact access patterns 27 | of the values. 28 | [**Example**](https://github.com/rust-lang/rust/pull/68694/commits/7426853ba255940b880f2e7f8026d60b94b42404). 29 | --------------------------------------------------------------------------------