├── .gitignore ├── third_party └── mermaid │ ├── mermaid-init.js │ └── LICENSE ├── README.md ├── src ├── SUMMARY.md ├── processes.md ├── pets.rs ├── introduction.md ├── apis.md ├── code.md ├── codebase.md ├── signatures.md └── types.md ├── book.toml ├── .github └── workflows │ └── main.yml ├── CONTRIBUTING.md └── LICENSE /.gitignore: -------------------------------------------------------------------------------- 1 | book 2 | -------------------------------------------------------------------------------- /third_party/mermaid/mermaid-init.js: -------------------------------------------------------------------------------- 1 | mermaid.initialize({startOnLoad:true}); 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ***This FAQ can be found at [https://cppfaq.rs](https://cppfaq.rs).*** 2 | 3 | Build instructions: 4 | * `cargo install mdbook` 5 | * `cargo install mdbook-mermaid` 6 | * `cargo install mdbook-linkcheck` 7 | * `mdbook serve -o` 8 | * Occasionally, `mdbook test` 9 | -------------------------------------------------------------------------------- /src/SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | - [Introduction](./introduction.md) 4 | - [Questions about code in function bodies](./code.md) 5 | - [Questions about your function signatures](./signatures.md) 6 | - [Questions about your types](./types.md) 7 | - [Questions about your APIs](./apis.md) 8 | - [Questions about your whole codebase](./codebase.md) 9 | - [Questions about your processes](./processes.md) 10 | -------------------------------------------------------------------------------- /book.toml: -------------------------------------------------------------------------------- 1 | [book] 2 | authors = ["Adrian Taylor", "Martin Brænne"] 3 | language = "en" 4 | multilingual = false 5 | src = "src" 6 | title = "cppfaq.rs" 7 | [preprocessor] 8 | [preprocessor.mermaid] 9 | command = "mdbook-mermaid" 10 | 11 | [preprocessor.toc] 12 | command = "mdbook-toc" 13 | renderer = ["html"] 14 | 15 | [output] 16 | [output.html] 17 | additional-js = ["third_party/mermaid/mermaid.min.js", "third_party/mermaid/mermaid-init.js"] 18 | cname = "cppfaq.rs" 19 | git-repository-url = "https://github.com/google/rust-design-faq" 20 | 21 | [output.linkcheck] 22 | follow-web-links = true 23 | optional = true 24 | -------------------------------------------------------------------------------- /src/processes.md: -------------------------------------------------------------------------------- 1 | # Questions about your development processes 2 | 3 | ## How should I use tools differently from C++? 4 | 5 | * *Use `rustfmt` automatically everywhere.* While in C++ there are many 6 | different coding styles, the Rust community is in agreement (at least, 7 | they're in agreement that it's a good idea to be in agreement). That 8 | is codified in `rustfmt`. Use it, automatically, on every submission. 9 | * *Use `clippy` somewhere*. Its lints are useful. 10 | * *Use IDEs more liberally*. Even staunch vim-adherents (your author!) 11 | prefer to use an IDE with Rust, because it's simply invaluable to show 12 | type annotations. Type information is typically invisible in the language 13 | so in Rust you're more reliant on tooling assistance. 14 | * *Deny unsafe code* by default. (`#![forbid(unsafe_code)]`). 15 | -------------------------------------------------------------------------------- /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | name: github pages 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | pull_request: 8 | branches: 9 | - main 10 | 11 | jobs: 12 | deploy: 13 | runs-on: ubuntu-20.04 14 | steps: 15 | - uses: actions/checkout@v2 16 | 17 | - name: Setup mdBook 18 | uses: peaceiris/actions-mdbook@v1 19 | with: 20 | mdbook-version: '0.4.10' 21 | # mdbook-version: 'latest' 22 | 23 | - name: Install mdbook-linkcheck 24 | run: cargo install mdbook-linkcheck 25 | 26 | - name: Install mdbook-mermaid 27 | run: cargo install mdbook-mermaid 28 | 29 | - name: Install mdbook-toc 30 | run: cargo install mdbook-toc 31 | 32 | - run: mdbook build 33 | 34 | - run: mdbook test 35 | 36 | - name: Deploy 37 | uses: peaceiris/actions-gh-pages@v3 38 | if: ${{ github.ref == 'refs/heads/main' }} 39 | with: 40 | github_token: ${{ secrets.GITHUB_TOKEN }} 41 | publish_dir: ./book/html 42 | cname: cppfaq.rs 43 | -------------------------------------------------------------------------------- /third_party/mermaid/LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014 - 2021 Knut Sveidqvist 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # How to Contribute 2 | 3 | We'd love to accept your patches and contributions to this project. There are 4 | just a few small guidelines you need to follow. 5 | 6 | ## Contributor License Agreement 7 | 8 | Contributions to this project must be accompanied by a Contributor License 9 | Agreement. You (or your employer) retain the copyright to your contribution; 10 | this simply gives us permission to use and redistribute your contributions as 11 | part of the project. Head over to to see 12 | your current agreements on file or to sign a new one. 13 | 14 | You generally only need to submit a CLA once, so if you've already submitted one 15 | (even if it was for a different project), you probably don't need to do it 16 | again. 17 | 18 | ## Code Reviews 19 | 20 | All submissions, including submissions by project members, require review. We 21 | use GitHub pull requests for this purpose. Consult 22 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more 23 | information on using pull requests. 24 | 25 | ## Community Guidelines 26 | 27 | This project follows [Google's Open Source Community 28 | Guidelines](https://opensource.google/conduct/). 29 | -------------------------------------------------------------------------------- /src/pets.rs: -------------------------------------------------------------------------------- 1 | # // Copyright 2020 Google LLC 2 | # // 3 | # // Licensed under the Apache License, Version 2.0 (the "License"); 4 | # // you may not use this file except in compliance with the License. 5 | # // You may obtain a copy of the License at 6 | # // 7 | # // https://www.apache.org/licenses/LICENSE-2.0 8 | # // 9 | # // Unless required by applicable law or agreed to in writing, software 10 | # // distributed under the License is distributed on an "AS IS" BASIS, 11 | # // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # // See the License for the specific language governing permissions and 13 | # // limitations under the License. 14 | # 15 | # use std::collections::HashSet; 16 | # struct Animal { 17 | # kind: &'static str, 18 | # is_hungry: bool, 19 | # meal_needed: &'static str, 20 | # } 21 | # 22 | # static PETS: [Animal; 4] = [ 23 | # Animal { 24 | # kind: "Dog", 25 | # is_hungry: true, 26 | # meal_needed: "Kibble", 27 | # }, 28 | # Animal { 29 | # kind: "Python", 30 | # is_hungry: false, 31 | # meal_needed: "Cat", 32 | # }, 33 | # Animal { 34 | # kind: "Cat", 35 | # is_hungry: true, 36 | # meal_needed: "Kibble", 37 | # }, 38 | # Animal { 39 | # kind: "Lion", 40 | # is_hungry: false, 41 | # meal_needed: "Kibble", 42 | # }, 43 | # ]; 44 | # 45 | # static NEARBY_DUCK: Animal = Animal { 46 | # kind: "Duck", 47 | # is_hungry: true, 48 | # meal_needed: "pondweed", 49 | # }; 50 | -------------------------------------------------------------------------------- /src/introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | So, you're coming from C++ and want to write Rust? Great! 4 | 5 | You have questions? We have answers. 6 | 7 | This book is a collection of frequently asked questions for those arriving from existing C++ codebases. It guides you on how to adapt your C++ thinking to the new facilities available in Rust. It should help you if you're coming from other object-oriented languages such as Java too. 8 | 9 | Although it's structured as questions and answers, it can also be read front-to-back, to give you hints about how to adapt your C++/Java thinking to a more idiomatically Rusty approach. 10 | 11 | It does not aim to teach you Rust - there are [many better resources](https://www.rust-lang.org/learn). It doesn't aim to talk about Rust idioms _in general_ - [there are great existing guides for that](https://rust-unofficial.github.io/patterns/idioms/index.html). This guide is specifically about transitioning from some other traditionally OO language. If you're coming from such a language, you'll have questions about how to achieve the same outcomes in idiomatic Rust. That's what this guide is for. 12 | 13 | # Structure 14 | 15 | The guide starts with idioms at the small scale - answering questions about how you'd write a few lines of code - and moves towards ever larger patterns - answering questions about how you'd structure your whole codebase. 16 | 17 | # Contributors 18 | 19 | The following awesome people helped write the answers here, and they're sometimes quoted using the abbreviations given. 20 | 21 | Thanks to Adam Perry[(@\_\_anp\_\_)](https://twitter.com/__anp__) (AP), Alyssa Haroldsen [(@kupiakos)](https://twitter.com/kupiakos) (AH), Augie Fackler [(@durin42)](https://twitter.com/durin42) (AF), David Tolnay [(@davidtolnay)](https://twitter.com/davidtolnay) (DT), Łukasz Anforowicz (LA), Manish Goregaokar [(@ManishEarth)](https://twitter.com/ManishEarth) (MG), Mike Forster (MF), Miguel Young de la Sota [(@DrawsMiguel)](https://twitter.com/DrawsMiguel) (MY), and Tyler Mandry [(@tmandry)](https://twitter.com/tmandry) (TM). 22 | 23 | Their views have been edited and collated by Adrian Taylor [(@adehohum)](https://twitter.com/adehohum), Chris Palmer, [danakj@chromium.org](mailto:danakj@chromium.org) and Martin Brænne. Any errors or misrepresentations are ours. 24 | 25 | Licensed under either of Apache License, Version 2.0 or MIT license at your option. 26 | -------------------------------------------------------------------------------- /src/apis.md: -------------------------------------------------------------------------------- 1 | # Questions about designing APIs for others 2 | 3 | 4 | 5 | See also the excellent [Rust API guidelines](https://rust-lang.github.io/api-guidelines/about.html). 6 | The document you're reading aims to provide extra hints which may be especially 7 | useful to folk coming from C++, but that's the canonical reference. 8 | 9 | ## When should my type implement `Default`? 10 | 11 | Whenever you'd provide a default constructor in C++. 12 | 13 | ## When should my type implement `From`, `Into` and `TryFrom`? 14 | 15 | You should think of these as equivalent to implicit conversions in C++. Just 16 | as with C++, if there are _multiple_ ways to convert from your thing to another 17 | thing, don't implement these, but if there's a single obvious conversion, do. 18 | 19 | Usually, don't implement `Into` but instead implement `From`. 20 | 21 | ## How should I expose constructors? 22 | 23 | See the previous two answers: where it's simple and obvious, use the standard 24 | traits to make your object behavior predictable. 25 | 26 | If you need to go beyond that, remember you've got a couple of extra toys in Rust: 27 | 28 | * A "constructor" could return a `Result` 29 | * Your constructors can have names, e.g. `Vec::with_capacity`, `Box::pin` 30 | 31 | ## When should my type implement `AsRef`? 32 | 33 | If you have a type which contains another type, provide `AsRef` especially 34 | so that people can clone the inner type. It's good practice to provide explicit 35 | versions as well (for example, `String` implements `AsRef` but also 36 | provides `.as_str()`.) 37 | 38 | ## When should I implement `Copy`? 39 | 40 | > Anything that is integer-like or reference-like should be `Copy`; other things 41 | > shouldn’t. - MY 42 | 43 | > When it's efficient and when it’s an API contact you're willing to uphold. - AH 44 | 45 | Generally speaking, types which are plain-old-data can be `Copy`. Anything 46 | more nuanced with any type of state shouldn't be. 47 | 48 | ## Should I have `Arc` or `Rc` in my API? 49 | 50 | > It’s a code smell to have reference counts in your API design. You should hide 51 | > it. - TM. 52 | 53 | If you must, you will need to decide between `Rc` and `Arc` - see the next 54 | answer for some considerations. But, generally, `Arc` is better practice because 55 | it imposes fewer restrictions on your callers. Also, consider taking a look at the 56 | [`Archery` crate](https://docs.rs/archery/latest/archery/). 57 | 58 | ## Should my API be thread-safe? What does that mean? 59 | 60 | In C++, a thread-safe API usually means that you can expect your API's 61 | consumers to use objects from multiple threads. This is difficult to make safe 62 | and therefore substantial extra engineering is required to make an API 63 | thread-safe. 64 | 65 | In Rust, things differ: 66 | 67 | * it's more normal to do things across multiple threads; 68 | * you don't have to worry about your callers making mistakes here because 69 | the compiler won't let them; 70 | * you can often rely on `Send` rather than `Sync`. 71 | 72 | You certainly shouldn't be putting a `Mutex` around all your types. If your 73 | caller attempts to use the type from multiple threads, the compiler will 74 | simply stop them. It is the responsibility of the caller to use things 75 | safely. 76 | 77 | > If the library has `Arc` or `Rc` in the APIs, it may be making choices about 78 | > how you should instantiate stuff, and that’s rude. - AF 79 | 80 | There's a reasonable chance that your API can be used in parallel threads 81 | by virtue of `Send` and `Sync` being automatically derived. But - you should 82 | think through the usage model for your API clients and ensure that's true. 83 | 84 | ```rust 85 | use std::cell::RefCell; 86 | use std::collections::VecDeque; 87 | use std::sync::Mutex; 88 | use std::thread; 89 | 90 | // Imagine this is your library, exposing this interface to library 91 | // consumers... 92 | mod pizza_api { 93 | 94 | use std::thread; 95 | use std::time::Duration; 96 | 97 | pub struct Pizza { 98 | // automatically 'Send' 99 | _anchovies: u32, 100 | _pepperoni: u32, 101 | } 102 | 103 | pub fn make_pizza() -> Pizza { 104 | println!("cooking..."); 105 | thread::sleep(Duration::from_millis(10)); 106 | Pizza { 107 | _anchovies: 0, // yuck 108 | _pepperoni: 32, 109 | } 110 | } 111 | 112 | pub fn eat_pizza(_pizza: Pizza) { 113 | println!("yum") 114 | } 115 | } 116 | 117 | // Absolutely no changes are required to the pizza library to let 118 | // it be usable from a multithreaded context 119 | fn main() { 120 | let pizza_queue = Mutex::new(RefCell::new(VecDeque::new())); 121 | thread::scope(|s| { 122 | s.spawn(|| { 123 | let mut pizzas_eaten = 0; 124 | while pizzas_eaten < 100 { 125 | if let Some(pizza) = pizza_queue.lock().unwrap().borrow_mut().pop_front() { 126 | pizza_api::eat_pizza(pizza); 127 | pizzas_eaten += 1; 128 | } 129 | } 130 | }); 131 | s.spawn(|| { 132 | for _ in 0..100 { 133 | let pizza = pizza_api::make_pizza(); 134 | pizza_queue.lock().unwrap().borrow_mut().push_back(pizza); 135 | } 136 | }); 137 | }); 138 | } 139 | ``` 140 | 141 | ## What should I `Derive` to make my code optimally usable? 142 | 143 | The [official guidelines say to be eager](https://rust-lang.github.io/api-guidelines/interoperability.html#types-eagerly-implement-common-traits-c-common-traits). 144 | 145 | But don't overpromise: 146 | 147 | > Equality can suddenly become expensive later - don’t make types comparable 148 | > unless you intend people to be able to compare instances of the type. 149 | > Allowing people to pattern match on enums is usually better. - MY 150 | 151 | Note that [`syn` is a rare case](https://docs.rs/syn/latest/syn/) in that it 152 | has so many types, and is so extensively depended upon by the rest of the Rust 153 | ecosystem, that it avoids deriving the standard traits unless explicitly 154 | commanded to do so via a cargo feature. This is an unusual pattern and should 155 | not normally be followed. 156 | 157 | ## How should I think about API design, differently from C++? 158 | 159 | > Make the most of the fact that everything is immutable by default. Things 160 | > which are mutable should stick out. - AF 161 | 162 | > Think about things which should take self and return self. - AF 163 | 164 | Refactoring is less expensive in Rust than C++ due to compiler safeguards, but 165 | _rearchitecting_ is expensive in any language. Think about "one way doors" 166 | and "two way doors" in the design space: can you undo a change later? 167 | -------------------------------------------------------------------------------- /src/code.md: -------------------------------------------------------------------------------- 1 | # Questions about code in function bodies 2 | 3 | 4 | 5 | ## How can I avoid the performance penalty of bounds checks? 6 | 7 | Rust array and list accesses are all bounds checked. You may be worried about a performance penalty. How can you avoid that? 8 | 9 | > Contort yourself a little bit to use iterators. - MY 10 | 11 | Rust gives you choices around functional versus imperative style, but things often work better in a functional style. Specifically - if you've got something iterable, then there are probably functional methods to do what you want. 12 | 13 | For instance, suppose you need to work out what food to get at the petshop. Here's code that does this in an imperative style: 14 | 15 | ```rust 16 | {{#include pets.rs}} 17 | fn make_shopping_list_a() -> HashSet<&'static str> { 18 | let mut meals_needed = HashSet::new(); 19 | for n in 0..PETS.len() { // ugh 20 | if PETS[n].is_hungry { 21 | meals_needed.insert(PETS[n].meal_needed); 22 | } 23 | } 24 | meals_needed 25 | } 26 | ``` 27 | 28 | The loop index is verbose and error-prone. Let's get rid of it and loop over an iterator instead: 29 | 30 | ```rust 31 | {{#include pets.rs}} 32 | fn make_shopping_list_b() -> HashSet<&'static str> { 33 | let mut meals_needed = HashSet::new(); 34 | for animal in PETS.iter() { // better... 35 | if animal.is_hungry { 36 | meals_needed.insert(animal.meal_needed); 37 | } 38 | } 39 | meals_needed 40 | } 41 | ``` 42 | 43 | We're accessing the loop through an iterator, but we're still processing the elements inside a loop. It's often more idiomatic to replace the loop with a chain of iterators: 44 | 45 | ```rust 46 | {{#include pets.rs}} 47 | fn make_shopping_list_c() -> HashSet<&'static str> { 48 | PETS.iter() 49 | .filter(|animal| animal.is_hungry) 50 | .map(|animal| animal.meal_needed) 51 | .collect() // best... 52 | } 53 | ``` 54 | 55 | The obvious advantage of the third approach is that it's more concise, but less obviously: 56 | 57 | * The first solution may require Rust to do array bounds checks inside each iteration of the loop, making Rust potentially slower than C++. In this sort of simple example, it likely wouldn't, but functional pipelines simply don't require bounds checks. 58 | * The final container (a `HashSet` in this case) may be able to allocate roughly the right size at the outset, using the [size_hint](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint) of a Rust iterator. 59 | * If you use iterator-style code rather than imperative code, it's more likely the Rust compiler will be able to [auto-vectorize using SIMD instructions](https://medium.com/swlh/an-adventure-in-simd-b0e8db4ccca7). 60 | * There is no mutable state within the function. This makes it easier to verify that the code is correct and to avoid introducing bugs when changing it. In this simple example it may be obvious that calling the `HashSet::insert` is the only mutation to the set, but in more complex scenarios it is quite easy to lose the overview. 61 | * And as a new arrival from C++, you may find this hard to believe: For an experienced Rustacean it'll be more readable. 62 | 63 | Here are some more iterator techniques to help avoid materializing a collection: 64 | 65 | * You can [chain two iterators together](https://doc.rust-lang.org/std/iter/struct.Chain.html) to make a longer one. 66 | * If you need to iterate two lists, [zip them together](https://doc.rust-lang.org/std/iter/struct.Zip.html) to avoid bounds checks on either. 67 | * If you want to feed all your animals, and also feed a nearby duck, just chain the iterator to `std::iter::once`: 68 | 69 | ```rust 70 | # use std::collections::HashSet; 71 | # struct Animal { 72 | # kind: &'static str, 73 | # is_hungry: bool, 74 | # meal_needed: &'static str, 75 | # } 76 | # static PETS: [Animal; 0] = []; 77 | # static NEARBY_DUCK: Animal = Animal { 78 | # kind: "Duck", 79 | # is_hungry: true, 80 | # meal_needed: "pondweed", 81 | # }; 82 | fn make_shopping_list_d() -> HashSet<&'static str> { 83 | PETS.iter() 84 | .chain(std::iter::once(&NEARBY_DUCK)) 85 | .filter(|animal| animal.is_hungry) 86 | .map(|animal| animal.meal_needed) 87 | .collect() 88 | } 89 | ``` 90 | (Similarly, if you want to add one more item to the shopping list - maybe you're hungry, as well as your menagerie? - just add it after the `map`). 91 | * `Option` is iterable. 92 | ```rust 93 | # use std::collections::HashSet; 94 | # struct Animal { 95 | # kind: &'static str, 96 | # is_hungry: bool, 97 | # meal_needed: &'static str, 98 | # } 99 | # static PETS: [Animal; 0] = []; 100 | # struct Pond; 101 | # static MY_POND: Pond = Pond; 102 | fn pond_inhabitant(pond: &Pond) -> Option<&Animal> { 103 | // ... 104 | # None 105 | } 106 | 107 | fn make_shopping_list_e() -> HashSet<&'static str> { 108 | PETS.iter() 109 | .chain(pond_inhabitant(&MY_POND)) 110 | .filter(|animal| animal.is_hungry) 111 | .map(|animal| animal.meal_needed) 112 | .collect() 113 | } 114 | ``` 115 | 116 | Here's a diagram showing how data flows in this iterator pipeline: 117 | 118 | ```mermaid 119 | flowchart LR 120 | %%{ init: { 'flowchart': { 'nodeSpacing': 40, 'rankSpacing': 15 } } }%% 121 | Pets 122 | Filter([filter by hunger]) 123 | Map([map to noms]) 124 | Meals 125 | uniqueify([uniqueify]) 126 | shopping[Shopping list] 127 | Pets ---> Filter 128 | Pond 129 | Pond ---> inhabitant 130 | inhabitant[Optional pond inhabitant] 131 | inhabitant ---> Map 132 | Filter ---> Map 133 | Map ---> Meals 134 | Meals ---> uniqueify 135 | uniqueify ---> shopping 136 | ``` 137 | 138 | * Here are other iterator APIs that will come in useful: 139 | * [cloned](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.cloned) 140 | * [flatten](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.flatten) 141 | 142 | C++20 recently introduced [ranges](https://en.cppreference.com/w/cpp/ranges), a feature that allows you to pipeline operations on a collection similar to the way Rust iterators do, so this style of programming is likely to become more common in C++ too. 143 | 144 | To summarize: While in C++ you tend to operate on collections by performing a series of operations on each individual item, in Rust you'll typically apply a pipeline of operations to the whole collection. Make this mental switch and your code will not just become more idiomatic but more efficient, too. 145 | 146 | ## Isn't it confusing to use the same variable name twice? 147 | 148 | In Rust, it's common to reuse the same name for multiple variables in a function. For a C++ programmer, this is weird, but there are two good reasons to do it: 149 | 150 | * You may no longer need to change a mutable variable after a certain point, and if your code is sufficiently complex you might want the compiler to guarantee this for you: 151 | 152 | ```rust 153 | # fn spot_ate_my_slippers() -> bool { 154 | # false 155 | # } 156 | # fn feed(_: &str) {} 157 | let mut good_boy = "Spot"; 158 | if spot_ate_my_slippers() { 159 | good_boy = "Rover"; 160 | } 161 | let good_boy = good_boy; // never going to change my dog again, who's a good boy 162 | feed(&good_boy); 163 | ``` 164 | 165 | * Another common pattern is to retain the same variable name as you gradually unwrap things to a simpler type: 166 | 167 | ```rust 168 | # let url = "http://foo.com:1234"; 169 | let port_number = url.split(":").skip(2).next().unwrap(); 170 | // hmm, maybe somebody else already wrote a better URL parser....? naah, probably not 171 | let port_number = port_number.parse::().unwrap(); 172 | ``` 173 | 174 | ## How can I avoid the performance penalty of `unwrap()`? 175 | 176 | C++ has no equivalent to Rust's `match`, so programmers coming from C++ often underuse it. 177 | 178 | A heuristic: if you find yourself `unwrap()`ing, _especially_ in an `if`/`else` statement, you should restructure your code to use a more sophisticated `match`. 179 | 180 | For example, note the `unwrap()` in here (implying some runtime branch): 181 | 182 | ```rust 183 | # fn test_parse() -> Result { 184 | # let s = "0x64a"; 185 | if s.starts_with("0x") { 186 | u64::from_str_radix(s.strip_prefix("0x").unwrap(), 16) 187 | } else { 188 | s.parse::() 189 | } 190 | # } 191 | ``` 192 | 193 | and no extra `unwrap()` here: 194 | 195 | ```rust 196 | # fn test_parse() -> Result { 197 | # let s = "0x64a"; 198 | match s.strip_prefix("0x") { 199 | None => s.parse::(), 200 | Some(remainder) => u64::from_str_radix(remainder, 16), 201 | } 202 | # } 203 | ``` 204 | 205 | `if let` and `matches!` are just as good as `match` but sometimes a little more concise. `cargo clippy` will usually tell you if you're using a `match` which can be simplified to one of those other two constructions. 206 | 207 | ## How do I access variables from within a spawned thread? 208 | 209 | Use [`std::thread::scope`](https://doc.rust-lang.org/nightly/std/thread/fn.scope.html). 210 | 211 | ## When should I use runtime checks vs jumping through hoops to do static checks? 212 | 213 | Everyone learns Rust a different way, but it's said that some people reach a 214 | point of "trait mania" where they try to encode _too much_ via the type 215 | system, and get in a mess. So, in learning Rust, you will want to strike a 216 | balance between runtime checks (easy) or static compile-time checks (more 217 | efficient but requires deeper understanding.) 218 | 219 | > It’s very personal - some people learn better if they opt out of 220 | > language features, others not. - MG 221 | 222 | Some heuristics for how to keep things simple during the beginning of your 223 | Rust journey: 224 | 225 | * It's OK to start with lots of `.unwrap()`, cloning and `Arc`/`Rc`. 226 | * Start to use more advanced language features when you feel annoyed with 227 | the amount of boilerplate. (As an expert, you'll switch to a different 228 | strategy which is to consider the virality of your choices through the 229 | codebase.) 230 | * Don't use traits until you have to. You might (for instance) need to use 231 | a trait to make some code unit testable, but overoptimizing for that too 232 | soon is a mistake. Some say that it's wise initially to avoid defining 233 | any new traits at all. 234 | * Try to keep types smaller. 235 | 236 | Specifically on reference counting, 237 | 238 | > If using Rc means you can avoid a lifetime parameter which is in half the 239 | > APIs in the project, that’s a very reasonable choice. If it avoids a single 240 | > lifetime somewhere, probably not a good idea. But measure before deciding. - MG 241 | 242 | If you want to bail out of the complexity of static checks, which runtime checks 243 | are OK? 244 | 245 | * `unwrap()` and `Option` is mostly fine. 246 | * `Arc` and `Rc` is also fine in most cases. 247 | * Extensive use of `clone()` is fine but will have a performance impact. 248 | * `Cell` is regarded as a code smell and suggests you don't understand your 249 | lifetimes - it should be used sparingly. 250 | * `unsafe` is definitely not OK. It's harder to write `unsafe` Rust than to write 251 | C or C++, because Rust has additional aliasing rules. If you're reaching for 252 | `unsafe` to work around the complexity of Rust's static type system, as a 253 | relative Rust beginner, please reconsider and look into the other options 254 | listed above. 255 | 256 | Doing lifetime magic — where "magic" means annotating a function or complex 257 | type with more than 1 lifetime, or other wizardry — is often an optimization 258 | that you can defer until later. In the beginning, and when writing small 259 | programs that you only intend to use a few times ('scripts'), copying is fine. 260 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /src/codebase.md: -------------------------------------------------------------------------------- 1 | # Questions about your whole codebase 2 | 3 | 4 | 5 | ## The C++ observer pattern is hard in Rust. What to do? 6 | 7 | The C++ observer pattern usually means that there are broadcasters sending messages to consumers: 8 | 9 | ```mermaid 10 | flowchart TB 11 | broadcaster_a[Broadcaster A] 12 | broadcaster_b[Broadcaster B] 13 | consumer_a[Consumer A] 14 | consumer_b[Consumer B] 15 | consumer_c[Consumer C] 16 | broadcaster_a --> consumer_a 17 | broadcaster_b --> consumer_a 18 | broadcaster_a --> consumer_b 19 | broadcaster_b --> consumer_b 20 | broadcaster_a --> consumer_c 21 | broadcaster_b --> consumer_c 22 | ``` 23 | 24 | The broadcasters maintain lists of consumers, and the consumers act in response to messages (often mutating their own state.) 25 | 26 | This doesn't work in Rust, because it requires the broadcasters to hold mutable references to the consumers. 27 | 28 | What do you do? 29 | 30 | ### Option 1: make everything runtime-checked 31 | 32 | Each of your consumers could become an `Rc>` or, if you need thread-safety, an `Arc>`. 33 | 34 | The [`Rc`](https://doc.rust-lang.org/std/rc/struct.Rc.html) or [`Arc`](https://doc.rust-lang.org/std/sync/struct.Arc.html) allows broadcasters to share ownership of a consumer. The [`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) or [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) allows each broadcaster to acquire a mutable reference to a consumer when it needs to send a message. 35 | 36 | This example shows how, in Rust, you [may independently choose reference counting _or_ interior mutability](https://manishearth.github.io/blog/2015/05/27/wrapper-types-in-rust-choosing-your-guarantees/). In this case we need both. 37 | 38 | Just like typical reference counting in C++, `Rc` and `Arc` have the option to provide a weak pointer, so the lifetime of each consumer doesn't need to be extended unnecessarily. As an aside, it would be nice if Rust had an `Rc`-like type which enforces exactly one owner, and multiple weak ptrs. `Rc` could be wrapped quite easily to do this. 39 | 40 | Reference counting is frowned-upon in C++ because it's expensive. But, in Rust, not so much: 41 | 42 | * Few objects are reference counted; the majority of objects are owned statically. 43 | * Even when objects are reference counted, those counts are rarely incremented and decremented because you can (and do) pass around `&Rc>` most of the time. In C++, the "copy by default" mode means it's much more common to increment and decrement reference counts. 44 | 45 | In fact, the compile-time guarantees might cause you to do _less_ reference counting than C++: 46 | 47 | > In Servo there is a reference count but far fewer objects are reference counted than in the rest of Firefox, because you don’t need to be paranoid - MG 48 | 49 | However: Rust does [not prevent reference cycles](https://doc.rust-lang.org/book/ch15-06-reference-cycles.html), although they're only possible if you're using _both_ reference counting and interior mutability. 50 | 51 | ### Option 2: drive the objects from the code, not the other way round 52 | 53 | In C++, it's common to have all behavior within classes. Those classes _are_ the total behavior of the system, and so they must interact with one another. The observer pattern is common. 54 | 55 | ```mermaid 56 | flowchart TB 57 | broadcaster_a[Broadcaster A] 58 | consumer_a[Consumer A] 59 | consumer_b[Consumer B] 60 | broadcaster_a -- observer --> consumer_a 61 | broadcaster_a -- observer --> consumer_b 62 | ``` 63 | 64 | In Rust, it's more common to have some _external_ function which drives overall behavior. 65 | 66 | ```mermaid 67 | flowchart TB 68 | main(Main) 69 | broadcaster_a[Broadcaster A] 70 | consumer_a[Consumer A] 71 | consumer_b[Consumer B] 72 | main --1--> broadcaster_a 73 | broadcaster_a --2--> main 74 | main --3--> consumer_a 75 | main --4--> consumer_b 76 | ``` 77 | 78 | With this sort of design, it's relatively straightforward to take some output from one object and pass it into another object, with no need for the objects to interact at all. 79 | 80 | In the most extreme case, this becomes the [Entity-Component-System architecture](https://en.wikipedia.org/wiki/Entity_component_system) used in game design. 81 | 82 | > Game developers seem to have completely solved this problem - we can learn from them. - MY 83 | 84 | ### Option 3: use channels 85 | 86 | The observer pattern is a way to decouple large, single-threaded C++ codebases. But if you're trying to decouple a codebase in Rust, perhaps you should assume multi-threading by default? Rust has built-in [channels](https://doc.rust-lang.org/std/sync/mpsc/), and the [crossbeam](https://docs.rs/crossbeam/0.8.0/crossbeam/) crate provides multi-producer, multi-consumer channels. 87 | 88 | > I'm a Rustacean, we assume massively parallel unless told otherwise :) - MG 89 | 90 | ## That's all very well, but I have an existing C++ object broadcasting events. How exactly should I observe it? 91 | 92 | If your Rust object is a consumer of events from some pre-existing C++ producer, all the above options remain possible. 93 | 94 | * You can make your object reference counted and have C++ own such a reference (potentially a weak reference) 95 | * C++ can deliver the message into a general message bucket. An external function reads messages from that bucket and invokes the Rust object that should handle it. This means the reference counting doesn't need to extend to the Rust objects outside that boundary layer. 96 | * You can have a shim object which converts the C++ callback into some message and injects it into a channel-based world. 97 | 98 | ## Some of my C++ objects have shared mutable state. How can I make them safe in Rust? 99 | 100 | You're going to have to do something with [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html): either `RefCell` or its multithreaded equivalent, `RwLock`. 101 | 102 | You have three decisions to make: 103 | 104 | 1. Will only Rust code access _this particular instance_ of this object, or might C++ access it too? 105 | 2. If both C++ and Rust may access the object, how do you avoid conflicts? 106 | 3. How should Rust code react if the object is not available, because something else is using it? 107 | 108 | If only Rust code can use this particular instance of shared state, then simply wrap it in `RefCell` (single-threaded) or `RwLock` (multi-threaded). Build a wrapper type such that callers aren't able to access the object directly, but instead only via the lock type. 109 | 110 | If C++ also needs to access this particular instance of the shared state, it's more complex. There are presumably some invariants regarding use of this data in C++ - otherwise it would crash all the time. Perhaps the data can be used only from one thread, or perhaps it can only be used with a given mutex held. Your goal is to translate those invariants into an idiomatic Rust API that can be checked (ideally) at compile-time, and (failing that) at runtime. 111 | 112 | For example, imagine: 113 | 114 | ```cpp 115 | class SharedMutableGoat { 116 | public: 117 | void eat_grass(); // mutates tummy state 118 | }; 119 | 120 | std::mutex lock; 121 | SharedMutableGoat* billy; // only access when owning lock 122 | ``` 123 | 124 | Your idiomatic Rust wrapper might be: 125 | 126 | ```rust 127 | # mod ffi { 128 | # #[allow(non_camel_case_types)] 129 | # pub struct lock_guard; 130 | # pub fn claim_lock() -> lock_guard { lock_guard{} } 131 | # pub fn eat_grass() {} 132 | # pub fn release_lock(lock: &mut lock_guard) {} 133 | # } 134 | struct SharedMutableGoatLock { 135 | lock: ffi::lock_guard, // owns a std::lock_guard somehow 136 | }; 137 | 138 | // Claims the lock, returns a new SharedMutableGoatLock 139 | fn lock_shared_mutable_goat() -> SharedMutableGoatLock { 140 | SharedMutableGoatLock { lock: ffi::claim_lock() } 141 | } 142 | 143 | impl SharedMutableGoatLock { 144 | fn eat_grass(&mut self) { 145 | ffi::eat_grass(); // Acts on the global goat 146 | } 147 | } 148 | 149 | impl Drop for SharedMutableGoatLock { 150 | fn drop(&mut self) { 151 | ffi::release_lock(&mut self.lock); 152 | } 153 | } 154 | ``` 155 | 156 | Obviously, lots of permutations are possible, but the goal is to ensure that it's simply compile-time impossible to act on the global state unless appropriate preconditions are met. 157 | 158 | The final decision is how to react if the object is not available. This decision can apply with C++ mutexes or with Rust locks (for example `RwLock`). As in C++, the two major options are: 159 | 160 | * Block until the object becomes available. 161 | * Try to lock, and if the object is not available, do something else. 162 | 163 | There can be a third option if you're using async Rust. If the data isn't available, you may be able to return to your event loop using an async version of the lock ([Tokio example](https://docs.rs/tokio/1.5.0/tokio/sync/struct.RwLock.html#method.read), [async_std example](https://docs.rs/async-std/1.9.0/async_std/sync/struct.RwLock.html)). 164 | 165 | ## How do I do a singleton? 166 | 167 | Use [OnceCell](https://doc.rust-lang.org/std/cell/struct.OnceCell.html). 168 | 169 | ## What's the best way to retrofit Rust's parallelism benefits to an existing codebase? 170 | 171 | When parallelizing an existing codebase, first check that all existing types are correctly [`Send`](https://doc.rust-lang.org/std/marker/trait.Send.html) and [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html). Generally, though, you should try to avoid implementing these yourself - instead use pre-existing wrapper types which enforce the correct contract (for example, [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html)). 172 | 173 | After that: 174 | 175 | > If you can solve your problem by throwing Rayon at it, do. It’s magic - MG 176 | 177 | > If your task is CPU-bound, Rayon solves this handily. - MY 178 | 179 | [Rayon](https://docs.rs/rayon/1.5.0/rayon/) offers parallel constructs - for example parallel iterators - which can readily be retrofitted to an existing codebase. It also allows you to create and join tasks. Using Rayon can help _simplify_ your code and eliminate lots of manual scheduling logic. 180 | 181 | If your tasks are IO-bound, then you may need to look into async Rust, but that's hard to pull into an existing codebase. 182 | 183 | ## What's the best way to architect a new codebase for parallelism? 184 | 185 | In brief, like in other languages, you have a choice of architectures: 186 | 187 | * Message-passing, using event loops which listen on a channel, receive `Send` data and pass it on. 188 | * More traditional multithreading using `Sync` data structures such as mutexes (and perhaps Rayon). 189 | 190 | > There's probably a bias towards message-passing, and that's probably well-informed by its extensibility. - MG 191 | 192 | ## I need a list of nodes which can refer to one another. How? 193 | 194 | You can't easily do self-referential data structures in Rust. The usual workaround is to [use an arena](https://manishearth.github.io/blog/2021/03/15/arenas-in-rust/) and replace references from one node to another with node IDs. 195 | 196 | An arena is typically a `Vec` (or similar), and the node IDs are a newtype wrapper around a simple integer index. 197 | 198 | Obviously, Rust doesn't check that your node IDs are valid. If you don't have proper references, what stops you from having stale IDs? 199 | 200 | Arenas are often purely additive, which means that you can add entries but not delete them ([example](https://github.com/Manishearth/elsa/blob/master/examples/mutable_arena.rs)). If you must have an arena which deletes things, then use generational IDs; see the [generational-arena](https://docs.rs/generational-arena/) crate and this [RustConf keynote](https://www.youtube.com/watch?v=aKLntZcp27M) for more details. 201 | 202 | If arenas still sound like a nasty workaround, consider that you might choose an arena anyway for other reasons: 203 | 204 | * All of the objects in the arena will be freed at the end of the arena's lifetime, instead of during their manipulation, which can give very low latency for some use-cases. [Bumpalo](https://docs.rs/bumpalo/3.6.1/bumpalo/) formalizes this. 205 | * The rest of your program might have real Rust references into the arena. You can give the arena a named lifetime (`'arena` for example), making the provenance of those references very clear. 206 | 207 | ## Should I have a few big crates or lots of small ones? 208 | 209 | In the past, it was recommended to have small crates to get optimal build time. 210 | Incremental builds generally make this unnecessary now. You should arrange your 211 | crates optimally for your semantic needs. 212 | 213 | ## What crates should everyone know about? 214 | 215 | | Crate | Description | 216 | |:--------------------------------------- |:---------------------------------- | 217 | | [rayon](https://docs.rs/rayon/) | parallelizing | 218 | | [serde](https://docs.rs/serde/) | serializing and deserializing | 219 | | [crossbeam](https://docs.rs/crossbeam/) | all sorts of parallelism tools | 220 | | [itertools](https://docs.rs/itertools/) | makes it slightly more pleasant to work with iterators. (For instance, if you want to join an iterator of strings, you can just go ahead and do that, without needing to collect the strings into a `Vec` first) | 221 | | [petgraph](https://docs.rs/petgraph/) | graph data structures | 222 | | [slotmap](https://docs.rs/slotmap/) | arena-like key-value map | 223 | | [nom](https://docs.rs/nom/) | parsing | 224 | | [clap](https://docs.rs/clap/) | command-line parsing | 225 | | [regex](https://docs.rs/regex/) | err, regular expressions | 226 | | [ring](https://docs.rs/ring/) | the leading crypto library | 227 | | [nalgebra](https://docs.rs/nalgebra/) | linear algebra | 228 | | [once_cell](https://docs.rs/once_cell/) | complex static data | 229 | 230 | ## How should I call C++ functions from Rust and vice versa? 231 | 232 | Use [cxx](https://cxx.rs). 233 | 234 | Oh, you want a justification? In that case, here's the history 235 | which brought us to this point. 236 | 237 | From the beginning, Rust supported calling C functions using [`extern "C"`](https://doc.rust-lang.org/std/keyword.extern.html), 238 | [`#[repr(C)]`](https://doc.rust-lang.org/reference/type-layout.html#the-c-representation) 239 | and [`#[no_mangle]`](https://doc.rust-lang.org/reference/abi.html#the-no_mangle-attribute). 240 | Such callable C functions had to be declared manually in Rust: 241 | 242 | ```mermaid 243 | sequenceDiagram 244 | Rust-->>extern: unsafe Rust function call 245 | extern-->>C: call from Rust to C 246 | participant extern as Rust unsafe extern "C" fn 247 | participant C as Existing C function 248 | ``` 249 | 250 | [`bindgen`](https://rust-lang.github.io/rust-bindgen/) was invented 251 | to generate these declarations automatically from existing C/C++ header 252 | files. It has grown to understand an astonishingly wide variety of C++ 253 | constructs, but its generated bindings are still `unsafe` functions 254 | with lots of pointers involved. 255 | 256 | ```mermaid 257 | sequenceDiagram 258 | Rust-->>extern: unsafe Rust function call 259 | extern-->>C: call from Rust to C++ 260 | participant extern as Bindgen generated bindings 261 | participant C as Existing C++ function 262 | ``` 263 | 264 | Interacting with `bindgen`-generated bindings requires unsafe Rust; 265 | you will likely have to manually craft idiomatic safe Rust wrappers. 266 | This is time-consuming and error-prone. 267 | 268 | [cxx](https://cxx.rs) automates a lot of that process. Unlike `bindgen` 269 | it doesn't learn about functions from existing C++ headers. Instead, 270 | you specify cross-language interfaces in a Rust-like interface definition 271 | language (IDL) within your Rust file. cxx generates both C++ and Rust code 272 | from that IDL, marshaling data behind the scenes on both sides such that 273 | you can use standard language features in your code. For example, you'll 274 | find idiomatic Rust wrappers for [`std::string`](https://docs.rs/cxx/1.0.50/cxx/struct.CxxString.html) 275 | and [`std::unique_ptr`](https://docs.rs/cxx/1.0.50/cxx/struct.UniquePtr.html) 276 | and idiomatic C++ wrappers for [a Rust slice](https://cxx.rs/binding/slice.html). 277 | 278 | ```mermaid 279 | sequenceDiagram 280 | Rust-->>rsbindings: safe idiomatic Rust function call 281 | rsbindings-->>cxxbindings: hidden C ABI call using marshaled data 282 | cxxbindings-->>cpp: call to standard idiomatic C++ 283 | participant rsbindings as cxx-generated Rust code 284 | participant cxxbindings as cxx-generated C++ code 285 | participant cpp as C++ function using STL types 286 | ``` 287 | 288 | > In the bindgen case even more work goes into wrapping idiomatic C++ signatures into something bindgen compatible: unique ptrs to raw ptrs, Drop impls on the Rust side, translating string types ... etc. The typical real-world binding we've converted from bindgen to cxx in my codebase has been -500 lines (mostly unsafe code) +300 lines (mostly safe code; IDL included). - DT 289 | 290 | The greatest benefit is that cxx sufficiently understands C++ STL 291 | object ownership norms that the generated bindings can be used from 292 | safe Rust code. 293 | 294 | At present, there is no established solution which combines the idiomatic, safe 295 | interoperability offered by `cxx` with the automatic generation offered by 296 | `bindgen`. It's not clear whether this is even _possible_ but [several](https://github.com/google/autocxx) 297 | [projects](https://github.com/google/mosaic) are aiming in this direction. 298 | 299 | ## I'm getting a lot of binary bloat. 300 | 301 | In Rust you have a free choice between `impl Trait` and `dyn Trait`. See 302 | [this answer](signatures.md#When_should_I_take_or_return_dyn_Trait), too. `impl Trait` tends 303 | to be the default, and results in large binaries as much code can be duplicated. 304 | If you have this problem, consider using `dyn Trait`. Other options include 305 | the 'thin template pattern' (an example is `serde_json` where the code to read 306 | from [a string and a slice](https://github.com/serde-rs/json/blob/master/src/read.rs#L172) 307 | would be duplicated entirely, but instead one delegates to the other and 308 | requests slightly different behavior.) 309 | -------------------------------------------------------------------------------- /src/signatures.md: -------------------------------------------------------------------------------- 1 | # Questions about your function signatures 2 | 3 | 4 | 5 | ## Should I return an iterator or a collection? 6 | 7 | > Pretty much always return an iterator. - AH 8 | 9 | We suggested you [use iterators a lot in your code](./code.md#how-can-i-avoid-the-performance-penalty-of-bounds-checks). Share the love! Give iterators to your callers too. 10 | 11 | If you *know* your caller will store the items you're returning in a concrete collection, such as a `Vec` or a `HashSet`, you may want to return that. In all other cases, return an iterator. 12 | 13 | Your caller might: 14 | * Collect the iterator into a `Vec` 15 | * Collect it into a `HashSet` or some other specialized container 16 | * Loop over the items 17 | * Filter them or otherwise completely ignore some 18 | 19 | Collecting the items into vector will only turn out to be right in one of these cases. In the other cases, you're wasting memory and CPU time by building a concrete collection. 20 | 21 | This is weird for C++ programmers because iterators don't usually have robust references into the underlying data. Even Java iterators are scary, throwing `ConcurrentModificationExceptions` when you least expect it. Rust prevents that, at compile time. If you _can_ return an iterator, you should. 22 | 23 | ```mermaid 24 | flowchart LR 25 | subgraph Caller 26 | it_ref[reference to iterator] 27 | end 28 | subgraph it_outer[Iterator] 29 | it[Iterator] 30 | it_ref --reference--> it 31 | end 32 | subgraph data[Underlying data] 33 | dat[Underlying data] 34 | it --reference--> dat 35 | end 36 | ``` 37 | 38 | ## How flexible should my parameters be? 39 | 40 | Which of these is best? 41 | 42 | ```rust 43 | fn a(params: &[String]) { 44 | // ... 45 | } 46 | 47 | fn b(params: &[&str]) { 48 | // ... 49 | } 50 | 51 | fn c(params: &[impl AsRef]) { 52 | // ... 53 | } 54 | ``` 55 | 56 | (You'll need to make an equivalent decision in other cases, e.g. `Path` versus `PathBuf` versus `AsRef`.) 57 | 58 | None of the options is clearly superior; for each option, there's a case it can't handle that the others can: 59 | 60 | ```rust 61 | # fn a(params: &[String]) { 62 | # } 63 | # fn b(params: &[&str]) { 64 | # } 65 | # fn c(params: &[impl AsRef]) { 66 | # } 67 | fn main() { 68 | a(&[]); 69 | // a(&["hi"]); // doesn't work 70 | a(&vec![format!("hello")]); 71 | 72 | b(&[]); 73 | b(&["hi"]); 74 | // b(&vec![format!("hello")]); // doesn't work 75 | 76 | // c(&[]); // doesn't work 77 | c(&["hi"]); 78 | c(&vec![format!("hello")]); 79 | } 80 | ``` 81 | 82 | So you have a variety of interesting ways to _slightly_ annoy your callers under different circumstances. Which is best? 83 | 84 | `AsRef` has some advantages: if a caller has a `Vec`, they can use that directly, which would be impossible with the other options. But if they want to pass an empty list, they'll have to explicitly specify the type (for instance `&Vec::::new()`). 85 | 86 | > Not a huge fan of AsRef everywhere - it's just saving the caller typing. If you have lots of AsRef then nothing is object-safe. - MG 87 | 88 | TL;DR: choose the middle option, `&[&str]`. If your caller happens to have a vector of `String`, it's relatively little work to get a slice of `&str`: 89 | 90 | ```rust 91 | # fn b(params: &[&str]) { 92 | # } 93 | 94 | fn main() { 95 | // Instead of b(&vec![format!("hello")]); 96 | let hellos = vec![format!("hello")]; 97 | b(&hellos.iter().map(String::as_str).collect::>()); 98 | } 99 | ``` 100 | 101 | ## How do I overload constructors? 102 | 103 | You can't do this: 104 | 105 | ```rust 106 | # struct BirthdayCard {} 107 | impl BirthdayCard { 108 | fn new(name: &str) -> Self { 109 | # Self{} 110 | // ... 111 | } 112 | 113 | // Can't add more overloads: 114 | // 115 | // fn new(name: &str, age: i32) -> BirthdayCard { ... } 116 | // 117 | // fn new(name: &str, text: &str) -> BirthdayCard { ... } 118 | } 119 | ``` 120 | 121 | If you have a default constructor, and a few variants for other cases, you can simply write them as different static methods. An idiomatic way to do this is to write a `new()` constructor and then `with_foo()` constructors that apply the given "foo" when constructing. 122 | 123 | ```rust 124 | # struct Racoon {} 125 | impl Racoon { 126 | fn new() -> Self { 127 | # Self{} 128 | // ... 129 | } 130 | fn with_age(age: usize) -> Self { 131 | # Self{} 132 | // ... 133 | } 134 | } 135 | ``` 136 | 137 | If you have have a bunch of constructors and no default, it may make sense to instead provide a set of `new_foo()` constructors. 138 | 139 | ```rust 140 | # struct Animal {} 141 | impl Animal { 142 | fn new_squirrel() -> Self { 143 | # Self{} 144 | // ... 145 | } 146 | fn new_badger() -> Self { 147 | # Self{} 148 | // ... 149 | } 150 | } 151 | ``` 152 | 153 | For a more complex situation, you may use [the builder pattern](https://rust-lang.github.io/api-guidelines/type-safety.html#builders-enable-construction-of-complex-values-c-builder). The builder has a set of methods which take `&mut self` and return `&mut Self`. Then add a `build()` that returns the final constructed object. 154 | 155 | ```rust 156 | struct BirthdayCard {} 157 | 158 | struct BirthdayCardBuilder {} 159 | impl BirthdayCardBuilder { 160 | fn new(name: &str) -> Self { 161 | # Self{} 162 | // ... 163 | } 164 | 165 | fn age(&mut self, age: i32) -> &mut Self { 166 | # self 167 | // ... 168 | } 169 | 170 | fn text(&mut self, text: &str) -> &mut Self { 171 | # self 172 | // ... 173 | } 174 | 175 | fn build(&mut self) -> BirthdayCard { BirthdayCard { /* ... */ } } 176 | } 177 | ``` 178 | 179 | You can then [chain these](https://rust-lang.github.io/api-guidelines/type-safety.html#non-consuming-builders-preferred) into short or long constructions, passing parameters as necessary: 180 | 181 | ```rust 182 | # struct BirthdayCard {} 183 | # 184 | # struct BirthdayCardBuilder {} 185 | # impl BirthdayCardBuilder { 186 | # fn new(name: &str) -> BirthdayCardBuilder { 187 | # Self{} 188 | # // ... 189 | # } 190 | # 191 | # fn age(&mut self, age: i32) -> &mut BirthdayCardBuilder { 192 | # self 193 | # // ... 194 | # } 195 | # 196 | # fn text(&mut self, text: &str) -> &mut BirthdayCardBuilder { 197 | # self 198 | # // ... 199 | # } 200 | # 201 | # fn build(&mut self) -> BirthdayCard { BirthdayCard { /* ... */ } } 202 | # } 203 | # 204 | fn main() { 205 | let card = BirthdayCardBuilder::new("Paul") 206 | .age(64) 207 | .text("Happy Valentine's Day!") 208 | .build(); 209 | } 210 | ``` 211 | 212 | Note another advantage of builders: Overloaded constructors often don't provide all possible combinations of parameters, whereas with the builder pattern, you can combine exactly the parameters you want. 213 | 214 | ## When must I use `#[must_use]`? 215 | 216 | > Use it on Results and mutex locks. - MG 217 | 218 | `#[must_use]` causes a compile error if the caller ignores the return value. 219 | 220 | Rust functions are often single-purpose. They either: 221 | 222 | * Return a value without any side effects; or 223 | * Do something (i.e. have side effects) and return nothing. 224 | 225 | In neither case do you need to think about `#[must_use]`. (In the first case, 226 | nobody would call your function unless they were going to use the result.) 227 | 228 | `#[must_use]` is useful for those rarer functions which return a result _and_ 229 | have side effects. In most such cases, it's wise to specify `#[must_use]`, unless 230 | the return value is truly optional (for example in 231 | [`HashMap::insert`](https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.insert)). 232 | 233 | ## When should I take parameters by value? 234 | 235 | Move semantics are more common in Rust than in C++. 236 | 237 | > In C++ moves tend to be an optimization, whereas in Rust they're a key semantic part of the program. - MY 238 | 239 | To a first approximation, you should assume similar performance when passing 240 | things by (moved) value or by reference. It's true that a move may turn out to 241 | be a `memcpy`, but it's often optimized away. 242 | 243 | > Express the ownership relationship in the type system, instead of trying to second-guess the compiler for efficiency. - AF 244 | 245 | The moves are, of course, destructive - and unlike in C++, the compiler 246 | enforces that you don't reuse a variable that has been moved. 247 | Some C++ objects become toxic after they've moved; that's not a 248 | risk in Rust. 249 | 250 | So here's the heuristic: if a caller shouldn't be able to use an object again, 251 | pass it via move semantics in order to consume it. 252 | 253 | An extreme example: a UUID is supposed to be globally unique - it might cause a 254 | logic error for a caller to retain knowledge of a UUID after passing it to a callee. 255 | 256 | More generally, consume data enthusiastically to avoid logical errors during future 257 | refactorings. For instance, if some command-line options are overridden by a 258 | runtime choice, consume those old options - then any future refactoring which 259 | uses them after that point will give you a compile error. This pattern is 260 | surprisingly effective at spotting errors in your assumptions. 261 | 262 | ## Should I ever take `self` by value? 263 | 264 | Sometimes. If you've got a member function which destroys or transforms a thing, 265 | it should take `self` by value. Examples: 266 | 267 | * Closing a file and returning a result code. 268 | * A builder-pattern object which spits out the thing it was building. ([Example](https://docs.rs/bindgen/0.59.0/bindgen/struct.Builder.html#method.generate)). 269 | 270 | ## How do I take a thing, and a reference to something within that thing? 271 | 272 | For example, suppose you want to give all of your dogs to your friend, yet also 273 | tell your friend which one of the dogs is the Best Boy or Girl. 274 | 275 | ```cpp 276 | struct PetInformation { 277 | std::vector dogs; 278 | Dog& BestBoy; 279 | Dog& BestGirl; 280 | } 281 | 282 | PetInformation GetPetInformation() { 283 | // ... 284 | } 285 | ``` 286 | 287 | Generally this is an indication that your types or functions are not split down 288 | in the correct 289 | way: 290 | 291 | > This is a decomposition problem. Once you’ve found the correct decomposition, everything 292 | > else just works. The code almost writes itself. - AF 293 | 294 | ```rust 295 | # struct Dog; 296 | struct PetInformation(Vec); 297 | 298 | fn get_pet_information() -> PetInformation { 299 | // ... 300 | # PetInformation(Vec::new()) 301 | } 302 | 303 | fn identify_best_boy(pet_information: &PetInformation) -> &Dog { 304 | // ... 305 | # pet_information.0.get(0).unwrap() 306 | } 307 | ``` 308 | 309 | One use-case is when you want to act on some data, depending on its contents... 310 | but you also wanted to do something with those contents that you previously 311 | identified. 312 | 313 | ```rust 314 | # struct Key; 315 | struct Door { locked: bool } 316 | 317 | struct Car { 318 | ignition: Option, 319 | door: Door, 320 | } 321 | 322 | fn steal_car(car: Car) { 323 | match car { 324 | Car { 325 | ignition: Some(ref key), 326 | door: Door { locked: false } 327 | } => drive_away_normally(car /* , key */), 328 | _ => break_in_and_hotwire(car) 329 | } 330 | } 331 | 332 | fn drive_away_normally(car: Car /* , key: &Key */) { 333 | // Annoying to have to repeat this code... 334 | let key = match car { 335 | Car { 336 | ignition: Some(ref key), 337 | .. 338 | } => key, 339 | _ => unreachable!() 340 | }; 341 | turn_key(key); 342 | // ... 343 | } 344 | 345 | # fn turn_key(key: &Key) {} 346 | # fn break_in_and_hotwire(car: Car) {} 347 | ``` 348 | 349 | If this repeated matching gets annoying, it's relatively easy 350 | to extract it to a function. 351 | 352 | ```rust 353 | # fn turn_key(key: &Key) {} 354 | # fn break_in_and_hotwire(car: Car) {} 355 | # struct Key; 356 | # struct Door { locked: bool } 357 | # struct Car { 358 | # ignition: Option, 359 | # door: Door, 360 | # } 361 | 362 | impl Car { 363 | fn get_usable_key(&self) -> Option<&Key> { 364 | match self { 365 | Car { 366 | ignition: Some(ref key), 367 | door: Door { locked: false } 368 | } => Some(key), 369 | _ => None, 370 | } 371 | } 372 | } 373 | 374 | fn steal_car(car: Car) { 375 | match car.get_usable_key() { 376 | None => break_in_and_hotwire(car), 377 | Some(_) => drive_away_normally(car), 378 | } 379 | } 380 | 381 | fn drive_away_normally(car: Car) { 382 | turn_key(car.get_usable_key().unwrap()); 383 | } 384 | ``` 385 | 386 | ## When should I return `impl Trait`? 387 | 388 | Your main consideration should be API stability. If your caller doesn't 389 | _need_ to know the concrete implementation type, then don't tell it. That 390 | gives you flexibility to change your implementation in future without breaking 391 | compatibility. 392 | 393 | Note [Hyrum's Law](https://www.hyrumslaw.com/)! 394 | 395 | Using `impl Trait` doesn't solve _all_ possible API stability concerns, because 396 | even `impl Trait` leaks auto-traits such as `Send` and `Sync`. 397 | 398 | ## I miss function overloading! What do I do? 399 | 400 | Use a trait to implement the behavior you used to have. 401 | 402 | For example, in C++: 403 | 404 | ```cpp 405 | class Dog { 406 | public: 407 | void eat(Dogfood); 408 | void eat(DeliveryPerson); 409 | }; 410 | ``` 411 | 412 | In Rust you might express this as: 413 | 414 | ```rust 415 | trait Edible { 416 | }; 417 | 418 | struct Dog; 419 | 420 | impl Dog { 421 | fn eat(edible: impl Edible) { 422 | // ... 423 | } 424 | } 425 | 426 | struct Dogfood; 427 | struct DeliveryPerson; 428 | 429 | impl Edible for Dogfood {} 430 | impl Edible for DeliveryPerson {} 431 | ``` 432 | 433 | This gives your caller all the convenience they want, though may increase 434 | work for you as the implementer. 435 | 436 | ## I miss operator overloading! What do I do? 437 | 438 | Implement the standard traits instead (for example `PartialEq`, `Add`). This 439 | has equivalent effect in that folks will be able to use your type in a standard 440 | Rusty way without knowing too much special about your type. 441 | 442 | ## Should I return an error, or panic? 443 | 444 | Panics should be used only for invariants, never for anything that you believe 445 | might happen. That's especially true [for libraries](https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell) 446 | - panicking (or asserting) should be reserved for the 'top level' code driving 447 | the application. 448 | 449 | > Libraries which panic are super-rude and I hate them - MY 450 | 451 | Even in your own application code, panicking might not be wise: 452 | 453 | > Panicking in application logic for recoverable errors makes it way harder to librarify some code - AP 454 | 455 | If you really must have an API which can panic, add a `try_` equivalent too. 456 | 457 | ## What should my error type be? 458 | 459 | [Rust's `Result` type](https://doc.rust-lang.org/std/result/) is parameterized 460 | over an error type. What should you use? 461 | 462 | For app code, consider [anyhow](https://docs.rs/anyhow/). For library code, 463 | use your own `enum` of error conditions - you can use [thiserror](https://docs.rs/thiserror/) 464 | to make this more pleasant. 465 | 466 | ## When should I take or return `dyn Trait`? 467 | 468 | In either C++ or Rust, you can choose between monomorphization (that is, building 469 | code multiple times for each permutation of parameter types) or dynamic dispatch (i.e. 470 | looking up the correct implementation using vtables). 471 | 472 | In C++ the syntax is completely different - templates vs virtual functions. 473 | In Rust the syntax is almost identical - in some cases it's as simple as 474 | exchanging the `impl` keyword with the `dyn` keyword. 475 | 476 | Given this flexibility to switch strategies, which should you start with? 477 | 478 | In both languages, monomorphization tends to result in a quicker program (partly 479 | due to better inlining). It's arguably true that inlining is more important in 480 | Rust, due to its functional nature and pervasive use of iterators. Whether or 481 | not that's the reason, experienced Rustaceans usually start with `impl`: 482 | 483 | > It's best practice to start with monomorphization and move to `dyn`... - MG 484 | 485 | The main cost of monomorphization is larger binaries. There are cases where 486 | large amounts of code can end up being duplicated (the marvellous [serde](https://serde.rs/) 487 | is one). 488 | 489 | You _can_ choose to do things the other way round: 490 | 491 | > ... it’s workable practice to start with `dyn` and then move to `impl` when you have problems. - MG 492 | 493 | `dyn` can be awkward, and potentially expensive in different ways: 494 | 495 | > One thing to note about pervasive `dyn` is that because it unsizes the types it wraps, you need to box it if you want to store it by value. You end up with a good bit more allocator pressure if you try to have `dyn` field types. - AP 496 | 497 | ## `<'a>`I seem to have lots of named lifetimes. Am `<'b>`I doing something wrong? 498 | 499 | Some say that if you have a significant number of named lifetimes, you're 500 | overcomplicating things. 501 | 502 | There are some scenarios where multiple named lifetimes make perfect sense - for example 503 | if you're dealing with an arena, or major phases of a process (the Rust compiler 504 | has `'gcx` and `'tcx` lifetimes relating to the output of certain compile phases.) 505 | 506 | But otherwise, it may be that you've got lifetimes because you're trying _too 507 | hard_ to avoid a copy. You may be better off simply switching to runtime 508 | checking (e.g. `Rc`, `Arc`) or even cloning. 509 | 510 | Are named lifetimes even a "code smell"? 511 | 512 | > My experience has been that the extent to which they're a smell varies a good bit based on the programmer's experience level, which has led me towards increased skepticism over time. Lots of people learning Rust have experienced the pain of first not wanting to `.clone()` something, immediately putting lifetimes everywhere, and then feeling the pain of lifetime subtyping and variance. I don't think they're nearly as odorous as unsafe, for example, but treating them as a bit of a smell does I think lead to code that's easier to read for a newcomer and to refactor around the stack. - AP 513 | -------------------------------------------------------------------------------- /src/types.md: -------------------------------------------------------------------------------- 1 | # Questions about your types 2 | 3 | 4 | 5 | ## My 'class' needs mutable references to other things to do its job. Other classes need mutable references to these things too. What do I do? 6 | 7 | It's common in C++ to have a class that contain mutable references to other 8 | objects; the class mutates those objects to do its work. Often, there 9 | are several classes that all hold a mutable reference to the same object. Here 10 | is a diagram that illustrates this: 11 | 12 | ```mermaid 13 | flowchart LR 14 | subgraph Shared functionality 15 | important[Important Shared Object] 16 | end 17 | subgraph ObjectA 18 | methodA[Method] 19 | refa[Mutable Reference]-->important 20 | methodA-. Acts on shared object.->important 21 | end 22 | subgraph ObjectB 23 | refb[Mutable Reference]-->important 24 | methodB[Method] 25 | methodB-. Acts on shared object.->important 26 | end 27 | main --> ObjectA 28 | main --> ObjectB 29 | main-. Calls .-> methodA 30 | main-. Calls .-> methodB 31 | ``` 32 | 33 | In Rust, you can't have multiple mutable references to a shared object, so what 34 | do you do? 35 | 36 | First of all, consider moving behavior out of your types. (See 37 | [the answer about the observer pattern](./codebase.md#the-c-observer-pattern-is-hard-in-rust-what-to-do) and especially 38 | [the second option described there](./codebase.md#option-2-drive-the-objects-from-the-code-not-the-other-way-round).) 39 | 40 | Even in Rust, though, it's still often the best choice to make complex behavior 41 | part of the type within `impl` blocks. You can still do that - but don't 42 | _store_ references. Instead, pass them into each function call. 43 | 44 | ```mermaid 45 | flowchart LR 46 | subgraph Shared functionality 47 | important[Important Shared Object] 48 | end 49 | subgraph ObjectA 50 | methodA[Method] 51 | methodA-. Acts on shared object.->important 52 | end 53 | subgraph ObjectB 54 | methodB[Method] 55 | methodB-. Acts on shared object.->important 56 | end 57 | main --> ObjectA 58 | main --> ObjectB 59 | main --> important 60 | main-. Passes reference to shared object.-> methodA 61 | main-. Passes reference to shared object.-> methodB 62 | ``` 63 | 64 | Instead of this: 65 | 66 | ```rust 67 | # struct ImportantSharedObject; 68 | # struct ObjectA<'a> { 69 | # important_shared_object: &'a mut ImportantSharedObject, 70 | # } 71 | # impl<'a> ObjectA<'a> { 72 | # fn new(important_shared_object: &'a mut ImportantSharedObject) -> Self { 73 | # Self { 74 | # important_shared_object 75 | # } 76 | # } 77 | # fn do_something(&mut self) { 78 | # // act on self.important_shared_object 79 | # } 80 | # } 81 | fn main() { 82 | let mut shared_thingy = ImportantSharedObject; 83 | let mut a = ObjectA::new(&mut shared_thingy); 84 | a.do_something(); // acts on shared_thingy 85 | } 86 | ``` 87 | 88 | Do this: 89 | 90 | ```rust 91 | # struct ImportantSharedObject; 92 | # struct ObjectA; 93 | # impl ObjectA { 94 | # fn new() -> Self { 95 | # Self 96 | # } 97 | # fn do_something(&mut self, important_shared_object: &mut ImportantSharedObject) { 98 | # // act on important_shared_object 99 | # } 100 | # } 101 | fn main() { 102 | let mut shared_thingy = ImportantSharedObject; 103 | let mut a = ObjectA::new(); 104 | a.do_something(&mut shared_thingy); // acts on shared_thingy 105 | } 106 | ``` 107 | 108 | (Happily this also gets rid of named lifetime parameters.) 109 | 110 | If you have a hundred such shared objects, you probably don't want a 111 | hundred function parameters. So it's usual to bundle them up into 112 | a context structure which can be passed into each function call: 113 | 114 | ```rust 115 | # struct ImportantSharedObject; 116 | # struct AnotherImportantObject; 117 | struct Ctx<'a> { 118 | important_shared_object: &'a mut ImportantSharedObject, 119 | another_important_object: &'a mut AnotherImportantObject, 120 | } 121 | 122 | # struct ObjectA; 123 | # impl ObjectA { 124 | # fn new() -> Self { 125 | # Self 126 | # } 127 | # fn do_something(&mut self, ctx: &mut Ctx) { 128 | # // act on ctx.important_shared_object and ctx.another_important_thing 129 | # } 130 | # } 131 | fn main() { 132 | let mut shared_thingy = ImportantSharedObject; 133 | let mut another_thingy = AnotherImportantObject; 134 | let mut ctx = Ctx { 135 | important_shared_object: &mut shared_thingy, 136 | another_important_object: &mut another_thingy, 137 | }; 138 | let mut a = ObjectA::new(); 139 | a.do_something(&mut ctx); // acts on both the shared thingies 140 | } 141 | ``` 142 | 143 | ```mermaid 144 | flowchart LR 145 | subgraph Shared functionality 146 | important[Important Shared Object] 147 | end 148 | subgraph Context 149 | refa[Mutable Reference]-->important 150 | end 151 | subgraph ObjectA 152 | objectA[Object A] 153 | methodA[Method] 154 | methodA-. Acts on shared object.->important 155 | end 156 | subgraph ObjectB 157 | objectB[Object B] 158 | methodB[Method] 159 | methodB-. Acts on shared object.->important 160 | end 161 | main --> objectA 162 | main --> objectB 163 | main --> Context 164 | main-. Passes context.-> methodA 165 | main-. Passes context.-> methodB 166 | ``` 167 | 168 | Even simpler: just put all the data directly into `Ctx`. But the key point 169 | is that this context object is passed around into just about all function calls 170 | rather than being stored anywhere, thus negating any borrowing/lifetime concerns. 171 | 172 | This pattern can be seen in [bindgen](https://github.com/rust-lang/rust-bindgen/blob/271eeb0782d34942267ceabcf5f1cf118f0f5842/src/ir/context.rs#L308), 173 | for example. 174 | 175 | > Split out borrowing concerns from the object concerns. - MG 176 | 177 | To generalize this idea, try to avoid storing references to anything that might 178 | need to be changed. Instead take those things as parameters. For instance 179 | `petgraph` [takes the entire graph as context to a `Walker` object](https://docs.rs/petgraph/0.6.0/petgraph/visit/trait.Walker.html), 180 | such that the graph can be changed while you're walking it. 181 | 182 | ## My type needs to store arbitrary user data. What do I do instead of `void *`? 183 | 184 | Ideally, your type would know all possible types of user data that it could store. 185 | You'd represent this as an `enum` with variant data for each possibility. This 186 | would give complete compile-time type safety. 187 | 188 | But sometimes code needs to store data for which it can't depend upon 189 | the definition: perhaps it's defined by a totally different area of the 190 | codebase, or belongs to clients. Such possibilities can't be enumerated in 191 | advance. Until recently, the only real option in C++ was to use a `void *` 192 | and have clients downcast to get their original type back. Modern C++ offers 193 | a much better option, `std::any`; if you've come across that, Rust's equivalent 194 | will seem very familiar. 195 | 196 | In Rust, the [`Any`](https://doc.rust-lang.org/std/any/trait.Any.html) type 197 | allows you to store _anything_ and retrieve it later in a type-safe fashion: 198 | 199 | ```rust 200 | use std::any::Any; 201 | 202 | struct MyTypeOfUserData(u8); 203 | 204 | fn main() { 205 | let any_user_data: Box = Box::new(MyTypeOfUserData(42)); 206 | let stored_value = any_user_data.downcast_ref::().unwrap().0; 207 | println!("{}", stored_value); 208 | } 209 | ``` 210 | 211 | If you want to be more prescriptive about what can be stored, you can define 212 | a trait (let's call it `UserData`) and store a `Box`. 213 | Your trait should have a method `fn as_any(&self) -> &dyn std::any::Any;` 214 | Each implementation can just return `self`. 215 | 216 | Your caller can then do this: 217 | 218 | ```rust 219 | trait UserData { 220 | fn as_any(&self) -> &dyn std::any::Any; 221 | // ...other trait methods which you wish to apply to any UserData... 222 | } 223 | 224 | struct MyTypeOfUserData(u8); 225 | 226 | impl UserData for MyTypeOfUserData { 227 | fn as_any(&self) -> &dyn std::any::Any { self } 228 | } 229 | 230 | fn main() { 231 | // Store a generic Box 232 | let user_data: Box = Box::new(MyTypeOfUserData(42)); 233 | // Get back to a specific type 234 | let stored_value = user_data.as_any().downcast_ref::().unwrap().0; 235 | println!("{}", stored_value); 236 | } 237 | ``` 238 | 239 | Of course, enumerating all possible stored variants remains preferable such that the 240 | compiler helps you to avoid runtime panics. 241 | 242 | ## When should I put my data in a `Box`? 243 | 244 | In C++, you often need to box things for ownership reasons, whereas in Rust 245 | it's typically just a performance trade-off. It's arguably premature optimization 246 | to use boxes unless your profiling shows a lot of memcpy of that particular 247 | type (or, perhaps, the relevant [clippy lint](https://rust-lang.github.io/rust-clippy/v0.0.212/index.html#large_enum_variant) 248 | informs you that you have a problem.) 249 | 250 | > I never box things unless they're really big. - MG 251 | 252 | Another heuristic is if part of your data structure is very rarely filled, 253 | in which case you may wish to `Box` it to avoid incurring an overhead for all 254 | other instances of the type. 255 | 256 | ```rust 257 | # struct Humility; struct Talent; struct Ego; 258 | struct Popstar { 259 | ego: Ego, 260 | talent: Talent, 261 | humility: Option>, 262 | } 263 | # fn main() {} 264 | ``` 265 | 266 | (This is one reason why people like using [anyhow](https://docs.rs/anyhow/latest/anyhow/) 267 | for their errors; it means the failure case in their `Result` enum is only 268 | a pointer wide.) 269 | 270 | Of course, Rust may require you to use a box: 271 | 272 | * if you need to `Pin` some data, typically for async Rust, or 273 | * if you otherwise have an infinitely sized data structure 274 | 275 | but as usual, the compiler will explain very nicely. 276 | 277 | ## Should I have public fields or accessor methods? 278 | 279 | The trade-offs are similar to C++ except that Rust's pattern-matching makes it 280 | very convenient to match on fields, so within a realm of code that you own you 281 | may bias towards having more public fields than you're used to. As with C++, 282 | this can give you a future compatibility burden. 283 | 284 | ## When should I use a newtype wrapper? 285 | 286 | The [newtype wrapper pattern](https://rust-unofficial.github.io/patterns/patterns/behavioural/newtype.html) 287 | uses Rust's type systems to enforce extra behavior without necessarily changing 288 | the underlying representation. 289 | 290 | ```rust 291 | # fn get_rocket_length() -> Inches { Inches(7) } 292 | struct Inches(u32); 293 | struct Centimeters(u32); 294 | 295 | fn build_mars_orbiter() { 296 | let rocket_length: Inches = get_rocket_length(); 297 | // mate_to_orbiter(rocket_length); // does not compile because this takes cm 298 | } 299 | ``` 300 | 301 | Other examples that have been used: 302 | * An IP address which is guaranteed not to be localhost; 303 | * Non-zero numbers; 304 | * IDs which are guaranteed to be unique 305 | 306 | Such new types typically need a lot of boilerplate, especially to implement 307 | the traits which users of your type would expect to find. On the other hand, 308 | they allow you to use Rust's type system to statically prevent logic bugs. 309 | 310 | A heuristic: if there are some invariants you'd be checking for at runtime, 311 | see if you can use a newtype wrapper to do it statically instead. Although it 312 | may be more code to start with, you'll [save the effort of finding and fixing 313 | logic bugs later](code.md#When-should-I-use-runtime-checks-vs-jumping-through-hoops-to-do-static-checks). 314 | 315 | ## How else can I use Rust's type system to avoid high-level logic bugs? 316 | 317 | Lots of ways: 318 | 319 | ### Zero-sized types. 320 | 321 | Also known as "ZSTs". These are types which occupy literally zero bytes, and 322 | so (generally) make no difference whatsoever to the code generated. But you 323 | can use them in the type system to enforce invariants at compile-time with 324 | no runtime check. 325 | 326 | For example, they're often used as capability tokens - you can statically 327 | prove that code exclusively has the right to do something. 328 | 329 | ```rust 330 | pub trait ValidationStatus {} 331 | 332 | mod validator { 333 | use self::super::{Bytecode, ValidationStatus}; 334 | /// ZST marker to show that bytecode has been validated. 335 | // Private field ensures this can't be created outside this mod 336 | // but PhantomData means this is still zero-sized. 337 | pub struct BytecodeValidated(std::marker::PhantomData); 338 | pub fn validate_bytecode(code: Bytecode) -> Bytecode { 339 | // Do expensive validation operation here... 340 | Bytecode { 341 | validated: BytecodeValidated(std::marker::PhantomData), 342 | code: code.code 343 | } 344 | } 345 | impl ValidationStatus for BytecodeValidated {} 346 | } 347 | 348 | struct BytecodeNotValidated; 349 | 350 | impl ValidationStatus for BytecodeNotValidated {} 351 | 352 | pub struct Bytecode { 353 | validated: V, 354 | code: Vec, 355 | } 356 | 357 | fn run_bytecode(bytecode: &Bytecode) { 358 | // Compiler PROVES you validated it before you can run it. There are no 359 | // runtime branches involved. 360 | } 361 | 362 | fn get_unvalidated_bytecode() -> Bytecode { 363 | // ... 364 | # Bytecode { 365 | # validated: BytecodeNotValidated, 366 | # code: Vec::new() 367 | # } 368 | } 369 | 370 | fn main() { 371 | let bytecode = get_unvalidated_bytecode(); 372 | // run_bytecode(bytecode); // does not compile 373 | let bytecode = validator::validate_bytecode(bytecode); 374 | run_bytecode(&bytecode); 375 | run_bytecode(&bytecode); 376 | } 377 | ``` 378 | 379 | ZSTs can also be used to demonstrate _exclusive_ access to some resource. 380 | 381 | ```rust 382 | struct RobotArmAccessToken; 383 | 384 | fn move_arm(token: &mut RobotArmAccessToken, x: u32, y: u32, z: u32) { 385 | // ... 386 | } 387 | 388 | fn attach_car_door(token: &mut RobotArmAccessToken) { 389 | move_arm(token, 3, 4, 6); 390 | move_arm(token, 5, 3, 6); 391 | } 392 | 393 | fn install_windscreen(token: &mut RobotArmAccessToken) { 394 | move_arm(token, 7, 8, 2); 395 | move_arm(token, 1, 2, 3); 396 | } 397 | 398 | fn main() { 399 | let mut token = RobotArmAccessToken; // ensure only one exists 400 | attach_car_door(&mut token); 401 | install_windscreen(&mut token); 402 | } 403 | ``` 404 | 405 | (The type system would prevent these operations happening in parallel.) 406 | 407 | ### Marker traits 408 | 409 | Indicate that a type meets certain invariants, so subsequent 410 | users of that type don't need to check at runtime. A common example is to 411 | indicate that a type is safe to serialize into some bytestream. 412 | 413 | ### Enums as state machines. 414 | 415 | Each enum variant is a state and stores data associated with that state. There 416 | simply is no possibility that the data can get out of sync with the state. 417 | 418 | ```rust 419 | enum ElectionState { 420 | RaisingDonations { amount_raised: u32 }, 421 | DoingTVInterviews { interviews_done: u16 }, 422 | Voting { votes_for_me: u64, votes_for_opponent: u64 }, 423 | Elected, 424 | NotElected, 425 | }; 426 | ``` 427 | 428 | A more heavyweight approach here is to define types for each state, and 429 | allow valid state transitions by taking the previous state by-value and 430 | returning the next state by-value. 431 | 432 | ```rust 433 | struct Seed { water_available: u32 } 434 | struct Growing { water_available: u32, sun_available: u32 } 435 | struct Flowering; 436 | struct Dead; 437 | 438 | enum PlantState { 439 | Seed(Seed), 440 | Growing(Growing), 441 | Flowering(Flowering), 442 | Dead(Dead) 443 | } 444 | 445 | impl Seed { 446 | fn advance(self) -> PlantState { 447 | if self.water_available > 3 { 448 | PlantState::Growing(Growing { water_available: self.water_available, sun_available: 0 }) 449 | } else { 450 | PlantState::Dead(Dead) 451 | } 452 | } 453 | } 454 | 455 | impl Growing { 456 | fn advance(self) -> PlantState { 457 | if self.water_available > 3 && self.sun_available > 3 { 458 | PlantState::Flowering(Flowering) 459 | } else { 460 | PlantState::Dead(Dead) 461 | } 462 | } 463 | } 464 | 465 | impl Flowering { 466 | fn advance(self) -> PlantState { 467 | PlantState::Dead(Dead) 468 | } 469 | } 470 | 471 | impl Dead { 472 | fn advance(self) -> PlantState { 473 | PlantState::Dead(Dead) 474 | } 475 | } 476 | 477 | impl PlantState { 478 | fn advance(self) -> Self { 479 | match self { 480 | Self::Seed(seed) => seed.advance(), 481 | Self::Growing(growing) => growing.advance(), 482 | Self::Flowering(flowering) => flowering.advance(), 483 | Self::Dead(dead) => dead.advance(), 484 | } 485 | } 486 | } 487 | 488 | // we should probably find a way to inject some sun and water into this 489 | // state machine or things are not looking rosy 490 | ``` 491 | 492 | ## What should I do instead of inheritance? 493 | 494 | Use [composition](https://en.wikipedia.org/wiki/Composition_over_inheritance). 495 | Sometimes this results in more boilerplate, but it avoids a raft of complexity. 496 | 497 | Specifically, for example: 498 | * you might include the "superclass" struct as a member of the subclass 499 | struct; 500 | * you might use an enum with different variants for the different possible 501 | "subclasses". 502 | 503 | Usually the answer is obvious: it's unlikely that your Rust code is structured 504 | in such a way that inheritance would be a good fit anyway. 505 | 506 | > I've only missed inheritance when actually _implementing_ languages which 507 | > themselves have inheritance. - MG 508 | 509 | ## I need a list of nodes which can refer to one another. How? 510 | 511 | You can't easily do self-referential data structures in Rust. The usual 512 | workaround is to [use an 513 | arena](https://manishearth.github.io/blog/2021/03/15/arenas-in-rust/) and 514 | replace references from one node to another with node IDs. 515 | 516 | An arena is typically a `Vec` (or similar), and the node IDs are a newtype 517 | wrapper around a simple integer index. 518 | 519 | Obviously, Rust doesn't check that your node IDs are valid. If you don't have 520 | proper references, what stops you from having stale IDs? 521 | 522 | Arenas are often purely additive, which means that you can add entries but not 523 | delete them 524 | ([example](https://github.com/Manishearth/elsa/blob/master/examples/mutable_arena.rs)). 525 | If you must have an arena which deletes things, then use generational IDs; see 526 | the [generational-arena](https://docs.rs/generational-arena/) crate and this 527 | [RustConf keynote](https://www.youtube.com/watch?v=aKLntZcp27M) for more 528 | details. 529 | 530 | If arenas still sound like a nasty workaround, consider that you might choose 531 | an arena anyway for other reasons: 532 | 533 | * All of the objects in the arena will be freed at the end of the arena's 534 | lifetime, instead of during their manipulation, which can give very low 535 | latency for some use-cases. [Bumpalo](https://docs.rs/bumpalo/3.6.1/bumpalo/) 536 | formalizes this. 537 | * The rest of your program might have real Rust references into the arena. You 538 | can give the arena a named lifetime (`'arena` for example), making the 539 | provenance of those references very clear. 540 | 541 | ## I'm having a miserable time making my data structure. Should I use unsafe? 542 | 543 | Low-level data structures are hard in Rust, especially if they're self- 544 | referential. Rust will make visible all sorts of risks of ownership and 545 | shared mutable state which may not be visible in other languages, and 546 | they're hard to solve in low-level data structure code. 547 | 548 | Even something as simple as a doubly-linked list is notoriously hard; so much so 549 | that there is a [book that teaches Rust based solely on linked lists](https://rust-unofficial.github.io/too-many-lists/). 550 | As that (wonderful) book makes clear, you are often faced with a choice: 551 | 552 | * [Use safe Rust, but shift compile-time checks to runtime](https://rust-unofficial.github.io/too-many-lists/fourth.html) 553 | * [Use `unsafe`](https://rust-unofficial.github.io/too-many-lists/fifth.html) and 554 | take the same degree of care you'd take in C or C++. And, just like in C or C++, 555 | you'll introduce [security vulnerabilities in the unsafe code](https://www.cvedetails.com/vulnerability-list/vendor_id-19029/product_id-48677/Rust-lang-Rust.html). 556 | 557 | If you're facing this decision... perhaps there's a third way. 558 | 559 | You should almost always be using somebody else's tried-and-tested 560 | data structure. 561 | 562 | [petgraph](https://docs.rs/petgraph) and 563 | [slotmap](https://docs.rs/slotmap) are great examples. Use someone else's crate 564 | by default, and resort to writing your own only if you exhaust that option. 565 | 566 | C++ makes it hard to pull in third-party dependencies, so it's culturally normal 567 | to write new code. Rust makes it trivial to add dependencies, and so you will 568 | need to do that, even if it feels surprising for a C++ programmer. 569 | 570 | This ease of adding dependencies co-evolved with the 571 | difficulty of making data structures. It's simply a part of programming in Rust. 572 | You just can't separate the language and the ecosystem. 573 | 574 | You might argue that this dependency on third-party crates is concerning 575 | from a supply-chain security point of view. Your author would agree, but 576 | it's just the way you do things in Rust. Stop creating your own data structures. 577 | 578 | Then again: 579 | 580 | > it’s equally miserable to implement performant, low-level data structures in 581 | > C++; you’ll be specializing on lots of things like is_trivially_movable etc. - MY. 582 | 583 | ## I nevertheless have to write my own data structure. Should I use unsafe? 584 | 585 | I'm sorry to hear that. 586 | 587 | Some suggestions: 588 | 589 | * Use `Rc`, weak etc. until you really can't. 590 | * Even if you can't use a pre-existing crate for the whole data structure, 591 | perhaps you can use a crate to avoid the `unsafe` bits (for example 592 | [rental](https://docs.rs/rental/latest/rental/)) 593 | * Bear in mind that refactoring Rust is generally safer than refactoring 594 | C++ (because the compiler will point out a higher proportion of your 595 | mistakes) so a wise strategy might be to start with a fully-safe, but slow, 596 | version, establish solid tests, and then [reach for unsafe](https://doc.rust-lang.org/nomicon/). 597 | 598 | --------------------------------------------------------------------------------