├── .gitignore
├── third_party
└── mermaid
│ ├── mermaid-init.js
│ └── LICENSE
├── README.md
├── src
├── SUMMARY.md
├── processes.md
├── pets.rs
├── introduction.md
├── apis.md
├── code.md
├── codebase.md
├── signatures.md
└── types.md
├── book.toml
├── .github
└── workflows
│ └── main.yml
├── CONTRIBUTING.md
└── LICENSE
/.gitignore:
--------------------------------------------------------------------------------
1 | book
2 |
--------------------------------------------------------------------------------
/third_party/mermaid/mermaid-init.js:
--------------------------------------------------------------------------------
1 | mermaid.initialize({startOnLoad:true});
2 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ***This FAQ can be found at [https://cppfaq.rs](https://cppfaq.rs).***
2 |
3 | Build instructions:
4 | * `cargo install mdbook`
5 | * `cargo install mdbook-mermaid`
6 | * `cargo install mdbook-linkcheck`
7 | * `mdbook serve -o`
8 | * Occasionally, `mdbook test`
9 |
--------------------------------------------------------------------------------
/src/SUMMARY.md:
--------------------------------------------------------------------------------
1 | # Summary
2 |
3 | - [Introduction](./introduction.md)
4 | - [Questions about code in function bodies](./code.md)
5 | - [Questions about your function signatures](./signatures.md)
6 | - [Questions about your types](./types.md)
7 | - [Questions about your APIs](./apis.md)
8 | - [Questions about your whole codebase](./codebase.md)
9 | - [Questions about your processes](./processes.md)
10 |
--------------------------------------------------------------------------------
/book.toml:
--------------------------------------------------------------------------------
1 | [book]
2 | authors = ["Adrian Taylor", "Martin Brænne"]
3 | language = "en"
4 | multilingual = false
5 | src = "src"
6 | title = "cppfaq.rs"
7 | [preprocessor]
8 | [preprocessor.mermaid]
9 | command = "mdbook-mermaid"
10 |
11 | [preprocessor.toc]
12 | command = "mdbook-toc"
13 | renderer = ["html"]
14 |
15 | [output]
16 | [output.html]
17 | additional-js = ["third_party/mermaid/mermaid.min.js", "third_party/mermaid/mermaid-init.js"]
18 | cname = "cppfaq.rs"
19 | git-repository-url = "https://github.com/google/rust-design-faq"
20 |
21 | [output.linkcheck]
22 | follow-web-links = true
23 | optional = true
24 |
--------------------------------------------------------------------------------
/src/processes.md:
--------------------------------------------------------------------------------
1 | # Questions about your development processes
2 |
3 | ## How should I use tools differently from C++?
4 |
5 | * *Use `rustfmt` automatically everywhere.* While in C++ there are many
6 | different coding styles, the Rust community is in agreement (at least,
7 | they're in agreement that it's a good idea to be in agreement). That
8 | is codified in `rustfmt`. Use it, automatically, on every submission.
9 | * *Use `clippy` somewhere*. Its lints are useful.
10 | * *Use IDEs more liberally*. Even staunch vim-adherents (your author!)
11 | prefer to use an IDE with Rust, because it's simply invaluable to show
12 | type annotations. Type information is typically invisible in the language
13 | so in Rust you're more reliant on tooling assistance.
14 | * *Deny unsafe code* by default. (`#![forbid(unsafe_code)]`).
15 |
--------------------------------------------------------------------------------
/.github/workflows/main.yml:
--------------------------------------------------------------------------------
1 | name: github pages
2 |
3 | on:
4 | push:
5 | branches:
6 | - main
7 | pull_request:
8 | branches:
9 | - main
10 |
11 | jobs:
12 | deploy:
13 | runs-on: ubuntu-20.04
14 | steps:
15 | - uses: actions/checkout@v2
16 |
17 | - name: Setup mdBook
18 | uses: peaceiris/actions-mdbook@v1
19 | with:
20 | mdbook-version: '0.4.10'
21 | # mdbook-version: 'latest'
22 |
23 | - name: Install mdbook-linkcheck
24 | run: cargo install mdbook-linkcheck
25 |
26 | - name: Install mdbook-mermaid
27 | run: cargo install mdbook-mermaid
28 |
29 | - name: Install mdbook-toc
30 | run: cargo install mdbook-toc
31 |
32 | - run: mdbook build
33 |
34 | - run: mdbook test
35 |
36 | - name: Deploy
37 | uses: peaceiris/actions-gh-pages@v3
38 | if: ${{ github.ref == 'refs/heads/main' }}
39 | with:
40 | github_token: ${{ secrets.GITHUB_TOKEN }}
41 | publish_dir: ./book/html
42 | cname: cppfaq.rs
43 |
--------------------------------------------------------------------------------
/third_party/mermaid/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2014 - 2021 Knut Sveidqvist
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # How to Contribute
2 |
3 | We'd love to accept your patches and contributions to this project. There are
4 | just a few small guidelines you need to follow.
5 |
6 | ## Contributor License Agreement
7 |
8 | Contributions to this project must be accompanied by a Contributor License
9 | Agreement. You (or your employer) retain the copyright to your contribution;
10 | this simply gives us permission to use and redistribute your contributions as
11 | part of the project. Head over to to see
12 | your current agreements on file or to sign a new one.
13 |
14 | You generally only need to submit a CLA once, so if you've already submitted one
15 | (even if it was for a different project), you probably don't need to do it
16 | again.
17 |
18 | ## Code Reviews
19 |
20 | All submissions, including submissions by project members, require review. We
21 | use GitHub pull requests for this purpose. Consult
22 | [GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
23 | information on using pull requests.
24 |
25 | ## Community Guidelines
26 |
27 | This project follows [Google's Open Source Community
28 | Guidelines](https://opensource.google/conduct/).
29 |
--------------------------------------------------------------------------------
/src/pets.rs:
--------------------------------------------------------------------------------
1 | # // Copyright 2020 Google LLC
2 | # //
3 | # // Licensed under the Apache License, Version 2.0 (the "License");
4 | # // you may not use this file except in compliance with the License.
5 | # // You may obtain a copy of the License at
6 | # //
7 | # // https://www.apache.org/licenses/LICENSE-2.0
8 | # //
9 | # // Unless required by applicable law or agreed to in writing, software
10 | # // distributed under the License is distributed on an "AS IS" BASIS,
11 | # // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # // See the License for the specific language governing permissions and
13 | # // limitations under the License.
14 | #
15 | # use std::collections::HashSet;
16 | # struct Animal {
17 | # kind: &'static str,
18 | # is_hungry: bool,
19 | # meal_needed: &'static str,
20 | # }
21 | #
22 | # static PETS: [Animal; 4] = [
23 | # Animal {
24 | # kind: "Dog",
25 | # is_hungry: true,
26 | # meal_needed: "Kibble",
27 | # },
28 | # Animal {
29 | # kind: "Python",
30 | # is_hungry: false,
31 | # meal_needed: "Cat",
32 | # },
33 | # Animal {
34 | # kind: "Cat",
35 | # is_hungry: true,
36 | # meal_needed: "Kibble",
37 | # },
38 | # Animal {
39 | # kind: "Lion",
40 | # is_hungry: false,
41 | # meal_needed: "Kibble",
42 | # },
43 | # ];
44 | #
45 | # static NEARBY_DUCK: Animal = Animal {
46 | # kind: "Duck",
47 | # is_hungry: true,
48 | # meal_needed: "pondweed",
49 | # };
50 |
--------------------------------------------------------------------------------
/src/introduction.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 |
3 | So, you're coming from C++ and want to write Rust? Great!
4 |
5 | You have questions? We have answers.
6 |
7 | This book is a collection of frequently asked questions for those arriving from existing C++ codebases. It guides you on how to adapt your C++ thinking to the new facilities available in Rust. It should help you if you're coming from other object-oriented languages such as Java too.
8 |
9 | Although it's structured as questions and answers, it can also be read front-to-back, to give you hints about how to adapt your C++/Java thinking to a more idiomatically Rusty approach.
10 |
11 | It does not aim to teach you Rust - there are [many better resources](https://www.rust-lang.org/learn). It doesn't aim to talk about Rust idioms _in general_ - [there are great existing guides for that](https://rust-unofficial.github.io/patterns/idioms/index.html). This guide is specifically about transitioning from some other traditionally OO language. If you're coming from such a language, you'll have questions about how to achieve the same outcomes in idiomatic Rust. That's what this guide is for.
12 |
13 | # Structure
14 |
15 | The guide starts with idioms at the small scale - answering questions about how you'd write a few lines of code - and moves towards ever larger patterns - answering questions about how you'd structure your whole codebase.
16 |
17 | # Contributors
18 |
19 | The following awesome people helped write the answers here, and they're sometimes quoted using the abbreviations given.
20 |
21 | Thanks to Adam Perry[(@\_\_anp\_\_)](https://twitter.com/__anp__) (AP), Alyssa Haroldsen [(@kupiakos)](https://twitter.com/kupiakos) (AH), Augie Fackler [(@durin42)](https://twitter.com/durin42) (AF), David Tolnay [(@davidtolnay)](https://twitter.com/davidtolnay) (DT), Łukasz Anforowicz (LA), Manish Goregaokar [(@ManishEarth)](https://twitter.com/ManishEarth) (MG), Mike Forster (MF), Miguel Young de la Sota [(@DrawsMiguel)](https://twitter.com/DrawsMiguel) (MY), and Tyler Mandry [(@tmandry)](https://twitter.com/tmandry) (TM).
22 |
23 | Their views have been edited and collated by Adrian Taylor [(@adehohum)](https://twitter.com/adehohum), Chris Palmer, [danakj@chromium.org](mailto:danakj@chromium.org) and Martin Brænne. Any errors or misrepresentations are ours.
24 |
25 | Licensed under either of Apache License, Version 2.0 or MIT license at your option.
26 |
--------------------------------------------------------------------------------
/src/apis.md:
--------------------------------------------------------------------------------
1 | # Questions about designing APIs for others
2 |
3 |
4 |
5 | See also the excellent [Rust API guidelines](https://rust-lang.github.io/api-guidelines/about.html).
6 | The document you're reading aims to provide extra hints which may be especially
7 | useful to folk coming from C++, but that's the canonical reference.
8 |
9 | ## When should my type implement `Default`?
10 |
11 | Whenever you'd provide a default constructor in C++.
12 |
13 | ## When should my type implement `From`, `Into` and `TryFrom`?
14 |
15 | You should think of these as equivalent to implicit conversions in C++. Just
16 | as with C++, if there are _multiple_ ways to convert from your thing to another
17 | thing, don't implement these, but if there's a single obvious conversion, do.
18 |
19 | Usually, don't implement `Into` but instead implement `From`.
20 |
21 | ## How should I expose constructors?
22 |
23 | See the previous two answers: where it's simple and obvious, use the standard
24 | traits to make your object behavior predictable.
25 |
26 | If you need to go beyond that, remember you've got a couple of extra toys in Rust:
27 |
28 | * A "constructor" could return a `Result`
29 | * Your constructors can have names, e.g. `Vec::with_capacity`, `Box::pin`
30 |
31 | ## When should my type implement `AsRef`?
32 |
33 | If you have a type which contains another type, provide `AsRef` especially
34 | so that people can clone the inner type. It's good practice to provide explicit
35 | versions as well (for example, `String` implements `AsRef` but also
36 | provides `.as_str()`.)
37 |
38 | ## When should I implement `Copy`?
39 |
40 | > Anything that is integer-like or reference-like should be `Copy`; other things
41 | > shouldn’t. - MY
42 |
43 | > When it's efficient and when it’s an API contact you're willing to uphold. - AH
44 |
45 | Generally speaking, types which are plain-old-data can be `Copy`. Anything
46 | more nuanced with any type of state shouldn't be.
47 |
48 | ## Should I have `Arc` or `Rc` in my API?
49 |
50 | > It’s a code smell to have reference counts in your API design. You should hide
51 | > it. - TM.
52 |
53 | If you must, you will need to decide between `Rc` and `Arc` - see the next
54 | answer for some considerations. But, generally, `Arc` is better practice because
55 | it imposes fewer restrictions on your callers. Also, consider taking a look at the
56 | [`Archery` crate](https://docs.rs/archery/latest/archery/).
57 |
58 | ## Should my API be thread-safe? What does that mean?
59 |
60 | In C++, a thread-safe API usually means that you can expect your API's
61 | consumers to use objects from multiple threads. This is difficult to make safe
62 | and therefore substantial extra engineering is required to make an API
63 | thread-safe.
64 |
65 | In Rust, things differ:
66 |
67 | * it's more normal to do things across multiple threads;
68 | * you don't have to worry about your callers making mistakes here because
69 | the compiler won't let them;
70 | * you can often rely on `Send` rather than `Sync`.
71 |
72 | You certainly shouldn't be putting a `Mutex` around all your types. If your
73 | caller attempts to use the type from multiple threads, the compiler will
74 | simply stop them. It is the responsibility of the caller to use things
75 | safely.
76 |
77 | > If the library has `Arc` or `Rc` in the APIs, it may be making choices about
78 | > how you should instantiate stuff, and that’s rude. - AF
79 |
80 | There's a reasonable chance that your API can be used in parallel threads
81 | by virtue of `Send` and `Sync` being automatically derived. But - you should
82 | think through the usage model for your API clients and ensure that's true.
83 |
84 | ```rust
85 | use std::cell::RefCell;
86 | use std::collections::VecDeque;
87 | use std::sync::Mutex;
88 | use std::thread;
89 |
90 | // Imagine this is your library, exposing this interface to library
91 | // consumers...
92 | mod pizza_api {
93 |
94 | use std::thread;
95 | use std::time::Duration;
96 |
97 | pub struct Pizza {
98 | // automatically 'Send'
99 | _anchovies: u32,
100 | _pepperoni: u32,
101 | }
102 |
103 | pub fn make_pizza() -> Pizza {
104 | println!("cooking...");
105 | thread::sleep(Duration::from_millis(10));
106 | Pizza {
107 | _anchovies: 0, // yuck
108 | _pepperoni: 32,
109 | }
110 | }
111 |
112 | pub fn eat_pizza(_pizza: Pizza) {
113 | println!("yum")
114 | }
115 | }
116 |
117 | // Absolutely no changes are required to the pizza library to let
118 | // it be usable from a multithreaded context
119 | fn main() {
120 | let pizza_queue = Mutex::new(RefCell::new(VecDeque::new()));
121 | thread::scope(|s| {
122 | s.spawn(|| {
123 | let mut pizzas_eaten = 0;
124 | while pizzas_eaten < 100 {
125 | if let Some(pizza) = pizza_queue.lock().unwrap().borrow_mut().pop_front() {
126 | pizza_api::eat_pizza(pizza);
127 | pizzas_eaten += 1;
128 | }
129 | }
130 | });
131 | s.spawn(|| {
132 | for _ in 0..100 {
133 | let pizza = pizza_api::make_pizza();
134 | pizza_queue.lock().unwrap().borrow_mut().push_back(pizza);
135 | }
136 | });
137 | });
138 | }
139 | ```
140 |
141 | ## What should I `Derive` to make my code optimally usable?
142 |
143 | The [official guidelines say to be eager](https://rust-lang.github.io/api-guidelines/interoperability.html#types-eagerly-implement-common-traits-c-common-traits).
144 |
145 | But don't overpromise:
146 |
147 | > Equality can suddenly become expensive later - don’t make types comparable
148 | > unless you intend people to be able to compare instances of the type.
149 | > Allowing people to pattern match on enums is usually better. - MY
150 |
151 | Note that [`syn` is a rare case](https://docs.rs/syn/latest/syn/) in that it
152 | has so many types, and is so extensively depended upon by the rest of the Rust
153 | ecosystem, that it avoids deriving the standard traits unless explicitly
154 | commanded to do so via a cargo feature. This is an unusual pattern and should
155 | not normally be followed.
156 |
157 | ## How should I think about API design, differently from C++?
158 |
159 | > Make the most of the fact that everything is immutable by default. Things
160 | > which are mutable should stick out. - AF
161 |
162 | > Think about things which should take self and return self. - AF
163 |
164 | Refactoring is less expensive in Rust than C++ due to compiler safeguards, but
165 | _rearchitecting_ is expensive in any language. Think about "one way doors"
166 | and "two way doors" in the design space: can you undo a change later?
167 |
--------------------------------------------------------------------------------
/src/code.md:
--------------------------------------------------------------------------------
1 | # Questions about code in function bodies
2 |
3 |
4 |
5 | ## How can I avoid the performance penalty of bounds checks?
6 |
7 | Rust array and list accesses are all bounds checked. You may be worried about a performance penalty. How can you avoid that?
8 |
9 | > Contort yourself a little bit to use iterators. - MY
10 |
11 | Rust gives you choices around functional versus imperative style, but things often work better in a functional style. Specifically - if you've got something iterable, then there are probably functional methods to do what you want.
12 |
13 | For instance, suppose you need to work out what food to get at the petshop. Here's code that does this in an imperative style:
14 |
15 | ```rust
16 | {{#include pets.rs}}
17 | fn make_shopping_list_a() -> HashSet<&'static str> {
18 | let mut meals_needed = HashSet::new();
19 | for n in 0..PETS.len() { // ugh
20 | if PETS[n].is_hungry {
21 | meals_needed.insert(PETS[n].meal_needed);
22 | }
23 | }
24 | meals_needed
25 | }
26 | ```
27 |
28 | The loop index is verbose and error-prone. Let's get rid of it and loop over an iterator instead:
29 |
30 | ```rust
31 | {{#include pets.rs}}
32 | fn make_shopping_list_b() -> HashSet<&'static str> {
33 | let mut meals_needed = HashSet::new();
34 | for animal in PETS.iter() { // better...
35 | if animal.is_hungry {
36 | meals_needed.insert(animal.meal_needed);
37 | }
38 | }
39 | meals_needed
40 | }
41 | ```
42 |
43 | We're accessing the loop through an iterator, but we're still processing the elements inside a loop. It's often more idiomatic to replace the loop with a chain of iterators:
44 |
45 | ```rust
46 | {{#include pets.rs}}
47 | fn make_shopping_list_c() -> HashSet<&'static str> {
48 | PETS.iter()
49 | .filter(|animal| animal.is_hungry)
50 | .map(|animal| animal.meal_needed)
51 | .collect() // best...
52 | }
53 | ```
54 |
55 | The obvious advantage of the third approach is that it's more concise, but less obviously:
56 |
57 | * The first solution may require Rust to do array bounds checks inside each iteration of the loop, making Rust potentially slower than C++. In this sort of simple example, it likely wouldn't, but functional pipelines simply don't require bounds checks.
58 | * The final container (a `HashSet` in this case) may be able to allocate roughly the right size at the outset, using the [size_hint](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.size_hint) of a Rust iterator.
59 | * If you use iterator-style code rather than imperative code, it's more likely the Rust compiler will be able to [auto-vectorize using SIMD instructions](https://medium.com/swlh/an-adventure-in-simd-b0e8db4ccca7).
60 | * There is no mutable state within the function. This makes it easier to verify that the code is correct and to avoid introducing bugs when changing it. In this simple example it may be obvious that calling the `HashSet::insert` is the only mutation to the set, but in more complex scenarios it is quite easy to lose the overview.
61 | * And as a new arrival from C++, you may find this hard to believe: For an experienced Rustacean it'll be more readable.
62 |
63 | Here are some more iterator techniques to help avoid materializing a collection:
64 |
65 | * You can [chain two iterators together](https://doc.rust-lang.org/std/iter/struct.Chain.html) to make a longer one.
66 | * If you need to iterate two lists, [zip them together](https://doc.rust-lang.org/std/iter/struct.Zip.html) to avoid bounds checks on either.
67 | * If you want to feed all your animals, and also feed a nearby duck, just chain the iterator to `std::iter::once`:
68 |
69 | ```rust
70 | # use std::collections::HashSet;
71 | # struct Animal {
72 | # kind: &'static str,
73 | # is_hungry: bool,
74 | # meal_needed: &'static str,
75 | # }
76 | # static PETS: [Animal; 0] = [];
77 | # static NEARBY_DUCK: Animal = Animal {
78 | # kind: "Duck",
79 | # is_hungry: true,
80 | # meal_needed: "pondweed",
81 | # };
82 | fn make_shopping_list_d() -> HashSet<&'static str> {
83 | PETS.iter()
84 | .chain(std::iter::once(&NEARBY_DUCK))
85 | .filter(|animal| animal.is_hungry)
86 | .map(|animal| animal.meal_needed)
87 | .collect()
88 | }
89 | ```
90 | (Similarly, if you want to add one more item to the shopping list - maybe you're hungry, as well as your menagerie? - just add it after the `map`).
91 | * `Option` is iterable.
92 | ```rust
93 | # use std::collections::HashSet;
94 | # struct Animal {
95 | # kind: &'static str,
96 | # is_hungry: bool,
97 | # meal_needed: &'static str,
98 | # }
99 | # static PETS: [Animal; 0] = [];
100 | # struct Pond;
101 | # static MY_POND: Pond = Pond;
102 | fn pond_inhabitant(pond: &Pond) -> Option<&Animal> {
103 | // ...
104 | # None
105 | }
106 |
107 | fn make_shopping_list_e() -> HashSet<&'static str> {
108 | PETS.iter()
109 | .chain(pond_inhabitant(&MY_POND))
110 | .filter(|animal| animal.is_hungry)
111 | .map(|animal| animal.meal_needed)
112 | .collect()
113 | }
114 | ```
115 |
116 | Here's a diagram showing how data flows in this iterator pipeline:
117 |
118 | ```mermaid
119 | flowchart LR
120 | %%{ init: { 'flowchart': { 'nodeSpacing': 40, 'rankSpacing': 15 } } }%%
121 | Pets
122 | Filter([filter by hunger])
123 | Map([map to noms])
124 | Meals
125 | uniqueify([uniqueify])
126 | shopping[Shopping list]
127 | Pets ---> Filter
128 | Pond
129 | Pond ---> inhabitant
130 | inhabitant[Optional pond inhabitant]
131 | inhabitant ---> Map
132 | Filter ---> Map
133 | Map ---> Meals
134 | Meals ---> uniqueify
135 | uniqueify ---> shopping
136 | ```
137 |
138 | * Here are other iterator APIs that will come in useful:
139 | * [cloned](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.cloned)
140 | * [flatten](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.flatten)
141 |
142 | C++20 recently introduced [ranges](https://en.cppreference.com/w/cpp/ranges), a feature that allows you to pipeline operations on a collection similar to the way Rust iterators do, so this style of programming is likely to become more common in C++ too.
143 |
144 | To summarize: While in C++ you tend to operate on collections by performing a series of operations on each individual item, in Rust you'll typically apply a pipeline of operations to the whole collection. Make this mental switch and your code will not just become more idiomatic but more efficient, too.
145 |
146 | ## Isn't it confusing to use the same variable name twice?
147 |
148 | In Rust, it's common to reuse the same name for multiple variables in a function. For a C++ programmer, this is weird, but there are two good reasons to do it:
149 |
150 | * You may no longer need to change a mutable variable after a certain point, and if your code is sufficiently complex you might want the compiler to guarantee this for you:
151 |
152 | ```rust
153 | # fn spot_ate_my_slippers() -> bool {
154 | # false
155 | # }
156 | # fn feed(_: &str) {}
157 | let mut good_boy = "Spot";
158 | if spot_ate_my_slippers() {
159 | good_boy = "Rover";
160 | }
161 | let good_boy = good_boy; // never going to change my dog again, who's a good boy
162 | feed(&good_boy);
163 | ```
164 |
165 | * Another common pattern is to retain the same variable name as you gradually unwrap things to a simpler type:
166 |
167 | ```rust
168 | # let url = "http://foo.com:1234";
169 | let port_number = url.split(":").skip(2).next().unwrap();
170 | // hmm, maybe somebody else already wrote a better URL parser....? naah, probably not
171 | let port_number = port_number.parse::().unwrap();
172 | ```
173 |
174 | ## How can I avoid the performance penalty of `unwrap()`?
175 |
176 | C++ has no equivalent to Rust's `match`, so programmers coming from C++ often underuse it.
177 |
178 | A heuristic: if you find yourself `unwrap()`ing, _especially_ in an `if`/`else` statement, you should restructure your code to use a more sophisticated `match`.
179 |
180 | For example, note the `unwrap()` in here (implying some runtime branch):
181 |
182 | ```rust
183 | # fn test_parse() -> Result {
184 | # let s = "0x64a";
185 | if s.starts_with("0x") {
186 | u64::from_str_radix(s.strip_prefix("0x").unwrap(), 16)
187 | } else {
188 | s.parse::()
189 | }
190 | # }
191 | ```
192 |
193 | and no extra `unwrap()` here:
194 |
195 | ```rust
196 | # fn test_parse() -> Result {
197 | # let s = "0x64a";
198 | match s.strip_prefix("0x") {
199 | None => s.parse::(),
200 | Some(remainder) => u64::from_str_radix(remainder, 16),
201 | }
202 | # }
203 | ```
204 |
205 | `if let` and `matches!` are just as good as `match` but sometimes a little more concise. `cargo clippy` will usually tell you if you're using a `match` which can be simplified to one of those other two constructions.
206 |
207 | ## How do I access variables from within a spawned thread?
208 |
209 | Use [`std::thread::scope`](https://doc.rust-lang.org/nightly/std/thread/fn.scope.html).
210 |
211 | ## When should I use runtime checks vs jumping through hoops to do static checks?
212 |
213 | Everyone learns Rust a different way, but it's said that some people reach a
214 | point of "trait mania" where they try to encode _too much_ via the type
215 | system, and get in a mess. So, in learning Rust, you will want to strike a
216 | balance between runtime checks (easy) or static compile-time checks (more
217 | efficient but requires deeper understanding.)
218 |
219 | > It’s very personal - some people learn better if they opt out of
220 | > language features, others not. - MG
221 |
222 | Some heuristics for how to keep things simple during the beginning of your
223 | Rust journey:
224 |
225 | * It's OK to start with lots of `.unwrap()`, cloning and `Arc`/`Rc`.
226 | * Start to use more advanced language features when you feel annoyed with
227 | the amount of boilerplate. (As an expert, you'll switch to a different
228 | strategy which is to consider the virality of your choices through the
229 | codebase.)
230 | * Don't use traits until you have to. You might (for instance) need to use
231 | a trait to make some code unit testable, but overoptimizing for that too
232 | soon is a mistake. Some say that it's wise initially to avoid defining
233 | any new traits at all.
234 | * Try to keep types smaller.
235 |
236 | Specifically on reference counting,
237 |
238 | > If using Rc means you can avoid a lifetime parameter which is in half the
239 | > APIs in the project, that’s a very reasonable choice. If it avoids a single
240 | > lifetime somewhere, probably not a good idea. But measure before deciding. - MG
241 |
242 | If you want to bail out of the complexity of static checks, which runtime checks
243 | are OK?
244 |
245 | * `unwrap()` and `Option` is mostly fine.
246 | * `Arc` and `Rc` is also fine in most cases.
247 | * Extensive use of `clone()` is fine but will have a performance impact.
248 | * `Cell` is regarded as a code smell and suggests you don't understand your
249 | lifetimes - it should be used sparingly.
250 | * `unsafe` is definitely not OK. It's harder to write `unsafe` Rust than to write
251 | C or C++, because Rust has additional aliasing rules. If you're reaching for
252 | `unsafe` to work around the complexity of Rust's static type system, as a
253 | relative Rust beginner, please reconsider and look into the other options
254 | listed above.
255 |
256 | Doing lifetime magic — where "magic" means annotating a function or complex
257 | type with more than 1 lifetime, or other wizardry — is often an optimization
258 | that you can defer until later. In the beginning, and when writing small
259 | programs that you only intend to use a few times ('scripts'), copying is fine.
260 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 |
2 | Apache License
3 | Version 2.0, January 2004
4 | http://www.apache.org/licenses/
5 |
6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7 |
8 | 1. Definitions.
9 |
10 | "License" shall mean the terms and conditions for use, reproduction,
11 | and distribution as defined by Sections 1 through 9 of this document.
12 |
13 | "Licensor" shall mean the copyright owner or entity authorized by
14 | the copyright owner that is granting the License.
15 |
16 | "Legal Entity" shall mean the union of the acting entity and all
17 | other entities that control, are controlled by, or are under common
18 | control with that entity. For the purposes of this definition,
19 | "control" means (i) the power, direct or indirect, to cause the
20 | direction or management of such entity, whether by contract or
21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
22 | outstanding shares, or (iii) beneficial ownership of such entity.
23 |
24 | "You" (or "Your") shall mean an individual or Legal Entity
25 | exercising permissions granted by this License.
26 |
27 | "Source" form shall mean the preferred form for making modifications,
28 | including but not limited to software source code, documentation
29 | source, and configuration files.
30 |
31 | "Object" form shall mean any form resulting from mechanical
32 | transformation or translation of a Source form, including but
33 | not limited to compiled object code, generated documentation,
34 | and conversions to other media types.
35 |
36 | "Work" shall mean the work of authorship, whether in Source or
37 | Object form, made available under the License, as indicated by a
38 | copyright notice that is included in or attached to the work
39 | (an example is provided in the Appendix below).
40 |
41 | "Derivative Works" shall mean any work, whether in Source or Object
42 | form, that is based on (or derived from) the Work and for which the
43 | editorial revisions, annotations, elaborations, or other modifications
44 | represent, as a whole, an original work of authorship. For the purposes
45 | of this License, Derivative Works shall not include works that remain
46 | separable from, or merely link (or bind by name) to the interfaces of,
47 | the Work and Derivative Works thereof.
48 |
49 | "Contribution" shall mean any work of authorship, including
50 | the original version of the Work and any modifications or additions
51 | to that Work or Derivative Works thereof, that is intentionally
52 | submitted to Licensor for inclusion in the Work by the copyright owner
53 | or by an individual or Legal Entity authorized to submit on behalf of
54 | the copyright owner. For the purposes of this definition, "submitted"
55 | means any form of electronic, verbal, or written communication sent
56 | to the Licensor or its representatives, including but not limited to
57 | communication on electronic mailing lists, source code control systems,
58 | and issue tracking systems that are managed by, or on behalf of, the
59 | Licensor for the purpose of discussing and improving the Work, but
60 | excluding communication that is conspicuously marked or otherwise
61 | designated in writing by the copyright owner as "Not a Contribution."
62 |
63 | "Contributor" shall mean Licensor and any individual or Legal Entity
64 | on behalf of whom a Contribution has been received by Licensor and
65 | subsequently incorporated within the Work.
66 |
67 | 2. Grant of Copyright License. Subject to the terms and conditions of
68 | this License, each Contributor hereby grants to You a perpetual,
69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70 | copyright license to reproduce, prepare Derivative Works of,
71 | publicly display, publicly perform, sublicense, and distribute the
72 | Work and such Derivative Works in Source or Object form.
73 |
74 | 3. Grant of Patent License. Subject to the terms and conditions of
75 | this License, each Contributor hereby grants to You a perpetual,
76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77 | (except as stated in this section) patent license to make, have made,
78 | use, offer to sell, sell, import, and otherwise transfer the Work,
79 | where such license applies only to those patent claims licensable
80 | by such Contributor that are necessarily infringed by their
81 | Contribution(s) alone or by combination of their Contribution(s)
82 | with the Work to which such Contribution(s) was submitted. If You
83 | institute patent litigation against any entity (including a
84 | cross-claim or counterclaim in a lawsuit) alleging that the Work
85 | or a Contribution incorporated within the Work constitutes direct
86 | or contributory patent infringement, then any patent licenses
87 | granted to You under this License for that Work shall terminate
88 | as of the date such litigation is filed.
89 |
90 | 4. Redistribution. You may reproduce and distribute copies of the
91 | Work or Derivative Works thereof in any medium, with or without
92 | modifications, and in Source or Object form, provided that You
93 | meet the following conditions:
94 |
95 | (a) You must give any other recipients of the Work or
96 | Derivative Works a copy of this License; and
97 |
98 | (b) You must cause any modified files to carry prominent notices
99 | stating that You changed the files; and
100 |
101 | (c) You must retain, in the Source form of any Derivative Works
102 | that You distribute, all copyright, patent, trademark, and
103 | attribution notices from the Source form of the Work,
104 | excluding those notices that do not pertain to any part of
105 | the Derivative Works; and
106 |
107 | (d) If the Work includes a "NOTICE" text file as part of its
108 | distribution, then any Derivative Works that You distribute must
109 | include a readable copy of the attribution notices contained
110 | within such NOTICE file, excluding those notices that do not
111 | pertain to any part of the Derivative Works, in at least one
112 | of the following places: within a NOTICE text file distributed
113 | as part of the Derivative Works; within the Source form or
114 | documentation, if provided along with the Derivative Works; or,
115 | within a display generated by the Derivative Works, if and
116 | wherever such third-party notices normally appear. The contents
117 | of the NOTICE file are for informational purposes only and
118 | do not modify the License. You may add Your own attribution
119 | notices within Derivative Works that You distribute, alongside
120 | or as an addendum to the NOTICE text from the Work, provided
121 | that such additional attribution notices cannot be construed
122 | as modifying the License.
123 |
124 | You may add Your own copyright statement to Your modifications and
125 | may provide additional or different license terms and conditions
126 | for use, reproduction, or distribution of Your modifications, or
127 | for any such Derivative Works as a whole, provided Your use,
128 | reproduction, and distribution of the Work otherwise complies with
129 | the conditions stated in this License.
130 |
131 | 5. Submission of Contributions. Unless You explicitly state otherwise,
132 | any Contribution intentionally submitted for inclusion in the Work
133 | by You to the Licensor shall be under the terms and conditions of
134 | this License, without any additional terms or conditions.
135 | Notwithstanding the above, nothing herein shall supersede or modify
136 | the terms of any separate license agreement you may have executed
137 | with Licensor regarding such Contributions.
138 |
139 | 6. Trademarks. This License does not grant permission to use the trade
140 | names, trademarks, service marks, or product names of the Licensor,
141 | except as required for reasonable and customary use in describing the
142 | origin of the Work and reproducing the content of the NOTICE file.
143 |
144 | 7. Disclaimer of Warranty. Unless required by applicable law or
145 | agreed to in writing, Licensor provides the Work (and each
146 | Contributor provides its Contributions) on an "AS IS" BASIS,
147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 | implied, including, without limitation, any warranties or conditions
149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 | PARTICULAR PURPOSE. You are solely responsible for determining the
151 | appropriateness of using or redistributing the Work and assume any
152 | risks associated with Your exercise of permissions under this License.
153 |
154 | 8. Limitation of Liability. In no event and under no legal theory,
155 | whether in tort (including negligence), contract, or otherwise,
156 | unless required by applicable law (such as deliberate and grossly
157 | negligent acts) or agreed to in writing, shall any Contributor be
158 | liable to You for damages, including any direct, indirect, special,
159 | incidental, or consequential damages of any character arising as a
160 | result of this License or out of the use or inability to use the
161 | Work (including but not limited to damages for loss of goodwill,
162 | work stoppage, computer failure or malfunction, or any and all
163 | other commercial damages or losses), even if such Contributor
164 | has been advised of the possibility of such damages.
165 |
166 | 9. Accepting Warranty or Additional Liability. While redistributing
167 | the Work or Derivative Works thereof, You may choose to offer,
168 | and charge a fee for, acceptance of support, warranty, indemnity,
169 | or other liability obligations and/or rights consistent with this
170 | License. However, in accepting such obligations, You may act only
171 | on Your own behalf and on Your sole responsibility, not on behalf
172 | of any other Contributor, and only if You agree to indemnify,
173 | defend, and hold each Contributor harmless for any liability
174 | incurred by, or claims asserted against, such Contributor by reason
175 | of your accepting any such warranty or additional liability.
176 |
177 | END OF TERMS AND CONDITIONS
178 |
179 | APPENDIX: How to apply the Apache License to your work.
180 |
181 | To apply the Apache License to your work, attach the following
182 | boilerplate notice, with the fields enclosed by brackets "[]"
183 | replaced with your own identifying information. (Don't include
184 | the brackets!) The text should be enclosed in the appropriate
185 | comment syntax for the file format. We also recommend that a
186 | file or class name and description of purpose be included on the
187 | same "printed page" as the copyright notice for easier
188 | identification within third-party archives.
189 |
190 | Copyright [yyyy] [name of copyright owner]
191 |
192 | Licensed under the Apache License, Version 2.0 (the "License");
193 | you may not use this file except in compliance with the License.
194 | You may obtain a copy of the License at
195 |
196 | http://www.apache.org/licenses/LICENSE-2.0
197 |
198 | Unless required by applicable law or agreed to in writing, software
199 | distributed under the License is distributed on an "AS IS" BASIS,
200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 | See the License for the specific language governing permissions and
202 | limitations under the License.
203 |
--------------------------------------------------------------------------------
/src/codebase.md:
--------------------------------------------------------------------------------
1 | # Questions about your whole codebase
2 |
3 |
4 |
5 | ## The C++ observer pattern is hard in Rust. What to do?
6 |
7 | The C++ observer pattern usually means that there are broadcasters sending messages to consumers:
8 |
9 | ```mermaid
10 | flowchart TB
11 | broadcaster_a[Broadcaster A]
12 | broadcaster_b[Broadcaster B]
13 | consumer_a[Consumer A]
14 | consumer_b[Consumer B]
15 | consumer_c[Consumer C]
16 | broadcaster_a --> consumer_a
17 | broadcaster_b --> consumer_a
18 | broadcaster_a --> consumer_b
19 | broadcaster_b --> consumer_b
20 | broadcaster_a --> consumer_c
21 | broadcaster_b --> consumer_c
22 | ```
23 |
24 | The broadcasters maintain lists of consumers, and the consumers act in response to messages (often mutating their own state.)
25 |
26 | This doesn't work in Rust, because it requires the broadcasters to hold mutable references to the consumers.
27 |
28 | What do you do?
29 |
30 | ### Option 1: make everything runtime-checked
31 |
32 | Each of your consumers could become an `Rc>` or, if you need thread-safety, an `Arc>`.
33 |
34 | The [`Rc`](https://doc.rust-lang.org/std/rc/struct.Rc.html) or [`Arc`](https://doc.rust-lang.org/std/sync/struct.Arc.html) allows broadcasters to share ownership of a consumer. The [`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) or [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) allows each broadcaster to acquire a mutable reference to a consumer when it needs to send a message.
35 |
36 | This example shows how, in Rust, you [may independently choose reference counting _or_ interior mutability](https://manishearth.github.io/blog/2015/05/27/wrapper-types-in-rust-choosing-your-guarantees/). In this case we need both.
37 |
38 | Just like typical reference counting in C++, `Rc` and `Arc` have the option to provide a weak pointer, so the lifetime of each consumer doesn't need to be extended unnecessarily. As an aside, it would be nice if Rust had an `Rc`-like type which enforces exactly one owner, and multiple weak ptrs. `Rc` could be wrapped quite easily to do this.
39 |
40 | Reference counting is frowned-upon in C++ because it's expensive. But, in Rust, not so much:
41 |
42 | * Few objects are reference counted; the majority of objects are owned statically.
43 | * Even when objects are reference counted, those counts are rarely incremented and decremented because you can (and do) pass around `&Rc>` most of the time. In C++, the "copy by default" mode means it's much more common to increment and decrement reference counts.
44 |
45 | In fact, the compile-time guarantees might cause you to do _less_ reference counting than C++:
46 |
47 | > In Servo there is a reference count but far fewer objects are reference counted than in the rest of Firefox, because you don’t need to be paranoid - MG
48 |
49 | However: Rust does [not prevent reference cycles](https://doc.rust-lang.org/book/ch15-06-reference-cycles.html), although they're only possible if you're using _both_ reference counting and interior mutability.
50 |
51 | ### Option 2: drive the objects from the code, not the other way round
52 |
53 | In C++, it's common to have all behavior within classes. Those classes _are_ the total behavior of the system, and so they must interact with one another. The observer pattern is common.
54 |
55 | ```mermaid
56 | flowchart TB
57 | broadcaster_a[Broadcaster A]
58 | consumer_a[Consumer A]
59 | consumer_b[Consumer B]
60 | broadcaster_a -- observer --> consumer_a
61 | broadcaster_a -- observer --> consumer_b
62 | ```
63 |
64 | In Rust, it's more common to have some _external_ function which drives overall behavior.
65 |
66 | ```mermaid
67 | flowchart TB
68 | main(Main)
69 | broadcaster_a[Broadcaster A]
70 | consumer_a[Consumer A]
71 | consumer_b[Consumer B]
72 | main --1--> broadcaster_a
73 | broadcaster_a --2--> main
74 | main --3--> consumer_a
75 | main --4--> consumer_b
76 | ```
77 |
78 | With this sort of design, it's relatively straightforward to take some output from one object and pass it into another object, with no need for the objects to interact at all.
79 |
80 | In the most extreme case, this becomes the [Entity-Component-System architecture](https://en.wikipedia.org/wiki/Entity_component_system) used in game design.
81 |
82 | > Game developers seem to have completely solved this problem - we can learn from them. - MY
83 |
84 | ### Option 3: use channels
85 |
86 | The observer pattern is a way to decouple large, single-threaded C++ codebases. But if you're trying to decouple a codebase in Rust, perhaps you should assume multi-threading by default? Rust has built-in [channels](https://doc.rust-lang.org/std/sync/mpsc/), and the [crossbeam](https://docs.rs/crossbeam/0.8.0/crossbeam/) crate provides multi-producer, multi-consumer channels.
87 |
88 | > I'm a Rustacean, we assume massively parallel unless told otherwise :) - MG
89 |
90 | ## That's all very well, but I have an existing C++ object broadcasting events. How exactly should I observe it?
91 |
92 | If your Rust object is a consumer of events from some pre-existing C++ producer, all the above options remain possible.
93 |
94 | * You can make your object reference counted and have C++ own such a reference (potentially a weak reference)
95 | * C++ can deliver the message into a general message bucket. An external function reads messages from that bucket and invokes the Rust object that should handle it. This means the reference counting doesn't need to extend to the Rust objects outside that boundary layer.
96 | * You can have a shim object which converts the C++ callback into some message and injects it into a channel-based world.
97 |
98 | ## Some of my C++ objects have shared mutable state. How can I make them safe in Rust?
99 |
100 | You're going to have to do something with [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html): either `RefCell` or its multithreaded equivalent, `RwLock`.
101 |
102 | You have three decisions to make:
103 |
104 | 1. Will only Rust code access _this particular instance_ of this object, or might C++ access it too?
105 | 2. If both C++ and Rust may access the object, how do you avoid conflicts?
106 | 3. How should Rust code react if the object is not available, because something else is using it?
107 |
108 | If only Rust code can use this particular instance of shared state, then simply wrap it in `RefCell` (single-threaded) or `RwLock` (multi-threaded). Build a wrapper type such that callers aren't able to access the object directly, but instead only via the lock type.
109 |
110 | If C++ also needs to access this particular instance of the shared state, it's more complex. There are presumably some invariants regarding use of this data in C++ - otherwise it would crash all the time. Perhaps the data can be used only from one thread, or perhaps it can only be used with a given mutex held. Your goal is to translate those invariants into an idiomatic Rust API that can be checked (ideally) at compile-time, and (failing that) at runtime.
111 |
112 | For example, imagine:
113 |
114 | ```cpp
115 | class SharedMutableGoat {
116 | public:
117 | void eat_grass(); // mutates tummy state
118 | };
119 |
120 | std::mutex lock;
121 | SharedMutableGoat* billy; // only access when owning lock
122 | ```
123 |
124 | Your idiomatic Rust wrapper might be:
125 |
126 | ```rust
127 | # mod ffi {
128 | # #[allow(non_camel_case_types)]
129 | # pub struct lock_guard;
130 | # pub fn claim_lock() -> lock_guard { lock_guard{} }
131 | # pub fn eat_grass() {}
132 | # pub fn release_lock(lock: &mut lock_guard) {}
133 | # }
134 | struct SharedMutableGoatLock {
135 | lock: ffi::lock_guard, // owns a std::lock_guard somehow
136 | };
137 |
138 | // Claims the lock, returns a new SharedMutableGoatLock
139 | fn lock_shared_mutable_goat() -> SharedMutableGoatLock {
140 | SharedMutableGoatLock { lock: ffi::claim_lock() }
141 | }
142 |
143 | impl SharedMutableGoatLock {
144 | fn eat_grass(&mut self) {
145 | ffi::eat_grass(); // Acts on the global goat
146 | }
147 | }
148 |
149 | impl Drop for SharedMutableGoatLock {
150 | fn drop(&mut self) {
151 | ffi::release_lock(&mut self.lock);
152 | }
153 | }
154 | ```
155 |
156 | Obviously, lots of permutations are possible, but the goal is to ensure that it's simply compile-time impossible to act on the global state unless appropriate preconditions are met.
157 |
158 | The final decision is how to react if the object is not available. This decision can apply with C++ mutexes or with Rust locks (for example `RwLock`). As in C++, the two major options are:
159 |
160 | * Block until the object becomes available.
161 | * Try to lock, and if the object is not available, do something else.
162 |
163 | There can be a third option if you're using async Rust. If the data isn't available, you may be able to return to your event loop using an async version of the lock ([Tokio example](https://docs.rs/tokio/1.5.0/tokio/sync/struct.RwLock.html#method.read), [async_std example](https://docs.rs/async-std/1.9.0/async_std/sync/struct.RwLock.html)).
164 |
165 | ## How do I do a singleton?
166 |
167 | Use [OnceCell](https://doc.rust-lang.org/std/cell/struct.OnceCell.html).
168 |
169 | ## What's the best way to retrofit Rust's parallelism benefits to an existing codebase?
170 |
171 | When parallelizing an existing codebase, first check that all existing types are correctly [`Send`](https://doc.rust-lang.org/std/marker/trait.Send.html) and [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html). Generally, though, you should try to avoid implementing these yourself - instead use pre-existing wrapper types which enforce the correct contract (for example, [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html)).
172 |
173 | After that:
174 |
175 | > If you can solve your problem by throwing Rayon at it, do. It’s magic - MG
176 |
177 | > If your task is CPU-bound, Rayon solves this handily. - MY
178 |
179 | [Rayon](https://docs.rs/rayon/1.5.0/rayon/) offers parallel constructs - for example parallel iterators - which can readily be retrofitted to an existing codebase. It also allows you to create and join tasks. Using Rayon can help _simplify_ your code and eliminate lots of manual scheduling logic.
180 |
181 | If your tasks are IO-bound, then you may need to look into async Rust, but that's hard to pull into an existing codebase.
182 |
183 | ## What's the best way to architect a new codebase for parallelism?
184 |
185 | In brief, like in other languages, you have a choice of architectures:
186 |
187 | * Message-passing, using event loops which listen on a channel, receive `Send` data and pass it on.
188 | * More traditional multithreading using `Sync` data structures such as mutexes (and perhaps Rayon).
189 |
190 | > There's probably a bias towards message-passing, and that's probably well-informed by its extensibility. - MG
191 |
192 | ## I need a list of nodes which can refer to one another. How?
193 |
194 | You can't easily do self-referential data structures in Rust. The usual workaround is to [use an arena](https://manishearth.github.io/blog/2021/03/15/arenas-in-rust/) and replace references from one node to another with node IDs.
195 |
196 | An arena is typically a `Vec` (or similar), and the node IDs are a newtype wrapper around a simple integer index.
197 |
198 | Obviously, Rust doesn't check that your node IDs are valid. If you don't have proper references, what stops you from having stale IDs?
199 |
200 | Arenas are often purely additive, which means that you can add entries but not delete them ([example](https://github.com/Manishearth/elsa/blob/master/examples/mutable_arena.rs)). If you must have an arena which deletes things, then use generational IDs; see the [generational-arena](https://docs.rs/generational-arena/) crate and this [RustConf keynote](https://www.youtube.com/watch?v=aKLntZcp27M) for more details.
201 |
202 | If arenas still sound like a nasty workaround, consider that you might choose an arena anyway for other reasons:
203 |
204 | * All of the objects in the arena will be freed at the end of the arena's lifetime, instead of during their manipulation, which can give very low latency for some use-cases. [Bumpalo](https://docs.rs/bumpalo/3.6.1/bumpalo/) formalizes this.
205 | * The rest of your program might have real Rust references into the arena. You can give the arena a named lifetime (`'arena` for example), making the provenance of those references very clear.
206 |
207 | ## Should I have a few big crates or lots of small ones?
208 |
209 | In the past, it was recommended to have small crates to get optimal build time.
210 | Incremental builds generally make this unnecessary now. You should arrange your
211 | crates optimally for your semantic needs.
212 |
213 | ## What crates should everyone know about?
214 |
215 | | Crate | Description |
216 | |:--------------------------------------- |:---------------------------------- |
217 | | [rayon](https://docs.rs/rayon/) | parallelizing |
218 | | [serde](https://docs.rs/serde/) | serializing and deserializing |
219 | | [crossbeam](https://docs.rs/crossbeam/) | all sorts of parallelism tools |
220 | | [itertools](https://docs.rs/itertools/) | makes it slightly more pleasant to work with iterators. (For instance, if you want to join an iterator of strings, you can just go ahead and do that, without needing to collect the strings into a `Vec` first) |
221 | | [petgraph](https://docs.rs/petgraph/) | graph data structures |
222 | | [slotmap](https://docs.rs/slotmap/) | arena-like key-value map |
223 | | [nom](https://docs.rs/nom/) | parsing |
224 | | [clap](https://docs.rs/clap/) | command-line parsing |
225 | | [regex](https://docs.rs/regex/) | err, regular expressions |
226 | | [ring](https://docs.rs/ring/) | the leading crypto library |
227 | | [nalgebra](https://docs.rs/nalgebra/) | linear algebra |
228 | | [once_cell](https://docs.rs/once_cell/) | complex static data |
229 |
230 | ## How should I call C++ functions from Rust and vice versa?
231 |
232 | Use [cxx](https://cxx.rs).
233 |
234 | Oh, you want a justification? In that case, here's the history
235 | which brought us to this point.
236 |
237 | From the beginning, Rust supported calling C functions using [`extern "C"`](https://doc.rust-lang.org/std/keyword.extern.html),
238 | [`#[repr(C)]`](https://doc.rust-lang.org/reference/type-layout.html#the-c-representation)
239 | and [`#[no_mangle]`](https://doc.rust-lang.org/reference/abi.html#the-no_mangle-attribute).
240 | Such callable C functions had to be declared manually in Rust:
241 |
242 | ```mermaid
243 | sequenceDiagram
244 | Rust-->>extern: unsafe Rust function call
245 | extern-->>C: call from Rust to C
246 | participant extern as Rust unsafe extern "C" fn
247 | participant C as Existing C function
248 | ```
249 |
250 | [`bindgen`](https://rust-lang.github.io/rust-bindgen/) was invented
251 | to generate these declarations automatically from existing C/C++ header
252 | files. It has grown to understand an astonishingly wide variety of C++
253 | constructs, but its generated bindings are still `unsafe` functions
254 | with lots of pointers involved.
255 |
256 | ```mermaid
257 | sequenceDiagram
258 | Rust-->>extern: unsafe Rust function call
259 | extern-->>C: call from Rust to C++
260 | participant extern as Bindgen generated bindings
261 | participant C as Existing C++ function
262 | ```
263 |
264 | Interacting with `bindgen`-generated bindings requires unsafe Rust;
265 | you will likely have to manually craft idiomatic safe Rust wrappers.
266 | This is time-consuming and error-prone.
267 |
268 | [cxx](https://cxx.rs) automates a lot of that process. Unlike `bindgen`
269 | it doesn't learn about functions from existing C++ headers. Instead,
270 | you specify cross-language interfaces in a Rust-like interface definition
271 | language (IDL) within your Rust file. cxx generates both C++ and Rust code
272 | from that IDL, marshaling data behind the scenes on both sides such that
273 | you can use standard language features in your code. For example, you'll
274 | find idiomatic Rust wrappers for [`std::string`](https://docs.rs/cxx/1.0.50/cxx/struct.CxxString.html)
275 | and [`std::unique_ptr`](https://docs.rs/cxx/1.0.50/cxx/struct.UniquePtr.html)
276 | and idiomatic C++ wrappers for [a Rust slice](https://cxx.rs/binding/slice.html).
277 |
278 | ```mermaid
279 | sequenceDiagram
280 | Rust-->>rsbindings: safe idiomatic Rust function call
281 | rsbindings-->>cxxbindings: hidden C ABI call using marshaled data
282 | cxxbindings-->>cpp: call to standard idiomatic C++
283 | participant rsbindings as cxx-generated Rust code
284 | participant cxxbindings as cxx-generated C++ code
285 | participant cpp as C++ function using STL types
286 | ```
287 |
288 | > In the bindgen case even more work goes into wrapping idiomatic C++ signatures into something bindgen compatible: unique ptrs to raw ptrs, Drop impls on the Rust side, translating string types ... etc. The typical real-world binding we've converted from bindgen to cxx in my codebase has been -500 lines (mostly unsafe code) +300 lines (mostly safe code; IDL included). - DT
289 |
290 | The greatest benefit is that cxx sufficiently understands C++ STL
291 | object ownership norms that the generated bindings can be used from
292 | safe Rust code.
293 |
294 | At present, there is no established solution which combines the idiomatic, safe
295 | interoperability offered by `cxx` with the automatic generation offered by
296 | `bindgen`. It's not clear whether this is even _possible_ but [several](https://github.com/google/autocxx)
297 | [projects](https://github.com/google/mosaic) are aiming in this direction.
298 |
299 | ## I'm getting a lot of binary bloat.
300 |
301 | In Rust you have a free choice between `impl Trait` and `dyn Trait`. See
302 | [this answer](signatures.md#When_should_I_take_or_return_dyn_Trait), too. `impl Trait` tends
303 | to be the default, and results in large binaries as much code can be duplicated.
304 | If you have this problem, consider using `dyn Trait`. Other options include
305 | the 'thin template pattern' (an example is `serde_json` where the code to read
306 | from [a string and a slice](https://github.com/serde-rs/json/blob/master/src/read.rs#L172)
307 | would be duplicated entirely, but instead one delegates to the other and
308 | requests slightly different behavior.)
309 |
--------------------------------------------------------------------------------
/src/signatures.md:
--------------------------------------------------------------------------------
1 | # Questions about your function signatures
2 |
3 |
4 |
5 | ## Should I return an iterator or a collection?
6 |
7 | > Pretty much always return an iterator. - AH
8 |
9 | We suggested you [use iterators a lot in your code](./code.md#how-can-i-avoid-the-performance-penalty-of-bounds-checks). Share the love! Give iterators to your callers too.
10 |
11 | If you *know* your caller will store the items you're returning in a concrete collection, such as a `Vec` or a `HashSet`, you may want to return that. In all other cases, return an iterator.
12 |
13 | Your caller might:
14 | * Collect the iterator into a `Vec`
15 | * Collect it into a `HashSet` or some other specialized container
16 | * Loop over the items
17 | * Filter them or otherwise completely ignore some
18 |
19 | Collecting the items into vector will only turn out to be right in one of these cases. In the other cases, you're wasting memory and CPU time by building a concrete collection.
20 |
21 | This is weird for C++ programmers because iterators don't usually have robust references into the underlying data. Even Java iterators are scary, throwing `ConcurrentModificationExceptions` when you least expect it. Rust prevents that, at compile time. If you _can_ return an iterator, you should.
22 |
23 | ```mermaid
24 | flowchart LR
25 | subgraph Caller
26 | it_ref[reference to iterator]
27 | end
28 | subgraph it_outer[Iterator]
29 | it[Iterator]
30 | it_ref --reference--> it
31 | end
32 | subgraph data[Underlying data]
33 | dat[Underlying data]
34 | it --reference--> dat
35 | end
36 | ```
37 |
38 | ## How flexible should my parameters be?
39 |
40 | Which of these is best?
41 |
42 | ```rust
43 | fn a(params: &[String]) {
44 | // ...
45 | }
46 |
47 | fn b(params: &[&str]) {
48 | // ...
49 | }
50 |
51 | fn c(params: &[impl AsRef]) {
52 | // ...
53 | }
54 | ```
55 |
56 | (You'll need to make an equivalent decision in other cases, e.g. `Path` versus `PathBuf` versus `AsRef`.)
57 |
58 | None of the options is clearly superior; for each option, there's a case it can't handle that the others can:
59 |
60 | ```rust
61 | # fn a(params: &[String]) {
62 | # }
63 | # fn b(params: &[&str]) {
64 | # }
65 | # fn c(params: &[impl AsRef]) {
66 | # }
67 | fn main() {
68 | a(&[]);
69 | // a(&["hi"]); // doesn't work
70 | a(&vec![format!("hello")]);
71 |
72 | b(&[]);
73 | b(&["hi"]);
74 | // b(&vec![format!("hello")]); // doesn't work
75 |
76 | // c(&[]); // doesn't work
77 | c(&["hi"]);
78 | c(&vec![format!("hello")]);
79 | }
80 | ```
81 |
82 | So you have a variety of interesting ways to _slightly_ annoy your callers under different circumstances. Which is best?
83 |
84 | `AsRef` has some advantages: if a caller has a `Vec`, they can use that directly, which would be impossible with the other options. But if they want to pass an empty list, they'll have to explicitly specify the type (for instance `&Vec::::new()`).
85 |
86 | > Not a huge fan of AsRef everywhere - it's just saving the caller typing. If you have lots of AsRef then nothing is object-safe. - MG
87 |
88 | TL;DR: choose the middle option, `&[&str]`. If your caller happens to have a vector of `String`, it's relatively little work to get a slice of `&str`:
89 |
90 | ```rust
91 | # fn b(params: &[&str]) {
92 | # }
93 |
94 | fn main() {
95 | // Instead of b(&vec![format!("hello")]);
96 | let hellos = vec![format!("hello")];
97 | b(&hellos.iter().map(String::as_str).collect::>());
98 | }
99 | ```
100 |
101 | ## How do I overload constructors?
102 |
103 | You can't do this:
104 |
105 | ```rust
106 | # struct BirthdayCard {}
107 | impl BirthdayCard {
108 | fn new(name: &str) -> Self {
109 | # Self{}
110 | // ...
111 | }
112 |
113 | // Can't add more overloads:
114 | //
115 | // fn new(name: &str, age: i32) -> BirthdayCard { ... }
116 | //
117 | // fn new(name: &str, text: &str) -> BirthdayCard { ... }
118 | }
119 | ```
120 |
121 | If you have a default constructor, and a few variants for other cases, you can simply write them as different static methods. An idiomatic way to do this is to write a `new()` constructor and then `with_foo()` constructors that apply the given "foo" when constructing.
122 |
123 | ```rust
124 | # struct Racoon {}
125 | impl Racoon {
126 | fn new() -> Self {
127 | # Self{}
128 | // ...
129 | }
130 | fn with_age(age: usize) -> Self {
131 | # Self{}
132 | // ...
133 | }
134 | }
135 | ```
136 |
137 | If you have have a bunch of constructors and no default, it may make sense to instead provide a set of `new_foo()` constructors.
138 |
139 | ```rust
140 | # struct Animal {}
141 | impl Animal {
142 | fn new_squirrel() -> Self {
143 | # Self{}
144 | // ...
145 | }
146 | fn new_badger() -> Self {
147 | # Self{}
148 | // ...
149 | }
150 | }
151 | ```
152 |
153 | For a more complex situation, you may use [the builder pattern](https://rust-lang.github.io/api-guidelines/type-safety.html#builders-enable-construction-of-complex-values-c-builder). The builder has a set of methods which take `&mut self` and return `&mut Self`. Then add a `build()` that returns the final constructed object.
154 |
155 | ```rust
156 | struct BirthdayCard {}
157 |
158 | struct BirthdayCardBuilder {}
159 | impl BirthdayCardBuilder {
160 | fn new(name: &str) -> Self {
161 | # Self{}
162 | // ...
163 | }
164 |
165 | fn age(&mut self, age: i32) -> &mut Self {
166 | # self
167 | // ...
168 | }
169 |
170 | fn text(&mut self, text: &str) -> &mut Self {
171 | # self
172 | // ...
173 | }
174 |
175 | fn build(&mut self) -> BirthdayCard { BirthdayCard { /* ... */ } }
176 | }
177 | ```
178 |
179 | You can then [chain these](https://rust-lang.github.io/api-guidelines/type-safety.html#non-consuming-builders-preferred) into short or long constructions, passing parameters as necessary:
180 |
181 | ```rust
182 | # struct BirthdayCard {}
183 | #
184 | # struct BirthdayCardBuilder {}
185 | # impl BirthdayCardBuilder {
186 | # fn new(name: &str) -> BirthdayCardBuilder {
187 | # Self{}
188 | # // ...
189 | # }
190 | #
191 | # fn age(&mut self, age: i32) -> &mut BirthdayCardBuilder {
192 | # self
193 | # // ...
194 | # }
195 | #
196 | # fn text(&mut self, text: &str) -> &mut BirthdayCardBuilder {
197 | # self
198 | # // ...
199 | # }
200 | #
201 | # fn build(&mut self) -> BirthdayCard { BirthdayCard { /* ... */ } }
202 | # }
203 | #
204 | fn main() {
205 | let card = BirthdayCardBuilder::new("Paul")
206 | .age(64)
207 | .text("Happy Valentine's Day!")
208 | .build();
209 | }
210 | ```
211 |
212 | Note another advantage of builders: Overloaded constructors often don't provide all possible combinations of parameters, whereas with the builder pattern, you can combine exactly the parameters you want.
213 |
214 | ## When must I use `#[must_use]`?
215 |
216 | > Use it on Results and mutex locks. - MG
217 |
218 | `#[must_use]` causes a compile error if the caller ignores the return value.
219 |
220 | Rust functions are often single-purpose. They either:
221 |
222 | * Return a value without any side effects; or
223 | * Do something (i.e. have side effects) and return nothing.
224 |
225 | In neither case do you need to think about `#[must_use]`. (In the first case,
226 | nobody would call your function unless they were going to use the result.)
227 |
228 | `#[must_use]` is useful for those rarer functions which return a result _and_
229 | have side effects. In most such cases, it's wise to specify `#[must_use]`, unless
230 | the return value is truly optional (for example in
231 | [`HashMap::insert`](https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.insert)).
232 |
233 | ## When should I take parameters by value?
234 |
235 | Move semantics are more common in Rust than in C++.
236 |
237 | > In C++ moves tend to be an optimization, whereas in Rust they're a key semantic part of the program. - MY
238 |
239 | To a first approximation, you should assume similar performance when passing
240 | things by (moved) value or by reference. It's true that a move may turn out to
241 | be a `memcpy`, but it's often optimized away.
242 |
243 | > Express the ownership relationship in the type system, instead of trying to second-guess the compiler for efficiency. - AF
244 |
245 | The moves are, of course, destructive - and unlike in C++, the compiler
246 | enforces that you don't reuse a variable that has been moved.
247 | Some C++ objects become toxic after they've moved; that's not a
248 | risk in Rust.
249 |
250 | So here's the heuristic: if a caller shouldn't be able to use an object again,
251 | pass it via move semantics in order to consume it.
252 |
253 | An extreme example: a UUID is supposed to be globally unique - it might cause a
254 | logic error for a caller to retain knowledge of a UUID after passing it to a callee.
255 |
256 | More generally, consume data enthusiastically to avoid logical errors during future
257 | refactorings. For instance, if some command-line options are overridden by a
258 | runtime choice, consume those old options - then any future refactoring which
259 | uses them after that point will give you a compile error. This pattern is
260 | surprisingly effective at spotting errors in your assumptions.
261 |
262 | ## Should I ever take `self` by value?
263 |
264 | Sometimes. If you've got a member function which destroys or transforms a thing,
265 | it should take `self` by value. Examples:
266 |
267 | * Closing a file and returning a result code.
268 | * A builder-pattern object which spits out the thing it was building. ([Example](https://docs.rs/bindgen/0.59.0/bindgen/struct.Builder.html#method.generate)).
269 |
270 | ## How do I take a thing, and a reference to something within that thing?
271 |
272 | For example, suppose you want to give all of your dogs to your friend, yet also
273 | tell your friend which one of the dogs is the Best Boy or Girl.
274 |
275 | ```cpp
276 | struct PetInformation {
277 | std::vector dogs;
278 | Dog& BestBoy;
279 | Dog& BestGirl;
280 | }
281 |
282 | PetInformation GetPetInformation() {
283 | // ...
284 | }
285 | ```
286 |
287 | Generally this is an indication that your types or functions are not split down
288 | in the correct
289 | way:
290 |
291 | > This is a decomposition problem. Once you’ve found the correct decomposition, everything
292 | > else just works. The code almost writes itself. - AF
293 |
294 | ```rust
295 | # struct Dog;
296 | struct PetInformation(Vec);
297 |
298 | fn get_pet_information() -> PetInformation {
299 | // ...
300 | # PetInformation(Vec::new())
301 | }
302 |
303 | fn identify_best_boy(pet_information: &PetInformation) -> &Dog {
304 | // ...
305 | # pet_information.0.get(0).unwrap()
306 | }
307 | ```
308 |
309 | One use-case is when you want to act on some data, depending on its contents...
310 | but you also wanted to do something with those contents that you previously
311 | identified.
312 |
313 | ```rust
314 | # struct Key;
315 | struct Door { locked: bool }
316 |
317 | struct Car {
318 | ignition: Option,
319 | door: Door,
320 | }
321 |
322 | fn steal_car(car: Car) {
323 | match car {
324 | Car {
325 | ignition: Some(ref key),
326 | door: Door { locked: false }
327 | } => drive_away_normally(car /* , key */),
328 | _ => break_in_and_hotwire(car)
329 | }
330 | }
331 |
332 | fn drive_away_normally(car: Car /* , key: &Key */) {
333 | // Annoying to have to repeat this code...
334 | let key = match car {
335 | Car {
336 | ignition: Some(ref key),
337 | ..
338 | } => key,
339 | _ => unreachable!()
340 | };
341 | turn_key(key);
342 | // ...
343 | }
344 |
345 | # fn turn_key(key: &Key) {}
346 | # fn break_in_and_hotwire(car: Car) {}
347 | ```
348 |
349 | If this repeated matching gets annoying, it's relatively easy
350 | to extract it to a function.
351 |
352 | ```rust
353 | # fn turn_key(key: &Key) {}
354 | # fn break_in_and_hotwire(car: Car) {}
355 | # struct Key;
356 | # struct Door { locked: bool }
357 | # struct Car {
358 | # ignition: Option,
359 | # door: Door,
360 | # }
361 |
362 | impl Car {
363 | fn get_usable_key(&self) -> Option<&Key> {
364 | match self {
365 | Car {
366 | ignition: Some(ref key),
367 | door: Door { locked: false }
368 | } => Some(key),
369 | _ => None,
370 | }
371 | }
372 | }
373 |
374 | fn steal_car(car: Car) {
375 | match car.get_usable_key() {
376 | None => break_in_and_hotwire(car),
377 | Some(_) => drive_away_normally(car),
378 | }
379 | }
380 |
381 | fn drive_away_normally(car: Car) {
382 | turn_key(car.get_usable_key().unwrap());
383 | }
384 | ```
385 |
386 | ## When should I return `impl Trait`?
387 |
388 | Your main consideration should be API stability. If your caller doesn't
389 | _need_ to know the concrete implementation type, then don't tell it. That
390 | gives you flexibility to change your implementation in future without breaking
391 | compatibility.
392 |
393 | Note [Hyrum's Law](https://www.hyrumslaw.com/)!
394 |
395 | Using `impl Trait` doesn't solve _all_ possible API stability concerns, because
396 | even `impl Trait` leaks auto-traits such as `Send` and `Sync`.
397 |
398 | ## I miss function overloading! What do I do?
399 |
400 | Use a trait to implement the behavior you used to have.
401 |
402 | For example, in C++:
403 |
404 | ```cpp
405 | class Dog {
406 | public:
407 | void eat(Dogfood);
408 | void eat(DeliveryPerson);
409 | };
410 | ```
411 |
412 | In Rust you might express this as:
413 |
414 | ```rust
415 | trait Edible {
416 | };
417 |
418 | struct Dog;
419 |
420 | impl Dog {
421 | fn eat(edible: impl Edible) {
422 | // ...
423 | }
424 | }
425 |
426 | struct Dogfood;
427 | struct DeliveryPerson;
428 |
429 | impl Edible for Dogfood {}
430 | impl Edible for DeliveryPerson {}
431 | ```
432 |
433 | This gives your caller all the convenience they want, though may increase
434 | work for you as the implementer.
435 |
436 | ## I miss operator overloading! What do I do?
437 |
438 | Implement the standard traits instead (for example `PartialEq`, `Add`). This
439 | has equivalent effect in that folks will be able to use your type in a standard
440 | Rusty way without knowing too much special about your type.
441 |
442 | ## Should I return an error, or panic?
443 |
444 | Panics should be used only for invariants, never for anything that you believe
445 | might happen. That's especially true [for libraries](https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell)
446 | - panicking (or asserting) should be reserved for the 'top level' code driving
447 | the application.
448 |
449 | > Libraries which panic are super-rude and I hate them - MY
450 |
451 | Even in your own application code, panicking might not be wise:
452 |
453 | > Panicking in application logic for recoverable errors makes it way harder to librarify some code - AP
454 |
455 | If you really must have an API which can panic, add a `try_` equivalent too.
456 |
457 | ## What should my error type be?
458 |
459 | [Rust's `Result` type](https://doc.rust-lang.org/std/result/) is parameterized
460 | over an error type. What should you use?
461 |
462 | For app code, consider [anyhow](https://docs.rs/anyhow/). For library code,
463 | use your own `enum` of error conditions - you can use [thiserror](https://docs.rs/thiserror/)
464 | to make this more pleasant.
465 |
466 | ## When should I take or return `dyn Trait`?
467 |
468 | In either C++ or Rust, you can choose between monomorphization (that is, building
469 | code multiple times for each permutation of parameter types) or dynamic dispatch (i.e.
470 | looking up the correct implementation using vtables).
471 |
472 | In C++ the syntax is completely different - templates vs virtual functions.
473 | In Rust the syntax is almost identical - in some cases it's as simple as
474 | exchanging the `impl` keyword with the `dyn` keyword.
475 |
476 | Given this flexibility to switch strategies, which should you start with?
477 |
478 | In both languages, monomorphization tends to result in a quicker program (partly
479 | due to better inlining). It's arguably true that inlining is more important in
480 | Rust, due to its functional nature and pervasive use of iterators. Whether or
481 | not that's the reason, experienced Rustaceans usually start with `impl`:
482 |
483 | > It's best practice to start with monomorphization and move to `dyn`... - MG
484 |
485 | The main cost of monomorphization is larger binaries. There are cases where
486 | large amounts of code can end up being duplicated (the marvellous [serde](https://serde.rs/)
487 | is one).
488 |
489 | You _can_ choose to do things the other way round:
490 |
491 | > ... it’s workable practice to start with `dyn` and then move to `impl` when you have problems. - MG
492 |
493 | `dyn` can be awkward, and potentially expensive in different ways:
494 |
495 | > One thing to note about pervasive `dyn` is that because it unsizes the types it wraps, you need to box it if you want to store it by value. You end up with a good bit more allocator pressure if you try to have `dyn` field types. - AP
496 |
497 | ## `<'a>`I seem to have lots of named lifetimes. Am `<'b>`I doing something wrong?
498 |
499 | Some say that if you have a significant number of named lifetimes, you're
500 | overcomplicating things.
501 |
502 | There are some scenarios where multiple named lifetimes make perfect sense - for example
503 | if you're dealing with an arena, or major phases of a process (the Rust compiler
504 | has `'gcx` and `'tcx` lifetimes relating to the output of certain compile phases.)
505 |
506 | But otherwise, it may be that you've got lifetimes because you're trying _too
507 | hard_ to avoid a copy. You may be better off simply switching to runtime
508 | checking (e.g. `Rc`, `Arc`) or even cloning.
509 |
510 | Are named lifetimes even a "code smell"?
511 |
512 | > My experience has been that the extent to which they're a smell varies a good bit based on the programmer's experience level, which has led me towards increased skepticism over time. Lots of people learning Rust have experienced the pain of first not wanting to `.clone()` something, immediately putting lifetimes everywhere, and then feeling the pain of lifetime subtyping and variance. I don't think they're nearly as odorous as unsafe, for example, but treating them as a bit of a smell does I think lead to code that's easier to read for a newcomer and to refactor around the stack. - AP
513 |
--------------------------------------------------------------------------------
/src/types.md:
--------------------------------------------------------------------------------
1 | # Questions about your types
2 |
3 |
4 |
5 | ## My 'class' needs mutable references to other things to do its job. Other classes need mutable references to these things too. What do I do?
6 |
7 | It's common in C++ to have a class that contain mutable references to other
8 | objects; the class mutates those objects to do its work. Often, there
9 | are several classes that all hold a mutable reference to the same object. Here
10 | is a diagram that illustrates this:
11 |
12 | ```mermaid
13 | flowchart LR
14 | subgraph Shared functionality
15 | important[Important Shared Object]
16 | end
17 | subgraph ObjectA
18 | methodA[Method]
19 | refa[Mutable Reference]-->important
20 | methodA-. Acts on shared object.->important
21 | end
22 | subgraph ObjectB
23 | refb[Mutable Reference]-->important
24 | methodB[Method]
25 | methodB-. Acts on shared object.->important
26 | end
27 | main --> ObjectA
28 | main --> ObjectB
29 | main-. Calls .-> methodA
30 | main-. Calls .-> methodB
31 | ```
32 |
33 | In Rust, you can't have multiple mutable references to a shared object, so what
34 | do you do?
35 |
36 | First of all, consider moving behavior out of your types. (See
37 | [the answer about the observer pattern](./codebase.md#the-c-observer-pattern-is-hard-in-rust-what-to-do) and especially
38 | [the second option described there](./codebase.md#option-2-drive-the-objects-from-the-code-not-the-other-way-round).)
39 |
40 | Even in Rust, though, it's still often the best choice to make complex behavior
41 | part of the type within `impl` blocks. You can still do that - but don't
42 | _store_ references. Instead, pass them into each function call.
43 |
44 | ```mermaid
45 | flowchart LR
46 | subgraph Shared functionality
47 | important[Important Shared Object]
48 | end
49 | subgraph ObjectA
50 | methodA[Method]
51 | methodA-. Acts on shared object.->important
52 | end
53 | subgraph ObjectB
54 | methodB[Method]
55 | methodB-. Acts on shared object.->important
56 | end
57 | main --> ObjectA
58 | main --> ObjectB
59 | main --> important
60 | main-. Passes reference to shared object.-> methodA
61 | main-. Passes reference to shared object.-> methodB
62 | ```
63 |
64 | Instead of this:
65 |
66 | ```rust
67 | # struct ImportantSharedObject;
68 | # struct ObjectA<'a> {
69 | # important_shared_object: &'a mut ImportantSharedObject,
70 | # }
71 | # impl<'a> ObjectA<'a> {
72 | # fn new(important_shared_object: &'a mut ImportantSharedObject) -> Self {
73 | # Self {
74 | # important_shared_object
75 | # }
76 | # }
77 | # fn do_something(&mut self) {
78 | # // act on self.important_shared_object
79 | # }
80 | # }
81 | fn main() {
82 | let mut shared_thingy = ImportantSharedObject;
83 | let mut a = ObjectA::new(&mut shared_thingy);
84 | a.do_something(); // acts on shared_thingy
85 | }
86 | ```
87 |
88 | Do this:
89 |
90 | ```rust
91 | # struct ImportantSharedObject;
92 | # struct ObjectA;
93 | # impl ObjectA {
94 | # fn new() -> Self {
95 | # Self
96 | # }
97 | # fn do_something(&mut self, important_shared_object: &mut ImportantSharedObject) {
98 | # // act on important_shared_object
99 | # }
100 | # }
101 | fn main() {
102 | let mut shared_thingy = ImportantSharedObject;
103 | let mut a = ObjectA::new();
104 | a.do_something(&mut shared_thingy); // acts on shared_thingy
105 | }
106 | ```
107 |
108 | (Happily this also gets rid of named lifetime parameters.)
109 |
110 | If you have a hundred such shared objects, you probably don't want a
111 | hundred function parameters. So it's usual to bundle them up into
112 | a context structure which can be passed into each function call:
113 |
114 | ```rust
115 | # struct ImportantSharedObject;
116 | # struct AnotherImportantObject;
117 | struct Ctx<'a> {
118 | important_shared_object: &'a mut ImportantSharedObject,
119 | another_important_object: &'a mut AnotherImportantObject,
120 | }
121 |
122 | # struct ObjectA;
123 | # impl ObjectA {
124 | # fn new() -> Self {
125 | # Self
126 | # }
127 | # fn do_something(&mut self, ctx: &mut Ctx) {
128 | # // act on ctx.important_shared_object and ctx.another_important_thing
129 | # }
130 | # }
131 | fn main() {
132 | let mut shared_thingy = ImportantSharedObject;
133 | let mut another_thingy = AnotherImportantObject;
134 | let mut ctx = Ctx {
135 | important_shared_object: &mut shared_thingy,
136 | another_important_object: &mut another_thingy,
137 | };
138 | let mut a = ObjectA::new();
139 | a.do_something(&mut ctx); // acts on both the shared thingies
140 | }
141 | ```
142 |
143 | ```mermaid
144 | flowchart LR
145 | subgraph Shared functionality
146 | important[Important Shared Object]
147 | end
148 | subgraph Context
149 | refa[Mutable Reference]-->important
150 | end
151 | subgraph ObjectA
152 | objectA[Object A]
153 | methodA[Method]
154 | methodA-. Acts on shared object.->important
155 | end
156 | subgraph ObjectB
157 | objectB[Object B]
158 | methodB[Method]
159 | methodB-. Acts on shared object.->important
160 | end
161 | main --> objectA
162 | main --> objectB
163 | main --> Context
164 | main-. Passes context.-> methodA
165 | main-. Passes context.-> methodB
166 | ```
167 |
168 | Even simpler: just put all the data directly into `Ctx`. But the key point
169 | is that this context object is passed around into just about all function calls
170 | rather than being stored anywhere, thus negating any borrowing/lifetime concerns.
171 |
172 | This pattern can be seen in [bindgen](https://github.com/rust-lang/rust-bindgen/blob/271eeb0782d34942267ceabcf5f1cf118f0f5842/src/ir/context.rs#L308),
173 | for example.
174 |
175 | > Split out borrowing concerns from the object concerns. - MG
176 |
177 | To generalize this idea, try to avoid storing references to anything that might
178 | need to be changed. Instead take those things as parameters. For instance
179 | `petgraph` [takes the entire graph as context to a `Walker` object](https://docs.rs/petgraph/0.6.0/petgraph/visit/trait.Walker.html),
180 | such that the graph can be changed while you're walking it.
181 |
182 | ## My type needs to store arbitrary user data. What do I do instead of `void *`?
183 |
184 | Ideally, your type would know all possible types of user data that it could store.
185 | You'd represent this as an `enum` with variant data for each possibility. This
186 | would give complete compile-time type safety.
187 |
188 | But sometimes code needs to store data for which it can't depend upon
189 | the definition: perhaps it's defined by a totally different area of the
190 | codebase, or belongs to clients. Such possibilities can't be enumerated in
191 | advance. Until recently, the only real option in C++ was to use a `void *`
192 | and have clients downcast to get their original type back. Modern C++ offers
193 | a much better option, `std::any`; if you've come across that, Rust's equivalent
194 | will seem very familiar.
195 |
196 | In Rust, the [`Any`](https://doc.rust-lang.org/std/any/trait.Any.html) type
197 | allows you to store _anything_ and retrieve it later in a type-safe fashion:
198 |
199 | ```rust
200 | use std::any::Any;
201 |
202 | struct MyTypeOfUserData(u8);
203 |
204 | fn main() {
205 | let any_user_data: Box = Box::new(MyTypeOfUserData(42));
206 | let stored_value = any_user_data.downcast_ref::().unwrap().0;
207 | println!("{}", stored_value);
208 | }
209 | ```
210 |
211 | If you want to be more prescriptive about what can be stored, you can define
212 | a trait (let's call it `UserData`) and store a `Box`.
213 | Your trait should have a method `fn as_any(&self) -> &dyn std::any::Any;`
214 | Each implementation can just return `self`.
215 |
216 | Your caller can then do this:
217 |
218 | ```rust
219 | trait UserData {
220 | fn as_any(&self) -> &dyn std::any::Any;
221 | // ...other trait methods which you wish to apply to any UserData...
222 | }
223 |
224 | struct MyTypeOfUserData(u8);
225 |
226 | impl UserData for MyTypeOfUserData {
227 | fn as_any(&self) -> &dyn std::any::Any { self }
228 | }
229 |
230 | fn main() {
231 | // Store a generic Box
232 | let user_data: Box = Box::new(MyTypeOfUserData(42));
233 | // Get back to a specific type
234 | let stored_value = user_data.as_any().downcast_ref::().unwrap().0;
235 | println!("{}", stored_value);
236 | }
237 | ```
238 |
239 | Of course, enumerating all possible stored variants remains preferable such that the
240 | compiler helps you to avoid runtime panics.
241 |
242 | ## When should I put my data in a `Box`?
243 |
244 | In C++, you often need to box things for ownership reasons, whereas in Rust
245 | it's typically just a performance trade-off. It's arguably premature optimization
246 | to use boxes unless your profiling shows a lot of memcpy of that particular
247 | type (or, perhaps, the relevant [clippy lint](https://rust-lang.github.io/rust-clippy/v0.0.212/index.html#large_enum_variant)
248 | informs you that you have a problem.)
249 |
250 | > I never box things unless they're really big. - MG
251 |
252 | Another heuristic is if part of your data structure is very rarely filled,
253 | in which case you may wish to `Box` it to avoid incurring an overhead for all
254 | other instances of the type.
255 |
256 | ```rust
257 | # struct Humility; struct Talent; struct Ego;
258 | struct Popstar {
259 | ego: Ego,
260 | talent: Talent,
261 | humility: Option>,
262 | }
263 | # fn main() {}
264 | ```
265 |
266 | (This is one reason why people like using [anyhow](https://docs.rs/anyhow/latest/anyhow/)
267 | for their errors; it means the failure case in their `Result` enum is only
268 | a pointer wide.)
269 |
270 | Of course, Rust may require you to use a box:
271 |
272 | * if you need to `Pin` some data, typically for async Rust, or
273 | * if you otherwise have an infinitely sized data structure
274 |
275 | but as usual, the compiler will explain very nicely.
276 |
277 | ## Should I have public fields or accessor methods?
278 |
279 | The trade-offs are similar to C++ except that Rust's pattern-matching makes it
280 | very convenient to match on fields, so within a realm of code that you own you
281 | may bias towards having more public fields than you're used to. As with C++,
282 | this can give you a future compatibility burden.
283 |
284 | ## When should I use a newtype wrapper?
285 |
286 | The [newtype wrapper pattern](https://rust-unofficial.github.io/patterns/patterns/behavioural/newtype.html)
287 | uses Rust's type systems to enforce extra behavior without necessarily changing
288 | the underlying representation.
289 |
290 | ```rust
291 | # fn get_rocket_length() -> Inches { Inches(7) }
292 | struct Inches(u32);
293 | struct Centimeters(u32);
294 |
295 | fn build_mars_orbiter() {
296 | let rocket_length: Inches = get_rocket_length();
297 | // mate_to_orbiter(rocket_length); // does not compile because this takes cm
298 | }
299 | ```
300 |
301 | Other examples that have been used:
302 | * An IP address which is guaranteed not to be localhost;
303 | * Non-zero numbers;
304 | * IDs which are guaranteed to be unique
305 |
306 | Such new types typically need a lot of boilerplate, especially to implement
307 | the traits which users of your type would expect to find. On the other hand,
308 | they allow you to use Rust's type system to statically prevent logic bugs.
309 |
310 | A heuristic: if there are some invariants you'd be checking for at runtime,
311 | see if you can use a newtype wrapper to do it statically instead. Although it
312 | may be more code to start with, you'll [save the effort of finding and fixing
313 | logic bugs later](code.md#When-should-I-use-runtime-checks-vs-jumping-through-hoops-to-do-static-checks).
314 |
315 | ## How else can I use Rust's type system to avoid high-level logic bugs?
316 |
317 | Lots of ways:
318 |
319 | ### Zero-sized types.
320 |
321 | Also known as "ZSTs". These are types which occupy literally zero bytes, and
322 | so (generally) make no difference whatsoever to the code generated. But you
323 | can use them in the type system to enforce invariants at compile-time with
324 | no runtime check.
325 |
326 | For example, they're often used as capability tokens - you can statically
327 | prove that code exclusively has the right to do something.
328 |
329 | ```rust
330 | pub trait ValidationStatus {}
331 |
332 | mod validator {
333 | use self::super::{Bytecode, ValidationStatus};
334 | /// ZST marker to show that bytecode has been validated.
335 | // Private field ensures this can't be created outside this mod
336 | // but PhantomData means this is still zero-sized.
337 | pub struct BytecodeValidated(std::marker::PhantomData);
338 | pub fn validate_bytecode(code: Bytecode) -> Bytecode {
339 | // Do expensive validation operation here...
340 | Bytecode {
341 | validated: BytecodeValidated(std::marker::PhantomData),
342 | code: code.code
343 | }
344 | }
345 | impl ValidationStatus for BytecodeValidated {}
346 | }
347 |
348 | struct BytecodeNotValidated;
349 |
350 | impl ValidationStatus for BytecodeNotValidated {}
351 |
352 | pub struct Bytecode {
353 | validated: V,
354 | code: Vec,
355 | }
356 |
357 | fn run_bytecode(bytecode: &Bytecode) {
358 | // Compiler PROVES you validated it before you can run it. There are no
359 | // runtime branches involved.
360 | }
361 |
362 | fn get_unvalidated_bytecode() -> Bytecode {
363 | // ...
364 | # Bytecode {
365 | # validated: BytecodeNotValidated,
366 | # code: Vec::new()
367 | # }
368 | }
369 |
370 | fn main() {
371 | let bytecode = get_unvalidated_bytecode();
372 | // run_bytecode(bytecode); // does not compile
373 | let bytecode = validator::validate_bytecode(bytecode);
374 | run_bytecode(&bytecode);
375 | run_bytecode(&bytecode);
376 | }
377 | ```
378 |
379 | ZSTs can also be used to demonstrate _exclusive_ access to some resource.
380 |
381 | ```rust
382 | struct RobotArmAccessToken;
383 |
384 | fn move_arm(token: &mut RobotArmAccessToken, x: u32, y: u32, z: u32) {
385 | // ...
386 | }
387 |
388 | fn attach_car_door(token: &mut RobotArmAccessToken) {
389 | move_arm(token, 3, 4, 6);
390 | move_arm(token, 5, 3, 6);
391 | }
392 |
393 | fn install_windscreen(token: &mut RobotArmAccessToken) {
394 | move_arm(token, 7, 8, 2);
395 | move_arm(token, 1, 2, 3);
396 | }
397 |
398 | fn main() {
399 | let mut token = RobotArmAccessToken; // ensure only one exists
400 | attach_car_door(&mut token);
401 | install_windscreen(&mut token);
402 | }
403 | ```
404 |
405 | (The type system would prevent these operations happening in parallel.)
406 |
407 | ### Marker traits
408 |
409 | Indicate that a type meets certain invariants, so subsequent
410 | users of that type don't need to check at runtime. A common example is to
411 | indicate that a type is safe to serialize into some bytestream.
412 |
413 | ### Enums as state machines.
414 |
415 | Each enum variant is a state and stores data associated with that state. There
416 | simply is no possibility that the data can get out of sync with the state.
417 |
418 | ```rust
419 | enum ElectionState {
420 | RaisingDonations { amount_raised: u32 },
421 | DoingTVInterviews { interviews_done: u16 },
422 | Voting { votes_for_me: u64, votes_for_opponent: u64 },
423 | Elected,
424 | NotElected,
425 | };
426 | ```
427 |
428 | A more heavyweight approach here is to define types for each state, and
429 | allow valid state transitions by taking the previous state by-value and
430 | returning the next state by-value.
431 |
432 | ```rust
433 | struct Seed { water_available: u32 }
434 | struct Growing { water_available: u32, sun_available: u32 }
435 | struct Flowering;
436 | struct Dead;
437 |
438 | enum PlantState {
439 | Seed(Seed),
440 | Growing(Growing),
441 | Flowering(Flowering),
442 | Dead(Dead)
443 | }
444 |
445 | impl Seed {
446 | fn advance(self) -> PlantState {
447 | if self.water_available > 3 {
448 | PlantState::Growing(Growing { water_available: self.water_available, sun_available: 0 })
449 | } else {
450 | PlantState::Dead(Dead)
451 | }
452 | }
453 | }
454 |
455 | impl Growing {
456 | fn advance(self) -> PlantState {
457 | if self.water_available > 3 && self.sun_available > 3 {
458 | PlantState::Flowering(Flowering)
459 | } else {
460 | PlantState::Dead(Dead)
461 | }
462 | }
463 | }
464 |
465 | impl Flowering {
466 | fn advance(self) -> PlantState {
467 | PlantState::Dead(Dead)
468 | }
469 | }
470 |
471 | impl Dead {
472 | fn advance(self) -> PlantState {
473 | PlantState::Dead(Dead)
474 | }
475 | }
476 |
477 | impl PlantState {
478 | fn advance(self) -> Self {
479 | match self {
480 | Self::Seed(seed) => seed.advance(),
481 | Self::Growing(growing) => growing.advance(),
482 | Self::Flowering(flowering) => flowering.advance(),
483 | Self::Dead(dead) => dead.advance(),
484 | }
485 | }
486 | }
487 |
488 | // we should probably find a way to inject some sun and water into this
489 | // state machine or things are not looking rosy
490 | ```
491 |
492 | ## What should I do instead of inheritance?
493 |
494 | Use [composition](https://en.wikipedia.org/wiki/Composition_over_inheritance).
495 | Sometimes this results in more boilerplate, but it avoids a raft of complexity.
496 |
497 | Specifically, for example:
498 | * you might include the "superclass" struct as a member of the subclass
499 | struct;
500 | * you might use an enum with different variants for the different possible
501 | "subclasses".
502 |
503 | Usually the answer is obvious: it's unlikely that your Rust code is structured
504 | in such a way that inheritance would be a good fit anyway.
505 |
506 | > I've only missed inheritance when actually _implementing_ languages which
507 | > themselves have inheritance. - MG
508 |
509 | ## I need a list of nodes which can refer to one another. How?
510 |
511 | You can't easily do self-referential data structures in Rust. The usual
512 | workaround is to [use an
513 | arena](https://manishearth.github.io/blog/2021/03/15/arenas-in-rust/) and
514 | replace references from one node to another with node IDs.
515 |
516 | An arena is typically a `Vec` (or similar), and the node IDs are a newtype
517 | wrapper around a simple integer index.
518 |
519 | Obviously, Rust doesn't check that your node IDs are valid. If you don't have
520 | proper references, what stops you from having stale IDs?
521 |
522 | Arenas are often purely additive, which means that you can add entries but not
523 | delete them
524 | ([example](https://github.com/Manishearth/elsa/blob/master/examples/mutable_arena.rs)).
525 | If you must have an arena which deletes things, then use generational IDs; see
526 | the [generational-arena](https://docs.rs/generational-arena/) crate and this
527 | [RustConf keynote](https://www.youtube.com/watch?v=aKLntZcp27M) for more
528 | details.
529 |
530 | If arenas still sound like a nasty workaround, consider that you might choose
531 | an arena anyway for other reasons:
532 |
533 | * All of the objects in the arena will be freed at the end of the arena's
534 | lifetime, instead of during their manipulation, which can give very low
535 | latency for some use-cases. [Bumpalo](https://docs.rs/bumpalo/3.6.1/bumpalo/)
536 | formalizes this.
537 | * The rest of your program might have real Rust references into the arena. You
538 | can give the arena a named lifetime (`'arena` for example), making the
539 | provenance of those references very clear.
540 |
541 | ## I'm having a miserable time making my data structure. Should I use unsafe?
542 |
543 | Low-level data structures are hard in Rust, especially if they're self-
544 | referential. Rust will make visible all sorts of risks of ownership and
545 | shared mutable state which may not be visible in other languages, and
546 | they're hard to solve in low-level data structure code.
547 |
548 | Even something as simple as a doubly-linked list is notoriously hard; so much so
549 | that there is a [book that teaches Rust based solely on linked lists](https://rust-unofficial.github.io/too-many-lists/).
550 | As that (wonderful) book makes clear, you are often faced with a choice:
551 |
552 | * [Use safe Rust, but shift compile-time checks to runtime](https://rust-unofficial.github.io/too-many-lists/fourth.html)
553 | * [Use `unsafe`](https://rust-unofficial.github.io/too-many-lists/fifth.html) and
554 | take the same degree of care you'd take in C or C++. And, just like in C or C++,
555 | you'll introduce [security vulnerabilities in the unsafe code](https://www.cvedetails.com/vulnerability-list/vendor_id-19029/product_id-48677/Rust-lang-Rust.html).
556 |
557 | If you're facing this decision... perhaps there's a third way.
558 |
559 | You should almost always be using somebody else's tried-and-tested
560 | data structure.
561 |
562 | [petgraph](https://docs.rs/petgraph) and
563 | [slotmap](https://docs.rs/slotmap) are great examples. Use someone else's crate
564 | by default, and resort to writing your own only if you exhaust that option.
565 |
566 | C++ makes it hard to pull in third-party dependencies, so it's culturally normal
567 | to write new code. Rust makes it trivial to add dependencies, and so you will
568 | need to do that, even if it feels surprising for a C++ programmer.
569 |
570 | This ease of adding dependencies co-evolved with the
571 | difficulty of making data structures. It's simply a part of programming in Rust.
572 | You just can't separate the language and the ecosystem.
573 |
574 | You might argue that this dependency on third-party crates is concerning
575 | from a supply-chain security point of view. Your author would agree, but
576 | it's just the way you do things in Rust. Stop creating your own data structures.
577 |
578 | Then again:
579 |
580 | > it’s equally miserable to implement performant, low-level data structures in
581 | > C++; you’ll be specializing on lots of things like is_trivially_movable etc. - MY.
582 |
583 | ## I nevertheless have to write my own data structure. Should I use unsafe?
584 |
585 | I'm sorry to hear that.
586 |
587 | Some suggestions:
588 |
589 | * Use `Rc`, weak etc. until you really can't.
590 | * Even if you can't use a pre-existing crate for the whole data structure,
591 | perhaps you can use a crate to avoid the `unsafe` bits (for example
592 | [rental](https://docs.rs/rental/latest/rental/))
593 | * Bear in mind that refactoring Rust is generally safer than refactoring
594 | C++ (because the compiler will point out a higher proportion of your
595 | mistakes) so a wise strategy might be to start with a fully-safe, but slow,
596 | version, establish solid tests, and then [reach for unsafe](https://doc.rust-lang.org/nomicon/).
597 |
598 |
--------------------------------------------------------------------------------