├── 0000-nonlexical-lifetimes.md └── README.md /0000-nonlexical-lifetimes.md: -------------------------------------------------------------------------------- 1 | - Feature Name: (fill me in with a unique ident, my_awesome_feature) 2 | - Start Date: (fill me in with today's date, YYYY-MM-DD) 3 | - RFC PR: (leave this empty) 4 | - Rust Issue: (leave this empty) 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | Extend Rust's borrow system to support **non-lexical lifetimes** -- 10 | these are lifetimes that are based on the control-flow graph, rather 11 | than lexical scopes. The RFC describes in detail how to infer these 12 | new, more flexible regions, and also describes how to adjust our error 13 | messages. The RFC also describes a few other extensions to the borrow 14 | checker, the total effect of which is to eliminate many common cases 15 | where small, function-local code modifications would be required to pass the 16 | borrow check. (The appendix describes some of the remaining 17 | borrow-checker limitations that are not addressed by this RFC.) 18 | 19 | # Motivation 20 | [motivation]: #motivation 21 | 22 | ## What is a lifetime? 23 | 24 | The basic idea of the borrow checker is that values may not be mutated 25 | or moved while they are borrowed, but how do we know whether a value 26 | is borrowed? The idea is quite simple: whenever you create a borrow, 27 | the compiler assigns the resulting reference a **lifetime**. This 28 | lifetime corresponds to the span of the code where the reference may 29 | be used. The compiler will infer this lifetime to be the smallest 30 | lifetime that it can have that still encompasses all the uses of the 31 | reference. 32 | 33 | Note that Rust uses the term lifetime in a very particular way. In 34 | everyday speech, the word lifetime can be used in two distinct -- but 35 | similar -- ways: 36 | 37 | 1. The lifetime of a **reference**, corresponding to the span of time in 38 | which that reference is **used**. 39 | 2. The lifetime of a **value**, corresponding to the span of time 40 | before that value gets **freed** (or, put another way, before the 41 | destructor for the value runs). 42 | 43 | This second span of time, which describes how long a value is valid, 44 | is very important. To distinguish the two, we refer to that 45 | second span of time as the value's **scope**. Naturally, lifetimes and 46 | scopes are linked to one another. Specifically, if you make a 47 | reference to a value, the lifetime of that reference cannot outlive 48 | the scope of that value. Otherwise, your reference would be pointing 49 | into freed memory. 50 | 51 | To better see the distinction between lifetime and scope, let's 52 | consider a simple example. In this example, the vector `data` is 53 | borrowed (mutably) and the resulting reference is passed to a function 54 | `capitalize`. Since `capitalize` does not return the reference back, 55 | the *lifetime* of this borrow will be confined to just that call. The 56 | *scope* of data, in contrast, is much larger, and corresponds to a 57 | suffix of the fn body, stretching from the `let` until the end of the 58 | enclosing scope. 59 | 60 | ```rust 61 | fn foo() { 62 | let mut data = vec!['a', 'b', 'c']; // --+ 'scope 63 | capitalize(&mut data[..]); // | 64 | // ^~~~~~~~~~~~~~~~~~~~~~~~~ 'lifetime // | 65 | data.push('d'); // | 66 | data.push('e'); // | 67 | data.push('f'); // | 68 | } // <---------------------------------------+ 69 | 70 | fn capitalize(data: &mut [char]) { 71 | // do something 72 | } 73 | ``` 74 | 75 | This example also demonstrates something else. Lifetimes in Rust today 76 | are quite a bit more flexible than scopes (if not as flexible as we 77 | might like, hence this RFC): 78 | 79 | - A scope generally corresponds to some block (or, more specifically, 80 | a *suffix* of a block that stretches from the `let` until the end of 81 | the enclosing block) \[[1](#temporaries)\]. 82 | - A lifetime, in contrast, can also span an individual expression, as 83 | this example demonstrates. The lifetime of the borrow in the example 84 | is confined to just the call to `capitalize`, and doesn't extend 85 | into the rest of the block. This is why the calls to `data.push` 86 | that come below are legal. 87 | 88 | So long as a reference is only used within one statement, today's 89 | lifetimes are typically adequate. Problems arise however when you have 90 | a reference that spans multiple statements. In that case, the compiler 91 | requires the lifetime to be the innermost expression (which is often a 92 | block) that encloses both statements, and that is typically much 93 | bigger than is really necessary or desired. Let's look at some example 94 | problem cases. Later on, we'll see how non-lexical lifetimes fix these 95 | cases. 96 | 97 | ## Problem case #1: references assigned into a variable 98 | 99 | One common problem case is when a reference is assigned into a 100 | variable. Consider this trivial variation of the previous example, 101 | where the `&mut data[..]` slice is not passed directly to 102 | `capitalize`, but is instead stored into a local variable: 103 | 104 | ```rust 105 | fn bar() { 106 | let mut data = vec!['a', 'b', 'c']; 107 | let slice = &mut data[..]; // <-+ 'lifetime 108 | capitalize(slice); // | 109 | data.push('d'); // ERROR! // | 110 | data.push('e'); // ERROR! // | 111 | data.push('f'); // ERROR! // | 112 | } // <------------------------------+ 113 | ``` 114 | 115 | The way that the compiler currently works, assigning a reference into 116 | a variable means that its lifetime must be as large as the entire 117 | scope of that variable. In this case, that means the lifetime is now 118 | extended all the way until the end of the block. This in turn means 119 | that the calls to `data.push` are now in error, because they occur 120 | during the lifetime of `slice`. It's logical, but it's annoying. 121 | 122 | In this particular case, you could resolve the problem by putting 123 | `slice` into its own block: 124 | 125 | ```rust 126 | fn bar() { 127 | let mut data = vec!['a', 'b', 'c']; 128 | { 129 | let slice = &mut data[..]; // <-+ 'lifetime 130 | capitalize(slice); // | 131 | } // <------------------------------+ 132 | data.push('d'); // OK 133 | data.push('e'); // OK 134 | data.push('f'); // OK 135 | } 136 | ``` 137 | 138 | Since we introduced a new block, the scope of `slice` is now smaller, 139 | and hence the resulting lifetime is smaller. Introducing a 140 | block like this is kind of artificial and also not an entirely obvious 141 | solution. 142 | 143 | ## Problem case #2: conditional control flow 144 | 145 | Another common problem case is when references are used in only one 146 | given match arm (or, more generally, one control-flow path). This most 147 | commonly arises around maps. Consider this function, which, given some 148 | `key`, processes the value found in `map[key]` if it exists, or else 149 | inserts a default value: 150 | 151 | ```rust 152 | fn process_or_default() { 153 | let mut map = ...; 154 | let key = ...; 155 | match map.get_mut(&key) { // -------------+ 'lifetime 156 | Some(value) => process(value), // | 157 | None => { // | 158 | map.insert(key, V::default()); // | 159 | // ^~~~~~ ERROR. // | 160 | } // | 161 | } // <------------------------------------+ 162 | } 163 | ``` 164 | 165 | This code will not compile today. The reason is that the `map` is 166 | borrowed as part of the call to `get_mut`, and that borrow must 167 | encompass not only the call to `get_mut`, but also the `Some` branch 168 | of the match. The innermost expression that encloses both of these 169 | expressions is the match itself (as depicted above), and hence the 170 | borrow is considered to extend until the end of the 171 | match. Unfortunately, the match encloses not only the `Some` branch, 172 | but also the `None` branch, and hence when we go to insert into the 173 | map in the `None` branch, we get an error that the `map` is still 174 | borrowed. 175 | 176 | This *particular* example is relatively easy to workaround. In many cases, 177 | one can move the code for `None` out from the `match` like so: 178 | 179 | ```rust 180 | fn process_or_default1() { 181 | let mut map = ...; 182 | let key = ...; 183 | match map.get_mut(&key) { // -------------+ 'lifetime 184 | Some(value) => { // | 185 | process(value); // | 186 | return; // | 187 | } // | 188 | None => { // | 189 | } // | 190 | } // <------------------------------------+ 191 | map.insert(key, V::default()); 192 | } 193 | ``` 194 | 195 | When the code is adjusted this way, the call to `map.insert` is not 196 | part of the match, and hence it is not part of the borrow. While this 197 | works, it is unfortunate to require these sorts of 198 | manipulations, just as it was when we introduced an artificial block 199 | in the previous example. 200 | 201 | ## Problem case #3: conditional control flow across functions 202 | 203 | While we were able to work around problem case #2 in a relatively 204 | simple, if irritating, fashion, there are other variations of 205 | conditional control flow that cannot be so easily resolved. This is 206 | particularly true when you are returning a reference out of a 207 | function. Consider the following function, which returns the value for 208 | a key if it exists, and inserts a new value otherwise (for the 209 | purposes of this section, assume that the `entry` API for maps does 210 | not exist): 211 | 212 | ```rust 213 | fn get_default<'r,K,V:Default>(map: &'r mut HashMap, 214 | key: K) 215 | -> &'r mut V { 216 | match map.get_mut(&key) { // -------------+ 'r 217 | Some(value) => value, // | 218 | None => { // | 219 | map.insert(key, V::default()); // | 220 | // ^~~~~~ ERROR // | 221 | map.get_mut(&key).unwrap() // | 222 | } // | 223 | } // | 224 | } // v 225 | ``` 226 | 227 | At first glance, this code appears quite similar to the code we saw 228 | before, and indeed, just as before, it will not compile. In fact, 229 | the lifetimes at play are quite different. The reason is that, in the 230 | `Some` branch, the value is being **returned out** to the caller. 231 | Since `value` is a reference into the map, this implies that the `map` 232 | will remain borrowed **until some point in the caller** (the point 233 | `'r`, to be exact). To get a better intuition for what this lifetime 234 | parameter `'r` represents, consider some hypothetical caller of 235 | `get_default`: the lifetime `'r` then represents the span of code in 236 | which that caller will use the resulting reference: 237 | 238 | ```rust 239 | fn caller() { 240 | let mut map = HashMap::new(); 241 | ... 242 | { 243 | let v = get_default(&mut map, key); // -+ 'r 244 | // +-- get_default() -----------+ // | 245 | // | match map.get_mut(&key) { | // | 246 | // | Some(value) => value, | // | 247 | // | None => { | // | 248 | // | .. | // | 249 | // | } | // | 250 | // +----------------------------+ // | 251 | process(v); // | 252 | } // <--------------------------------------+ 253 | ... 254 | } 255 | ``` 256 | 257 | If we attempt the same workaround for this case that we tried 258 | in the previous example, we will find that it does not work: 259 | 260 | ```rust 261 | fn get_default1<'r,K,V:Default>(map: &'r mut HashMap, 262 | key: K) 263 | -> &'r mut V { 264 | match map.get_mut(&key) { // -------------+ 'r 265 | Some(value) => return value, // | 266 | None => { } // | 267 | } // | 268 | map.insert(key, V::default()); // | 269 | // ^~~~~~ ERROR (still) | 270 | map.get_mut(&key).unwrap() // | 271 | } // v 272 | ``` 273 | 274 | Whereas before the lifetime of `value` was confined to the match, this 275 | new lifetime extends out into the caller, and therefore the borrow 276 | does not end just because we exited the match. Hence it is still in 277 | scope when we attempt to call `insert` after the match. 278 | 279 | The workaround for this problem is a bit more involved. It relies on 280 | the fact that the borrow checker uses the precise control-flow of the 281 | function to determine which borrows are in scope. 282 | 283 | ```rust 284 | fn get_default2<'r,K,V:Default>(map: &'r mut HashMap, 285 | key: K) 286 | -> &'r mut V { 287 | if map.contains(&key) { 288 | // ^~~~~~~~~~~~~~~~~~ 'n 289 | return match map.get_mut(&key) { // + 'r 290 | Some(value) => value, // | 291 | None => unreachable!() // | 292 | }; // v 293 | } 294 | 295 | // At this point, `map.get_mut` was never 296 | // called! (As opposed to having been called, 297 | // but its result no longer being in use.) 298 | map.insert(key, V::default()); // OK now. 299 | map.get_mut(&key).unwrap() 300 | } 301 | ``` 302 | 303 | What has changed here is that we moved the call to `map.get_mut` 304 | inside of an `if`, and we have set things up so that the if body 305 | unconditionally returns. What this means is that a borrow begins at 306 | the point of `get_mut`, and that borrow lasts until the point `'r` in 307 | the caller, but the borrow checker can see that this borrow *will not 308 | have even started* outside of the `if`. It does not consider the 309 | borrow in scope at the point where we call `map.insert`. 310 | 311 | This workaround is more troublesome than the others, because the 312 | resulting code is actually less efficient at runtime, since it must do 313 | multiple lookups. 314 | 315 | It's worth noting that Rust's hashmaps include an `entry` API that 316 | one could use to implement this function today. The resulting code is 317 | both nicer to read and more efficient even than the original version, 318 | since it avoids extra lookups on the "not present" path as well: 319 | 320 | ```rust 321 | fn get_default3<'r,K,V:Default>(map: &'r mut HashMap, 322 | key: K) 323 | -> &'r mut V { 324 | map.entry(key) 325 | .or_insert_with(|| V::default()) 326 | } 327 | ``` 328 | 329 | Regardless, the problem exists for other data structures besides 330 | `HashMap`, so it would be nice if the original code passed the borrow 331 | checker, even if in practice using the `entry` API would be 332 | preferable. (Interestingly, the limitation of the borrow checker here 333 | was one of the motivations for developing the `entry` API in the first 334 | place!) 335 | 336 | ## Problem case #4: mutating `&mut` references 337 | 338 | The current borrow checker forbids reassigning an `&mut` variable `x` 339 | when the referent (`*x`) has been borrowed. This most commonly arises 340 | when writing a loop that progressively "walks down" a data structure. 341 | Consider this function, which converts a linked list `&mut List` 342 | into a `Vec<&mut T>`: 343 | 344 | ```rust 345 | struct List { 346 | value: T, 347 | next: Option>>, 348 | } 349 | 350 | fn to_refs(mut list: &mut List) -> Vec<&mut T> { 351 | let mut result = vec![]; 352 | loop { 353 | result.push(&mut list.value); 354 | if let Some(n) = list.next.as_mut() { 355 | list = &mut n; 356 | } else { 357 | return result; 358 | } 359 | } 360 | } 361 | ``` 362 | 363 | If we attempt to compile this, we get an error (actually, we get 364 | multiple errors): 365 | 366 | ``` 367 | error[E0506]: cannot assign to `list` because it is borrowed 368 | --> /Users/nmatsakis/tmp/x.rs:11:13 369 | | 370 | 9 | result.push(&mut list.value); 371 | | ---------- borrow of `list` occurs here 372 | 10 | if let Some(n) = list.next.as_mut() { 373 | 11 | list = n; 374 | | ^^^^^^^^ assignment to borrowed `list` occurs here 375 | ``` 376 | 377 | Specifically, what's gone wrong is that we borrowed `list.value` (or, 378 | more explicitly, `(*list).value`). The current borrow checker enforces 379 | the rule that when you borrow a path, you cannot assign to that path 380 | or any prefix of that path. In this case, that means you cannot assign to any 381 | of the following: 382 | 383 | - `(*list).value` 384 | - `*list` 385 | - `list` 386 | 387 | As a result, the `list = n` assignment is forbidden. These rules make 388 | sense in some cases (for example, if `list` were of type `List`, 389 | and not `&mut List`, then overwriting `list` would also overwrite 390 | `list.value`), but not in the case where we cross a mutable reference. 391 | 392 | As described in [Issue #10520][10520], there exist various workarounds 393 | for this problem. One trick is to move the `&mut` reference into a 394 | temporary variable that you won't have to modify: 395 | 396 | ```rust 397 | fn to_refs(mut list: &mut List) -> Vec<&mut T> { 398 | let mut result = vec![]; 399 | loop { 400 | let list1 = list; 401 | result.push(&mut list1.value); 402 | if let Some(n) = list1.next.as_mut() { 403 | list = &mut n; 404 | } else { 405 | return result; 406 | } 407 | } 408 | } 409 | ``` 410 | 411 | When you frame the program this way, the borrow checker sees that 412 | `(*list1).value` is borrowed (not `list`). This does not prevent us 413 | from later assigning to `list`. 414 | 415 | Clearly this workaround is annoying. The problem here, it turns out, 416 | is not specific to non-lexical lifetimes per se. Rather, it is that 417 | the rules which the borrow checker enforces when a path is borrowed 418 | are too strict and do not account for the indirection inherent in a 419 | borrowed reference. This RFC proposes a tweak to address that. 420 | 421 | ## The rough outline of our solution 422 | 423 | This RFC proposes a more flexible model for lifetimes. Whereas 424 | previously lifetimes were based on the abstract syntax tree, we now 425 | propose lifetimes that are defined via the control-flow graph. More 426 | specifically, lifetimes will be derived based on the [MIR][MIR-details] 427 | used internally in the compiler. 428 | 429 | [MIR-details]: https://blog.rust-lang.org/2016/04/19/MIR.html 430 | 431 | Intuitively, in the new proposal, the lifetime of a reference lasts 432 | only for those portions of the function in which the reference may 433 | later be used (where the reference is **live**, in compiler 434 | speak). This can range from a few sequential statements (as in problem 435 | case #1) to something more complex, such as covering one arm in a 436 | match but not the others (problem case #2). 437 | 438 | However, in order to sucessfully type the full range of examples that 439 | we would like, we have to go a bit further than just changing 440 | lifetimes to a portion of the control-flow graph. **We also have to 441 | take location into account when doing subtyping checks**. This is in 442 | contrast to how the compiler works today, where subtyping relations 443 | are "absolute". That is, in the current compiler, the type `&'a ()` is 444 | a subtype of the type `&'b ()` whenever `'a` outlives `'b` (`'a: 'b`), 445 | which means that `'a` corresponds to a bigger portion of the function. 446 | Under this proposal, subtyping can instead be established **at a 447 | particular point P**. In that case, the lifetime `'a` must only 448 | outlive those portions of `'b` that are reachable from P. 449 | 450 | The ideas in this RFC have been implemented in 451 | [prototype form][proto]. This prototype includes a simplified 452 | control-flow graph that allows one to create the various kinds of 453 | region constraints that can arise and implements the region inference 454 | algorithm which then solves those constraints. 455 | 456 | [proto]: https://github.com/nikomatsakis/nll 457 | 458 | # Detailed design 459 | [design]: #detailed-design 460 | 461 | ## Layering the design 462 | 463 | We describe the design in "layers": 464 | 465 | 1. Initially, we will describe a basic design focused on control-flow 466 | within one function. 467 | 2. Next, we extend the control-flow graph to better handle infinite loops. 468 | 3. Next, we extend the design to handle dropck, and specifically the 469 | `#[may_dangle]` attribute introduced by RFC 1327. 470 | 4. Next, we will extend the design to consider named lifetime parameters, 471 | like those in problem case 3. 472 | 5. Finally, we give a brief description of the borrow checker. 473 | 474 | ## Layer 0: Definitions 475 | 476 | Before we can describe the design, we have to define the terms that we 477 | will be using. The RFC is defined in terms of a simplified version of 478 | MIR, eliding various details that don't introduce fundamental 479 | complexity. 480 | 481 | **Lvalues**. A MIR "lvalue" is a path that leads to a memory location. 482 | The full MIR Lvalues are defined [via a Rust enum][lvaluecode] and 483 | contain a number of knobs, most of which are not relevant for this RFC. 484 | We will present a simplified form of lvalues for now: 485 | 486 | ``` 487 | LV = x // local variable 488 | | LV.f // field access 489 | | *LV // deref 490 | ``` 491 | 492 | The precedence of `*` is low, so `*a.b.c` will deref `a.b.c`; to deref 493 | just `a`, one would write `(*a).b.c`. 494 | 495 | **Prefixes.** We say that the prefixes of an lvalue are all the 496 | lvalues you get by stripping away fields and derefs. The prefixes 497 | of `*a.b` would be `*a.b`, `a.b`, and `a`. 498 | 499 | [lvaluecode]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L839-L851 500 | 501 | **Control-flow graph.** MIR is organized into a 502 | [control-flow graph][cfg] rather than an abstract syntax tree. It is 503 | created in the compiler by transforming the "HIR" (high-level IR). The 504 | MIR CFG consists of a set of [basic blocks][bbdata]. Each basic block 505 | has a series of [statements][stmt] and a 506 | [terminator][term]. Statements that concern us in this RFC fall into 507 | three categories: 508 | 509 | - assignments like `x = y`; the RHS of such an assignment is called an 510 | [rvalue][]. There are no compound rvalues, and hence each statement 511 | is a discrete action that executes instantaneously. For example, the 512 | Rust expression `a = b + c + d` would be compiled into two MIR 513 | instructions, like `tmp0 = b + c; a = tmp0 + d;`. 514 | - `drop(lvalue)` deallocates an lvalue, if there is a value in it; in the 515 | limit, this requires runtime checks (a pass in mir, called elaborate drops, 516 | performs this transformation). 517 | - `StorageDead(x)` deallocates the stack storage for `x`. These are used by LLVM to allow 518 | stack-allocated values to use the same stack slot (if their live storage ranges are disjoint). 519 | [Ralf Jung's recent blog post has more details.][rjung-sd] 520 | 521 | [rjung-sd]: https://www.ralfj.de/blog/2017/06/06/MIR-semantics.html 522 | [rvalue]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L1037-L1071 523 | [bbdata]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L443-L463 524 | [stmt]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L774-L814 525 | [term]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L465-L552 526 | [cfg]: https://en.wikipedia.org/wiki/Control_flow_graph 527 | 528 | ## Layer 1: Control-flow within a function 529 | 530 | ### Running Example 531 | 532 | We will explain the design with reference to a running example, called 533 | **Example 4**. After presenting the design, we will apply it to the three 534 | problem cases, as well as a number of other interesting examples. 535 | 536 | ```rust 537 | let mut foo: T = ...; 538 | let mut bar: T = ...; 539 | let p: &T; 540 | 541 | p = &foo; 542 | // (0) 543 | if condition { 544 | print(*p); 545 | // (1) 546 | p = &bar; 547 | // (2) 548 | } 549 | // (3) 550 | print(*p); 551 | // (4) 552 | ``` 553 | 554 | The key point of this example is that the variable `foo` should only 555 | be considered borrowed at points 0 and 3, but not point 1. `bar`, 556 | in contrast, should be considered borrowed at points 2 and 3. Neither 557 | of them need to be considered borrowed at point 4, as the reference `p` 558 | is not used there. 559 | 560 | We can convert this example into the control-flow graph that follows. 561 | Recall that a control-flow graph in MIR consists of basic blocks 562 | containing a list of discrete statements and a trailing terminator: 563 | 564 | ``` 565 | // let mut foo: i32; 566 | // let mut bar: i32; 567 | // let p: &i32; 568 | 569 | A 570 | [ p = &foo ] 571 | [ if condition ] ----\ (true) 572 | | | 573 | | B v 574 | | [ print(*p) ] 575 | | [ ... ] 576 | | [ p = &bar ] 577 | | [ ... ] 578 | | [ goto C ] 579 | | | 580 | +-------------/ 581 | | 582 | C v 583 | [ print(*p) ] 584 | [ return ] 585 | ``` 586 | 587 | We will use a notation like `Block/Index` to refer to a specific 588 | statement or terminate in the control-flow graph. `A/0` and `B/4` 589 | refer to `p = &foo` and `goto C`, respectively. 590 | 591 | ### What is a lifetime and how does it interact with the borrow checker 592 | 593 | To start with, we will consider lifetimes as a **set of points in the 594 | control-flow graph**; later in the RFC we will extend the domain of 595 | these sets to include "skolemized" lifetimes, which correspond to 596 | named lifetime parameters declared on a function. If a lifetime 597 | contains the point P, that implies that references with that lifetime 598 | are valid on entry to P. Lifetimes appear in various places in the MIR 599 | representation: 600 | 601 | - The types of variables (and temporaries, etc) may contain lifetimes. 602 | - Every borrow expression has a designated lifetime. 603 | 604 | We can extend our example 4 to include explicit lifetime names. There 605 | are three lifetimes that result. We will call them `'p`, `'foo`, and 606 | `'bar`: 607 | 608 | ```rust 609 | let mut foo: T = ...; 610 | let mut bar: T = ...; 611 | let p: &'p T; 612 | // -- 613 | p = &'foo foo; 614 | // ---- 615 | if condition { 616 | print(*p); 617 | p = &'bar bar; 618 | // ---- 619 | } 620 | print(*p); 621 | ``` 622 | 623 | As you can see, the lifetime `'p` is part of the type of the variable 624 | `p`. It indicates the portions of the control-flow graph where `p` can 625 | safely be dereferenced. The lifetimes `'foo` and `'bar` are different: 626 | they refer to the lifetimes for which `foo` and `bar` are borrowed, 627 | respectively. 628 | 629 | Lifetimes attached to a borrow expression, like `'foo` and `'bar`, are 630 | important to the borrow checker. Those correspond to the portions of 631 | the control-flow graph in which the borrow checker will enforce its 632 | restrictions. In this case, since both borrows are shared borrows 633 | (`&`), the borrow checker will prevent `foo` from being modified 634 | during `'foo` and it will prevent `bar` from being modified during 635 | `'bar`. If these had been mutable borrows (`&mut`), the borrow checker 636 | would have prevented **all** access to `foo` and `bar` during those 637 | lifetimes. 638 | 639 | There are many valid choices one could make for `'foo` and `'bar`. 640 | This RFC however describes an inference algorithm that aims to pick 641 | the **minimal** lifetimes for each borrow which could possibly work. 642 | This corresponds to imposing the fewest restrictions we can. 643 | 644 | In the case of example 4, therefore, we wish our algorithm to compute 645 | that `'foo` is `{A/1, B/0, C/0}`, which notably excludes the points B/1 646 | through B/4. `'bar` should be inferred to the set `{B/3, B/4, 647 | C/0}`. The lifetime `'p` will be the union of `'foo` and `'bar`, since 648 | it contains all the points where the variable `p` is valid. 649 | 650 | ### Lifetime inference constraints 651 | 652 | The inference algorithm works by analyzing the MIR and creating a 653 | series of **constraints**. These constraints obey the following 654 | grammar: 655 | 656 | ``` 657 | // A constraint set C: 658 | C = true 659 | | C, (L1: L2) @ P // Lifetime L1 outlives Lifetime L2 at point P 660 | 661 | // A lifetime L: 662 | L = 'a 663 | | {P} 664 | ``` 665 | 666 | Here the terminal `P` represents a point in the control-flow graph, 667 | and the notation `'a` refers to some named lifetime inference variable 668 | (e.g., `'p`, `'foo` or `'bar`). 669 | 670 | Once the constraints are created, the **inference algorithm** solves 671 | the constraints. This is done via fixed-point iteration: each 672 | lifetime variable begins as an empty set and we iterate over the 673 | constaints, repeatedly growing the lifetimes until they are big enough 674 | to satisfy all constraints. 675 | 676 | (If you'd like to compare this to the prototype code, the file 677 | [`regionck.rs`] is responsible for creating the constraints, and 678 | [`infer.rs`] is responsible for solving them.) 679 | 680 | [`regionck.rs`]: https://github.com/nikomatsakis/nll/blob/master/nll/src/regionck.rs 681 | [`infer.rs`]: https://github.com/nikomatsakis/nll/blob/master/nll/src/infer.rs 682 | 683 | ### Liveness 684 | 685 | One key ingredient to understanding how NLL should work is 686 | understanding **liveness**. The term "liveness" derives from compiler 687 | analysis, but it's fairly intuitive. We say that **a variable is live 688 | if the current value that it holds may be used later**. This is very 689 | important to Example 4: 690 | 691 | ```rust 692 | let mut foo: T = ...; 693 | let mut bar: T = ...; 694 | let p: &'p T = &foo; 695 | // `p` is live here: its value may be used on the next line. 696 | if condition { 697 | // `p` is live here: its value will be used on the next line. 698 | print(*p); 699 | // `p` is DEAD here: its value will not be used. 700 | p = &bar; 701 | // `p` is live here: its value will be used later. 702 | } 703 | // `p` is live here: its value may be used on the next line. 704 | print(*p); 705 | // `p` is DEAD here: its value will not be used. 706 | ``` 707 | 708 | Here you see a variable `p` that is assigned in the beginning of the 709 | program, and then maybe re-assigned during the `if`. The key point is 710 | that `p` becomes **dead** (not live) in the span before it is 711 | reassigned. This is true even though the variable `p` will be used 712 | again, because the **value** that is in `p` will not be used. 713 | 714 | Traditional compiler compute liveness based on variables, but we wish 715 | to compute liveness for **lifetimes**. We can extend a variable-based 716 | analysis to lifetimes by saying that a lifetime L is live at a point P 717 | if there is some variable `p` which is live at P, and L appears in the 718 | type of `p`. (Later on, when we cover the dropck, we will use a more 719 | selective notion of liveness for lifetimes in which *some* of the 720 | lifetimes in a variable's type may be live while others are not.) So, 721 | in our running example, the lifetime `'p` would be live at precisely 722 | the same points that `p` is live. The lifetimes `'foo` and `'bar` have 723 | no points where they are (directly) live, since they do not appear in 724 | the types of any variables. 725 | 726 | * However, this does not mean these lifetimes are irrelevant; as 727 | shown below, subtyping constraints introduced by subsequent 728 | analyses will eventually require `'foo` and `'bar` to *outlive* 729 | `'p`. 730 | 731 | #### Liveness-based constraints for lifetimes 732 | 733 | The first set of constraints that we generate are derived from 734 | liveness. Specifically, if a lifetime L is live at the point P, 735 | then we will introduce a constraint like: 736 | 737 | (L: {P}) @ P 738 | 739 | (As we'll see later when we cover solving constraints, this constraint 740 | effectively just inserts `P` into the set for `L`. In fact, the 741 | prototype doesn't bother to materialize such constraints, instead just 742 | immediately inserting `P` into `L`.) 743 | 744 | For our running example, this means that we would introduce the following 745 | liveness constraints: 746 | 747 | ('p: {A/1}) @ A/1 748 | ('p: {B/0}) @ B/0 749 | ('p: {B/3}) @ B/3 750 | ('p: {B/4}) @ B/4 751 | ('p: {C/0}) @ C/0 752 | 753 | ### Subtyping 754 | 755 | Whenever references are copied from one location to another, the Rust 756 | subtyping rules require that the lifetime of the source reference 757 | **outlives** the lifetime of the target location. As discussed 758 | earlier, in this RFC, we extend the notion of subtyping to be 759 | **location-aware**, meaning that we take into account the point where 760 | the value is being copied. 761 | 762 | For example, at the point A/0, our running example contains a borrow 763 | expression `p = &'foo foo`. In this case, the borrow expression will 764 | produce a reference of type `&'foo T`, where `T` is the type of 765 | `foo`. This value is then assigned to `p`, which has the type `&'p T`. 766 | Therefore, we wish to require that `&'foo T` be a subtype of `&'p T`. 767 | Moreover, this relation needs to hold at the point A/1 -- the 768 | **successor** of the point A/0 where the assignment occurs (this is 769 | because the new value of `p` is first visible in A/1). We write that 770 | subtyping constraint as follows: 771 | 772 | (&'foo T <: &'p T) @ A/1 773 | 774 | The standard Rust subtyping rules (two examples of which are given 775 | below) can then "break down" this subtyping rule into the lifetime 776 | constraints we need for inference: 777 | 778 | (T_a <: T_b) @ P 779 | ('a: 'b) @ P // <-- a constraint for our inference algorithm 780 | ------------------------ 781 | (&'a T_a <: &'b T_b) @ P 782 | 783 | (T_a <: T_b) @ P 784 | (T_b <: T_a) @ P // (&mut T is invariant) 785 | ('a: 'b) @ P // <-- another constraint 786 | ------------------------ 787 | (&'a mut T_a <: &'b mut T_b) @ P 788 | 789 | In the case of our running example, we generate the following subtyping 790 | constraints: 791 | 792 | (&'foo T <: &'p T) @ A/1 793 | (&'bar T <: &'p T) @ B/3 794 | 795 | These can be converted into the following lifetime constraints: 796 | 797 | ('foo: 'p) @ A/1 798 | ('bar: 'p) @ B/3 799 | 800 | ### Reborrow constraints 801 | 802 | There is one final source of constraints. It frequently happens that we 803 | have a borrow expression that "reborrows" the referent of an 804 | existing reference: 805 | 806 | let x: &'x i32 = ...; 807 | let y: &'y i32 = &*x; 808 | 809 | In such cases, there is a connection between the lifetime `'y` of the 810 | borrow and the lifetime `'x` of the original reference. In particular, 811 | `'x` must outlive `'y` (`'x: 'y`). In simple cases like this, the 812 | relationship is the same regardless of whether the original reference 813 | `x` is a shared (`&`) or mutable (`&mut`) reference. However, in more 814 | complex cases that involve multiple dereferences, the treatment is 815 | different. 816 | 817 | **Supporting prefixes.** To define the reborrow constraints, we first 818 | introduce the idea of supporting prefixes -- this definition will be 819 | useful in a few places. The *supporting prefixes* for an lvalue are 820 | formed by stripping away fields and derefs, except that we stop when 821 | we reach the deref of a shared reference. Inituitively, shared 822 | references are different because they are `Copy` -- and hence one 823 | could always copy the shared reference into a temporary and get an 824 | equivalent path. Here are some examples of supporting prefixes: 825 | 826 | ``` 827 | let r: (&(i32, i64), (f32, f64)); 828 | 829 | // The path (*r.0).1 has type `i64` and supporting prefixes: 830 | // - (*r.0).1 831 | // - *r.0 832 | 833 | // The path r.1.0 has type `f32` and supporting prefixes: 834 | // - r.1.0 835 | // - r.1 836 | // - r 837 | 838 | let m: (&mut (i32, i64), (f32, f64)); 839 | 840 | // The path (*m.0).1 has type `i64` and supporting prefixes: 841 | // - (*m.0).1 842 | // - *m.0 843 | // - m.0 844 | // - m 845 | ``` 846 | 847 | **Reborrow constraints.** Consider the case where we have a borrow 848 | (shared or mutable) of some lvalue `lv_b` for the lifetime `'b`: 849 | 850 | lv_l = &'b lv_b // or: 851 | lv_l = &'b mut lv_b 852 | 853 | In that case, we compute the supporting prefixes of `lv_b`, and find 854 | every deref lvalue `*lv` in the set where `lv` is a reference with 855 | lifetime `'a`. We then add a constraint `('a: 'b) @ P`, where `P` is 856 | the point following the borrow (that's the point where the borrow 857 | takes effect). 858 | 859 | Let's look at some examples. In each case, we will link to the 860 | corresponding test from the prototype implementation. 861 | 862 | [**Example 1.**][bck-rvwbi] To see why this rule is needed, let's 863 | first consider a simple example involving a single reference: 864 | 865 | [bck-rvwbi]: https://github.com/nikomatsakis/nll/blob/master/test/borrowck-read-variable-while-borrowed-indirect.nll 866 | 867 | ```rust 868 | let mut foo: i32 = 22; 869 | let r_a: &'a mut i32 = &'a mut foo; 870 | let r_b: &'b mut i32 = &'b mut *r_a; 871 | ... 872 | use(r_b); 873 | ``` 874 | 875 | In this case, the supporting prefixes of `*r_a` are `*r_a` and `r_a` 876 | (because `r_a` is a mutable reference, we recurse). Only one of those, 877 | `*r_a`, is a deref lvalue, and the reference `r_a` being dereferenced 878 | has the lifetime `'a`. We would add the constraint that `'a: 'b`, 879 | thus ensuring that `foo` is considered borrowed so long as `r_b` is in 880 | use. Without this constraint, the lifetime `'a` would end after the 881 | second borrow, and hence `foo` would be considered unborrowed, even 882 | though `*r_b` could still be used to access `foo`. 883 | 884 | [**Example 2.**][bck-wvare] Consider now a case with a double indirection: 885 | 886 | [bck-wvare]: https://github.com/nikomatsakis/nll/blob/master/test/borrowck-write-variable-after-ref-extracted.nll 887 | 888 | ```rust 889 | let mut foo: i32 = 22; 890 | let mut r_a: &'a i32 = &'a foo; 891 | let r_b: &'b &'a i32 = &'b r_a; 892 | let r_c: &'c i32 = &'c **r_b; 893 | // What is considered borrowed here? 894 | use(r_c); 895 | ``` 896 | 897 | Just as before, it is important that, so long as `r_c` is in use, 898 | `foo` is considered borrowed. However, what about the variable `r_a`: 899 | should *it* considered borrowed? The answer is no: once `r_c` is 900 | initialized, the value of `r_a` is no longer important, and it would 901 | be fine to (for example) overwrite `r_a` with a new value, even as 902 | `foo` is still considered borrowed. This result falls out from our 903 | reborrowing rules: the supporting paths of `**r_b` is just `**r_b`. 904 | We do not add any more paths because this path is already a 905 | dereference of `*r_b`, and `*r_b` has (shared reference) type `&'a 906 | i32`. Therefore, we would add one reborrow constraint: that `'a: 'c`. 907 | This constraint ensures that as long as `r_c` is in use, the borrow of 908 | `foo` remains in force, but the borrow of `r_a` (which has the 909 | lifetime `'b`) can expire. 910 | 911 | [**Example 3.**][bck-rrwrmb] The previous example showed how a borrow 912 | of a shared reference can expire once it has been dereferenced. With 913 | mutable references, however, this is not safe. Consider the following example: 914 | 915 | [bck-rrwrmb]: https://github.com/nikomatsakis/nll/blob/master/test/borrowck-read-ref-while-referent-mutably-borrowed.nll 916 | 917 | ```rust 918 | let foo = Foo { ... }; 919 | let p: &'p mut Foo = &mut foo; 920 | let q: &'q mut &'p mut Foo = &mut p; 921 | let r: &'r mut Foo = &mut **q; 922 | use(*p); // <-- This line should result in an ERROR 923 | use(r); 924 | ``` 925 | 926 | The key point here is that we create a reference `r` by reborrowing 927 | `**q`; `r` is then later used in the final line of the program. This 928 | use of `r` must extend the lifetime of the borrows used to create 929 | *both* `p` *and* `q`. Otherwise, one could access (and mutate) the 930 | same memory through both `*r` and `*p`. (In fact, the real rustc did 931 | in its early days have a soundness bug much like this one.) 932 | 933 | Because dereferencing a mutable reference does not stop the supporting 934 | prefixes from being enumerated, the supporting prefixes of `**q` are 935 | `**q`, `*q`, and `q`. Therefore, we add two reborrow constraints: `'q: 936 | 'r` and `'p: 'r`, and hence both borrows are indeed considered in 937 | scope at the line in question. 938 | 939 | As an alternate way of looking at the previous example, consider it 940 | like this. To create the mutable reference `p`, we get a "lock" on 941 | `foo` (that lasts so long as `p` is in use). We then take a lock on 942 | the mutable reference `p` to create `q`; this lock must last for as 943 | long as `q` is in use. When we create `r` by borrowing `**q`, that is 944 | the last direct use of `q` -- so you might think we can release the 945 | lock on `p`, since `q` is no longer in (direct) use. However, that 946 | would be unsound, since then `r` and `*p` could both be used to access 947 | the same memory. The key is to recognize that `r` represents an 948 | indirect use of `q` (and `q` in turn is an indirect use of `p`), and 949 | hence so long as `r` is in use, `p` and `q` must also be considered "in 950 | use" (and hence their "locks" still enforced). 951 | 952 | ### Solving constraints 953 | 954 | Once the constraints are created, the **inference algorithm** solves 955 | the constraints. This is done via fixed-point iteration: each 956 | lifetime variable begins as an empty set and we iterate over the 957 | constaints, repeatedly growing the lifetimes until they are big enough 958 | to satisfy all constraints. 959 | 960 | The meaning of a constraint like `('a: 'b) @ P` is that, starting from 961 | the point P, the lifetime `'a` must include all points in `'b` that 962 | are reachable from the point P. The implementation 963 | [does a depth-first search starting from P][dfs]; the search stops if 964 | we exit the lifetime `'b`. Otherwise, for each point we find, we add 965 | it to `'a`. 966 | 967 | In our example, the full set of constraints is: 968 | 969 | ('foo: 'p) @ A/1 970 | ('bar: 'p) @ B/3 971 | ('p: {A/1}) @ A/1 972 | ('p: {B/0}) @ B/0 973 | ('p: {B/3}) @ B/3 974 | ('p: {B/4}) @ B/4 975 | ('p: {C/0}) @ C/0 976 | 977 | Solving these constraints results in the following lifetimes, 978 | which are precisely the answers we expected: 979 | 980 | 'p = {A/1, B/0, B/3, B/4, C/0} 981 | 'foo = {A/1, B/0, C/0} 982 | 'bar = {B/3, B/4, C/0} 983 | 984 | [dfs]: https://github.com/nikomatsakis/nll/blob/1cff361c9aeb6f553b528078866f5717f1872dad/nll/src/infer.rs#L71-L113 985 | 986 | ### Intuition for why this algorithm is correct 987 | 988 | For the algorithm to be correct, there is a critical invariant that we 989 | must maintain. Consider some path H that is borrowed with lifetime L 990 | at a point P to create a reference R; this reference R (or some 991 | copy/move of it) is then later dereferenced at some point Q. 992 | 993 | We must ensure that the reference has not been invalidated: this means 994 | that the memory which was borrowed must not have been freed by the 995 | time we reach Q. If the reference R is a shared reference (`&T`), then 996 | the memory must also not have been written (modulo `UnsafeCell`). If 997 | the reference R is a mutable reference (`&mut T`), then the memory 998 | must not have been accessed at all, except through the reference R. 999 | **To guarantee these properties, we must prevent actions that might 1000 | affect the borrowed memory for all of the points between P (the 1001 | borrow) and Q (the use).** 1002 | 1003 | This means that L must at least include all the points between P and 1004 | Q. There are two cases to consider. First, the case where the access 1005 | at point Q occurs through the same reference R that was created by 1006 | the borrow: 1007 | 1008 | R = &H; // point P 1009 | ... 1010 | use(R); // point Q 1011 | 1012 | In this case, the variable R will be **live** on all the points 1013 | between P and Q. The liveness-based rules suffice for this case: 1014 | specifically, because the type of R includes the lifetime L, we know 1015 | that L must include all the points between P and Q, since R is live 1016 | there. 1017 | 1018 | The second case is when the memory referenced by R is accessed, but 1019 | through an alias (or move): 1020 | 1021 | R = &H; // point P 1022 | R2 = R; // last use of R, point A 1023 | ... 1024 | use(R2); // point Q 1025 | 1026 | In this case, the liveness rules alone do not suffice. The problem is 1027 | that the `R2 = R` assignment may well be the last use of R, and so the 1028 | **variable** R is dead at this point. However, the *value* in R will 1029 | still be dereferenced later (through R2), and hence we want the 1030 | lifetime L to include those points. This is where the **subtyping 1031 | constraints** come into play: the type of R2 includes a lifetime L2, 1032 | and the assignment `R2 = R` will establish an outlives constraint `(L: 1033 | L2) @ A` between L and L2. Moreover, this new variable R2 must be 1034 | live between the assignment and the ultimate use (that is, along the 1035 | path A...Q). Putting these two facts together, we see that L will 1036 | ultimately include the points from P to A (because of the liveness of 1037 | R) and the points from A to Q (because the subtyping requirement 1038 | propagates the liveness of R2). 1039 | 1040 | Note that it is possible for these lifetimes to have gaps. This can occur 1041 | when the same variable is used and overwritten multiple times: 1042 | 1043 | let R: &L i32; 1044 | let R2: &L2 i32; 1045 | 1046 | R = &H1; // point P1 1047 | R2 = R; // point A1 1048 | use(R2); // point Q1 1049 | ... 1050 | R2 = &H2; // point P2 1051 | use(R2); // point Q2 1052 | 1053 | In this example, the liveness constraints on R2 will ensure that L2 1054 | (the lifetime in its type) includes Q1 and Q2 (because R2 is live at 1055 | those two points), but not the "..." nor the points P1 or P2. Note 1056 | that the subtyping relationship (`(L: L2) @ A1)`) at A1 here ensures 1057 | that L also includes Q1, but doesn't require that L includes Q2 (even 1058 | though L2 has point Q2). This is because the value in R2 at Q2 cannot 1059 | have come from the assignment at A1; if it could have done, then 1060 | either R2 would have to be live between A1 and Q2 or else there would 1061 | be a subtyping constraint. 1062 | 1063 | ### Other examples 1064 | 1065 | Let us work through some more examples. We begin with problem cases #1 1066 | and #2 (problem case #3 will be covered after we cover named lifetimes 1067 | in a later section). 1068 | 1069 | #### Problem case #1. 1070 | 1071 | Translated into MIR, the example will look roughly as follows: 1072 | 1073 | ```rust 1074 | let mut data: Vec; 1075 | let slice: &'slice mut i32; 1076 | START { 1077 | data = ...; 1078 | slice = &'borrow mut data; 1079 | capitalize(slice); 1080 | data.push('d'); 1081 | data.push('e'); 1082 | data.push('f'); 1083 | } 1084 | ``` 1085 | 1086 | The constraints generated will be as follows: 1087 | 1088 | ('slice: {START/2}) @ START/2 1089 | ('borrow: 'slice) @ START/2 1090 | 1091 | Both `'slice` and `'borrow` will therefore be inferred to START/2, and 1092 | hence the accesses to `data` in START/3 and the following statements 1093 | are permitted. 1094 | 1095 | #### Problem case #2. 1096 | 1097 | Translated into MIR, the example will look roughly as follows (some 1098 | irrelevant details are elided). Note that the `match` statement is 1099 | translated into a SWITCH, which tests the variant, and a "downcast", 1100 | which lets us extract the contents out from the `Some` variant (this 1101 | operation is specific to MIR and has no Rust equivalent, other than as 1102 | part of a match). 1103 | 1104 | ``` 1105 | let map: HashMap; 1106 | let key: K; 1107 | let tmp0: &'tmp0 mut HashMap; 1108 | let tmp1: &K; 1109 | let tmp2: Option<&'tmp2 mut V>; 1110 | let value: &'value mut V; 1111 | 1112 | START { 1113 | /*0*/ map = ...; 1114 | /*1*/ key = ...; 1115 | /*2*/ tmp0 = &'map mut map; 1116 | /*3*/ tmp1 = &key; 1117 | /*4*/ tmp2 = HashMap::get_mut(tmp0, tmp1); 1118 | /*5*/ SWITCH tmp2 { None => NONE, Some => SOME } 1119 | } 1120 | 1121 | NONE { 1122 | /*0*/ ... 1123 | /*1*/ goto EXIT; 1124 | } 1125 | 1126 | SOME { 1127 | /*0*/ value = tmp2.downcast.0; 1128 | /*1*/ process(value); 1129 | /*2*/ goto EXIT; 1130 | } 1131 | 1132 | EXIT { 1133 | } 1134 | ``` 1135 | 1136 | The following liveness constraints are generated: 1137 | 1138 | ('tmp0: {START/3}) @ START/3 1139 | ('tmp0: {START/4}) @ START/4 1140 | ('tmp2: {SOME/0}) @ SOME/0 1141 | ('value: {SOME/1}) @ SOME/1 1142 | 1143 | The following subtyping-based constraints are generated: 1144 | 1145 | ('map: 'tmp0) @ START/3 1146 | ('tmp0: 'tmp2) @ START/5 1147 | ('tmp2: 'value) @ SOME/1 1148 | 1149 | Ultimately, the lifetime we are most interested in is `'map`, 1150 | which indicates the duration for which `map` is borrowed. If we solve 1151 | the constraints above, we will get: 1152 | 1153 | 'map == {START/3, START/4, SOME/0, SOME/1} 1154 | 'tmp0 == {START/3, START/4, SOME/0, SOME/1} 1155 | 'tmp2 == {SOME/0, SOME/1} 1156 | 'value == {SOME/1} 1157 | 1158 | These results indicate that `map` **can** be mutated in the `None` 1159 | arm; `map` could also be mutated in the `Some` arm, but only after 1160 | `process()` is called (i.e., starting at SOME/2). This is the desired 1161 | result. 1162 | 1163 | #### Example 4, invariant 1164 | 1165 | It's worth looking at a variant of our running example ("Example 4"). 1166 | This is the same pattern as before, but instead of using `&'a T` 1167 | references, we use `Foo<'a>` references, which are **invariant** with 1168 | respect to `'a`. This means that the `'a` lifetime in a `Foo<'a>` 1169 | value cannot be approximated (i.e., you can't make it shorter, as you 1170 | can with a normal reference). Usually invariance arises because of 1171 | mutability (e.g., `Foo<'a>` might have a field of type `Cell<&'a 1172 | ()>`). The key point here is that invariance actually makes **no 1173 | difference at all** the outcome. This is true because of 1174 | location-based subtyping. 1175 | 1176 | ```rust 1177 | let mut foo: T = ...; 1178 | let mut bar: T = ...; 1179 | let p: Foo<'a>; 1180 | 1181 | p = Foo::new(&foo); 1182 | if condition { 1183 | print(*p); 1184 | p = Foo::new(&bar); 1185 | } 1186 | print(*p); 1187 | ``` 1188 | 1189 | Effectively, we wind up with the same constraints as before, but where 1190 | we only had `'foo: 'p`/`'bar: 'p` constraints before (due to subtyping), we now 1191 | also have `'p: 'foo` and `'p: 'bar` constraints: 1192 | 1193 | ('foo: 'p) @ A/1 1194 | ('p: 'foo) @ A/1 1195 | ('bar: 'p) @ B/3 1196 | ('p: 'bar) @ B/3 1197 | ('p: {A/1}) @ A/1 1198 | ('p: {B/0}) @ B/0 1199 | ('p: {B/3}) @ B/3 1200 | ('p: {B/4}) @ B/4 1201 | ('p: {C/0}) @ C/0 1202 | 1203 | The key point is that the new constraints don't affect the final answer: 1204 | the new constraints were already satisfied with the older answer. 1205 | 1206 | #### vec-push-ref 1207 | 1208 | In previous iterations of this proposal, the location-aware subtyping 1209 | rules were replaced with transformations such as SSA form. The 1210 | vec-push-ref example demonstrates the value of location-aware 1211 | subtyping in contrast to these approaches. 1212 | 1213 | ```rust 1214 | let foo: i32; 1215 | let vec: Vec<&'vec i32>; 1216 | let p: &'p i32; 1217 | 1218 | foo = ...; 1219 | vec = Vec::new(); 1220 | p = &'foo foo; 1221 | if true { 1222 | vec.push(p); 1223 | } else { 1224 | // Key point: `foo` not borrowed here. 1225 | use(vec); 1226 | } 1227 | ``` 1228 | 1229 | This can be converted to control-flow graph form: 1230 | 1231 | ``` 1232 | block START { 1233 | v = Vec::new(); 1234 | p = &'foo foo; 1235 | goto B C; 1236 | } 1237 | 1238 | block B { 1239 | vec.push(p); 1240 | goto EXIT; 1241 | } 1242 | 1243 | block C { 1244 | // Key point: `foo` not borrowed here 1245 | use(vec); 1246 | goto EXIT; 1247 | } 1248 | 1249 | block EXIT { 1250 | } 1251 | ``` 1252 | 1253 | Here the relations from liveness are: 1254 | 1255 | ('vec: {START/1}) @ START/1 1256 | ('vec: {START/2}) @ START/2 1257 | ('vec: {B/0}) @ B/0 1258 | ('vec: {C/0}) @ C/0 1259 | ('p: {START/2}) @ START/2 1260 | ('p: {B/0}) @ B/0 1261 | 1262 | Meanwhile, the call to `vec.push(p)` establishes this subtyping 1263 | relation: 1264 | 1265 | ('p: 'vec) @ B/1 1266 | ('foo: 'p) @ START/2 1267 | 1268 | The solution is: 1269 | 1270 | 'vec = {START/1, START/2, B/0, C/0} 1271 | 'p = {START/2, B/0} 1272 | 'foo = {START/2, B/0} 1273 | 1274 | What makes this example interesting is that **the lifetime `'vec` must 1275 | include both halves of the `if`** -- because it is used in both branches 1276 | -- but `'vec` only becomes "entangled" with the lifetime `'p` on one 1277 | path. Thus even though `'vec` has to outlive `'p`, `'p` never winds up 1278 | including the "else" branch thanks to location-aware subtyping. 1279 | 1280 | ## Layer 2: Avoiding infinite loops 1281 | 1282 | The previous design was described in terms of the "pure" MIR 1283 | control-flow graph. However, using the raw graph has some undesirable 1284 | properties around infinite loops. In such cases, the graph has no 1285 | exit, which undermines the traditional definition of reverse analyses 1286 | like liveness. To address this, when we build the control-flow graph 1287 | for our functions, we will augment it with additional edges -- in 1288 | particular, for every infinite loop (`loop { }`), we will add false 1289 | "unwind" edges. This ensures that the control-flow graph has a final 1290 | exit node (the success of the RETURN and RESUME nodes) that 1291 | postdominates all other nodes in the graph. 1292 | 1293 | If we did not add such edges, the result would also allow a number of surprising 1294 | programs to type-check. For example, it would be possible to borrow local variables 1295 | with `'static` lifetime, so long as the function never returned: 1296 | 1297 | ```rust 1298 | fn main() { 1299 | let x: usize; 1300 | let y: &'static x = &x; 1301 | loop { } 1302 | } 1303 | ``` 1304 | 1305 | This would work because (as covered in detail under the borrow check 1306 | section) the `StorageDead(x)` instruction would never be reachable, 1307 | and hence any lifetime of borrow would be acceptable. This further leads to 1308 | other surprising programs that still type-check, such as this example which 1309 | uses an (incorrect, but declared as unsafe) API for spawning threads: 1310 | 1311 | ```rust 1312 | let scope = Scope::new(); 1313 | let mut foo = 22; 1314 | 1315 | unsafe { 1316 | // dtor joins the thread 1317 | let _guard = scope.spawn(&mut foo); 1318 | loop { 1319 | foo += 1; 1320 | } 1321 | // drop of `_guard` joins the thread 1322 | } 1323 | ``` 1324 | 1325 | Without the unwind edges, this code would pass the borrowck, since the 1326 | drop of `_guard` (and `StorageDead` instruction) is not reachable, and 1327 | hence `_guard` is not considered live (after all, its destructor will 1328 | indeed never run). However, this would permit the `foo` variable to be 1329 | modified both during the infinite loop and by the thread launched by 1330 | `scope.spawn()`, which was given access to an `&mut foo` reference 1331 | (albeit one with a theoretically short lifetime). 1332 | 1333 | With the false unwind edge, the compiler essentially always assumes 1334 | that a destructor *may* run, since every scope may theoretically 1335 | execute. This extends the `&mut foo` borrow given to `scope.spawn()` 1336 | to cover the body of the loop, resulting in a borrowck error. 1337 | 1338 | ## Layer 3: Accommodating dropck 1339 | 1340 | MIR includes an action that corresponds to "dropping" a variable: 1341 | 1342 | DROP(variable) 1343 | 1344 | Note that while MIR supports general drops of any lvalue, at the point 1345 | where this analysis is running, we are always dropping entire 1346 | variables at a time. This operation executes the destructor for 1347 | `variable`, effectively "de-initializing" the memory in which the 1348 | value resides (if the variable -- or parts of the variable -- have 1349 | already been dropped, then drop has no effect; this is not relevant to 1350 | the current analysis). 1351 | 1352 | Interestingly, in many cases dropping a value does not require that the 1353 | lifetimes in the dropped value be valid. After all, dropping a 1354 | reference of type `&'a T` or `&'a mut T` is defined as a no-op, so it 1355 | does not matter if the reference points at valid memory. In cases like 1356 | this, we say that the lifetime `'a` **may dangle**. This is inspired by the C 1357 | term "dangling pointer" which means a pointer to freed or invalid 1358 | memory. 1359 | 1360 | However, if that same reference is stored in the field of a struct 1361 | which implements the `Drop` trait, when the struct may, during its 1362 | destructor, access the referenced value, so it's very important that 1363 | the reference be valid in that case. Put another way, if you have a 1364 | value `v` of type `Foo<'a>` that implements `Drop`, then `'a` 1365 | typically **cannot dangle** when `v` is dropped (just as `'a` would 1366 | not be allowed to dangle for any other operation). 1367 | 1368 | More generally, RFC 1327 defined specific rules for which lifetimes in 1369 | a type may dangle during drop and which may not. We integrate those 1370 | rules into our liveness analysis as follows: the MIR instruction 1371 | `DROP(variable)` is not treated like other MIR instructions when it 1372 | comes to liveness. In a sense, conceptually we run two distinct liveness analyses (in practice, the prototype 1373 | uses two bits per variable): 1374 | 1375 | 1. The first, which we've already seen, indicates when a variable's 1376 | current value may be **used** in the future. This corresponds to 1377 | "non-drop" uses of the variable in the MIR. Whenever a variable is live by this definition, 1378 | all of the lifetimes in its type are live. 1379 | 2. The second, which we are adding now, indicates when a variable's 1380 | current value may be **dropped** in the future. This corresponds to 1381 | "drop" uses of the variable in the MIR. Whenever a variable is live 1382 | in *this* sense, all of the lifetimes in its type **except those 1383 | marked as may-dangle** are live. 1384 | 1385 | Permitting lifetimes to dangle during drop is very important! In fact, 1386 | it is essential to even the most basic non-lexical lifetime examples, 1387 | such as Problem Case #1. After all, if we translate Problem Case #1 1388 | into MIR, we see that the reference `slice` will wind up being dropped 1389 | at the end of the block: 1390 | 1391 | ```rust 1392 | let mut data: Vec; 1393 | let slice: &'slice mut i32; 1394 | START { 1395 | ... 1396 | slice = &'borrow mut data; 1397 | capitalize(slice); 1398 | data.push('d'); 1399 | data.push('e'); 1400 | data.push('f'); 1401 | DROP(slice); 1402 | DROP(data); 1403 | } 1404 | ``` 1405 | 1406 | This poses no problem for our analysis, however, because `'slice` "may 1407 | dangle" during the drop, and hence is not considered live. 1408 | 1409 | ## Layer 4: Named lifetimes 1410 | 1411 | Until now, we've only considered lifetimes that are confined to the 1412 | extent of a function. Often, we want to reason about 1413 | lifetimes that begin or end after the current function has ended. More 1414 | subtly, we sometimes want to have lifetimes that sometimes begin and 1415 | end in the current function, but which may (along some paths) extend 1416 | into the caller. Consider Problem Case #3 (the corresponding test case 1417 | in the prototype is the [get-default] test): 1418 | 1419 | [get-default]: https://github.com/nikomatsakis/nll/blob/master/test/get-default.nll 1420 | 1421 | ```rust 1422 | fn get_default<'r,K,V:Default>(map: &'r mut HashMap, 1423 | key: K) 1424 | -> &'r mut V { 1425 | match map.get_mut(&key) { // -------------+ 'r 1426 | Some(value) => value, // | 1427 | None => { // | 1428 | map.insert(key, V::default()); // | 1429 | // ^~~~~~ ERROR // | 1430 | map.get_mut(&key).unwrap() // | 1431 | } // | 1432 | } // | 1433 | } // v 1434 | ``` 1435 | 1436 | When we translate this into MIR, we get something like the following 1437 | (this is "pseudo-MIR"): 1438 | 1439 | ``` 1440 | block START { 1441 | m1 = &'m1 mut *map; // temporary created for `map.get_mut()` call 1442 | v = Map::get_mut(m1, &key); 1443 | switch v { SOME NONE }; 1444 | } 1445 | 1446 | block SOME { 1447 | return = v.as.0; // assign to return value slot 1448 | goto END; 1449 | } 1450 | 1451 | block NONE { 1452 | Map::insert(&*map, key, ...); 1453 | m2 = &'m2 mut *map; // temporary created for `map.get_mut()` call 1454 | v = Map::get_mut(m2, &key); 1455 | return = ... // "unwrap" of `v` 1456 | goto END; 1457 | } 1458 | 1459 | block END { 1460 | return; 1461 | } 1462 | ``` 1463 | 1464 | The key to this example is that the first borrow of `map`, with the 1465 | lifetime `'m1`, must extend to the end of the `'r`, but only if we 1466 | branch to SOME. Otherwise, it should end once we enter the NONE block. 1467 | 1468 | To accommodate cases like this, we will extend the notion of a region 1469 | so that it includes not only points in the control-flow graph, but 1470 | also includes a (possibly empty) set of "end regions" for various 1471 | named lifetimes. We denote these as `end('r)` for some named region 1472 | `'r`. The region `end('r)` can be understood semantically as referring 1473 | to some portion of the caller's control-flow graph (actually, they 1474 | could extend beyond the end of the caller, into the caller's caller, 1475 | and so forth, but that doesn't concern us). This new region might then 1476 | be denoted as the following (in pseudocode form): 1477 | 1478 | ```rust 1479 | struct Region { 1480 | points: Set, 1481 | end_regions: Set, 1482 | } 1483 | ``` 1484 | 1485 | In this case, when a type mentions a named lifetime, such as `'r`, that 1486 | can be represented by a region that includes: 1487 | 1488 | - the entire CFG, 1489 | - and, the end region for that named lifetime (`end('r)`). 1490 | 1491 | Furthermore, we can **elaborate** the set to include `end('x)` for 1492 | every named lifetime `'x` such that `'r: 'x`. This is because, if `'r: 1493 | 'x`, then we know that `'r` doesn't end up until `'x` has already 1494 | ended. 1495 | 1496 | Finally, we must adjust our definition of subtyping to accommodate 1497 | this amended definition of a region, which we do as follows. When we have 1498 | an outlives relation 1499 | 1500 | 'b: 'a @ P 1501 | 1502 | where the end point of the CFG is reachable from P without leaving 1503 | `'a`, the existing inference algorithm would simply add the end-point 1504 | to `'b` and stop. The new algorithm would also add any end regions 1505 | that are included in `'a` to `'b` at that time. (Expressed less 1506 | operationally, `'b` only outlives `'a` if it also includes the 1507 | end-regions that `'a` includes, presuming that the end point of the 1508 | CFG is reachable from P). The reason that we require the end point of 1509 | the CFG to be reachable is because otherwise the data never escapes 1510 | the current function, and hence `end('r)` is not reachable (since 1511 | `end('r)` only covers the code in callers that executes *after* the 1512 | return). 1513 | 1514 | NB: This part of the prototype is partially 1515 | implemented. [Issue #12](https://github.com/nikomatsakis/nll/issues/12) 1516 | describes the current status and links to the in-progress PRs. 1517 | 1518 | ## Layer 5: How the borrow check works 1519 | 1520 | For the most part, the focus of this RFC is on the structure of 1521 | lifetimes, but it's worth talking a bit about how to integrate 1522 | these non-lexical lifetimes into the borrow checker. In particular, 1523 | along the way, we'd like to fix two shortcomings of the borrow checker: 1524 | 1525 | **First, support nested method calls like `vec.push(vec.len())`.** 1526 | Here, the plan is to continue with the `mut2` borrow solution proposed 1527 | in [RFC 2025]. This RFC does not (yet) propose one of the type-based 1528 | solutions described in RFC 2025, such as "borrowing for the future" or 1529 | `Ref2`. The reasons why are discussed in the Alternatives section. For 1530 | simplicity, this description of the borrow checker ignores 1531 | [RFC 2025]. The extensions described here are fairly orthogonal to the 1532 | changes proposed in [RFC 2025], which in effect cause the start of a 1533 | borrow to be delayed. 1534 | 1535 | **Second, permit variables containing mutable references to be 1536 | modified, even if their referent is borrowed.** This refers to the 1537 | "Problem Case #4" described in the introduction; we wish to accept the 1538 | original program. 1539 | 1540 | ### Borrow checker phase 1: computing loans in scope 1541 | 1542 | The first phase of the borrow checker computes, at each point in 1543 | the CFG, the set of in-scope **loans**. A "loan" is represented as a tuple 1544 | `('a, shared|uniq|mut, lvalue)` indicating: 1545 | 1546 | 1. the lifetime `'a` for which the value was borrowed; 1547 | 2. whether this was a shared, unique, or mutable loan; 1548 | - "unique" loans are exactly like mutable loans, but they do not permit 1549 | mutation of their referents. They are used only in closure desugarings 1550 | and are not part of Rust's surface syntax. 1551 | 3. the lvalue that was borrowed (e.g., `x` or `(*x).foo`). 1552 | 1553 | The set of in-scope loans at each point is found via a fixed-point 1554 | dataflow computation. We create a loan tuple from each borrow rvalue 1555 | in the MIR (that is, every assignment statement like `tmp = &'a 1556 | b.c.d`), giving each tuple a unique index `i`. We can then represent 1557 | the set of loans that are in scope at a particular point using a 1558 | bit-set and do a standard forward data-flow propagation. 1559 | 1560 | For a statement at point P in the graph, we define the "transfer 1561 | function" -- that is, which loans it brings into or out of scope -- as 1562 | follows: 1563 | 1564 | - any loans whose region does not include P are killed; 1565 | - if this is a borrow statement, the corresponding loan is generated; 1566 | - if this is an assignment `lv = `, then any loan for some path P 1567 | of which `lv` is a prefix is killed. 1568 | 1569 | The last point bears some elaboration. This rule is what allows us to 1570 | support cases like the one in Problem Case #4: 1571 | 1572 | ```rust 1573 | let list: &mut List = ...; 1574 | let v = &mut (*list).value; 1575 | list = ...; // <-- assignment 1576 | ``` 1577 | 1578 | At the point of the marked assignment, the loan of `(*list).value` is 1579 | in-scope, but it does not have to be considered in-scope 1580 | afterwards. This is because the variable `list` now holds a fresh 1581 | value, and that new value has not yet been borrowed (or else we could 1582 | not have produced it). Specifically, whenever we see an assignment `lv 1583 | = ` in MIR, we can clear all loans where the borrowed path 1584 | `lv_loan` has `lv` as a prefix. (In our example, the assignment is to 1585 | `list`, and the loan path `(*list).value` has `list` as a prefix.) 1586 | 1587 | **NB.** In this phase, when there is an assignment, we always clear 1588 | all loans that applied to the overwritten path; however, in some cases 1589 | the **assignment itself** may be illegal due to those very loans. In 1590 | our example, this would be the case if the type of `list` had been 1591 | `List` and not `&mut List`. In such cases, errors will be 1592 | reported by the next portion of the borrowck, described in the next 1593 | section. 1594 | 1595 | ### Borrow checker phase 2: reporting errors 1596 | 1597 | At this point, we have computed which loans are in scope at each 1598 | point. Next, we traverse the MIR and identify actions that are illegal 1599 | given the loans in scope. Rather than go through every kind of MIR statement, 1600 | we can break things down into two kinds of actions that can be performed: 1601 | 1602 | - Accessing an lvalue, which we categorize along two axes (shallow vs deep, read vs write) 1603 | - Dropping an lvalue 1604 | 1605 | For each of these kinds of actions, we will specify below the rules 1606 | that determine when they are legal, given the set of loans L in scope 1607 | at the start of the action. The second phase of the borrow check 1608 | therefore consists of iterating over each statement in the MIR and 1609 | checking, given the in-scope loans, whether the actions it performs 1610 | are legal. Translating MIR statements into actions is mostly 1611 | straightforward: 1612 | 1613 | - A `StorageDead` statement counts as a **shallow write**. 1614 | - An assignment statement `LV = RV` is a **shallow write** to `LV`; 1615 | - and, within the rvalue `RV`: 1616 | - Each lvalue operand is either a **deep read** or a **deep write** action, depending 1617 | on whether or not the type of the lvalue implements `Copy`. 1618 | - Note that moves count as "deep writes". 1619 | - A shared borrow `&LV` counts as a **deep read**. 1620 | - A mutable borrow `&mut LV` counts as **deep write**. 1621 | 1622 | There are a few interesting cases to keep in mind: 1623 | 1624 | - MIR models discriminants more precisely. They should be 1625 | thought of as a distinct *field* when it comes to borrows. 1626 | - In the compiler today, `Box` is still "built-in" to MIR. This RFC 1627 | ignores that possibility and instead acts as though borrowed 1628 | references (`&` and `&mut`) and raw pointers (`*const` and `*mut`) 1629 | were the only sorts of pointers. It should be straight-forward to 1630 | extend the text here to cover `Box`, though some questions arise 1631 | around the handling of drop (see the section on drops for details). 1632 | 1633 | **Accessing an lvalue LV.** When accessing an lvalue LV, there are two 1634 | axes to consider: 1635 | 1636 | - The access can be SHALLOW or DEEP: 1637 | - A *shallow* access means that the immediate fields reached at LV 1638 | are accessed, but references or pointers found within are not 1639 | dereferenced. Right now, the only access that is shallow is an 1640 | assignment like `x = ...`, which would be a **shallow write** of 1641 | `x`. 1642 | - A *deep* access means that all data reachable through a given lvalue 1643 | may be invalidated or accessed by this action. 1644 | - The access can be a READ or WRITE: 1645 | - A *read* means that the existing data may be read, but will not be changed. 1646 | - A *write* means that the data may be mutated to new values or 1647 | otherwise invalidated (for example, it could be de-initialized, as 1648 | in a move operation). 1649 | 1650 | "Deep" accesses are often deep because they create and release an 1651 | alias, in which case the "deep" qualifier reflects what might happen 1652 | through that alias. For example, if you have `let x = &mut y`, that is 1653 | considered a **deep write** of `y`, even though the **actual borrow** 1654 | doesn't do anything at all, we create a mutable alias `x` that can be 1655 | used to mutate anything reachable from `y`. A move `let x = y` is 1656 | similar: it writes to the shallow content of `y`, but then -- via the 1657 | new name `x` -- we can access all other content accessible through 1658 | `y`. 1659 | 1660 | The pseudocode for deciding when an access is legal looks like this: 1661 | 1662 | ``` 1663 | fn access_legal(lvalue, is_shallow, is_read) { 1664 | let relevant_borrows = select_relevant_borrows(lvalue, is_shallow); 1665 | 1666 | for borrow in relevant_borrows { 1667 | // shared borrows like `&x` still permit reads from `x` (but not writes) 1668 | if is_read && borrow.is_read { continue; } 1669 | 1670 | // otherwise, report an error, because we have an access 1671 | // that conflicts with an in-scope borrow 1672 | report_error(); 1673 | } 1674 | } 1675 | ``` 1676 | 1677 | As you can see, it works in two steps. First, we enumerate a set of 1678 | in-scope borrows that are relevant to `lvalue` -- this set is affected 1679 | by whether this is a "shallow" or "deep" action, as will be described 1680 | shortly. Then, for each such borrow, we check if it conflicts with the 1681 | action (i.e.,, if at least one of them is potentially writing), and, 1682 | if so, we report an error. 1683 | 1684 | For **shallow** accesses to the path `lvalue`, we consider borrows relevant 1685 | if they meet one of the following criteria: 1686 | 1687 | - there is a loan for the path `lvalue`; 1688 | - so: writing a path like `a.b.c` is illegal if `a.b.c` is borrowed 1689 | - there is a loan for some prefix of the path `lvalue`; 1690 | - so: writing a path like `a.b.c` is illegal if `a` or `a.b` is borrowed 1691 | - `lvalue` is a **shallow prefix** of the loan path 1692 | - shallow prefixes are found by stripping away fields, but stop at 1693 | any dereference 1694 | - so: writing a path like `a` is illegal if `a.b` is borrowed 1695 | - but: writing `a` is legal if `*a` is borrowed, whether or not `a` 1696 | is a shared or mutable reference 1697 | 1698 | For **deep** accesses to the path `lvalue`, we consider borrows relevant 1699 | if they meet one of the following criteria: 1700 | 1701 | - there is a loan for the path `lvalue`; 1702 | - so: reading a path like `a.b.c` is illegal if `a.b.c` is mutably borrowed 1703 | - there is a loan for some prefix of the path `lvalue`; 1704 | - so: reading a path like `a.b.c` is illegal if `a` or `a.b` is mutably borrowed 1705 | - `lvalue` is a **supporting prefix** of the loan path 1706 | - supporting prefixes were defined earlier 1707 | - so: reading a path like `a` is illegal if `a.b` is mutably 1708 | borrowed, but -- in contrast with shallow accesses -- reading `a` is also 1709 | illegal if `*a` is mutably borrowed 1710 | 1711 | **Dropping an lvalue LV.** Dropping an lvalue can be treated as a DEEP 1712 | WRITE, like a move, but this is overly conservative. The rules here 1713 | are under active development, see 1714 | [#40](https://github.com/nikomatsakis/nll-rfc/issues/40). 1715 | 1716 | # How We Teach This 1717 | [how-we-teach-this]: #how-we-teach-this 1718 | 1719 | ## Terminology 1720 | 1721 | In this RFC, I've opted to continue using the term "lifetime" to refer 1722 | to the portion of the program in which a reference is in active use 1723 | (or, alternatively, to the "duration of a borrow"). As the intro to 1724 | the RFC makes clear, this terminology somewhat conflicts with an 1725 | alternative usage, in which lifetime refers to the dynamic extent of a 1726 | value (what we call the "scope"). I think that -- if we were starting 1727 | over -- it might have been preferable to find an alternative term that 1728 | is more specific. However, it would be rather difficult to try and 1729 | change the term "lifetime" at this point, and hence this RFC does not 1730 | attempt do so. To avoid confusion, however, it seems best if the error 1731 | messages result from the region and borrow check avoid the term 1732 | lifetime where possible, or use qualification to make the meaning more 1733 | clear. 1734 | 1735 | ## Leveraging intuition: framing errors in terms of points 1736 | 1737 | Part of the reason that Rust currently uses lexical scopes to 1738 | determine lifetimes is that it was thought that they would be simpler 1739 | for users to reason about. Time and experience have not borne this 1740 | hypothesis out: for many users, the fact that borrows are 1741 | "artificially" extended to the end of the block is more surprising 1742 | than not. Furthermore, most users have a pretty intuitive 1743 | understanding of control flow (which makes sense: you have to, in 1744 | order to understand what your program will do). 1745 | 1746 | We therefore propose to leverage this intution when explaining borrow 1747 | and lifetime errors. To the extent possible, we will try to explain 1748 | all errors in terms of three points: 1749 | 1750 | - The point where the borrow occurred (B). 1751 | - The point where the resulting reference is used (U). 1752 | - An intervening point that might have invalidated the reference (A). 1753 | 1754 | We should select three points such that B can reach A and A can reach 1755 | U. In general, the approach is to describe the errors in "narrative" form: 1756 | 1757 | - First, value is borrowed occurs. 1758 | - Next, the action occurs, invalidating the reference. 1759 | - Finally, the next use occcurs, after the reference has been invalidated. 1760 | 1761 | This approach is similar to what we do today, but we often neglect to 1762 | mention this third point, where the next use occurs. Note that the 1763 | "point of error" remains the *second* action -- that is, the error, 1764 | conceptually, is to perform an invalidating action in between two uses 1765 | of the reference (rather than, say, to use the reference after an 1766 | invalidating action). This actually reflects the definition of 1767 | undefined behavior more accurately (that is, performing an illegal 1768 | write is what causes undefined behavior, but the write is illegal 1769 | because of the latter use). 1770 | 1771 | To see the difference, consider this erroneous program: 1772 | 1773 | ```rust 1774 | fn main() { 1775 | let mut i = 3; 1776 | let x = &i; 1777 | i += 1; 1778 | println!("{}", x); 1779 | } 1780 | ``` 1781 | 1782 | Currently, we emit the following error: 1783 | 1784 | ``` 1785 | error[E0506]: cannot assign to `i` because it is borrowed 1786 | --> :4:5 1787 | | 1788 | 3 | let x = &i; 1789 | | - borrow of `i` occurs here 1790 | 4 | i += 1; 1791 | | ^^^^^^ assignment to borrowed `i` occurs here 1792 | ``` 1793 | 1794 | Here, the points B and A are highlighted, but not the point of use 1795 | U. Moreover, the "blame" is placed on the assignment. Under this RFC, 1796 | we would display the error as follows: 1797 | 1798 | ``` 1799 | error[E0506]: cannot write to `i` while borrowed 1800 | --> :4:5 1801 | | 1802 | 3 | let x = &i; 1803 | | - (shared) borrow of `i` occurs here 1804 | 4 | i += 1; 1805 | | ^^^^^^ write to `i` occurs here, while borrow is still active 1806 | 5 | println!("{}", x); 1807 | | - borrow is later used here 1808 | ``` 1809 | 1810 | Another example, this time using a `match`: 1811 | 1812 | ```rust 1813 | fn main() { 1814 | let mut x = Some(3); 1815 | match &mut x { 1816 | Some(i) => { 1817 | x = None; 1818 | *i += 1; 1819 | } 1820 | None => { 1821 | x = Some(0); // OK 1822 | } 1823 | } 1824 | } 1825 | ``` 1826 | 1827 | The error might be: 1828 | 1829 | ``` 1830 | error[E0506]: cannot write to `x` while borrowed 1831 | --> :4:5 1832 | | 1833 | 3 | match &mut x { 1834 | | ------ (mutable) borrow of `x` occurs here 1835 | 4 | Some(i) => { 1836 | 5 | x = None; 1837 | | ^^^^^^^^ write to `x` occurs here, while borrow is still active 1838 | 6 | *i += 1; 1839 | | -- borrow is later used here 1840 | | 1841 | ``` 1842 | 1843 | (Note that the assignment in the `None` arm is not an error, since the 1844 | borrow is never used again.) 1845 | 1846 | ## Some special cases 1847 | 1848 | There are some cases where the three points are not all visible 1849 | in the user syntax where we may need some careful treatment. 1850 | 1851 | ### Drop as last use 1852 | 1853 | There are times when the last use of a variable will in fact be its 1854 | destructor. Consider an example like this: 1855 | 1856 | ```rust 1857 | struct Foo<'a> { field: &'a u32 } 1858 | impl<'a> Drop for Foo<'a> { .. } 1859 | 1860 | fn main() { 1861 | let mut x = 22; 1862 | let y = Foo { field: &x }; 1863 | x += 1; 1864 | } 1865 | ``` 1866 | 1867 | This code would be legal, but for the destructor on `y`, which will 1868 | implicitly execute at the end of the enclosing scope. The error 1869 | message might be shown as follows: 1870 | 1871 | ``` 1872 | error[E0506]: cannot write to `x` while borrowed 1873 | --> :4:5 1874 | | 1875 | 6 | let y = Foo { field: &x }; 1876 | | -- borrow of `x` occurs here 1877 | 7 | x += 1; 1878 | | ^ write to `x` occurs here, while borrow is still active 1879 | 8 | } 1880 | | - borrow is later used here, when `y` is dropped 1881 | ``` 1882 | 1883 | ### Method calls 1884 | 1885 | One example would be method calls: 1886 | 1887 | ```rust 1888 | fn main() { 1889 | let mut x = vec![1]; 1890 | x.push(x.pop().unwrap()); 1891 | } 1892 | ``` 1893 | 1894 | We propose the following error for this sort of scenario: 1895 | 1896 | ``` 1897 | error[E0506]: cannot write to `x` while borrowed 1898 | --> :4:5 1899 | | 1900 | 3 | x.push(x.pop().unwrap()); 1901 | | - ---- ^^^^^^^^^^^^^^^^ 1902 | | | | write to `x` occurs here, while borrow is still in active use 1903 | | | borrow is later used here, during the call 1904 | | `x` borrowed here 1905 | ``` 1906 | 1907 | If you are not using a method, the error would look slightly different, 1908 | but be similar in concept: 1909 | 1910 | ``` 1911 | error[E0506]: cannot assign to `x` because it is borrowed 1912 | --> :4:5 1913 | | 1914 | 3 | Vec::push(&mut x, x.pop().unwrap()); 1915 | | --------- ------ ^^^^^^^^^^^^^^^^ 1916 | | | | write to `x` occurs here, while borrow is still in active use 1917 | | | `x` borrowed here 1918 | | borrow is later used here, during the call 1919 | ``` 1920 | 1921 | We can detect this scenario in MIR readily enough by checking when the 1922 | point of use turns out to be a "call" terminator. We'll have to tweak 1923 | the spans to get everything to look correct, but that is easy enough. 1924 | 1925 | ### Closures 1926 | 1927 | As today, when the initial borrow is part of constructing a closure, 1928 | we wish to highlight not only the point where the closure is 1929 | constructed, but the point *within* the closure where the variable in 1930 | question is used. 1931 | 1932 | ## Borrowing a variable for longer than its scope 1933 | 1934 | Consider this example: 1935 | 1936 | ```rust 1937 | let p; 1938 | { 1939 | let x = 3; 1940 | p = &x; 1941 | } 1942 | println!("{}", p); 1943 | ``` 1944 | 1945 | In this example, the reference `p` refers to `x` with a lifetime that 1946 | exceeds the scope of `x`. In short, that portion of the stack will be 1947 | popped with `p` still in active use. In today's compiler, this is 1948 | detected during the borrow checker by a special check that computes 1949 | the "maximal scope" of the path being borrowed (`x`, here). This makes 1950 | sense in the existing system since lifetimes and scopes are expressed 1951 | in the same units (portions of the AST). In the newer, non-lexical 1952 | formulation, this error would be detected somewhat differently. As 1953 | described earlier, we would see that a `StorageDead` instruction frees 1954 | the slot for `x` while `p` is still in use. We can thus present the 1955 | error in the same "three-point style": 1956 | 1957 | ``` 1958 | error[E0506]: variable goes out of scope while still borrowed 1959 | --> :4:5 1960 | | 1961 | 3 | p = &x; 1962 | | - `x` borrowed here 1963 | 4 | } 1964 | | ^ `x` goes out of scope here, while borrow is still in active use 1965 | 5 | println!("{}", p); 1966 | | - borrow used here, after invalidation 1967 | ``` 1968 | 1969 | ## Errors during inference 1970 | 1971 | The remaining set of lifetime-related errors come about primarily due 1972 | to the interaction with function signatures. For example: 1973 | 1974 | ```rust 1975 | impl Foo { 1976 | fn foo(&self, y: &u8) -> &u8 { 1977 | x 1978 | } 1979 | } 1980 | ``` 1981 | 1982 | We already have work-in-progress on presenting these sorts of errors 1983 | in a better way (see [issue 42516][] for numerous examples and 1984 | details), all of which should be applicable here. In short, the name 1985 | of the game is to identify patterns and suggest changes to improve the 1986 | function signature to match the body (or at least diagnose the problem 1987 | more clearly). 1988 | 1989 | [issue 42516]: https://github.com/rust-lang/rust/issues/42516 1990 | 1991 | Whenever possible, we should leverage points in the control-flow and 1992 | try to explain errors in "narrative" form. 1993 | 1994 | # Drawbacks 1995 | [drawbacks]: #drawbacks 1996 | 1997 | There are very few drawbacks to this proposal. The primary one is that 1998 | the **rules** for the system become more complex. However, this 1999 | permits us to accept a large number of more programs, and so we expect 2000 | that **using Rust** will feel simpler. Moreover, experience has shown 2001 | that -- for many users -- the current scheme of tying reference 2002 | lifetimes to lexical scoping is confusing and surprising. 2003 | 2004 | # Alternatives 2005 | [alternatives]: #alternatives 2006 | 2007 | ### Alternative formulations of NLL 2008 | 2009 | During the runup to this RFC, a number of alternate schemes and 2010 | approaches to describing NLL were tried and discarded. 2011 | 2012 | **RFC 396.** [RFC 396][] defined lifetimes to be a "prefix" of the 2013 | dominator tree -- roughly speaking, a single-entry, multiple-exit 2014 | region of the control-flow graph. Unlike our system, this definition 2015 | did not permit gaps or holes in a lifetime. Ensuring continuous lifetimes was 2016 | meant to guarantee soundness; in this RFC, we use the liveness 2017 | constraints to achieve a similar effect. This more flexible setup 2018 | allows us to handle cases like Problem Case #3, which RFC 396 would 2019 | not have accepted. RFC 396 also did not cover dropck and a number of 2020 | other complications. 2021 | 2022 | **SSA or SSI transformation.** Rather than incorporating the "current location" into 2023 | the subtype check, we also considered formulations that first applied 2024 | an SSA transformation to the input program, and then gave each of those 2025 | variables a distinct type. This does allow some examples to type-check that 2026 | wouldn't otherwise, but it is not flexible enough for the `vec-push-ref` 2027 | example covered earlier. 2028 | 2029 | Using SSA also introduces other complications. Among other things, 2030 | Rust permits variables and temporaries to be borrowed and mutated 2031 | indirectly (e.g., via `&mut`). If we were to apply SSA to MIR in a 2032 | naive fashion, then, it would ignore these assignments when creating 2033 | numberings. For example: 2034 | 2035 | ```rust 2036 | let mut x = 1; // x0, has value 1 2037 | let mut p = &mut x; // p0 2038 | *p += 1; 2039 | use(x); // uses `x0`, but it now has value 2 2040 | ``` 2041 | 2042 | Here, the value of `x0` changed due to a write from `p`. Thus this is 2043 | not a true SSA form. Normally, SSA transformations achieve this by 2044 | making local variables like `x` and `p` be pointers into stack slots, 2045 | and then lifting those stack slots into locals when safe. MIR was 2046 | intentionally not done using SSA form precisely to avoid the need for 2047 | such contortions (we can leave that to the optimizing backend). 2048 | 2049 | **Type per program point.** Going further than SSA, one can 2050 | accommodate `vec-push-ref` through a scheme that gives each variable a 2051 | distinct type at each point in the CFG (similar to what Ericson2314 2052 | describes in the [stateful MIR for Rust][smr]) and applies 2053 | transformations to the lifetimes on every edge. During the rustc 2054 | design sprint, the compiler team also enumerated such a design. The 2055 | author believes this RFC to be a roughly equivalent analysis, but with 2056 | an alternative, more familiar formulation that still uses one type per 2057 | variable (rather than one type per variable per point). 2058 | 2059 | There are several advantages to the design enumerated here. For one 2060 | thing, it involves far fewer inference variables (if each variable has 2061 | many types, each of those types needs distinct inference variables at 2062 | each point) and far fewer constraints (we don't need constraints just 2063 | for connecting the type of a variable between distinct points). It is 2064 | also a more natural fit for the surface language, in which variables 2065 | have a single type. 2066 | 2067 | ### Different "lifetime roles" 2068 | 2069 | In the discussion about nested method calls ([RFC 2025], and the 2070 | discussions that led up to it), there were various proposals that were 2071 | aimed at accepting the naive desugaring of a call like `vec.push(vec.len())`: 2072 | 2073 | ```rust 2074 | let tmp0 = &mut vec; 2075 | let tmp1 = vec.len(); // does a shared borrow of vec 2076 | Vec::push(tmp0, tmp1); 2077 | ``` 2078 | 2079 | The alternatives to RFC 2025 were focused on augmenting the type of 2080 | references to have distinct "roles" -- the most prominent such 2081 | proposal was `Ref2<'r, 'w>`, in which mutable references change to 2082 | have two distinct lifetimes, a "read" lifetime (`'r`) and a "write" 2083 | lifetime (`'w`), where read encompasses the entire span of the 2084 | reference, but write only contains those points where writes are 2085 | occuring. This RFC does not attempt to change the approach to nested 2086 | method calls, rather continuing with the RFC 2025 approach (which 2087 | affects only the borrowck handling). However, if we did wish to adopt 2088 | a `Ref2`-style approach in the future, it could be done backwards 2089 | compatibly, but it would require modifying (for example) the liveness 2090 | requirements. For example, currently, if a variable `x` is live at 2091 | some point P, then all lifetimes in the type of `x` must contain P -- 2092 | but in the `Ref2` approach, only the read lifetime would have to 2093 | contain P. This implies that lifetimes are treated differently 2094 | depending on their "role". It seems like a good idea to isolate such a 2095 | change into a distinct RFC. 2096 | 2097 | # Unresolved questions 2098 | [unresolved]: #unresolved-questions 2099 | 2100 | None at present. 2101 | 2102 | # Appendix: What this proposal will not fix 2103 | 2104 | It is worth discussing a few kinds of borrow check errors that the 2105 | current RFC will **not** eliminate. These are generally errors that 2106 | cross procedural boundaries in some form or another. 2107 | 2108 | **Closure desugaring.** The first kind of error has to do with the 2109 | closure desugaring. Right now, closures always capture local 2110 | variables, even if the closure only uses some sub-path of the variable 2111 | internally: 2112 | 2113 | ```rust 2114 | let get_len = || self.vec.len(); // borrows `self`, not `self.vec` 2115 | self.vec2.push(...); // error: self is borrowed 2116 | ``` 2117 | 2118 | This was discussed on [an internals thread][tc]. It is possible to fix 2119 | this [by making the closure desugaring smarter][cc]. 2120 | 2121 | [tc]: https://internals.rust-lang.org/t/borrow-the-full-stable-name-in-closures-for-ergonomics/5387 2122 | [cc]: https://internals.rust-lang.org/t/borrow-the-full-stable-name-in-closures-for-ergonomics/5387/11?u=nikomatsakis 2123 | 2124 | **Disjoint fields across functions.** Another kind of error is when 2125 | you have one method that only uses a field `a` and another that only 2126 | uses some field `b`; right now, you can't express that, and hence 2127 | these two methods cannot be used "in parallel" with one another: 2128 | 2129 | ```rust 2130 | impl Foo { 2131 | fn get_a(&self) -> &A { &self.a } 2132 | fn inc_b(&mut self) { self.b.value += 1; } 2133 | fn bar(&mut self) { 2134 | let a = self.get_a(); 2135 | self.inc_b(); // Error: self is already borrowed 2136 | use(a); 2137 | } 2138 | } 2139 | ``` 2140 | 2141 | The fix for this is to refactor so as to expose the fact that the methods 2142 | operate on disjoint data. For example, one can factor out the methods into 2143 | methods on the fields themselves: 2144 | 2145 | ```rust 2146 | fn bar(&mut self) { 2147 | let a = self.a.get(); 2148 | self.b.inc(); 2149 | use(a); 2150 | } 2151 | ``` 2152 | 2153 | This way, when looking at `bar()` alone, we see borrows of `self.a` 2154 | and `self.b`, rather than two borrows of `self`. Another technique is 2155 | to introduce "free functions" (e.g., `get(&self.a)` and `inc(&mut 2156 | self.b)`) that expose more clearly which fields are operated upon, or 2157 | to inline the method bodies. This is a non-trivial bit of design and 2158 | is out of scope for this RFC. See 2159 | [this comment on an internals thread][cpb] for further thoughts. 2160 | 2161 | [cpb]: https://internals.rust-lang.org/t/partially-borrowed-moved-struct-types/5392/2 2162 | 2163 | **Self-referential structs.** The final limitation we are not fixing 2164 | yet is the inability to have "self-referential structs". That is, you 2165 | cannot have a struct that stores, within itself, an arena and pointers 2166 | into that arena, and then move that struct around. This comes up in a 2167 | number of settings. There are various workarounds: sometimes you can 2168 | use a vector with indices, for example, or 2169 | [the `owning_ref` crate](https://crates.io/crates/owning_ref). The 2170 | latter, when combined with [associated type constructors][ATC], might 2171 | be an adequate solution for some uses cases, actually (it's basically 2172 | a way of modeling "existential lifetimes" in library code). For the 2173 | case of futures especially, [the `?Move` RFC][?Move] proposes another 2174 | lightweight and interesting approach. 2175 | 2176 | [?Move]: https://github.com/rust-lang/rfcs/pull/1858 2177 | 2178 | # Endnotes 2179 | 2180 | 2181 | 2182 | **1.** Scopes always correspond to blocks with one exception: the 2183 | scope of a temporary value is sometimes the enclosing 2184 | statement. 2185 | 2186 | [RFC 396]: https://github.com/rust-lang/rfcs/pull/396 2187 | [RFC 2025]: https://github.com/rust-lang/rfcs/pull/2025 2188 | [smr]: https://github.com/Ericson2314/a-stateful-mir-for-rust 2189 | [10520]: https://github.com/rust-lang/rust/issues/10520 2190 | [ATC]: https://github.com/rust-lang/rfcs/pull/1598 2191 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # nll-rfc 2 | Non-lexical lifetimes RFC. 3 | --------------------------------------------------------------------------------