├── 0000-nonlexical-lifetimes.md
└── README.md


/0000-nonlexical-lifetimes.md:
--------------------------------------------------------------------------------
   1 | - Feature Name: (fill me in with a unique ident, my_awesome_feature)
   2 | - Start Date: (fill me in with today's date, YYYY-MM-DD)
   3 | - RFC PR: (leave this empty)
   4 | - Rust Issue: (leave this empty)
   5 | 
   6 | # Summary
   7 | [summary]: #summary
   8 | 
   9 | Extend Rust's borrow system to support **non-lexical lifetimes** --
  10 | these are lifetimes that are based on the control-flow graph, rather
  11 | than lexical scopes. The RFC describes in detail how to infer these
  12 | new, more flexible regions, and also describes how to adjust our error
  13 | messages. The RFC also describes a few other extensions to the borrow
  14 | checker, the total effect of which is to eliminate many common cases
  15 | where small, function-local code modifications would be required to pass the
  16 | borrow check. (The appendix describes some of the remaining
  17 | borrow-checker limitations that are not addressed by this RFC.)
  18 | 
  19 | # Motivation
  20 | [motivation]: #motivation
  21 | 
  22 | ## What is a lifetime?
  23 | 
  24 | The basic idea of the borrow checker is that values may not be mutated
  25 | or moved while they are borrowed, but how do we know whether a value
  26 | is borrowed? The idea is quite simple: whenever you create a borrow,
  27 | the compiler assigns the resulting reference a **lifetime**. This
  28 | lifetime corresponds to the span of the code where the reference may
  29 | be used. The compiler will infer this lifetime to be the smallest
  30 | lifetime that it can have that still encompasses all the uses of the
  31 | reference.
  32 | 
  33 | Note that Rust uses the term lifetime in a very particular way.  In
  34 | everyday speech, the word lifetime can be used in two distinct -- but
  35 | similar -- ways:
  36 | 
  37 | 1. The lifetime of a **reference**, corresponding to the span of time in
  38 |    which that reference is **used**.
  39 | 2. The lifetime of a **value**, corresponding to the span of time
  40 |    before that value gets **freed** (or, put another way, before the
  41 |    destructor for the value runs).
  42 | 
  43 | This second span of time, which describes how long a value is valid,
  44 | is very important. To distinguish the two, we refer to that
  45 | second span of time as the value's **scope**. Naturally, lifetimes and
  46 | scopes are linked to one another. Specifically, if you make a
  47 | reference to a value, the lifetime of that reference cannot outlive
  48 | the scope of that value. Otherwise, your reference would be pointing
  49 | into freed memory.
  50 | 
  51 | To better see the distinction between lifetime and scope, let's
  52 | consider a simple example. In this example, the vector `data` is
  53 | borrowed (mutably) and the resulting reference is passed to a function
  54 | `capitalize`. Since `capitalize` does not return the reference back,
  55 | the *lifetime* of this borrow will be confined to just that call. The
  56 | *scope* of data, in contrast, is much larger, and corresponds to a
  57 | suffix of the fn body, stretching from the `let` until the end of the
  58 | enclosing scope.
  59 | 
  60 | ```rust
  61 | fn foo() {
  62 |     let mut data = vec!['a', 'b', 'c']; // --+ 'scope
  63 |     capitalize(&mut data[..]);          //   |
  64 | //  ^~~~~~~~~~~~~~~~~~~~~~~~~ 'lifetime //   |
  65 |     data.push('d');                     //   |
  66 |     data.push('e');                     //   |
  67 |     data.push('f');                     //   |
  68 | } // <---------------------------------------+
  69 | 
  70 | fn capitalize(data: &mut [char]) {
  71 |     // do something
  72 | }
  73 | ```
  74 | 
  75 | This example also demonstrates something else. Lifetimes in Rust today
  76 | are quite a bit more flexible than scopes (if not as flexible as we
  77 | might like, hence this RFC):
  78 | 
  79 | - A scope generally corresponds to some block (or, more specifically,
  80 |   a *suffix* of a block that stretches from the `let` until the end of
  81 |   the enclosing block) \[[1](#temporaries)\].
  82 | - A lifetime, in contrast, can also span an individual expression, as
  83 |   this example demonstrates. The lifetime of the borrow in the example
  84 |   is confined to just the call to `capitalize`, and doesn't extend
  85 |   into the rest of the block. This is why the calls to `data.push`
  86 |   that come below are legal.
  87 | 
  88 | So long as a reference is only used within one statement, today's
  89 | lifetimes are typically adequate. Problems arise however when you have
  90 | a reference that spans multiple statements. In that case, the compiler
  91 | requires the lifetime to be the innermost expression (which is often a
  92 | block) that encloses both statements, and that is typically much
  93 | bigger than is really necessary or desired. Let's look at some example
  94 | problem cases. Later on, we'll see how non-lexical lifetimes fix these
  95 | cases.
  96 | 
  97 | ## Problem case #1: references assigned into a variable
  98 | 
  99 | One common problem case is when a reference is assigned into a
 100 | variable. Consider this trivial variation of the previous example,
 101 | where the `&mut data[..]` slice is not passed directly to
 102 | `capitalize`, but is instead stored into a local variable:
 103 | 
 104 | ```rust
 105 | fn bar() {
 106 |     let mut data = vec!['a', 'b', 'c'];
 107 |     let slice = &mut data[..]; // <-+ 'lifetime
 108 |     capitalize(slice);         //   |
 109 |     data.push('d'); // ERROR!  //   |
 110 |     data.push('e'); // ERROR!  //   |
 111 |     data.push('f'); // ERROR!  //   |
 112 | } // <------------------------------+
 113 | ```
 114 | 
 115 | The way that the compiler currently works, assigning a reference into
 116 | a variable means that its lifetime must be as large as the entire
 117 | scope of that variable. In this case, that means the lifetime is now
 118 | extended all the way until the end of the block. This in turn means
 119 | that the calls to `data.push` are now in error, because they occur
 120 | during the lifetime of `slice`. It's logical, but it's annoying.
 121 | 
 122 | In this particular case, you could resolve the problem by putting
 123 | `slice` into its own block:
 124 | 
 125 | ```rust
 126 | fn bar() {
 127 |     let mut data = vec!['a', 'b', 'c'];
 128 |     {
 129 |         let slice = &mut data[..]; // <-+ 'lifetime
 130 |         capitalize(slice);         //   |
 131 |     } // <------------------------------+
 132 |     data.push('d'); // OK
 133 |     data.push('e'); // OK
 134 |     data.push('f'); // OK
 135 | }
 136 | ```
 137 | 
 138 | Since we introduced a new block, the scope of `slice` is now smaller,
 139 | and hence the resulting lifetime is smaller. Introducing a
 140 | block like this is kind of artificial and also not an entirely obvious
 141 | solution.
 142 | 
 143 | ## Problem case #2: conditional control flow
 144 | 
 145 | Another common problem case is when references are used in only one
 146 | given match arm (or, more generally, one control-flow path). This most
 147 | commonly arises around maps. Consider this function, which, given some
 148 | `key`, processes the value found in `map[key]` if it exists, or else
 149 | inserts a default value:
 150 | 
 151 | ```rust
 152 | fn process_or_default() {
 153 |     let mut map = ...;
 154 |     let key = ...;
 155 |     match map.get_mut(&key) { // -------------+ 'lifetime
 156 |         Some(value) => process(value),     // |
 157 |         None => {                          // |
 158 |             map.insert(key, V::default()); // |
 159 |             //  ^~~~~~ ERROR.              // |
 160 |         }                                  // |
 161 |     } // <------------------------------------+
 162 | }
 163 | ```
 164 | 
 165 | This code will not compile today. The reason is that the `map` is
 166 | borrowed as part of the call to `get_mut`, and that borrow must
 167 | encompass not only the call to `get_mut`, but also the `Some` branch
 168 | of the match. The innermost expression that encloses both of these
 169 | expressions is the match itself (as depicted above), and hence the
 170 | borrow is considered to extend until the end of the
 171 | match. Unfortunately, the match encloses not only the `Some` branch,
 172 | but also the `None` branch, and hence when we go to insert into the
 173 | map in the `None` branch, we get an error that the `map` is still
 174 | borrowed.
 175 | 
 176 | This *particular* example is relatively easy to workaround. In many cases,
 177 | one can move the code for `None` out from the `match` like so:
 178 | 
 179 | ```rust
 180 | fn process_or_default1() {
 181 |     let mut map = ...;
 182 |     let key = ...;
 183 |     match map.get_mut(&key) { // -------------+ 'lifetime
 184 |         Some(value) => {                   // |
 185 |             process(value);                // |
 186 |             return;                        // |
 187 |         }                                  // |
 188 |         None => {                          // |
 189 |         }                                  // |
 190 |     } // <------------------------------------+
 191 |     map.insert(key, V::default());
 192 | }
 193 | ```
 194 | 
 195 | When the code is adjusted this way, the call to `map.insert` is not
 196 | part of the match, and hence it is not part of the borrow.  While this
 197 | works, it is unfortunate to require these sorts of
 198 | manipulations, just as it was when we introduced an artificial block
 199 | in the previous example.
 200 | 
 201 | ## Problem case #3: conditional control flow across functions
 202 | 
 203 | While we were able to work around problem case #2 in a relatively
 204 | simple, if irritating, fashion, there are other variations of
 205 | conditional control flow that cannot be so easily resolved. This is
 206 | particularly true when you are returning a reference out of a
 207 | function. Consider the following function, which returns the value for
 208 | a key if it exists, and inserts a new value otherwise (for the
 209 | purposes of this section, assume that the `entry` API for maps does
 210 | not exist):
 211 | 
 212 | ```rust
 213 | fn get_default<'r,K,V:Default>(map: &'r mut HashMap<K,V>,
 214 |                                key: K)
 215 |                                -> &'r mut V {
 216 |     match map.get_mut(&key) { // -------------+ 'r
 217 |         Some(value) => value,              // |
 218 |         None => {                          // |
 219 |             map.insert(key, V::default()); // |
 220 |             //  ^~~~~~ ERROR               // |
 221 |             map.get_mut(&key).unwrap()     // |
 222 |         }                                  // |
 223 |     }                                      // |
 224 | }                                          // v
 225 | ```
 226 | 
 227 | At first glance, this code appears quite similar to the code we saw
 228 | before, and indeed, just as before, it will not compile. In fact,
 229 | the lifetimes at play are quite different. The reason is that, in the
 230 | `Some` branch, the value is being **returned out** to the caller.
 231 | Since `value` is a reference into the map, this implies that the `map`
 232 | will remain borrowed **until some point in the caller** (the point
 233 | `'r`, to be exact). To get a better intuition for what this lifetime
 234 | parameter `'r` represents, consider some hypothetical caller of
 235 | `get_default`: the lifetime `'r` then represents the span of code in
 236 | which that caller will use the resulting reference:
 237 | 
 238 | ```rust
 239 | fn caller() {
 240 |     let mut map = HashMap::new();
 241 |     ...
 242 |     {
 243 |         let v = get_default(&mut map, key); // -+ 'r
 244 |           // +-- get_default() -----------+ //  |
 245 |           // | match map.get_mut(&key) {  | //  |
 246 |           // |   Some(value) => value,    | //  |
 247 |           // |   None => {                | //  |
 248 |           // |     ..                     | //  |
 249 |           // |   }                        | //  |
 250 |           // +----------------------------+ //  |
 251 |         process(v);                         //  |
 252 |     } // <--------------------------------------+
 253 |     ...
 254 | }
 255 | ```
 256 | 
 257 | If we attempt the same workaround for this case that we tried
 258 | in the previous example, we will find that it does not work:
 259 | 
 260 | ```rust
 261 | fn get_default1<'r,K,V:Default>(map: &'r mut HashMap<K,V>,
 262 |                                 key: K)
 263 |                                 -> &'r mut V {
 264 |     match map.get_mut(&key) { // -------------+ 'r
 265 |         Some(value) => return value,       // |
 266 |         None => { }                        // |
 267 |     }                                      // |
 268 |     map.insert(key, V::default());         // |
 269 |     //  ^~~~~~ ERROR (still)                  |
 270 |     map.get_mut(&key).unwrap()             // |
 271 | }                                          // v
 272 | ```
 273 | 
 274 | Whereas before the lifetime of `value` was confined to the match, this
 275 | new lifetime extends out into the caller, and therefore the borrow
 276 | does not end just because we exited the match. Hence it is still in
 277 | scope when we attempt to call `insert` after the match.
 278 | 
 279 | The workaround for this problem is a bit more involved. It relies on
 280 | the fact that the borrow checker uses the precise control-flow of the
 281 | function to determine which borrows are in scope.
 282 | 
 283 | ```rust
 284 | fn get_default2<'r,K,V:Default>(map: &'r mut HashMap<K,V>,
 285 |                                 key: K)
 286 |                                 -> &'r mut V {
 287 |     if map.contains(&key) {
 288 |     // ^~~~~~~~~~~~~~~~~~ 'n
 289 |         return match map.get_mut(&key) { // + 'r
 290 |             Some(value) => value,        // |
 291 |             None => unreachable!()       // |
 292 |         };                               // v
 293 |     }
 294 | 
 295 |     // At this point, `map.get_mut` was never
 296 |     // called! (As opposed to having been called,
 297 |     // but its result no longer being in use.)
 298 |     map.insert(key, V::default()); // OK now.
 299 |     map.get_mut(&key).unwrap()
 300 | }
 301 | ```
 302 | 
 303 | What has changed here is that we moved the call to `map.get_mut`
 304 | inside of an `if`, and we have set things up so that the if body
 305 | unconditionally returns. What this means is that a borrow begins at
 306 | the point of `get_mut`, and that borrow lasts until the point `'r` in
 307 | the caller, but the borrow checker can see that this borrow *will not
 308 | have even started* outside of the `if`. It does not consider the
 309 | borrow in scope at the point where we call `map.insert`.
 310 | 
 311 | This workaround is more troublesome than the others, because the
 312 | resulting code is actually less efficient at runtime, since it must do
 313 | multiple lookups.
 314 | 
 315 | It's worth noting that Rust's hashmaps include an `entry` API that
 316 | one could use to implement this function today. The resulting code is
 317 | both nicer to read and more efficient even than the original version,
 318 | since it avoids extra lookups on the "not present" path as well:
 319 | 
 320 | ```rust
 321 | fn get_default3<'r,K,V:Default>(map: &'r mut HashMap<K,V>,
 322 |                                 key: K)
 323 |                                 -> &'r mut V {
 324 |     map.entry(key)
 325 |        .or_insert_with(|| V::default())
 326 | }
 327 | ```
 328 | 
 329 | Regardless, the problem exists for other data structures besides
 330 | `HashMap`, so it would be nice if the original code passed the borrow
 331 | checker, even if in practice using the `entry` API would be
 332 | preferable. (Interestingly, the limitation of the borrow checker here
 333 | was one of the motivations for developing the `entry` API in the first
 334 | place!)
 335 | 
 336 | ## Problem case #4: mutating `&mut` references
 337 | 
 338 | The current borrow checker forbids reassigning an `&mut` variable `x`
 339 | when the referent (`*x`) has been borrowed. This most commonly arises
 340 | when writing a loop that progressively "walks down" a data structure.
 341 | Consider this function, which converts a linked list `&mut List<T>`
 342 | into a `Vec<&mut T>`:
 343 | 
 344 | ```rust
 345 | struct List<T> {
 346 |     value: T,
 347 |     next: Option<Box<List<T>>>,
 348 | }
 349 | 
 350 | fn to_refs<T>(mut list: &mut List<T>) -> Vec<&mut T> {
 351 |     let mut result = vec![];
 352 |     loop {
 353 |         result.push(&mut list.value);
 354 |         if let Some(n) = list.next.as_mut() {
 355 |             list = &mut n;
 356 |         } else {
 357 |             return result;
 358 |         }
 359 |     }
 360 | }
 361 | ```
 362 | 
 363 | If we attempt to compile this, we get an error (actually, we get
 364 | multiple errors):
 365 | 
 366 | ```
 367 | error[E0506]: cannot assign to `list` because it is borrowed
 368 |   --> /Users/nmatsakis/tmp/x.rs:11:13
 369 |    |
 370 | 9  |         result.push(&mut list.value);
 371 |    |                          ---------- borrow of `list` occurs here
 372 | 10 |         if let Some(n) = list.next.as_mut() {
 373 | 11 |             list = n;
 374 |    |             ^^^^^^^^ assignment to borrowed `list` occurs here
 375 | ```
 376 | 
 377 | Specifically, what's gone wrong is that we borrowed `list.value` (or,
 378 | more explicitly, `(*list).value`). The current borrow checker enforces
 379 | the rule that when you borrow a path, you cannot assign to that path
 380 | or any prefix of that path. In this case, that means you cannot assign to any
 381 | of the following:
 382 | 
 383 | - `(*list).value`
 384 | - `*list`
 385 | - `list`
 386 | 
 387 | As a result, the `list = n` assignment is forbidden. These rules make
 388 | sense in some cases (for example, if `list` were of type `List<T>`,
 389 | and not `&mut List<T>`, then overwriting `list` would also overwrite
 390 | `list.value`), but not in the case where we cross a mutable reference.
 391 | 
 392 | As described in [Issue #10520][10520], there exist various workarounds
 393 | for this problem. One trick is to move the `&mut` reference into a
 394 | temporary variable that you won't have to modify:
 395 | 
 396 | ```rust
 397 | fn to_refs<T>(mut list: &mut List<T>) -> Vec<&mut T> {
 398 |     let mut result = vec![];
 399 |     loop {
 400 |         let list1 = list;
 401 |         result.push(&mut list1.value);
 402 |         if let Some(n) = list1.next.as_mut() {
 403 |             list = &mut n;
 404 |         } else {
 405 |             return result;
 406 |         }
 407 |     }
 408 | }
 409 | ```
 410 | 
 411 | When you frame the program this way, the borrow checker sees that
 412 | `(*list1).value` is borrowed (not `list`). This does not prevent us
 413 | from later assigning to `list`.
 414 | 
 415 | Clearly this workaround is annoying. The problem here, it turns out,
 416 | is not specific to non-lexical lifetimes per se. Rather, it is that
 417 | the rules which the borrow checker enforces when a path is borrowed
 418 | are too strict and do not account for the indirection inherent in a
 419 | borrowed reference. This RFC proposes a tweak to address that.
 420 | 
 421 | ## The rough outline of our solution
 422 | 
 423 | This RFC proposes a more flexible model for lifetimes. Whereas
 424 | previously lifetimes were based on the abstract syntax tree, we now
 425 | propose lifetimes that are defined via the control-flow graph. More
 426 | specifically, lifetimes will be derived based on the [MIR][MIR-details]
 427 | used internally in the compiler.
 428 | 
 429 | [MIR-details]: https://blog.rust-lang.org/2016/04/19/MIR.html
 430 | 
 431 | Intuitively, in the new proposal, the lifetime of a reference lasts
 432 | only for those portions of the function in which the reference may
 433 | later be used (where the reference is **live**, in compiler
 434 | speak). This can range from a few sequential statements (as in problem
 435 | case #1) to something more complex, such as covering one arm in a
 436 | match but not the others (problem case #2).
 437 | 
 438 | However, in order to sucessfully type the full range of examples that
 439 | we would like, we have to go a bit further than just changing
 440 | lifetimes to a portion of the control-flow graph. **We also have to
 441 | take location into account when doing subtyping checks**. This is in
 442 | contrast to how the compiler works today, where subtyping relations
 443 | are "absolute". That is, in the current compiler, the type `&'a ()` is
 444 | a subtype of the type `&'b ()` whenever `'a` outlives `'b` (`'a: 'b`),
 445 | which means that `'a` corresponds to a bigger portion of the function.
 446 | Under this proposal, subtyping can instead be established **at a
 447 | particular point P**. In that case, the lifetime `'a` must only
 448 | outlive those portions of `'b` that are reachable from P.
 449 | 
 450 | The ideas in this RFC have been implemented in
 451 | [prototype form][proto]. This prototype includes a simplified
 452 | control-flow graph that allows one to create the various kinds of
 453 | region constraints that can arise and implements the region inference
 454 | algorithm which then solves those constraints.
 455 | 
 456 | [proto]: https://github.com/nikomatsakis/nll
 457 | 
 458 | # Detailed design
 459 | [design]: #detailed-design
 460 | 
 461 | ## Layering the design
 462 | 
 463 | We describe the design in "layers":
 464 | 
 465 | 1. Initially, we will describe a basic design focused on control-flow
 466 |    within one function.
 467 | 2. Next, we extend the control-flow graph to better handle infinite loops.
 468 | 3. Next, we extend the design to handle dropck, and specifically the
 469 |    `#[may_dangle]` attribute introduced by RFC 1327.
 470 | 4. Next, we will extend the design to consider named lifetime parameters,
 471 |    like those in problem case 3.
 472 | 5. Finally, we give a brief description of the borrow checker.
 473 | 
 474 | ## Layer 0: Definitions
 475 | 
 476 | Before we can describe the design, we have to define the terms that we
 477 | will be using. The RFC is defined in terms of a simplified version of
 478 | MIR, eliding various details that don't introduce fundamental
 479 | complexity.
 480 | 
 481 | **Lvalues**. A MIR "lvalue" is a path that leads to a memory location.
 482 | The full MIR Lvalues are defined [via a Rust enum][lvaluecode] and
 483 | contain a number of knobs, most of which are not relevant for this RFC.
 484 | We will present a simplified form of lvalues for now:
 485 | 
 486 | ```
 487 | LV = x       // local variable
 488 |    | LV.f    // field access
 489 |    | *LV     // deref
 490 | ```
 491 | 
 492 | The precedence of `*` is low, so `*a.b.c` will deref `a.b.c`; to deref
 493 | just `a`, one would write `(*a).b.c`.
 494 | 
 495 | **Prefixes.** We say that the prefixes of an lvalue are all the
 496 | lvalues you get by stripping away fields and derefs. The prefixes
 497 | of `*a.b` would be `*a.b`, `a.b`, and `a`.
 498 | 
 499 | [lvaluecode]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L839-L851
 500 | 
 501 | **Control-flow graph.** MIR is organized into a
 502 | [control-flow graph][cfg] rather than an abstract syntax tree. It is
 503 | created in the compiler by transforming the "HIR" (high-level IR). The
 504 | MIR CFG consists of a set of [basic blocks][bbdata]. Each basic block
 505 | has a series of [statements][stmt] and a
 506 | [terminator][term]. Statements that concern us in this RFC fall into
 507 | three categories:
 508 | 
 509 | - assignments like `x = y`; the RHS of such an assignment is called an
 510 |   [rvalue][]. There are no compound rvalues, and hence each statement
 511 |   is a discrete action that executes instantaneously. For example, the
 512 |   Rust expression `a = b + c + d` would be compiled into two MIR
 513 |   instructions, like `tmp0 = b + c; a = tmp0 + d;`.
 514 | - `drop(lvalue)` deallocates an lvalue, if there is a value in it; in the
 515 |   limit, this requires runtime checks (a pass in mir, called elaborate drops,
 516 |   performs this transformation).
 517 | - `StorageDead(x)` deallocates the stack storage for `x`. These are used by LLVM to allow
 518 |   stack-allocated values to use the same stack slot (if their live storage ranges are disjoint).
 519 |   [Ralf Jung's recent blog post has more details.][rjung-sd]
 520 | 
 521 | [rjung-sd]: https://www.ralfj.de/blog/2017/06/06/MIR-semantics.html
 522 | [rvalue]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L1037-L1071
 523 | [bbdata]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L443-L463
 524 | [stmt]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L774-L814
 525 | [term]: https://github.com/rust-lang/rust/blob/bf0a9e0b4d3a4dd09717960840798e2933ec7568/src/librustc/mir/mod.rs#L465-L552
 526 | [cfg]: https://en.wikipedia.org/wiki/Control_flow_graph
 527 | 
 528 | ## Layer 1: Control-flow within a function
 529 | 
 530 | ### Running Example
 531 | 
 532 | We will explain the design with reference to a running example, called
 533 | **Example 4**. After presenting the design, we will apply it to the three
 534 | problem cases, as well as a number of other interesting examples.
 535 | 
 536 | ```rust
 537 | let mut foo: T = ...;
 538 | let mut bar: T = ...;
 539 | let p: &T;
 540 | 
 541 | p = &foo;
 542 | // (0)
 543 | if condition {
 544 |     print(*p);
 545 |     // (1)
 546 |     p = &bar;
 547 |     // (2)
 548 | }
 549 | // (3)
 550 | print(*p);
 551 | // (4)
 552 | ```
 553 | 
 554 | The key point of this example is that the variable `foo` should only
 555 | be considered borrowed at points 0 and 3, but not point 1. `bar`,
 556 | in contrast, should be considered borrowed at points 2 and 3. Neither
 557 | of them need to be considered borrowed at point 4, as the reference `p`
 558 | is not used there.
 559 | 
 560 | We can convert this example into the control-flow graph that follows.
 561 | Recall that a control-flow graph in MIR consists of basic blocks
 562 | containing a list of discrete statements and a trailing terminator:
 563 | 
 564 | ```
 565 | // let mut foo: i32;
 566 | // let mut bar: i32;
 567 | // let p: &i32;
 568 | 
 569 | A
 570 | [ p = &foo     ]
 571 | [ if condition ] ----\ (true)
 572 |        |             |
 573 |        |     B       v
 574 |        |     [ print(*p)     ]
 575 |        |     [ ...           ]
 576 |        |     [ p = &bar      ]
 577 |        |     [ ...           ]
 578 |        |     [ goto C        ]
 579 |        |             |
 580 |        +-------------/
 581 |        |
 582 | C      v
 583 | [ print(*p)    ]
 584 | [ return       ]
 585 | ```
 586 | 
 587 | We will use a notation like `Block/Index` to refer to a specific
 588 | statement or terminate in the control-flow graph. `A/0` and `B/4`
 589 | refer to `p = &foo` and `goto C`, respectively.
 590 | 
 591 | ### What is a lifetime and how does it interact with the borrow checker
 592 | 
 593 | To start with, we will consider lifetimes as a **set of points in the
 594 | control-flow graph**; later in the RFC we will extend the domain of
 595 | these sets to include "skolemized" lifetimes, which correspond to
 596 | named lifetime parameters declared on a function. If a lifetime
 597 | contains the point P, that implies that references with that lifetime
 598 | are valid on entry to P. Lifetimes appear in various places in the MIR
 599 | representation:
 600 | 
 601 | - The types of variables (and temporaries, etc) may contain lifetimes.
 602 | - Every borrow expression has a designated lifetime.
 603 | 
 604 | We can extend our example 4 to include explicit lifetime names. There
 605 | are three lifetimes that result. We will call them `'p`, `'foo`, and
 606 | `'bar`:
 607 | 
 608 | ```rust
 609 | let mut foo: T = ...;
 610 | let mut bar: T = ...;
 611 | let p: &'p T;
 612 | //      --
 613 | p = &'foo foo;
 614 | //   ----
 615 | if condition {
 616 |     print(*p);
 617 |     p = &'bar bar;
 618 |     //   ----
 619 | }
 620 | print(*p);
 621 | ```
 622 | 
 623 | As you can see, the lifetime `'p` is part of the type of the variable
 624 | `p`. It indicates the portions of the control-flow graph where `p` can
 625 | safely be dereferenced. The lifetimes `'foo` and `'bar` are different:
 626 | they refer to the lifetimes for which `foo` and `bar` are borrowed,
 627 | respectively.
 628 | 
 629 | Lifetimes attached to a borrow expression, like `'foo` and `'bar`, are
 630 | important to the borrow checker. Those correspond to the portions of
 631 | the control-flow graph in which the borrow checker will enforce its
 632 | restrictions. In this case, since both borrows are shared borrows
 633 | (`&`), the borrow checker will prevent `foo` from being modified
 634 | during `'foo` and it will prevent `bar` from being modified during
 635 | `'bar`. If these had been mutable borrows (`&mut`), the borrow checker
 636 | would have prevented **all** access to `foo` and `bar` during those
 637 | lifetimes.
 638 | 
 639 | There are many valid choices one could make for `'foo` and `'bar`.
 640 | This RFC however describes an inference algorithm that aims to pick
 641 | the **minimal** lifetimes for each borrow which could possibly work.
 642 | This corresponds to imposing the fewest restrictions we can.
 643 | 
 644 | In the case of example 4, therefore, we wish our algorithm to compute
 645 | that `'foo` is `{A/1, B/0, C/0}`, which notably excludes the points B/1
 646 | through B/4. `'bar` should be inferred to the set `{B/3, B/4,
 647 | C/0}`. The lifetime `'p` will be the union of `'foo` and `'bar`, since
 648 | it contains all the points where the variable `p` is valid.
 649 | 
 650 | ### Lifetime inference constraints
 651 | 
 652 | The inference algorithm works by analyzing the MIR and creating a
 653 | series of **constraints**. These constraints obey the following
 654 | grammar:
 655 | 
 656 | ```
 657 | // A constraint set C:
 658 | C = true
 659 |   | C, (L1: L2) @ P    // Lifetime L1 outlives Lifetime L2 at point P
 660 | 
 661 | // A lifetime L:
 662 | L = 'a
 663 |   | {P}
 664 | ```
 665 | 
 666 | Here the terminal `P` represents a point in the control-flow graph,
 667 | and the notation `'a` refers to some named lifetime inference variable
 668 | (e.g., `'p`, `'foo` or `'bar`).
 669 | 
 670 | Once the constraints are created, the **inference algorithm** solves
 671 | the constraints. This is done via fixed-point iteration: each
 672 | lifetime variable begins as an empty set and we iterate over the
 673 | constaints, repeatedly growing the lifetimes until they are big enough
 674 | to satisfy all constraints.
 675 | 
 676 | (If you'd like to compare this to the prototype code, the file
 677 | [`regionck.rs`] is responsible for creating the constraints, and
 678 | [`infer.rs`] is responsible for solving them.)
 679 | 
 680 | [`regionck.rs`]: https://github.com/nikomatsakis/nll/blob/master/nll/src/regionck.rs
 681 | [`infer.rs`]: https://github.com/nikomatsakis/nll/blob/master/nll/src/infer.rs
 682 | 
 683 | ### Liveness
 684 | 
 685 | One key ingredient to understanding how NLL should work is
 686 | understanding **liveness**. The term "liveness" derives from compiler
 687 | analysis, but it's fairly intuitive. We say that **a variable is live
 688 | if the current value that it holds may be used later**. This is very
 689 | important to Example 4:
 690 | 
 691 | ```rust
 692 | let mut foo: T = ...;
 693 | let mut bar: T = ...;
 694 | let p: &'p T = &foo;
 695 | // `p` is live here: its value may be used on the next line.
 696 | if condition {
 697 |     // `p` is live here: its value will be used on the next line.
 698 |     print(*p);
 699 |     // `p` is DEAD here: its value will not be used.
 700 |     p = &bar;
 701 |     // `p` is live here: its value will be used later.
 702 | }
 703 | // `p` is live here: its value may be used on the next line.
 704 | print(*p);
 705 | // `p` is DEAD here: its value will not be used.
 706 | ```
 707 | 
 708 | Here you see a variable `p` that is assigned in the beginning of the
 709 | program, and then maybe re-assigned during the `if`. The key point is
 710 | that `p` becomes **dead** (not live) in the span before it is
 711 | reassigned.  This is true even though the variable `p` will be used
 712 | again, because the **value** that is in `p` will not be used.
 713 | 
 714 | Traditional compiler compute liveness based on variables, but we wish
 715 | to compute liveness for **lifetimes**. We can extend a variable-based
 716 | analysis to lifetimes by saying that a lifetime L is live at a point P
 717 | if there is some variable `p` which is live at P, and L appears in the
 718 | type of `p`. (Later on, when we cover the dropck, we will use a more
 719 | selective notion of liveness for lifetimes in which *some* of the
 720 | lifetimes in a variable's type may be live while others are not.) So,
 721 | in our running example, the lifetime `'p` would be live at precisely
 722 | the same points that `p` is live. The lifetimes `'foo` and `'bar` have
 723 | no points where they are (directly) live, since they do not appear in
 724 | the types of any variables.
 725 | 
 726 |  * However, this does not mean these lifetimes are irrelevant; as
 727 |    shown below, subtyping constraints introduced by subsequent
 728 |    analyses will eventually require `'foo` and `'bar` to *outlive*
 729 |    `'p`.
 730 | 
 731 | #### Liveness-based constraints for lifetimes
 732 | 
 733 | The first set of constraints that we generate are derived from
 734 | liveness. Specifically, if a lifetime L is live at the point P,
 735 | then we will introduce a constraint like:
 736 | 
 737 |     (L: {P}) @ P
 738 | 
 739 | (As we'll see later when we cover solving constraints, this constraint
 740 | effectively just inserts `P` into the set for `L`. In fact, the
 741 | prototype doesn't bother to materialize such constraints, instead just
 742 | immediately inserting `P` into `L`.)
 743 | 
 744 | For our running example, this means that we would introduce the following
 745 | liveness constraints:
 746 | 
 747 |     ('p: {A/1}) @ A/1
 748 |     ('p: {B/0}) @ B/0
 749 |     ('p: {B/3}) @ B/3
 750 |     ('p: {B/4}) @ B/4
 751 |     ('p: {C/0}) @ C/0
 752 | 
 753 | ### Subtyping
 754 | 
 755 | Whenever references are copied from one location to another, the Rust
 756 | subtyping rules require that the lifetime of the source reference
 757 | **outlives** the lifetime of the target location. As discussed
 758 | earlier, in this RFC, we extend the notion of subtyping to be
 759 | **location-aware**, meaning that we take into account the point where
 760 | the value is being copied.
 761 | 
 762 | For example, at the point A/0, our running example contains a borrow
 763 | expression `p = &'foo foo`. In this case, the borrow expression will
 764 | produce a reference of type `&'foo T`, where `T` is the type of
 765 | `foo`. This value is then assigned to `p`, which has the type `&'p T`.
 766 | Therefore, we wish to require that `&'foo T` be a subtype of `&'p T`.
 767 | Moreover, this relation needs to hold at the point A/1 -- the
 768 | **successor** of the point A/0 where the assignment occurs (this is
 769 | because the new value of `p` is first visible in A/1). We write that
 770 | subtyping constraint as follows:
 771 | 
 772 |     (&'foo T <: &'p T) @ A/1
 773 | 
 774 | The standard Rust subtyping rules (two examples of which are given
 775 | below) can then "break down" this subtyping rule into the lifetime
 776 | constraints we need for inference:
 777 | 
 778 |     (T_a <: T_b) @ P
 779 |     ('a: 'b) @ P      // <-- a constraint for our inference algorithm
 780 |     ------------------------
 781 |     (&'a T_a <: &'b T_b) @ P
 782 | 
 783 |     (T_a <: T_b) @ P
 784 |     (T_b <: T_a) @ P  // (&mut T is invariant)
 785 |     ('a: 'b) @ P      // <-- another constraint
 786 |     ------------------------
 787 |     (&'a mut T_a <: &'b mut T_b) @ P
 788 | 
 789 | In the case of our running example, we generate the following subtyping
 790 | constraints:
 791 | 
 792 |     (&'foo T <: &'p T) @ A/1
 793 |     (&'bar T <: &'p T) @ B/3
 794 | 
 795 | These can be converted into the following lifetime constraints:
 796 | 
 797 |     ('foo: 'p) @ A/1
 798 |     ('bar: 'p) @ B/3
 799 | 
 800 | ### Reborrow constraints
 801 | 
 802 | There is one final source of constraints. It frequently happens that we
 803 | have a borrow expression that "reborrows" the referent of an
 804 | existing reference:
 805 | 
 806 |     let x: &'x i32 = ...;
 807 |     let y: &'y i32 = &*x;
 808 | 
 809 | In such cases, there is a connection between the lifetime `'y` of the
 810 | borrow and the lifetime `'x` of the original reference. In particular,
 811 | `'x` must outlive `'y` (`'x: 'y`). In simple cases like this, the
 812 | relationship is the same regardless of whether the original reference
 813 | `x` is a shared (`&`) or mutable (`&mut`) reference. However, in more
 814 | complex cases that involve multiple dereferences, the treatment is
 815 | different.
 816 | 
 817 | **Supporting prefixes.** To define the reborrow constraints, we first
 818 | introduce the idea of supporting prefixes -- this definition will be
 819 | useful in a few places. The *supporting prefixes* for an lvalue are
 820 | formed by stripping away fields and derefs, except that we stop when
 821 | we reach the deref of a shared reference. Inituitively, shared
 822 | references are different because they are `Copy` -- and hence one
 823 | could always copy the shared reference into a temporary and get an
 824 | equivalent path. Here are some examples of supporting prefixes:
 825 | 
 826 | ```
 827 | let r: (&(i32, i64), (f32, f64));
 828 | 
 829 | // The path (*r.0).1 has type `i64` and supporting prefixes:
 830 | // - (*r.0).1
 831 | // - *r.0
 832 | 
 833 | // The path r.1.0 has type `f32` and supporting prefixes:
 834 | // - r.1.0
 835 | // - r.1
 836 | // - r
 837 | 
 838 | let m: (&mut (i32, i64), (f32, f64));
 839 | 
 840 | // The path (*m.0).1 has type `i64` and supporting prefixes:
 841 | // - (*m.0).1
 842 | // - *m.0
 843 | // - m.0
 844 | // - m
 845 | ```
 846 | 
 847 | **Reborrow constraints.** Consider the case where we have a borrow
 848 | (shared or mutable) of some lvalue `lv_b` for the lifetime `'b`:
 849 | 
 850 |     lv_l = &'b lv_b      // or:
 851 |     lv_l = &'b mut lv_b
 852 | 
 853 | In that case, we compute the supporting prefixes of `lv_b`, and find
 854 | every deref lvalue `*lv` in the set where `lv` is a reference with
 855 | lifetime `'a`. We then add a constraint `('a: 'b) @ P`, where `P` is
 856 | the point following the borrow (that's the point where the borrow
 857 | takes effect).
 858 | 
 859 | Let's look at some examples. In each case, we will link to the
 860 | corresponding test from the prototype implementation.
 861 | 
 862 | [**Example 1.**][bck-rvwbi] To see why this rule is needed, let's
 863 | first consider a simple example involving a single reference:
 864 | 
 865 | [bck-rvwbi]: https://github.com/nikomatsakis/nll/blob/master/test/borrowck-read-variable-while-borrowed-indirect.nll
 866 | 
 867 | ```rust
 868 | let mut foo: i32     = 22;
 869 | let r_a: &'a mut i32 = &'a mut foo;
 870 | let r_b: &'b mut i32 = &'b mut *r_a;
 871 | ...
 872 | use(r_b);
 873 | ```
 874 | 
 875 | In this case, the supporting prefixes of `*r_a` are `*r_a` and `r_a`
 876 | (because `r_a` is a mutable reference, we recurse). Only one of those,
 877 | `*r_a`, is a deref lvalue, and the reference `r_a` being dereferenced
 878 | has the lifetime `'a`. We would add the constraint that `'a: 'b`,
 879 | thus ensuring that `foo` is considered borrowed so long as `r_b` is in
 880 | use. Without this constraint, the lifetime `'a` would end after the
 881 | second borrow, and hence `foo` would be considered unborrowed, even
 882 | though `*r_b` could still be used to access `foo`.
 883 | 
 884 | [**Example 2.**][bck-wvare] Consider now a case with a double indirection:
 885 | 
 886 | [bck-wvare]: https://github.com/nikomatsakis/nll/blob/master/test/borrowck-write-variable-after-ref-extracted.nll
 887 | 
 888 | ```rust
 889 | let mut foo: i32     = 22;
 890 | let mut r_a: &'a i32 = &'a foo;
 891 | let r_b: &'b &'a i32 = &'b r_a;
 892 | let r_c: &'c i32     = &'c **r_b;
 893 | // What is considered borrowed here?
 894 | use(r_c);
 895 | ```
 896 | 
 897 | Just as before, it is important that, so long as `r_c` is in use,
 898 | `foo` is considered borrowed. However, what about the variable `r_a`:
 899 | should *it* considered borrowed? The answer is no: once `r_c` is
 900 | initialized, the value of `r_a` is no longer important, and it would
 901 | be fine to (for example) overwrite `r_a` with a new value, even as
 902 | `foo` is still considered borrowed. This result falls out from our
 903 | reborrowing rules: the supporting paths of `**r_b` is just `**r_b`.
 904 | We do not add any more paths because this path is already a
 905 | dereference of `*r_b`, and `*r_b` has (shared reference) type `&'a
 906 | i32`. Therefore, we would add one reborrow constraint: that `'a: 'c`.
 907 | This constraint ensures that as long as `r_c` is in use, the borrow of
 908 | `foo` remains in force, but the borrow of `r_a` (which has the
 909 | lifetime `'b`) can expire.
 910 | 
 911 | [**Example 3.**][bck-rrwrmb] The previous example showed how a borrow
 912 | of a shared reference can expire once it has been dereferenced. With
 913 | mutable references, however, this is not safe. Consider the following example:
 914 | 
 915 | [bck-rrwrmb]: https://github.com/nikomatsakis/nll/blob/master/test/borrowck-read-ref-while-referent-mutably-borrowed.nll
 916 | 
 917 | ```rust
 918 | let foo = Foo { ... };
 919 | let p: &'p mut Foo = &mut foo;
 920 | let q: &'q mut &'p mut Foo = &mut p;
 921 | let r: &'r mut Foo = &mut **q;
 922 | use(*p); // <-- This line should result in an ERROR
 923 | use(r);
 924 | ```
 925 | 
 926 | The key point here is that we create a reference `r` by reborrowing
 927 | `**q`; `r` is then later used in the final line of the program. This
 928 | use of `r` must extend the lifetime of the borrows used to create
 929 | *both* `p` *and* `q`. Otherwise, one could access (and mutate) the
 930 | same memory through both `*r` and `*p`. (In fact, the real rustc did
 931 | in its early days have a soundness bug much like this one.)
 932 | 
 933 | Because dereferencing a mutable reference does not stop the supporting
 934 | prefixes from being enumerated, the supporting prefixes of `**q` are
 935 | `**q`, `*q`, and `q`. Therefore, we add two reborrow constraints: `'q:
 936 | 'r` and `'p: 'r`, and hence both borrows are indeed considered in
 937 | scope at the line in question.
 938 | 
 939 | As an alternate way of looking at the previous example, consider it
 940 | like this. To create the mutable reference `p`, we get a "lock" on
 941 | `foo` (that lasts so long as `p` is in use). We then take a lock on
 942 | the mutable reference `p` to create `q`; this lock must last for as
 943 | long as `q` is in use. When we create `r` by borrowing `**q`, that is
 944 | the last direct use of `q` -- so you might think we can release the
 945 | lock on `p`, since `q` is no longer in (direct) use. However, that
 946 | would be unsound, since then `r` and `*p` could both be used to access
 947 | the same memory. The key is to recognize that `r` represents an
 948 | indirect use of `q` (and `q` in turn is an indirect use of `p`), and
 949 | hence so long as `r` is in use, `p` and `q` must also be considered "in
 950 | use" (and hence their "locks" still enforced).
 951 | 
 952 | ### Solving constraints
 953 | 
 954 | Once the constraints are created, the **inference algorithm** solves
 955 | the constraints. This is done via fixed-point iteration: each
 956 | lifetime variable begins as an empty set and we iterate over the
 957 | constaints, repeatedly growing the lifetimes until they are big enough
 958 | to satisfy all constraints.
 959 | 
 960 | The meaning of a constraint like `('a: 'b) @ P` is that, starting from
 961 | the point P, the lifetime `'a` must include all points in `'b` that
 962 | are reachable from the point P. The implementation
 963 | [does a depth-first search starting from P][dfs]; the search stops if
 964 | we exit the lifetime `'b`. Otherwise, for each point we find, we add
 965 | it to `'a`.
 966 | 
 967 | In our example, the full set of constraints is:
 968 | 
 969 |     ('foo: 'p) @ A/1
 970 |     ('bar: 'p) @ B/3
 971 |     ('p: {A/1}) @ A/1
 972 |     ('p: {B/0}) @ B/0
 973 |     ('p: {B/3}) @ B/3
 974 |     ('p: {B/4}) @ B/4
 975 |     ('p: {C/0}) @ C/0
 976 | 
 977 | Solving these constraints results in the following lifetimes,
 978 | which are precisely the answers we expected:
 979 | 
 980 |     'p   = {A/1, B/0, B/3, B/4, C/0}
 981 |     'foo = {A/1, B/0, C/0}
 982 |     'bar = {B/3, B/4, C/0}
 983 | 
 984 | [dfs]: https://github.com/nikomatsakis/nll/blob/1cff361c9aeb6f553b528078866f5717f1872dad/nll/src/infer.rs#L71-L113
 985 | 
 986 | ### Intuition for why this algorithm is correct
 987 | 
 988 | For the algorithm to be correct, there is a critical invariant that we
 989 | must maintain. Consider some path H that is borrowed with lifetime L
 990 | at a point P to create a reference R; this reference R (or some
 991 | copy/move of it) is then later dereferenced at some point Q.
 992 | 
 993 | We must ensure that the reference has not been invalidated: this means
 994 | that the memory which was borrowed must not have been freed by the
 995 | time we reach Q. If the reference R is a shared reference (`&T`), then
 996 | the memory must also not have been written (modulo `UnsafeCell`). If
 997 | the reference R is a mutable reference (`&mut T`), then the memory
 998 | must not have been accessed at all, except through the reference R.
 999 | **To guarantee these properties, we must prevent actions that might
1000 | affect the borrowed memory for all of the points between P (the
1001 | borrow) and Q (the use).**
1002 | 
1003 | This means that L must at least include all the points between P and
1004 | Q. There are two cases to consider. First, the case where the access
1005 | at point Q occurs through the same reference R that was created by
1006 | the borrow:
1007 | 
1008 |     R = &H; // point P
1009 |     ...
1010 |     use(R); // point Q
1011 | 
1012 | In this case, the variable R will be **live** on all the points
1013 | between P and Q. The liveness-based rules suffice for this case:
1014 | specifically, because the type of R includes the lifetime L, we know
1015 | that L must include all the points between P and Q, since R is live
1016 | there.
1017 | 
1018 | The second case is when the memory referenced by R is accessed, but
1019 | through an alias (or move):
1020 | 
1021 |     R = &H;  // point P
1022 |     R2 = R;  // last use of R, point A
1023 |     ...
1024 |     use(R2); // point Q
1025 | 
1026 | In this case, the liveness rules alone do not suffice. The problem is
1027 | that the `R2 = R` assignment may well be the last use of R, and so the
1028 | **variable** R is dead at this point. However, the *value* in R will
1029 | still be dereferenced later (through R2), and hence we want the
1030 | lifetime L to include those points. This is where the **subtyping
1031 | constraints** come into play: the type of R2 includes a lifetime L2,
1032 | and the assignment `R2 = R` will establish an outlives constraint `(L:
1033 | L2) @ A` between L and L2. Moreover, this new variable R2 must be
1034 | live between the assignment and the ultimate use (that is, along the
1035 | path A...Q). Putting these two facts together, we see that L will
1036 | ultimately include the points from P to A (because of the liveness of
1037 | R) and the points from A to Q (because the subtyping requirement
1038 | propagates the liveness of R2).
1039 | 
1040 | Note that it is possible for these lifetimes to have gaps. This can occur
1041 | when the same variable is used and overwritten multiple times:
1042 | 
1043 |     let R: &L i32;
1044 |     let R2: &L2 i32;
1045 | 
1046 |     R = &H1; // point P1
1047 |     R2 = R;  // point A1
1048 |     use(R2); // point Q1
1049 |     ...
1050 |     R2 = &H2; // point P2
1051 |     use(R2);  // point Q2
1052 | 
1053 | In this example, the liveness constraints on R2 will ensure that L2
1054 | (the lifetime in its type) includes Q1 and Q2 (because R2 is live at
1055 | those two points), but not the "..." nor the points P1 or P2. Note
1056 | that the subtyping relationship (`(L: L2) @ A1)`) at A1 here ensures
1057 | that L also includes Q1, but doesn't require that L includes Q2 (even
1058 | though L2 has point Q2). This is because the value in R2 at Q2 cannot
1059 | have come from the assignment at A1; if it could have done, then
1060 | either R2 would have to be live between A1 and Q2 or else there would
1061 | be a subtyping constraint.
1062 | 
1063 | ### Other examples
1064 | 
1065 | Let us work through some more examples. We begin with problem cases #1
1066 | and #2 (problem case #3 will be covered after we cover named lifetimes
1067 | in a later section).
1068 | 
1069 | #### Problem case #1.
1070 | 
1071 | Translated into MIR, the example will look roughly as follows:
1072 | 
1073 | ```rust
1074 | let mut data: Vec<i32>;
1075 | let slice: &'slice mut i32;
1076 | START {
1077 |     data = ...;
1078 |     slice = &'borrow mut data;
1079 |     capitalize(slice);
1080 |     data.push('d');
1081 |     data.push('e');
1082 |     data.push('f');
1083 | }
1084 | ```
1085 | 
1086 | The constraints generated will be as follows:
1087 | 
1088 |     ('slice: {START/2}) @ START/2
1089 |     ('borrow: 'slice) @ START/2
1090 | 
1091 | Both `'slice` and `'borrow` will therefore be inferred to START/2, and
1092 | hence the accesses to `data` in START/3 and the following statements
1093 | are permitted.
1094 | 
1095 | #### Problem case #2.
1096 | 
1097 | Translated into MIR, the example will look roughly as follows (some
1098 | irrelevant details are elided). Note that the `match` statement is
1099 | translated into a SWITCH, which tests the variant, and a "downcast",
1100 | which lets us extract the contents out from the `Some` variant (this
1101 | operation is specific to MIR and has no Rust equivalent, other than as
1102 | part of a match).
1103 | 
1104 | ```
1105 | let map: HashMap<K,V>;
1106 | let key: K;
1107 | let tmp0: &'tmp0 mut HashMap<K,V>;
1108 | let tmp1: &K;
1109 | let tmp2: Option<&'tmp2 mut V>;
1110 | let value: &'value mut V;
1111 | 
1112 | START {
1113 | /*0*/ map = ...;
1114 | /*1*/ key = ...;
1115 | /*2*/ tmp0 = &'map mut map;
1116 | /*3*/ tmp1 = &key;
1117 | /*4*/ tmp2 = HashMap::get_mut(tmp0, tmp1);
1118 | /*5*/ SWITCH tmp2 { None => NONE, Some => SOME }
1119 | }
1120 | 
1121 | NONE {
1122 | /*0*/ ...
1123 | /*1*/ goto EXIT;
1124 | }
1125 | 
1126 | SOME {
1127 | /*0*/ value = tmp2.downcast<Some>.0;
1128 | /*1*/ process(value);
1129 | /*2*/ goto EXIT;
1130 | }
1131 | 
1132 | EXIT {
1133 | }
1134 | ```
1135 | 
1136 | The following liveness constraints are generated:
1137 | 
1138 |     ('tmp0: {START/3}) @ START/3
1139 |     ('tmp0: {START/4}) @ START/4
1140 |     ('tmp2: {SOME/0}) @ SOME/0
1141 |     ('value: {SOME/1}) @ SOME/1
1142 | 
1143 | The following subtyping-based constraints are generated:
1144 | 
1145 |     ('map: 'tmp0) @ START/3
1146 |     ('tmp0: 'tmp2) @ START/5
1147 |     ('tmp2: 'value) @ SOME/1
1148 | 
1149 | Ultimately, the lifetime we are most interested in is `'map`,
1150 | which indicates the duration for which `map` is borrowed. If we solve
1151 | the constraints above, we will get:
1152 | 
1153 |     'map == {START/3, START/4, SOME/0, SOME/1}
1154 |     'tmp0 == {START/3, START/4, SOME/0, SOME/1}
1155 |     'tmp2 == {SOME/0, SOME/1}
1156 |     'value == {SOME/1}
1157 | 
1158 | These results indicate that `map` **can** be mutated in the `None`
1159 | arm; `map` could also be mutated in the `Some` arm, but only after
1160 | `process()` is called (i.e., starting at SOME/2). This is the desired
1161 | result.
1162 | 
1163 | #### Example 4, invariant
1164 | 
1165 | It's worth looking at a variant of our running example ("Example 4").
1166 | This is the same pattern as before, but instead of using `&'a T`
1167 | references, we use `Foo<'a>` references, which are **invariant** with
1168 | respect to `'a`.  This means that the `'a` lifetime in a `Foo<'a>`
1169 | value cannot be approximated (i.e., you can't make it shorter, as you
1170 | can with a normal reference). Usually invariance arises because of
1171 | mutability (e.g., `Foo<'a>` might have a field of type `Cell<&'a
1172 | ()>`). The key point here is that invariance actually makes **no
1173 | difference at all** the outcome. This is true because of
1174 | location-based subtyping.
1175 | 
1176 | ```rust
1177 | let mut foo: T = ...;
1178 | let mut bar: T = ...;
1179 | let p: Foo<'a>;
1180 | 
1181 | p = Foo::new(&foo);
1182 | if condition {
1183 |     print(*p);
1184 |     p = Foo::new(&bar);
1185 | }
1186 | print(*p);
1187 | ```
1188 | 
1189 | Effectively, we wind up with the same constraints as before, but where
1190 | we only had `'foo: 'p`/`'bar: 'p` constraints before (due to subtyping), we now
1191 | also have `'p: 'foo` and `'p: 'bar` constraints:
1192 | 
1193 |     ('foo: 'p) @ A/1
1194 |     ('p: 'foo) @ A/1
1195 |     ('bar: 'p) @ B/3
1196 |     ('p: 'bar) @ B/3
1197 |     ('p: {A/1}) @ A/1
1198 |     ('p: {B/0}) @ B/0
1199 |     ('p: {B/3}) @ B/3
1200 |     ('p: {B/4}) @ B/4
1201 |     ('p: {C/0}) @ C/0
1202 | 
1203 | The key point is that the new constraints don't affect the final answer:
1204 | the new constraints were already satisfied with the older answer.
1205 | 
1206 | #### vec-push-ref
1207 | 
1208 | In previous iterations of this proposal, the location-aware subtyping
1209 | rules were replaced with transformations such as SSA form. The
1210 | vec-push-ref example demonstrates the value of location-aware
1211 | subtyping in contrast to these approaches.
1212 | 
1213 | ```rust
1214 | let foo: i32;
1215 | let vec: Vec<&'vec i32>;
1216 | let p: &'p i32;
1217 | 
1218 | foo = ...;
1219 | vec = Vec::new();
1220 | p = &'foo foo;
1221 | if true {
1222 |     vec.push(p);
1223 | } else {
1224 |     // Key point: `foo` not borrowed here.
1225 |     use(vec);
1226 | }
1227 | ```
1228 | 
1229 | This can be converted to control-flow graph form:
1230 | 
1231 | ```
1232 | block START {
1233 |     v = Vec::new();
1234 |     p = &'foo foo;
1235 |     goto B C;
1236 | }
1237 | 
1238 | block B {
1239 |     vec.push(p);
1240 |     goto EXIT;
1241 | }
1242 | 
1243 | block C {
1244 |     // Key point: `foo` not borrowed here
1245 |     use(vec);
1246 |     goto EXIT;
1247 | }
1248 | 
1249 | block EXIT {
1250 | }
1251 | ```
1252 | 
1253 | Here the relations from liveness are:
1254 | 
1255 |     ('vec: {START/1}) @ START/1
1256 |     ('vec: {START/2}) @ START/2
1257 |     ('vec: {B/0}) @ B/0
1258 |     ('vec: {C/0}) @ C/0
1259 |     ('p: {START/2}) @ START/2
1260 |     ('p: {B/0}) @ B/0
1261 | 
1262 | Meanwhile, the call to `vec.push(p)` establishes this subtyping
1263 | relation:
1264 | 
1265 |     ('p: 'vec) @ B/1
1266 |     ('foo: 'p) @ START/2
1267 | 
1268 | The solution is:
1269 | 
1270 |     'vec = {START/1, START/2, B/0, C/0}
1271 |     'p = {START/2, B/0}
1272 |     'foo = {START/2, B/0}
1273 | 
1274 | What makes this example interesting is that **the lifetime `'vec` must
1275 | include both halves of the `if`** -- because it is used in both branches
1276 | -- but `'vec` only becomes "entangled" with the lifetime `'p` on one
1277 | path. Thus even though `'vec` has to outlive `'p`, `'p` never winds up
1278 | including the "else" branch thanks to location-aware subtyping.
1279 | 
1280 | ## Layer 2: Avoiding infinite loops
1281 | 
1282 | The previous design was described in terms of the "pure" MIR
1283 | control-flow graph. However, using the raw graph has some undesirable
1284 | properties around infinite loops. In such cases, the graph has no
1285 | exit, which undermines the traditional definition of reverse analyses
1286 | like liveness. To address this, when we build the control-flow graph
1287 | for our functions, we will augment it with additional edges -- in
1288 | particular, for every infinite loop (`loop { }`), we will add false
1289 | "unwind" edges. This ensures that the control-flow graph has a final
1290 | exit node (the success of the RETURN and RESUME nodes) that
1291 | postdominates all other nodes in the graph.
1292 | 
1293 | If we did not add such edges, the result would also allow a number of surprising
1294 | programs to type-check. For example, it would be possible to borrow local variables
1295 | with `'static` lifetime, so long as the function never returned:
1296 | 
1297 | ```rust
1298 | fn main() {
1299 |     let x: usize;
1300 |     let y: &'static x = &x;
1301 |     loop { }
1302 | }
1303 | ```
1304 | 
1305 | This would work because (as covered in detail under the borrow check
1306 | section) the `StorageDead(x)` instruction would never be reachable,
1307 | and hence any lifetime of borrow would be acceptable. This further leads to
1308 | other surprising programs that still type-check, such as this example which
1309 | uses an (incorrect, but declared as unsafe) API for spawning threads:
1310 | 
1311 | ```rust
1312 | let scope = Scope::new();
1313 | let mut foo = 22;
1314 | 
1315 | unsafe {
1316 |     // dtor joins the thread
1317 |     let _guard = scope.spawn(&mut foo);
1318 |     loop {
1319 |         foo += 1;
1320 |     }
1321 |     // drop of `_guard` joins the thread
1322 | }
1323 | ```
1324 | 
1325 | Without the unwind edges, this code would pass the borrowck, since the
1326 | drop of `_guard` (and `StorageDead` instruction) is not reachable, and
1327 | hence `_guard` is not considered live (after all, its destructor will
1328 | indeed never run). However, this would permit the `foo` variable to be
1329 | modified both during the infinite loop and by the thread launched by
1330 | `scope.spawn()`, which was given access to an `&mut foo` reference
1331 | (albeit one with a theoretically short lifetime).
1332 | 
1333 | With the false unwind edge, the compiler essentially always assumes
1334 | that a destructor *may* run, since every scope may theoretically
1335 | execute. This extends the `&mut foo` borrow given to `scope.spawn()`
1336 | to cover the body of the loop, resulting in a borrowck error.
1337 | 
1338 | ## Layer 3: Accommodating dropck
1339 | 
1340 | MIR includes an action that corresponds to "dropping" a variable:
1341 | 
1342 |     DROP(variable)
1343 | 
1344 | Note that while MIR supports general drops of any lvalue, at the point
1345 | where this analysis is running, we are always dropping entire
1346 | variables at a time. This operation executes the destructor for
1347 | `variable`, effectively "de-initializing" the memory in which the
1348 | value resides (if the variable -- or parts of the variable -- have
1349 | already been dropped, then drop has no effect; this is not relevant to
1350 | the current analysis).
1351 | 
1352 | Interestingly, in many cases dropping a value does not require that the
1353 | lifetimes in the dropped value be valid. After all, dropping a
1354 | reference of type `&'a T` or `&'a mut T` is defined as a no-op, so it
1355 | does not matter if the reference points at valid memory. In cases like
1356 | this, we say that the lifetime `'a` **may dangle**. This is inspired by the C
1357 | term "dangling pointer" which means a pointer to freed or invalid
1358 | memory.
1359 | 
1360 | However, if that same reference is stored in the field of a struct
1361 | which implements the `Drop` trait, when the struct may, during its
1362 | destructor, access the referenced value, so it's very important that
1363 | the reference be valid in that case. Put another way, if you have a
1364 | value `v` of type `Foo<'a>` that implements `Drop`, then `'a`
1365 | typically **cannot dangle** when `v` is dropped (just as `'a` would
1366 | not be allowed to dangle for any other operation).
1367 | 
1368 | More generally, RFC 1327 defined specific rules for which lifetimes in
1369 | a type may dangle during drop and which may not. We integrate those
1370 | rules into our liveness analysis as follows: the MIR instruction
1371 | `DROP(variable)` is not treated like other MIR instructions when it
1372 | comes to liveness. In a sense, conceptually we run two distinct liveness analyses (in practice, the prototype
1373 | uses two bits per variable):
1374 | 
1375 | 1. The first, which we've already seen, indicates when a variable's
1376 |    current value may be **used** in the future. This corresponds to
1377 |    "non-drop" uses of the variable in the MIR. Whenever a variable is live by this definition,
1378 |    all of the lifetimes in its type are live.
1379 | 2. The second, which we are adding now, indicates when a variable's
1380 |    current value may be **dropped** in the future. This corresponds to
1381 |    "drop" uses of the variable in the MIR. Whenever a variable is live
1382 |    in *this* sense, all of the lifetimes in its type **except those
1383 |    marked as may-dangle** are live.
1384 | 
1385 | Permitting lifetimes to dangle during drop is very important! In fact,
1386 | it is essential to even the most basic non-lexical lifetime examples,
1387 | such as Problem Case #1.  After all, if we translate Problem Case #1
1388 | into MIR, we see that the reference `slice` will wind up being dropped
1389 | at the end of the block:
1390 | 
1391 | ```rust
1392 | let mut data: Vec<i32>;
1393 | let slice: &'slice mut i32;
1394 | START {
1395 |     ...
1396 |     slice = &'borrow mut data;
1397 |     capitalize(slice);
1398 |     data.push('d');
1399 |     data.push('e');
1400 |     data.push('f');
1401 |     DROP(slice);
1402 |     DROP(data);
1403 | }
1404 | ```
1405 | 
1406 | This poses no problem for our analysis, however, because `'slice` "may
1407 | dangle" during the drop, and hence is not considered live.
1408 | 
1409 | ## Layer 4: Named lifetimes
1410 | 
1411 | Until now, we've only considered lifetimes that are confined to the
1412 | extent of a function. Often, we want to reason about
1413 | lifetimes that begin or end after the current function has ended. More
1414 | subtly, we sometimes want to have lifetimes that sometimes begin and
1415 | end in the current function, but which may (along some paths) extend
1416 | into the caller. Consider Problem Case #3 (the corresponding test case
1417 | in the prototype is the [get-default] test):
1418 | 
1419 | [get-default]: https://github.com/nikomatsakis/nll/blob/master/test/get-default.nll
1420 | 
1421 | ```rust
1422 | fn get_default<'r,K,V:Default>(map: &'r mut HashMap<K,V>,
1423 |                                key: K)
1424 |                                -> &'r mut V {
1425 |     match map.get_mut(&key) { // -------------+ 'r
1426 |         Some(value) => value,              // |
1427 |         None => {                          // |
1428 |             map.insert(key, V::default()); // |
1429 |             //  ^~~~~~ ERROR               // |
1430 |             map.get_mut(&key).unwrap()     // |
1431 |         }                                  // |
1432 |     }                                      // |
1433 | }                                          // v
1434 | ```
1435 | 
1436 | When we translate this into MIR, we get something like the following
1437 | (this is "pseudo-MIR"):
1438 | 
1439 | ```
1440 | block START {
1441 |   m1 = &'m1 mut *map;  // temporary created for `map.get_mut()` call
1442 |   v = Map::get_mut(m1, &key);
1443 |   switch v { SOME NONE };
1444 | }
1445 | 
1446 | block SOME {
1447 |   return = v.as<Some>.0; // assign to return value slot
1448 |   goto END;
1449 | }
1450 | 
1451 | block NONE {
1452 |   Map::insert(&*map, key, ...);
1453 |   m2 = &'m2 mut *map;  // temporary created for `map.get_mut()` call
1454 |   v = Map::get_mut(m2, &key);
1455 |   return = ... // "unwrap" of `v`
1456 |   goto END;
1457 | }
1458 | 
1459 | block END {
1460 |   return;
1461 | }
1462 | ```
1463 | 
1464 | The key to this example is that the first borrow of `map`, with the
1465 | lifetime `'m1`, must extend to the end of the `'r`, but only if we
1466 | branch to SOME. Otherwise, it should end once we enter the NONE block.
1467 | 
1468 | To accommodate cases like this, we will extend the notion of a region
1469 | so that it includes not only points in the control-flow graph, but
1470 | also includes a (possibly empty) set of "end regions" for various
1471 | named lifetimes.  We denote these as `end('r)` for some named region
1472 | `'r`. The region `end('r)` can be understood semantically as referring
1473 | to some portion of the caller's control-flow graph (actually, they
1474 | could extend beyond the end of the caller, into the caller's caller,
1475 | and so forth, but that doesn't concern us). This new region might then
1476 | be denoted as the following (in pseudocode form):
1477 | 
1478 | ```rust
1479 | struct Region {
1480 |   points: Set<Point>,
1481 |   end_regions: Set<NamedLifetime>,
1482 | }
1483 | ```
1484 | 
1485 | In this case, when a type mentions a named lifetime, such as `'r`, that
1486 | can be represented by a region that includes:
1487 | 
1488 | - the entire CFG,
1489 | - and, the end region for that named lifetime (`end('r)`).
1490 | 
1491 | Furthermore, we can **elaborate** the set to include `end('x)` for
1492 | every named lifetime `'x` such that `'r: 'x`. This is because, if `'r:
1493 | 'x`, then we know that `'r` doesn't end up until `'x` has already
1494 | ended.
1495 | 
1496 | Finally, we must adjust our definition of subtyping to accommodate
1497 | this amended definition of a region, which we do as follows. When we have
1498 | an outlives relation 
1499 | 
1500 |     'b: 'a @ P
1501 |     
1502 | where the end point of the CFG is reachable from P without leaving
1503 | `'a`, the existing inference algorithm would simply add the end-point
1504 | to `'b` and stop. The new algorithm would also add any end regions
1505 | that are included in `'a` to `'b` at that time. (Expressed less
1506 | operationally, `'b` only outlives `'a` if it also includes the
1507 | end-regions that `'a` includes, presuming that the end point of the
1508 | CFG is reachable from P). The reason that we require the end point of
1509 | the CFG to be reachable is because otherwise the data never escapes
1510 | the current function, and hence `end('r)` is not reachable (since
1511 | `end('r)` only covers the code in callers that executes *after* the
1512 | return).
1513 | 
1514 | NB: This part of the prototype is partially
1515 | implemented. [Issue #12](https://github.com/nikomatsakis/nll/issues/12)
1516 | describes the current status and links to the in-progress PRs.
1517 | 
1518 | ## Layer 5: How the borrow check works
1519 | 
1520 | For the most part, the focus of this RFC is on the structure of
1521 | lifetimes, but it's worth talking a bit about how to integrate
1522 | these non-lexical lifetimes into the borrow checker. In particular,
1523 | along the way, we'd like to fix two shortcomings of the borrow checker:
1524 | 
1525 | **First, support nested method calls like `vec.push(vec.len())`.**
1526 | Here, the plan is to continue with the `mut2` borrow solution proposed
1527 | in [RFC 2025]. This RFC does not (yet) propose one of the type-based
1528 | solutions described in RFC 2025, such as "borrowing for the future" or
1529 | `Ref2`. The reasons why are discussed in the Alternatives section. For
1530 | simplicity, this description of the borrow checker ignores
1531 | [RFC 2025]. The extensions described here are fairly orthogonal to the
1532 | changes proposed in [RFC 2025], which in effect cause the start of a
1533 | borrow to be delayed.
1534 | 
1535 | **Second, permit variables containing mutable references to be
1536 | modified, even if their referent is borrowed.** This refers to the
1537 | "Problem Case #4" described in the introduction; we wish to accept the
1538 | original program.
1539 | 
1540 | ### Borrow checker phase 1: computing loans in scope
1541 | 
1542 | The first phase of the borrow checker computes, at each point in
1543 | the CFG, the set of in-scope **loans**. A "loan" is represented as a tuple
1544 | `('a, shared|uniq|mut, lvalue)` indicating:
1545 | 
1546 | 1. the lifetime `'a` for which the value was borrowed;
1547 | 2. whether this was a shared, unique, or mutable loan;
1548 |     - "unique" loans are exactly like mutable loans, but they do not permit
1549 |       mutation of their referents. They are used only in closure desugarings
1550 |       and are not part of Rust's surface syntax.
1551 | 3. the lvalue that was borrowed (e.g., `x` or `(*x).foo`).
1552 | 
1553 | The set of in-scope loans at each point is found via a fixed-point
1554 | dataflow computation. We create a loan tuple from each borrow rvalue
1555 | in the MIR (that is, every assignment statement like `tmp = &'a
1556 | b.c.d`), giving each tuple a unique index `i`. We can then represent
1557 | the set of loans that are in scope at a particular point using a
1558 | bit-set and do a standard forward data-flow propagation.
1559 | 
1560 | For a statement at point P in the graph, we define the "transfer
1561 | function" -- that is, which loans it brings into or out of scope -- as
1562 | follows:
1563 | 
1564 | - any loans whose region does not include P are killed;
1565 | - if this is a borrow statement, the corresponding loan is generated;
1566 | - if this is an assignment `lv = <rvalue>`, then any loan for some path P
1567 |   of which `lv` is a prefix is killed.
1568 | 
1569 | The last point bears some elaboration. This rule is what allows us to
1570 | support cases like the one in Problem Case #4:
1571 | 
1572 | ```rust
1573 | let list: &mut List<T> = ...;
1574 | let v = &mut (*list).value;
1575 | list = ...; // <-- assignment
1576 | ```
1577 | 
1578 | At the point of the marked assignment, the loan of `(*list).value` is
1579 | in-scope, but it does not have to be considered in-scope
1580 | afterwards. This is because the variable `list` now holds a fresh
1581 | value, and that new value has not yet been borrowed (or else we could
1582 | not have produced it). Specifically, whenever we see an assignment `lv
1583 | = <rvalue>` in MIR, we can clear all loans where the borrowed path
1584 | `lv_loan` has `lv` as a prefix. (In our example, the assignment is to
1585 | `list`, and the loan path `(*list).value` has `list` as a prefix.)
1586 | 
1587 | **NB.** In this phase, when there is an assignment, we always clear
1588 | all loans that applied to the overwritten path; however, in some cases
1589 | the **assignment itself** may be illegal due to those very loans. In
1590 | our example, this would be the case if the type of `list` had been
1591 | `List<T>` and not `&mut List<T>`.  In such cases, errors will be
1592 | reported by the next portion of the borrowck, described in the next
1593 | section.
1594 | 
1595 | ### Borrow checker phase 2: reporting errors
1596 | 
1597 | At this point, we have computed which loans are in scope at each
1598 | point. Next, we traverse the MIR and identify actions that are illegal
1599 | given the loans in scope. Rather than go through every kind of MIR statement,
1600 | we can break things down into two kinds of actions that can be performed:
1601 | 
1602 | - Accessing an lvalue, which we categorize along two axes (shallow vs deep, read vs write)
1603 | - Dropping an lvalue
1604 | 
1605 | For each of these kinds of actions, we will specify below the rules
1606 | that determine when they are legal, given the set of loans L in scope
1607 | at the start of the action. The second phase of the borrow check
1608 | therefore consists of iterating over each statement in the MIR and
1609 | checking, given the in-scope loans, whether the actions it performs
1610 | are legal. Translating MIR statements into actions is mostly
1611 | straightforward:
1612 | 
1613 | - A `StorageDead` statement counts as a **shallow write**.
1614 | - An assignment statement `LV = RV` is a **shallow write** to `LV`;
1615 | - and, within the rvalue `RV`:
1616 |   - Each lvalue operand is either a **deep read** or a **deep write** action, depending
1617 |     on whether or not the type of the lvalue implements `Copy`.
1618 |     - Note that moves count as "deep writes".
1619 |   - A shared borrow `&LV` counts as a **deep read**.
1620 |   - A mutable borrow `&mut LV` counts as **deep write**.
1621 |   
1622 | There are a few interesting cases to keep in mind:
1623 | 
1624 | - MIR models discriminants more precisely. They should be
1625 |   thought of as a distinct *field* when it comes to borrows.
1626 | - In the compiler today, `Box` is still "built-in" to MIR. This RFC
1627 |   ignores that possibility and instead acts as though borrowed
1628 |   references (`&` and `&mut`) and raw pointers (`*const` and `*mut`)
1629 |   were the only sorts of pointers.  It should be straight-forward to
1630 |   extend the text here to cover `Box`, though some questions arise
1631 |   around the handling of drop (see the section on drops for details).
1632 | 
1633 | **Accessing an lvalue LV.** When accessing an lvalue LV, there are two
1634 | axes to consider:
1635 | 
1636 | - The access can be SHALLOW or DEEP:
1637 |   - A *shallow* access means that the immediate fields reached at LV
1638 |     are accessed, but references or pointers found within are not
1639 |     dereferenced. Right now, the only access that is shallow is an
1640 |     assignment like `x = ...`, which would be a **shallow write** of
1641 |     `x`.
1642 |   - A *deep* access means that all data reachable through a given lvalue
1643 |     may be invalidated or accessed by this action.
1644 | - The access can be a READ or WRITE:
1645 |   - A *read* means that the existing data may be read, but will not be changed.
1646 |   - A *write* means that the data may be mutated to new values or
1647 |     otherwise invalidated (for example, it could be de-initialized, as
1648 |     in a move operation).
1649 | 
1650 | "Deep" accesses are often deep because they create and release an
1651 | alias, in which case the "deep" qualifier reflects what might happen
1652 | through that alias. For example, if you have `let x = &mut y`, that is
1653 | considered a **deep write** of `y`, even though the **actual borrow**
1654 | doesn't do anything at all, we create a mutable alias `x` that can be
1655 | used to mutate anything reachable from `y`. A move `let x = y` is
1656 | similar: it writes to the shallow content of `y`, but then -- via the
1657 | new name `x` -- we can access all other content accessible through
1658 | `y`.
1659 | 
1660 | The pseudocode for deciding when an access is legal looks like this:
1661 | 
1662 | ```
1663 | fn access_legal(lvalue, is_shallow, is_read) {
1664 |     let relevant_borrows = select_relevant_borrows(lvalue, is_shallow);
1665 | 
1666 |     for borrow in relevant_borrows {
1667 |         // shared borrows like `&x` still permit reads from `x` (but not writes)
1668 |         if is_read && borrow.is_read { continue; }
1669 |         
1670 |         // otherwise, report an error, because we have an access
1671 |         // that conflicts with an in-scope borrow
1672 |         report_error();
1673 |     }
1674 | }
1675 | ```
1676 | 
1677 | As you can see, it works in two steps. First, we enumerate a set of
1678 | in-scope borrows that are relevant to `lvalue` -- this set is affected
1679 | by whether this is a "shallow" or "deep" action, as will be described
1680 | shortly. Then, for each such borrow, we check if it conflicts with the
1681 | action (i.e.,, if at least one of them is potentially writing), and,
1682 | if so, we report an error.
1683 | 
1684 | For **shallow** accesses to the path `lvalue`, we consider borrows relevant
1685 | if they meet one of the following criteria:
1686 | 
1687 | - there is a loan for the path `lvalue`;
1688 |   - so: writing a path like `a.b.c` is illegal if `a.b.c` is borrowed
1689 | - there is a loan for some prefix of the path `lvalue`;
1690 |   - so: writing a path like `a.b.c` is illegal if `a` or `a.b` is borrowed
1691 | - `lvalue` is a **shallow prefix** of the loan path
1692 |   - shallow prefixes are found by stripping away fields, but stop at
1693 |     any dereference
1694 |   - so: writing a path like `a` is illegal if `a.b` is borrowed
1695 |   - but: writing `a` is legal if `*a` is borrowed, whether or not `a`
1696 |     is a shared or mutable reference
1697 | 
1698 | For **deep** accesses to the path `lvalue`, we consider borrows relevant
1699 | if they meet one of the following criteria:
1700 | 
1701 | - there is a loan for the path `lvalue`;
1702 |   - so: reading a path like `a.b.c` is illegal if `a.b.c` is mutably borrowed
1703 | - there is a loan for some prefix of the path `lvalue`;
1704 |   - so: reading a path like `a.b.c` is illegal if `a` or `a.b` is mutably borrowed
1705 | - `lvalue` is a **supporting prefix** of the loan path
1706 |   - supporting prefixes were defined earlier
1707 |   - so: reading a path like `a` is illegal if `a.b` is mutably
1708 |     borrowed, but -- in contrast with shallow accesses -- reading `a` is also
1709 |     illegal if `*a` is mutably borrowed
1710 |     
1711 | **Dropping an lvalue LV.** Dropping an lvalue can be treated as a DEEP
1712 | WRITE, like a move, but this is overly conservative. The rules here
1713 | are under active development, see
1714 | [#40](https://github.com/nikomatsakis/nll-rfc/issues/40).
1715 | 
1716 | # How We Teach This
1717 | [how-we-teach-this]: #how-we-teach-this
1718 | 
1719 | ## Terminology
1720 | 
1721 | In this RFC, I've opted to continue using the term "lifetime" to refer
1722 | to the portion of the program in which a reference is in active use
1723 | (or, alternatively, to the "duration of a borrow"). As the intro to
1724 | the RFC makes clear, this terminology somewhat conflicts with an
1725 | alternative usage, in which lifetime refers to the dynamic extent of a
1726 | value (what we call the "scope"). I think that -- if we were starting
1727 | over -- it might have been preferable to find an alternative term that
1728 | is more specific. However, it would be rather difficult to try and
1729 | change the term "lifetime" at this point, and hence this RFC does not
1730 | attempt do so. To avoid confusion, however, it seems best if the error
1731 | messages result from the region and borrow check avoid the term
1732 | lifetime where possible, or use qualification to make the meaning more
1733 | clear.
1734 | 
1735 | ## Leveraging intuition: framing errors in terms of points
1736 | 
1737 | Part of the reason that Rust currently uses lexical scopes to
1738 | determine lifetimes is that it was thought that they would be simpler
1739 | for users to reason about. Time and experience have not borne this
1740 | hypothesis out: for many users, the fact that borrows are
1741 | "artificially" extended to the end of the block is more surprising
1742 | than not. Furthermore, most users have a pretty intuitive
1743 | understanding of control flow (which makes sense: you have to, in
1744 | order to understand what your program will do).
1745 | 
1746 | We therefore propose to leverage this intution when explaining borrow
1747 | and lifetime errors. To the extent possible, we will try to explain
1748 | all errors in terms of three points:
1749 | 
1750 | - The point where the borrow occurred (B).
1751 | - The point where the resulting reference is used (U).
1752 | - An intervening point that might have invalidated the reference (A).
1753 | 
1754 | We should select three points such that B can reach A and A can reach
1755 | U. In general, the approach is to describe the errors in "narrative" form:
1756 | 
1757 | - First, value is borrowed occurs.
1758 | - Next, the action occurs, invalidating the reference.
1759 | - Finally, the next use occcurs, after the reference has been invalidated.
1760 | 
1761 | This approach is similar to what we do today, but we often neglect to
1762 | mention this third point, where the next use occurs. Note that the
1763 | "point of error" remains the *second* action -- that is, the error,
1764 | conceptually, is to perform an invalidating action in between two uses
1765 | of the reference (rather than, say, to use the reference after an
1766 | invalidating action). This actually reflects the definition of
1767 | undefined behavior more accurately (that is, performing an illegal
1768 | write is what causes undefined behavior, but the write is illegal
1769 | because of the latter use).
1770 | 
1771 | To see the difference, consider this erroneous program:
1772 | 
1773 | ```rust
1774 | fn main() {
1775 |     let mut i = 3;
1776 |     let x = &i;
1777 |     i += 1;
1778 |     println!("{}", x);
1779 | }
1780 | ```
1781 | 
1782 | Currently, we emit the following error:
1783 | 
1784 | ```
1785 | error[E0506]: cannot assign to `i` because it is borrowed
1786 |  --> <anon>:4:5
1787 |    |
1788 |  3 |     let x = &i;
1789 |    |              - borrow of `i` occurs here
1790 |  4 |     i += 1;
1791 |    |     ^^^^^^ assignment to borrowed `i` occurs here
1792 | ```
1793 | 
1794 | Here, the points B and A are highlighted, but not the point of use
1795 | U. Moreover, the "blame" is placed on the assignment. Under this RFC,
1796 | we would display the error as follows:
1797 | 
1798 | ```
1799 | error[E0506]: cannot write to `i` while borrowed
1800 |  --> <anon>:4:5
1801 |    |
1802 |  3 |     let x = &i;
1803 |    |              - (shared) borrow of `i` occurs here
1804 |  4 |     i += 1;
1805 |    |     ^^^^^^ write to `i` occurs here, while borrow is still active
1806 |  5 |     println!("{}", x);
1807 |    |                    - borrow is later used here
1808 | ```
1809 | 
1810 | Another example, this time using a `match`:
1811 | 
1812 | ```rust
1813 | fn main() {
1814 |     let mut x = Some(3);
1815 |     match &mut x {
1816 |         Some(i) => {
1817 |             x = None;
1818 |             *i += 1;
1819 |         }
1820 |         None => {
1821 |             x = Some(0); // OK
1822 |         }
1823 |     }
1824 | }
1825 | ```
1826 | 
1827 | The error might be:
1828 | 
1829 | ```
1830 | error[E0506]: cannot write to `x` while borrowed
1831 |  --> <anon>:4:5
1832 |    |
1833 |  3 |     match &mut x {
1834 |    |           ------ (mutable) borrow of `x` occurs here
1835 |  4 |         Some(i) => {
1836 |  5 |              x = None;
1837 |    |              ^^^^^^^^ write to `x` occurs here, while borrow is still active
1838 |  6 |              *i += 1;
1839 |    |              -- borrow is later used here
1840 |    |
1841 | ```
1842 | 
1843 | (Note that the assignment in the `None` arm is not an error, since the
1844 | borrow is never used again.)
1845 | 
1846 | ## Some special cases
1847 | 
1848 | There are some cases where the three points are not all visible
1849 | in the user syntax where we may need some careful treatment.
1850 | 
1851 | ### Drop as last use
1852 | 
1853 | There are times when the last use of a variable will in fact be its
1854 | destructor. Consider an example like this:
1855 | 
1856 | ```rust
1857 | struct Foo<'a> { field: &'a u32 }
1858 | impl<'a> Drop for Foo<'a> { .. }
1859 | 
1860 | fn main() {
1861 |     let mut x = 22;
1862 |     let y = Foo { field: &x };
1863 |     x += 1;
1864 | }
1865 | ```
1866 | 
1867 | This code would be legal, but for the destructor on `y`, which will
1868 | implicitly execute at the end of the enclosing scope. The error
1869 | message might be shown as follows:
1870 | 
1871 | ```
1872 | error[E0506]: cannot write to `x` while borrowed
1873 |  --> <anon>:4:5
1874 |    |
1875 |  6 |     let y = Foo { field: &x };
1876 |    |                          -- borrow of `x` occurs here
1877 |  7 |     x += 1;
1878 |    |     ^ write to `x` occurs here, while borrow is still active
1879 |  8 | }
1880 |    | - borrow is later used here, when `y` is dropped
1881 | ```
1882 | 
1883 | ### Method calls
1884 | 
1885 | One example would be method calls:
1886 | 
1887 | ```rust
1888 | fn main() {
1889 |     let mut x = vec![1];
1890 |     x.push(x.pop().unwrap());
1891 | }
1892 | ```
1893 | 
1894 | We propose the following error for this sort of scenario:
1895 | 
1896 | ```
1897 | error[E0506]: cannot write to `x` while borrowed
1898 |  --> <anon>:4:5
1899 |    |
1900 |  3 |     x.push(x.pop().unwrap());
1901 |    |     - ---- ^^^^^^^^^^^^^^^^
1902 |    |     | |    write to `x` occurs here, while borrow is still in active use
1903 |    |     | borrow is later used here, during the call
1904 |    |     `x` borrowed here
1905 | ```
1906 | 
1907 | If you are not using a method, the error would look slightly different,
1908 | but be similar in concept:
1909 | 
1910 | ```
1911 | error[E0506]: cannot assign to `x` because it is borrowed
1912 |  --> <anon>:4:5
1913 |    |
1914 |  3 |     Vec::push(&mut x, x.pop().unwrap());
1915 |    |     --------- ------  ^^^^^^^^^^^^^^^^
1916 |    |     |         |       write to `x` occurs here, while borrow is still in active use
1917 |    |     |         `x` borrowed here
1918 |    |     borrow is later used here, during the call
1919 | ```
1920 | 
1921 | We can detect this scenario in MIR readily enough by checking when the
1922 | point of use turns out to be a "call" terminator. We'll have to tweak
1923 | the spans to get everything to look correct, but that is easy enough.
1924 | 
1925 | ### Closures
1926 | 
1927 | As today, when the initial borrow is part of constructing a closure,
1928 | we wish to highlight not only the point where the closure is
1929 | constructed, but the point *within* the closure where the variable in
1930 | question is used.
1931 | 
1932 | ## Borrowing a variable for longer than its scope
1933 | 
1934 | Consider this example:
1935 | 
1936 | ```rust
1937 | let p;
1938 | {
1939 |     let x = 3;
1940 |     p = &x;
1941 | }
1942 | println!("{}", p);
1943 | ```
1944 | 
1945 | In this example, the reference `p` refers to `x` with a lifetime that
1946 | exceeds the scope of `x`. In short, that portion of the stack will be
1947 | popped with `p` still in active use. In today's compiler, this is
1948 | detected during the borrow checker by a special check that computes
1949 | the "maximal scope" of the path being borrowed (`x`, here). This makes
1950 | sense in the existing system since lifetimes and scopes are expressed
1951 | in the same units (portions of the AST).  In the newer, non-lexical
1952 | formulation, this error would be detected somewhat differently. As
1953 | described earlier, we would see that a `StorageDead` instruction frees
1954 | the slot for `x` while `p` is still in use. We can thus present the
1955 | error in the same "three-point style":
1956 | 
1957 | ```
1958 | error[E0506]: variable goes out of scope while still borrowed
1959 |  --> <anon>:4:5
1960 |    |
1961 |  3 |     p = &x;
1962 |    |          - `x` borrowed here
1963 |  4 | }
1964 |    | ^ `x` goes out of scope here, while borrow is still in active use
1965 |  5 | println!("{}", p);
1966 |    |                - borrow used here, after invalidation
1967 | ```
1968 | 
1969 | ## Errors during inference
1970 | 
1971 | The remaining set of lifetime-related errors come about primarily due
1972 | to the interaction with function signatures. For example:
1973 | 
1974 | ```rust
1975 | impl Foo {
1976 |     fn foo(&self, y: &u8) -> &u8 {
1977 |         x
1978 |     }
1979 | }
1980 | ```
1981 | 
1982 | We already have work-in-progress on presenting these sorts of errors
1983 | in a better way (see [issue 42516][] for numerous examples and
1984 | details), all of which should be applicable here. In short, the name
1985 | of the game is to identify patterns and suggest changes to improve the
1986 | function signature to match the body (or at least diagnose the problem
1987 | more clearly).
1988 | 
1989 | [issue 42516]: https://github.com/rust-lang/rust/issues/42516
1990 | 
1991 | Whenever possible, we should leverage points in the control-flow and
1992 | try to explain errors in "narrative" form.
1993 | 
1994 | # Drawbacks
1995 | [drawbacks]: #drawbacks
1996 | 
1997 | There are very few drawbacks to this proposal. The primary one is that
1998 | the **rules** for the system become more complex. However, this
1999 | permits us to accept a large number of more programs, and so we expect
2000 | that **using Rust** will feel simpler. Moreover, experience has shown
2001 | that -- for many users -- the current scheme of tying reference
2002 | lifetimes to lexical scoping is confusing and surprising.
2003 | 
2004 | # Alternatives
2005 | [alternatives]: #alternatives
2006 | 
2007 | ### Alternative formulations of NLL
2008 | 
2009 | During the runup to this RFC, a number of alternate schemes and
2010 | approaches to describing NLL were tried and discarded.
2011 | 
2012 | **RFC 396.** [RFC 396][] defined lifetimes to be a "prefix" of the
2013 | dominator tree -- roughly speaking, a single-entry, multiple-exit
2014 | region of the control-flow graph. Unlike our system, this definition
2015 | did not permit gaps or holes in a lifetime. Ensuring continuous lifetimes was
2016 | meant to guarantee soundness; in this RFC, we use the liveness
2017 | constraints to achieve a similar effect. This more flexible setup
2018 | allows us to handle cases like Problem Case #3, which RFC 396 would
2019 | not have accepted. RFC 396 also did not cover dropck and a number of
2020 | other complications.
2021 | 
2022 | **SSA or SSI transformation.** Rather than incorporating the "current location" into
2023 | the subtype check, we also considered formulations that first applied
2024 | an SSA transformation to the input program, and then gave each of those
2025 | variables a distinct type. This does allow some examples to type-check that
2026 | wouldn't otherwise, but it is not flexible enough for the `vec-push-ref`
2027 | example covered earlier.
2028 | 
2029 | Using SSA also introduces other complications. Among other things,
2030 | Rust permits variables and temporaries to be borrowed and mutated
2031 | indirectly (e.g., via `&mut`).  If we were to apply SSA to MIR in a
2032 | naive fashion, then, it would ignore these assignments when creating
2033 | numberings. For example:
2034 | 
2035 | ```rust
2036 | let mut x = 1;      // x0, has value 1
2037 | let mut p = &mut x; // p0
2038 | *p += 1;
2039 | use(x);             // uses `x0`, but it now has value 2
2040 | ```
2041 | 
2042 | Here, the value of `x0` changed due to a write from `p`. Thus this is
2043 | not a true SSA form. Normally, SSA transformations achieve this by
2044 | making local variables like `x` and `p` be pointers into stack slots,
2045 | and then lifting those stack slots into locals when safe. MIR was
2046 | intentionally not done using SSA form precisely to avoid the need for
2047 | such contortions (we can leave that to the optimizing backend).
2048 | 
2049 | **Type per program point.** Going further than SSA, one can
2050 | accommodate `vec-push-ref` through a scheme that gives each variable a
2051 | distinct type at each point in the CFG (similar to what Ericson2314
2052 | describes in the [stateful MIR for Rust][smr]) and applies
2053 | transformations to the lifetimes on every edge. During the rustc
2054 | design sprint, the compiler team also enumerated such a design. The
2055 | author believes this RFC to be a roughly equivalent analysis, but with
2056 | an alternative, more familiar formulation that still uses one type per
2057 | variable (rather than one type per variable per point).
2058 | 
2059 | There are several advantages to the design enumerated here. For one
2060 | thing, it involves far fewer inference variables (if each variable has
2061 | many types, each of those types needs distinct inference variables at
2062 | each point) and far fewer constraints (we don't need constraints just
2063 | for connecting the type of a variable between distinct points). It is
2064 | also a more natural fit for the surface language, in which variables
2065 | have a single type.
2066 | 
2067 | ### Different "lifetime roles"
2068 | 
2069 | In the discussion about nested method calls ([RFC 2025], and the
2070 | discussions that led up to it), there were various proposals that were
2071 | aimed at accepting the naive desugaring of a call like `vec.push(vec.len())`:
2072 | 
2073 | ```rust
2074 | let tmp0 = &mut vec;
2075 | let tmp1 = vec.len(); // does a shared borrow of vec
2076 | Vec::push(tmp0, tmp1);
2077 | ```
2078 | 
2079 | The alternatives to RFC 2025 were focused on augmenting the type of
2080 | references to have distinct "roles" -- the most prominent such
2081 | proposal was `Ref2<'r, 'w>`, in which mutable references change to
2082 | have two distinct lifetimes, a "read" lifetime (`'r`) and a "write"
2083 | lifetime (`'w`), where read encompasses the entire span of the
2084 | reference, but write only contains those points where writes are
2085 | occuring. This RFC does not attempt to change the approach to nested
2086 | method calls, rather continuing with the RFC 2025 approach (which
2087 | affects only the borrowck handling). However, if we did wish to adopt
2088 | a `Ref2`-style approach in the future, it could be done backwards
2089 | compatibly, but it would require modifying (for example) the liveness
2090 | requirements. For example, currently, if a variable `x` is live at
2091 | some point P, then all lifetimes in the type of `x` must contain P --
2092 | but in the `Ref2` approach, only the read lifetime would have to
2093 | contain P. This implies that lifetimes are treated differently
2094 | depending on their "role". It seems like a good idea to isolate such a
2095 | change into a distinct RFC.
2096 | 
2097 | # Unresolved questions
2098 | [unresolved]: #unresolved-questions
2099 | 
2100 | None at present.
2101 | 
2102 | # Appendix: What this proposal will not fix
2103 | 
2104 | It is worth discussing a few kinds of borrow check errors that the
2105 | current RFC will **not** eliminate. These are generally errors that
2106 | cross procedural boundaries in some form or another.
2107 | 
2108 | **Closure desugaring.** The first kind of error has to do with the
2109 | closure desugaring. Right now, closures always capture local
2110 | variables, even if the closure only uses some sub-path of the variable
2111 | internally:
2112 | 
2113 | ```rust
2114 | let get_len = || self.vec.len(); // borrows `self`, not `self.vec`
2115 | self.vec2.push(...); // error: self is borrowed
2116 | ```
2117 | 
2118 | This was discussed on [an internals thread][tc]. It is possible to fix
2119 | this [by making the closure desugaring smarter][cc].
2120 | 
2121 | [tc]: https://internals.rust-lang.org/t/borrow-the-full-stable-name-in-closures-for-ergonomics/5387
2122 | [cc]: https://internals.rust-lang.org/t/borrow-the-full-stable-name-in-closures-for-ergonomics/5387/11?u=nikomatsakis
2123 | 
2124 | **Disjoint fields across functions.** Another kind of error is when
2125 | you have one method that only uses a field `a` and another that only
2126 | uses some field `b`; right now, you can't express that, and hence
2127 | these two methods cannot be used "in parallel" with one another:
2128 | 
2129 | ```rust
2130 | impl Foo {
2131 |     fn get_a(&self) -> &A { &self.a }
2132 |     fn inc_b(&mut self) { self.b.value += 1; }
2133 |     fn bar(&mut self) {
2134 |         let a = self.get_a();
2135 |         self.inc_b(); // Error: self is already borrowed
2136 |         use(a);
2137 |     }
2138 | }
2139 | ```
2140 | 
2141 | The fix for this is to refactor so as to expose the fact that the methods
2142 | operate on disjoint data. For example, one can factor out the methods into
2143 | methods on the fields themselves:
2144 | 
2145 | ```rust
2146 | fn bar(&mut self) {
2147 |     let a = self.a.get();
2148 |     self.b.inc();
2149 |     use(a);
2150 | }
2151 | ```
2152 | 
2153 | This way, when looking at `bar()` alone, we see borrows of `self.a`
2154 | and `self.b`, rather than two borrows of `self`. Another technique is
2155 | to introduce "free functions" (e.g., `get(&self.a)` and `inc(&mut
2156 | self.b)`) that expose more clearly which fields are operated upon, or
2157 | to inline the method bodies. This is a non-trivial bit of design and
2158 | is out of scope for this RFC. See
2159 | [this comment on an internals thread][cpb] for further thoughts.
2160 | 
2161 | [cpb]: https://internals.rust-lang.org/t/partially-borrowed-moved-struct-types/5392/2
2162 | 
2163 | **Self-referential structs.** The final limitation we are not fixing
2164 | yet is the inability to have "self-referential structs". That is, you
2165 | cannot have a struct that stores, within itself, an arena and pointers
2166 | into that arena, and then move that struct around. This comes up in a
2167 | number of settings.  There are various workarounds: sometimes you can
2168 | use a vector with indices, for example, or
2169 | [the `owning_ref` crate](https://crates.io/crates/owning_ref). The
2170 | latter, when combined with [associated type constructors][ATC], might
2171 | be an adequate solution for some uses cases, actually (it's basically
2172 | a way of modeling "existential lifetimes" in library code). For the
2173 | case of futures especially, [the `?Move` RFC][?Move] proposes another
2174 | lightweight and interesting approach.
2175 | 
2176 | [?Move]: https://github.com/rust-lang/rfcs/pull/1858
2177 | 
2178 | # Endnotes
2179 | 
2180 | <a name="temporaries"></a>
2181 | 
2182 | **1.** Scopes always correspond to blocks with one exception: the
2183 | scope of a temporary value is sometimes the enclosing
2184 | statement.
2185 | 
2186 | [RFC 396]: https://github.com/rust-lang/rfcs/pull/396
2187 | [RFC 2025]: https://github.com/rust-lang/rfcs/pull/2025
2188 | [smr]: https://github.com/Ericson2314/a-stateful-mir-for-rust
2189 | [10520]: https://github.com/rust-lang/rust/issues/10520
2190 | [ATC]: https://github.com/rust-lang/rfcs/pull/1598
2191 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # nll-rfc
2 | Non-lexical lifetimes RFC.
3 | 


--------------------------------------------------------------------------------