└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Different approach for implementing a borrow checker 2 | Today I introduce a method to implement a kind of borrow checker like rust's one that is more convenient for developers to use. 3 | The idea is to associate references to actual referenced variables instead of scopes. 4 | I'll call it references annotations 5 | For instance: 6 | 7 | reference: &'a i32, // a reference to variable called a whose type is i32 8 | Note that the `'a` may not be the real name of the variable but instead a denoted name, or even may mean multiple variables! 9 | Also keep in mind that while the syntax may look too verbose at some points, most of the time the developer will not have to type annotations and when he has to it is mostly the basic ones. 10 | 11 | Before continue reading be informed that while the syntax here is similar to that of rust's one, these concepts are not limited to a particular syntax, and I am not proposing to add anything to rust! 12 | 13 | ## Who refers to whom 14 | Since the compiler can see and control all reference creation in a block of code it can determine which reference refers to which variable: 15 | 16 | struct Struct { 17 | field1: i32, 18 | field2: i32, 19 | } 20 | 21 | fn main() { 22 | let i1 = 0; // i1 is created 23 | let mut r = &i1; // r points to i1 24 | println!("{}", *r); // ok: r is valid because i1 is valid 25 | { 26 | let i2 = 0; // i2 is created 27 | r = &i2; // r points to i2 28 | println!("{}", *r); // ok: r is valid because i2 is valid 29 | // i2 is dropped here leaving r dangling 30 | } 31 | // println!("{}", *r); // error: r is invalid 32 | r = &i1; // r points to i1 again 33 | println!("{}", *r); // ok: r is valid because i1 is valid 34 | 35 | let s = Struct{ field1: 0, field2: 1 }; 36 | r = &s.field1; // r points to s.field1 37 | println!("{}", *r); // ok: r is valid because s.field1 is valid 38 | r = &s.field2; // r points to s.field2 39 | println!("{}", *r); // ok: r is valid because s.field2 is valid 40 | drop(s); // s is dropped with all its fields leaving r dangling 41 | // println!("{}", *r); // error: r is invalid 42 | } 43 | 44 | The compiler did all the job, and we didn't have to write any annotations. But what if we want to call another function and deal with references in parameters and return values? 45 | 46 | ## Functions 47 | The compiler has two options to do the borrow checking when calling functions is involved: 48 | - The compiler can step into the body of the function and continue analysing as if it was an inline block of code. This is currently used for lambdas, but not for ordinary functions since compile time will explode 49 | - The compiler will determine what to do based on the function signature. So, we must encode the set of rules for the compiler to follow in the function signature. 50 | 51 | Now how to do this? 52 | Suppose we want to write a function that takes a reference pointing to some variable and returns a reference to the same variable, then we may write this: 53 | 54 | fn same_ref<'a>(r: &'a i32) -> &'a i32 { 55 | r 56 | } 57 | 58 | Let's know how to interpret this: 59 | - The function `same_ref<'a>` names a variable `'a` that resides outside the function body, but where is exactly that variable does not matter inside the function 60 | - The parameter `r: &'a i32` is a reference to the variable `'a` whose type is `i32` 61 | - The return type `&'a i32` is a reference to the same variable `'a` 62 | - Inside the function body the compiler will make sure that the returned value `r` is pointing to `'a` as promised by the function signature 63 | 64 | All the caller needs to use the function is its signature `fn same_ref<'a>(r: &'a i32) -> &'a i32` so we can write: 65 | 66 | let i1 = 0; // i1 is created 67 | // i1 is 'a and the resulting reference points to 'a which is i1! 68 | let r = same_ref(&i1); // r points to i1 69 | 70 | Let's write a function that takes two references and returns either of them: 71 | 72 | fn max_of<'a, 'b>(r1: &'a i32, r2: &'b i32) -> &'(a | b) i32 { 73 | if *r1 >= *r2 { 74 | r1 75 | } else { 76 | r2 77 | } 78 | } 79 | 80 | The function is more verbose than the previous one, but it isn't hard to understand: 81 | - The function `max_of<'a, 'b>` names variables `'a` and `'b` that reside outside the function body. Both `'a` and `'b` can point to the same variable or to different variables. More on this later 82 | - The parameter `r1: &'a i32` points to `'a` and `r2: &'b i32` points to `'b` 83 | - The returned reference `&'(a | b) i32` may point to either of `'a` or `'b` 84 | - Inside the function body the compiler will make sure that the returned value `r1` or `r2` is pointing to either `'a` or `'b` as promised by the function signature 85 | 86 | Here how this function may be used: 87 | 88 | 89 | let i1 = 0; // i1 is created 90 | { 91 | let i2 = 1; // i2 is created 92 | // i1 is 'a and i2 is 'b, the resulting reference points to either 'a or 'b so either i1 or i2! 93 | let r = max_of(&i1, &i2); // r points to either i1 or i2 94 | // i2 is dropped here leaving r dangling 95 | } 96 | Wait! can one reference `r` point to two variables? Really no. But since the compiler does not know which of the two variable the reference is pointing to it will be conservative and assume it points to either of them. This is why I said in the begin that one annotation `'a` may refer to a set of variables not only one. 97 | 98 | The good news is that most of the references used will be tied to only one variable and when tied to many variables the set is limited and the compiler should know the set of variables that a reference may point to, but it can't deduce which one is actually pointed to 99 | 100 | ### Out references 101 | There is another method a function can return references with, a kind of out references where you pass the function a reference to reference or a reference to a structure that has a reference field, and the function will make this inner reference point to another one 102 | Consider the following function: 103 | 104 | fn move_a_to_b<'a, 'b>(r1: &mut &'a i32, r2: &mut &'b i32) { 105 | // the inner reference of r1 points to 'a 106 | // // the inner reference of r2 points to 'b 107 | *r1 = *r2; // the inner reference of r1 points to the inner of r2 thus points to 'b now 108 | } 109 | notice how we didn't add annotations to `r1` and `r2` because we don't care about what does they refer to, only we are interested in the inner references pointing to `'a` and `'b` respectively. 110 | The caller will call the function like this: 111 | 112 | let i1 = 0; // i1 is created 113 | let i2 = 1; // i2 is created 114 | let mut r1 = &i1; // r1 points to i1 115 | let mut r2 = &i2; // r2 points to i2 116 | move_a_to_b(&mut r1, &mut r2); // error: how can the compiler know which variable is pointed to by each reference? 117 | If you try to compile this with rust compiler you will get: 118 | ``` 119 | error: lifetime may not live long enough 120 | *r1 = *r2; // the inner reference of r1 points to the inner of r2 thus points to 'b now 121 | | ^^^^^^^^^ assignment requires that `'b` must outlive `'a` 122 | = help: consider adding the following bound: `'b: 'a` 123 | ``` 124 | The rust compiler tries to figure out the lifetime of references, but it can't prove that `'b` outlives `'a` so it complains in the assignment line and ask to add a bound `'b: 'a` 125 | On the other hand, the introduced borrow checker will let the function compile fine since it is not the responsibility of the function to deduce the lifetime of its parameters, instead it should tell the compiler how it deals with references and since its signature does not tell anything about this, the compiler will complain in the call site saying that it can't match references to variables anymore. 126 | The question now is how to encode this operation in the function signature? 127 | Rust already supports two methods: bounds in the annotations declarations and bounds with `where` clause. 128 | 129 | Here I'll take another approach. I'll say a function may have preconditions and/or postconditions, both described using `where` clause followed by first block for preconditions (may be empty) and optional second block for postconditions. 130 | The full syntax is: 131 | 132 | where { preconditions } { postconditons } 133 | Again, be informed that syntax used here is for explaining and description only and I don't propose any syntax for any programming language, and a good programming language designer will mostly find a nicer syntax. 134 | 135 | Let's keep the preconditions block empty for now and go to postconditions. 136 | To say that after the function returns `r1` will be pointing to `'b` add this to postconditions: 137 | 138 | r1: &mut &'b i32 139 | The function becomes: 140 | 141 | fn move_a_to_b<'a, 'b>(r1: &mut &'a i32, r2: &mut &'b i32) where {} { r1: &mut &'b i32 } { 142 | // the inner reference of r1 points to 'a 143 | // // the inner reference of r2 points to 'b 144 | *r1 = *r2; // the inner reference of r1 points to the inner of r2 thus points to 'b now 145 | } 146 | the compiler now can use the function signature to compile this code: 147 | 148 | 149 | let i1 = 0; // i1 is created 150 | let i2 = 1; // i2 is created 151 | let mut r1 = &i1; // r1 points to i1 152 | let mut r2 = &i2; // r2 points to i2 153 | // i1 is 'a and i2 is 'b 154 | move_a_to_b(&mut r1, &mut r2); 155 | // r1 points now to i2 ('b), and r2 is unchanged 156 | 157 | If the function will not change `r1` in all paths it may use the postcondition `r1: &mut &'(a + b) i32` instead and the compiler will assume that `r1` points to either of them after return 158 | 159 | The syntax got too verbose, right? Yes, but you don't need to define eached annotation and dealing with out references like this is rare unlike return types. 160 | For example, you can omit `'a` from the previous function like any reference annotation you don't use in return types and don't assign to another reference in the parameters. 161 | There are also rules for the compiler to put the annotations for you in general cases similar to rust's rules. 162 | 163 | ## Structs 164 | A structure generally contains number of fields that are constructed together in a specified order and dropped together in a specified order. Each member in a struct can be referenced individually and a reference to one field must not be mistaken to another one and is not the same as a reference to the struct itself! This point is important and differs from rust's borrow checker's rules. 165 | 166 | struct Struct { 167 | field1: i32, 168 | field2: i32, 169 | } 170 | let s = Struct{ field1: 0, field2: 1 }; 171 | let r1 = &s.field1; // r1 points to s.field1 172 | println!("{}", *r1); // ok: r1 is valid because s.field1 is valid 173 | let r2 = &s.field2; // r2 points to s.field2 174 | println!("{}", *r2); // ok: r2 is valid because s.field2 is valid 175 | 176 | Rust already differentiates between direct fields borrows as above but the situation is different when returning fields from a function: 177 | 178 | fn get_field1<'a>(s: &'a Struct) -> &'a i32 { 179 | &s.field1 180 | } 181 | This can't be done in the introduced borrow checker because: 182 | - The compiler needs to know which object the reference points to, and the function doesn't tell which field of the struct is returned by reference 183 | - In the function signature we named a variable `'a` with type `Struct` in the parameter but it appeared at the return with type `i32`! 184 | 185 | So, a new syntax is required to get this code work. 186 | Recall that `r: &'a i32` means that reference `r` points to variable `'a` of type `i32` and `a` may be the real name of the variable or introduced by function, struct, impl, ... 187 | In order to tie a reference to field of struct the annotation will be the real name of the field just as you are using it: 188 | 189 | fn get_field1(s: &Struct) -> &'s.field1 i32 { 190 | &s.field1 191 | } 192 | Or by field index: 193 | 194 | fn get_field1(s: &Struct) -> &'s.0 i32 { 195 | &s.field1 196 | } 197 | For associated methods which start with `self` parameter the field name or index may be used directly: 198 | 199 | impl Struct { 200 | fn get_field1(&self) -> &'self.0 i32 { 201 | &self.field1 202 | } 203 | 204 | fn get_field2(&self) -> &'1 i32 { 205 | &self.field2 206 | } 207 | } 208 | nested fields annotations are written in the same manner, for example: `&'s.0.1` 209 | Why one needs the type of field while the field path is typed in the annotation? 210 | Because the function may not return a reference to a `dyn` trait that is implemented by the field. In other cases, the type may be omitted but it is not a big deal anyway. 211 | 212 | ### Reference fields in structs 213 | A struct may contain a reference as a field so there must be a syntax to tie this reference to its pointed to variable. The rust syntax can just do it with minor changes: 214 | 215 | // 'a and 'b names two set of variables which may or may not be the same 216 | struct TwoRefs<'a, 'b> { 217 | r1: &'a i32, // a is i32 218 | r2: &'b i32, // b is i32 219 | r3: &'a i32, // r3 is always pointing to the same variable pointed to by r1 220 | //r4: &'a f64, // error a can't be i32 and f64 221 | r5: &'a dyn Trait, // ok: a may be referenced as Trait since Trait is implemented for i32 222 | r6: &('a | 'b) i32, // r6 points to either 'a or 'b, assume it points to both 223 | } 224 | 225 | Inside the struct `impl` block reference parameters of functions that points to the same struct variables sets `'a` and `'b` are added to the set because the function may not assign the reference in all paths If it is required to replace the variables set instead of adding to it, one may use `''a` instead of `'a` and this will be enforced in the function body 226 | 227 | impl<'a, 'b> TwoRefs<'a, 'b> { 228 | // the resulting struct points to 'a and 'b 229 | fn new(r1: &'a i32, r2: &'b i32) -> Self { 230 | Self { 231 | r1: r1, 232 | r2: r2, 233 | r3: r1, 234 | r5: r1, 235 | r6: r2, // or r6: r1 236 | } 237 | } 238 | 239 | // add 'a variable referenced by r to the struct's 'a set 240 | fn add_to_a(&mut self, r: &'a i32) { 241 | if *r1 > 0 { 242 | r1 = r; 243 | } 244 | r5 = r; 245 | } 246 | 247 | // set the 'b set to ''b referenced by r 248 | fn_set_b(&mut self, r: &''b i32) { 249 | // must set all references that may point to 'b to r 250 | r2 = r; 251 | r6 = r; 252 | } 253 | } 254 | 255 | fn get_a<'a>(t: &TwoRefs<'a, '_>) -> &'a i32 { 256 | //t.r6 // error: r6 points to either 'a or 'b but return must point to 'a 257 | //t.r3 // ok 258 | t.r1 259 | } 260 | 261 | fn main() { 262 | let i1 = 0; 263 | let i2 = 1; 264 | let mut t = TwoRefs::new(&i1, &i2); 265 | // t.r1 and t.r3 and t.r5 and t.r6 points to i1 266 | // t.r2 and t.r6 points to i2 267 | 268 | let i3 = 2; 269 | t.add_to_a(&i3); // .r1 and t.r3 and t.r5 and t.r6 points to both i1 and i3 270 | 271 | let i4 = 3; 272 | t.set_b(&i4); // t.r2 and t.r6 points to i4, t.r6 no longer points to i3 273 | 274 | let r = get_a(&t); // r points to both i1 and i3 275 | } 276 | 277 | ## Multiple mutable references 278 | This is a known limitation in rust, that is: it does not support multiple `mut` references to the same object or mixed `mut` and non `mut` references, instead you need to use `RefCell` and `Cell` and let the safety checks be done in runtime. However, you can't just plug in `RefCell` and go since these wrappers don't go well with other types. Either you wrap all your type in cells, or you will find yourself always moving your variables to and from a cell. 279 | Another caveat is that the cells can be mutated without being declared `mut` so it goes beyond mutable by default to no `const` in the language! 280 | Not to mention that some entire architectures use mutably shared data extensively like retained GUI and MVC programming (Model View Controller). This explains why rust does not have until now usable retained GUI library and each developer is coming with a new architecture that comes nowhere near the proved and established ones. 281 | 282 | But why rust did go this way? The problem is that some wrapper types like `enum` , `Box`, `Vec` and other pointers and containers may return you a reference to a value that is allocated on the heap and if the container for example clears its storage the returned reference will be invalid! The same applies to iterators. This is known among `c++` developers like iterator invalidation and use after move. 283 | 284 | Look at this similar struct to `Box` introduced in rust docs: 285 | 286 | struct Carton(ptr::NonNull); 287 | 288 | impl Carton { 289 | pub fn new(value: T) -> Self { 290 | // Allocate enough memory on the heap to store one T. 291 | assert_ne!(size_of::(), 0, "Zero-sized types are out of the scope of this example"); 292 | let mut memptr: *mut T = ptr::null_mut(); 293 | unsafe { 294 | let ret = libc::posix_memalign( 295 | (&mut memptr).cast(), 296 | align_of::(), 297 | size_of::() 298 | ); 299 | assert_eq!(ret, 0, "Failed to allocate or invalid alignment"); 300 | }; 301 | 302 | // NonNull is just a wrapper that enforces that the pointer isn't null. 303 | let ptr = { 304 | // Safety: memptr is dereferenceable because we created it from a 305 | // reference and have exclusive access. 306 | ptr::NonNull::new(memptr) 307 | .expect("Guaranteed non-null if posix_memalign returns 0") 308 | }; 309 | 310 | // Move value from the stack to the location we allocated on the heap. 311 | unsafe { 312 | // Safety: If non-null, posix_memalign gives us a ptr that is valid 313 | // for writes and properly aligned. 314 | ptr.as_ptr().write(value); 315 | } 316 | 317 | Self(ptr) 318 | } 319 | } 320 | 321 | impl Deref for Carton { 322 | type Target = T; 323 | 324 | fn deref(&self) -> &Self::Target { 325 | unsafe { 326 | // Safety: The pointer is aligned, initialized, and dereferenceable 327 | // by the logic in [`Self::new`]. We require readers to borrow the 328 | // Carton, and the lifetime of the return value is elided to the 329 | // lifetime of the input. This means the borrow checker will 330 | // enforce that no one can mutate the contents of the Carton until 331 | // the reference returned is dropped. 332 | self.0.as_ref() 333 | } 334 | } 335 | } 336 | 337 | impl DerefMut for Carton { 338 | fn deref_mut(&mut self) -> &mut Self::Target { 339 | unsafe { 340 | // Safety: The pointer is aligned, initialized, and dereferenceable 341 | // by the logic in [`Self::new`]. We require writers to mutably 342 | // borrow the Carton, and the lifetime of the return value is 343 | // elided to the lifetime of the input. This means the borrow 344 | // checker will enforce that no one else can access the contents 345 | // of the Carton until the mutable reference returned is dropped. 346 | self.0.as_mut() 347 | } 348 | } 349 | } 350 | 351 | impl Drop for Carton { 352 | fn drop(&mut self) { 353 | unsafe { 354 | libc::free(self.0.as_ptr().cast()); 355 | } 356 | } 357 | } 358 | 359 | Imagine that the compiler will allow you to get multiple `mut` references to `Carton` and write the following code: 360 | 361 | let mut c = Carton::new(0); // c is Carton holding i32 value allocated on the heap 362 | let r: &mut i32 = &mut c; // r is &i32 borrowing c 363 | c = Carton::new(0); // assign c to a new Carton, the old storage is freed leaving r dangling 364 | println!("{}", *r); // error: use after free, r is dangling! 365 | The code clearly invokes a bug, and rut prevents it by preventing the use of `mut c` while `r` is still in use 366 | 367 | How can this be worked around? 368 | Let's introduce a new special type called `PhantomMut` . This is not the best name to call it, but it suffices for now. 369 | This type, like rust's phantom types, does not have storage in the containing `struct` . The special thing about `PhantomMut` is that it is considered dropped after each mutation of its struct and a new `PhantomMut` is created so references to this type gets invalidated each time this type is mutated. 370 | How does this help us build a `Box`? 371 | 372 | struct Carton { 373 | p: ptr::NonNull, 374 | pub _0: PhantomMut, 375 | } 376 | 377 | impl Deref for Carton { 378 | type Target = T; 379 | 380 | // remeber that references to fields are not references to the struct 381 | fn deref(&self) -> &'_0 Self::Target { 382 | unsafe { 383 | // unsafe allows converting reference with unknown variable to &'_0 384 | self.p.as_ref() 385 | } 386 | } 387 | } 388 | 389 | impl DerefMut for Carton { 390 | // after this method all previous references to '_0 are invalid 391 | fn deref_mut(&mut self) -> &'_0 mut Self::Target { 392 | unsafe { 393 | // unsafe allows converting reference with unknown variable to &'_0 394 | self.0.as_mut() 395 | } 396 | } 397 | } 398 | 399 | impl Drop for Carton { 400 | // after this method all previous references to '_0 are invalid 401 | fn drop(&mut self) { 402 | unsafe { 403 | libc::free(self.0.as_ptr().cast()); 404 | } 405 | } 406 | } 407 | 408 | Let's use it 409 | 410 | let mut c = Carton::new(0); // c is Carton holding i32 value allocated on the heap 411 | let r: &mut i32 = &mut c; // r is &i32 borrowing c._0 412 | c = Carton::new(0); // assign c to a new Carton, drop is called, c._0 is reset, the old storage is freed and r is invalid 413 | //println!("{}", *r); // error: use after free, r is dangling! 414 | r = &mut c; // r is valid again 415 | 416 | The compiler now can allow a mutable reference to `c` and `c._0` and will detect when the reference to `c._0` becomes invalid. 417 | But you will notice that the example above is somewhat useless because if we try to dereference `Carton` multiple times the previous references will be invalid but the pointed to value is still valid! So, we can't get more than a `mut` reference to the value of the box stored on the heap. 418 | To overcome this, we will use a new postcondition to tell the compiler not to reset `c._0` after the call to `fn deref_mut(&mut self) -> &'_0 mut Self::Target`. 419 | 420 | impl DerefMut for Carton { 421 | // after this method all previous references to '_0 are invalid 422 | fn deref_mut(&mut self) -> &'_0 mut Self::Target where {} { _0: const } { 423 | unsafe { 424 | // unsafe allows converting reference with unknown variable to &'_0 425 | self.0.as_mut() 426 | } 427 | } 428 | } 429 | 430 | We can now call `deref_mut` multiple times and all the references pointing to `c._0` will still be valid, only on drop they are invalidated. 431 | 432 | The same applies to `Vec`: `index` and `index_mut` will return a reference pointing to `vec._0` and operations like `push`, `pop`, `clear`... will invalidate the references and iterators to the data allocated by the vector. 433 | 434 | ## Enums (tagged unions) 435 | In rust tagged union are backed in the language under the name of `enum` (not to be confused with `c`/`c++` `enum`) 436 | If current rust's borrow checker allowed multiple `mut` references to an `enum` references may be dangling without a way to detect that and use after free bugs may arise. 437 | I took this example from a blog on the web: 438 | 439 | enum StringOrInt { 440 | Str(String), 441 | Int(i64) 442 | } 443 | 444 | let x = Str("Hi!".to_string()); // Create an instance of the `Str` variant with associated string "Hi!" 445 | let y = &mut x; // Create a mutable alias to x 446 | 447 | if let Str(ref insides) = x { // If x is a `Str`, assign its inner data to the variable `insides` 448 | *y = Int(1); // Set `*y` to `Int(1), therefore setting `x` to `Int(1)` too 449 | println!("x says: {}", insides); // Uh oh! 450 | } 451 | By making all `enum` types have a public field `_0: PhantomMut`, `_1: PhantomMut`, ... for each type `T` in the `enum` all references referring to a type in the `enum` will be pointing to that field and when the `enum` is dropped or assigned all public fields are reset invalidating references, except the one associated with the type being set 452 | 453 | enum StringOrInt { 454 | Str(String), 455 | Int(i64), 456 | // pub _0: PhantomMut, 457 | // pub _1: PhantomMut, 458 | } 459 | 460 | let x = Str("Str".to_string()); // Create an instance of the `Str` variant 461 | let y = &mut x; // Create a mutable alias to x 462 | 463 | if let Str(ref insides) = x { // insides is a field inside Str variant which is &'x._0 Str 464 | *y = Int(1); // Set `*y` to `Int(1), therefore setting `x` to `Int(1)` too, therefore x._0 is reset leaving &'x._0 Str dangling, and all its fields are invalid 465 | //println!("x says: {}", insides); // error: insides is invalid after x._0 was reset 466 | } 467 | ## Arrays 468 | arrays can be thought as a fixed size `Vec` at which all elements are constructed together in a specified order, and dropped together in a specified order. 469 | An array of type `[T;N]` will have a public field `_0: PhantomMut` which all references to elements of the array are tied to, and when the array is dropped all references to its elements are invalidated 470 | 471 | ### Swap implementation with multiple mutable aliases 472 | Rust has a swap function that does bitwise swap of two variables of the same type 473 | 474 | fn swap(x: &mut T, y: &mut T) 475 | the usual implementation is to use `memcpy` to copy the content of `x` to a temporary place, then copy content of `y` to `x` and retore `x` contents from temp into `y`. However, an implementation may do some optimizations based on the assumption that `x` and `y` does not alias. With multiple mutable aliases allowed how a function can assert that two mutable references in its parameters don't alias? 476 | Using function preconditions: 477 | 478 | fn swap(x: &'a mut T, y: &'b mut T) where { 'a != 'b } { /* function body */ } 479 | Without this precondition the compiler can't deduce that inside the function body `'a` and `'b` do not alias so it will be conservative and assume that all references of the same type passed in parameters may point to one variable and make analysis based on this. This will lead us to the next section, but before going to the next section, how can we swap two parameters if we for some reason couldn't provide a precondition to require that they don't alias? Specialize the swap function: 480 | 481 | fn swap(x: &mut T, y: &mut T) { /* assume x and y refer to the same memory and check if address of x equals that of y first */ } 482 | In fact, this is used in `c++` assignment operators nearly everywhere because mutable references may alias each other. The situation with a borrow checker is better since the compiler can easily use the best specialization based on the variables it is passing to the function 483 | ### Assume all references in parameter are invalid after one reference of the same type is invalid, unless proved the opposite 484 | If we have this function: 485 | 486 | fn mutate_carton(c: &mut Carton, i: &i32) { 487 | c = Carton::new(0); // drop and assign c to a new Carton, reset c._0 and invalidate any pointer to c._0 488 | *i = 1; // what if i was referring to c._0 ? a potential use after free bug! 489 | } 490 | 491 | and it used like this: 492 | 493 | let mut c = Carton::new(0); 494 | mutate_carton(&mut c, &c); 495 | 496 | This will lead to a subtle bug based on what input is given to the function. The current rust borrow checker doesn't allow this code to begin with. But with multiple mutable aliases allowed, this code may compile and cause an after free bug. 497 | To mitigate this problem, when the compiler attempts to invalidate a reference of type `T` in the parameters of a function, it will invalidate all references with the same type `T` or a trait implemented by `T` in the parameters, unless the compiler can prove (with aid of precondition) that a particular reference does not alias with other references. So, to requires that `i` does not point to `c._0` we may write: 498 | 499 | fn mutate_carton<'a>(c: &mut Carton, i: &'a i32) where { 'a != 'c._0 } { 500 | c = Carton::new(0); // drop and assign c to a new Carton, reset c._0 and invalidate any pointer to c._0 501 | *i = 1; // i does not point to c._0 so it is still valid 502 | } 503 | The following code will not compile because it violates the function preconditions: 504 | 505 | let mut c = Carton::new(0); 506 | mutate_carton(&mut c, &c); // error c points to c._0 507 | 508 | How does the compiler know it is invalidating references to `i32` when `c._0` is reset? It's because `c._0` has a type of `PhantomMut`, and this is why `PhantomMut` is generic and why the compiler must know the type of each reference annotation in a struct to know which reference to invalidate and which to not. 509 | 510 | ## Static variables and references 511 | All static variables are accessible by all threads and are either immutable or `mut` and `Sync` where `Sync` means it can be mutated from multiple threads (like a mutex or atomic type) or it is immutable and thus can be read by multiple threads 512 | Static variables also last from before the main function begins until after it returns so it is always valid to access after the main function was entered. 513 | Generally speaking, rust does not allow to do advanced things with static variables like in `c++` in a safe manner so we will not talk about order of initialization and related problems. 514 | From a function point of view, if a reference points to a static variable `r: &'static i32` it is not very helpful to know which static variable actually is referenced by the reference, and if a function returns a static reference the caller does not need to know which static variable is referenced. 515 | 516 | A more useful thing for generic programming, instead of relying on `'static` to require that a type is static a trait `Static` is more useful. This trait will be auto implemented for types and structs that does not contain reference, references to static variables and structs that depend only on static variables. This should make it easier to specialize generic functions for `Static` variables but implementing this trait for references and structs that depend only on static variables is tricky! It's because a reference that might now point to a static variable might not point to it later, and a struct that depends on a static variable may not depend on it later, leaving a burden on the borrow checker to implement and un implement a trait for type. This is clearly not the job of the borrow checker and this clarifies why rust specialization has not reached stabilization yet! 517 | 518 | ## What about threads? 519 | As far as I know, there exists two kinds of safe thread types: 520 | - Detached thread (start and forget) which requires its functor to be static and since static variables are `Sync` or immutable, detached thread should just work as it did before 521 | - Scoped thread where you have spawn threads and block the current thread until they have finished executing so spawned threads can refer to data in the local thread stack and since multiple mutable references can alias, a data race may occur, right? 522 | 523 | Examine this example from rust docs: 524 | 525 | let mut a = vec![1, 2, 3]; 526 | let mut x = 0; 527 | 528 | thread::scope(|s| { 529 | s.spawn(|| { 530 | println!("hello from the first scoped thread"); 531 | // We can borrow `a` here. 532 | dbg!(&a); 533 | }); 534 | s.spawn(|| { 535 | println!("hello from the second scoped thread"); 536 | // We can even mutably borrow `x` here, 537 | // because no other threads are using it. 538 | x += a[0] + a[2]; 539 | }); 540 | println!("hello from the main thread"); 541 | }); 542 | 543 | // After the scope, we can modify and access our variables again: 544 | a.push(4); 545 | assert_eq!(x, a.len()); 546 | 547 | Notes: 548 | - threads eagerly starts with `s.spawn` 549 | - threads functors are required to be `Send` 550 | - spawned threads can capture multiple immutable references 551 | - only one `mut` reference to an object is allowed inside the scope block 552 | 553 | This clearly won't work out of the box with multiple `mut` references allowed. 554 | But it can be made to work through: 555 | - threads are lazy started inside `thread::scope` after all threads are scheduled for spawning using `s.spawn` 556 | - the functor of the thread muse be `Send` and `Sync` meaning that all references must either be immutable or safe to mutate from multiple threads 557 | 558 | Pros: The current thread may participate in executing the spawned threads saving the cost of launching a new thread instead of just blocking 559 | Cons: No one thread can capture a mutable reference to a local variable except if it is `Sync` so borrowing `x` as above won't work. This is not much of importance because it is more appropriate to mutate data on the current thread instead of sending a `mut` reference to them to another thread and then just block and wait for it to finish 560 | 561 | ## Things I didn't address or forgot 562 | There are many aspects I didn't address, refer to or didn't know of or otherwise forgot to mention so I'm not surprised if any of the methods introduced here are wrong or make nonsense. Any feedback will be welcome! 563 | --------------------------------------------------------------------------------