├── .github └── workflows │ └── deploy.yml ├── .gitignore ├── LICENSE-APACHE ├── LICENSE-MIT ├── README.md ├── active_discussion ├── TEMPLATE.md ├── aliasing.md ├── crypto_concerns.md ├── crypto_concerns │ ├── constant_time_code.md │ └── keeping_secrets.md ├── layout.md ├── stable_addresses.md ├── storage_liveness.md ├── uninhabited_types.md ├── uninitialized_memory.md ├── unions.md └── validity.md ├── meeting-notes ├── 20180308.md ├── 20180913.md ├── 20181011.md ├── 20181025.md └── 20190307.md ├── reference ├── .gitignore ├── book.toml └── src │ ├── SUMMARY.md │ ├── glossary.md │ ├── introduction.md │ ├── layout │ ├── arrays-and-slices.md │ ├── enums.md │ ├── function-pointers.md │ ├── packed-simd-vectors.md │ ├── pointers.md │ ├── scalars.md │ ├── structs-and-tuples.md │ └── unions.md │ ├── optimizations │ └── return_value_optimization.md │ └── validity │ ├── function-pointers.md │ └── unions.md ├── resources ├── deliberate-ub.md └── llvm-assumptions.md ├── triagebot.toml └── wip ├── memory-interface.md ├── stacked-borrows.md └── value-domain.md /.github/workflows/deploy.yml: -------------------------------------------------------------------------------- 1 | name: Deploy 2 | on: 3 | push: 4 | branches: [ master ] 5 | pull_request: 6 | branches: [ master ] 7 | 8 | jobs: 9 | build: 10 | # we run this part also in PRs, to ensure that building the book works 11 | runs-on: ubuntu-latest 12 | steps: 13 | - uses: actions/checkout@v3 14 | with: 15 | fetch-depth: 0 16 | - name: Install mdbook 17 | run: | 18 | mkdir mdbook 19 | curl -Lf https://github.com/rust-lang/mdBook/releases/download/v0.4.34/mdbook-v0.4.34-x86_64-unknown-linux-gnu.tar.gz | tar -xz --directory=./mdbook 20 | echo `pwd`/mdbook >> $GITHUB_PATH 21 | - name: Generate Book 22 | run: | 23 | cd reference 24 | mdbook build 25 | mdbook test 26 | - name: Upload Artifact 27 | uses: actions/upload-pages-artifact@v3 28 | with: 29 | path: reference/book 30 | 31 | deploy: 32 | needs: build 33 | 34 | permissions: 35 | pages: write 36 | id-token: write 37 | 38 | environment: 39 | name: github-pages 40 | url: ${{ steps.deployment.outputs.page_url }} 41 | 42 | # only do this part on an actual push 43 | if: github.event_name == 'push' 44 | 45 | runs-on: ubuntu-latest 46 | steps: 47 | - id: deployment 48 | uses: actions/deploy-pages@v4 49 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | -------------------------------------------------------------------------------- /LICENSE-APACHE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /LICENSE-MIT: -------------------------------------------------------------------------------- 1 | Permission is hereby granted, free of charge, to any 2 | person obtaining a copy of this software and associated 3 | documentation files (the "Software"), to deal in the 4 | Software without restriction, including without 5 | limitation the rights to use, copy, modify, merge, 6 | publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software 8 | is furnished to do so, subject to the following 9 | conditions: 10 | 11 | The above copyright notice and this permission notice 12 | shall be included in all copies or substantial portions 13 | of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF 16 | ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED 17 | TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 18 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 19 | SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 20 | CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR 22 | IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 23 | DEALINGS IN THE SOFTWARE. 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | UCG - Rust's Unsafe Code Guidelines 3 | === 4 | 5 | The purpose of this repository is to collect and discuss all sorts of questions that come up when writing unsafe code. 6 | It is primarily used by the [opsem team](https://github.com/rust-lang/opsem-team/) to track open questions around the operational semantics, but we also track some "non-opsem" questions that fall into T-lang or T-type's purview, if they are highly relevant to unsafe code authors. 7 | 8 | The [Unsafe Code Guidelines Reference "book"][ucg_book] is a past effort to systematize a consensus on some of these questions. 9 | Most of it has been archived, but the [glossary](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html) is still a useful resource. 10 | 11 | Current consensus is documented in [t-opsem FCPs](https://github.com/rust-lang/opsem-team/blob/main/fcps.md) and the [Rust Language Reference]. 12 | 13 | [ucg_book]: https://rust-lang.github.io/unsafe-code-guidelines 14 | [Rust Language Reference]: https://doc.rust-lang.org/reference/index.html 15 | 16 | ## See also 17 | 18 | The [Rustonomicon] is a draft document discussing unsafe code. It is intended to 19 | be brought into agreement with the content here. It represents an organized 20 | effort to explain how to write Rust code, rather than a reference. 21 | 22 | [Rustonomicon]: https://doc.rust-lang.org/nightly/nomicon/ 23 | 24 | ## Code of Conduct and licensing 25 | 26 | All interactions on this repository (whether on issues, PRs, or 27 | elsewhere) are governed by the [Rust Code of 28 | Conduct](CODE_OF_CONDUCT.md). 29 | 30 | Further, all content on this repository is subject to the standard 31 | [Rust](LICENSE-MIT) [licensing](LICENSE-APACHE). 32 | -------------------------------------------------------------------------------- /active_discussion/TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # Name of your discussion area 2 | 3 | - Discussion lead(s): XXX 4 | 5 | ## Introduction 6 | 7 | In this section, give background information and links regarding the area 8 | you would like to discuss. Your audience should be people familiar with Rust 9 | but not necessarily experts in the domain. 10 | 11 | ## Goals 12 | 13 | Describe the goals of this discussion: specific questions or 14 | categories of questions you would like answered. 15 | 16 | ## Some interesting examples and questions 17 | 18 | Highlight some specific examples and things to give people a more 19 | concrete idea, as well as to help seed discussion. 20 | 21 | ## Active threads 22 | 23 | Describe the active threads you imagine for this section and some 24 | notes on their content. As discussion proceeds, this area will be kept 25 | up to date with rough summaries of conclusions, links to posts, and so 26 | forth. 27 | -------------------------------------------------------------------------------- /active_discussion/aliasing.md: -------------------------------------------------------------------------------- 1 | # Aliasing and memory model 2 | 3 | ## Brief Summary 4 | 5 | Rust's borrow checker enforces some particular aliasing 6 | requirements. For example, an `&u32` reference is always guaranteed to 7 | be valid (dereferenceable) and immutable while it is in active 8 | use. Similarly, an `&mut` reference cannot alias any other active 9 | reference. It is less clear how those invariants translate to unsafe 10 | code that is using raw pointers. 11 | 12 | ## See also 13 | 14 | - ACA model https://github.com/nikomatsakis/rust-memory-model/issues/26 15 | - Capability-based model https://github.com/nikomatsakis/rust-memory-model/issues/28 16 | - A formal C memory model supporting integer-pointer casts https://github.com/nikomatsakis/rust-memory-model/issues/30 17 | - Promising semantics for relaxed-memory concurrency https://github.com/nikomatsakis/rust-memory-model/issues/32 18 | - Tootsie Pop model https://github.com/nikomatsakis/rust-memory-model/issues/21 19 | -------------------------------------------------------------------------------- /active_discussion/crypto_concerns.md: -------------------------------------------------------------------------------- 1 | # Cryptographic concerns 2 | 3 | There are a number of concerns that arise when writing cryptographic 4 | code. This chapter summarizes some of them. 5 | -------------------------------------------------------------------------------- /active_discussion/crypto_concerns/constant_time_code.md: -------------------------------------------------------------------------------- 1 | # Constant time code 2 | 3 | It is often important to be able to turn off optimizations and 4 | guarantee that code executes in constant time to prevent side-channel 5 | attacks based on timing. 6 | -------------------------------------------------------------------------------- /active_discussion/crypto_concerns/keeping_secrets.md: -------------------------------------------------------------------------------- 1 | # Keeping secrets 2 | 3 | When storing cryptographic keys, crypto code wants to be sure that the 4 | compiler will not insert loads or stores that were not present in the 5 | source. Moreover, it wants to be able to zero memory and know that no 6 | bits from that memory "escape" into registers etc. 7 | 8 | ## See also 9 | 10 | - https://internals.rust-lang.org/t/volatile-and-sensitive-memory/3188 11 | - https://github.com/nikomatsakis/rust-memory-model/issues/16 12 | -------------------------------------------------------------------------------- /active_discussion/layout.md: -------------------------------------------------------------------------------- 1 | # Data structure layout 2 | 3 | ## Introduction 4 | 5 | This discussion is meant to focus on the following things: 6 | 7 | - What guarantees does Rust make regarding the layout of data structures? 8 | - What guarantees does Rust make regarding ABI compatibility? 9 | 10 | NB. Oftentimes, choices of layout will only be possible if we can 11 | guarantee various invariants -- this is particularly true when 12 | optimizing the layout of `Option` or other enums. However, designing 13 | those invariants is left for a future discussion -- here, we should 14 | document/describe what we currently do and/or aim to support. 15 | 16 | ### Layout of data structures 17 | 18 | In general, Rust makes few guarantees about the memory layout of your 19 | structures. For example, by default, the compiler has the freedom to 20 | rearrange the field order of your structures for more efficiency (as 21 | of this writing, we try to minimize the overall size of your 22 | structure, but this is the sort of detail that can easily change). For 23 | safe code, of course, any rearrangements "just work" transparently. 24 | 25 | If, however, you need to write unsafe code, you may wish to have a 26 | fixed data structure layout. In that case, there are ways to specify 27 | and control how an individual struct will be laid out -- notably with 28 | `#[repr]` annotations. One purpose of this section, then, is to layout 29 | what sorts of guarantees we offer when it comes to layout, and also 30 | what effect the various `#[repr]` annotations have. 31 | 32 | ### ABI compatibilty 33 | 34 | When one either calls a foreign function or is called by one, extra 35 | care is needed to ensure that all the ABI details line up. ABI compatibility 36 | is related to data structure layout but -- in some cases -- can add another 37 | layer of complexity. For example, consider a struct with one field, like this one: 38 | 39 | ```rust 40 | #[repr(C)] 41 | struct Foo { field: u32 } 42 | ``` 43 | 44 | The memory layout of `Foo` is identical to a `u32`. But in many ABIs, 45 | the struct type `Foo` is treated differently at the point of a 46 | function call than a `u32` would be. Eliminating these gaps is the 47 | goal of the `#[repr(transparent)]` annotation introduced in [RFC 48 | 1758]. For built-in types, such as `&T` and so forth, it is important 49 | for us to specify how they are treated at the point of a function 50 | call. 51 | 52 | ## Goals 53 | 54 | - Document current behavior of compiler. 55 | - Indicate which behavior is "permitted" for compiler and which 56 | aspects are things that unsafe code can rely upon. 57 | - Include the effect of `#[repr]` annotations. 58 | - Uncover the sorts of layout optimizations we may wish to do in the 59 | future. 60 | 61 | ## Some interesting examples and questions 62 | 63 | - `&T` where `T: Sized` 64 | - This is **guaranteed** to be a non-null pointer 65 | - `Option<&T>` where `T: Sized` 66 | - This is **guaranteed** to be a nullable pointer 67 | - `Option` 68 | - Can this be assumed to be a non-null pointer? 69 | - `usize` 70 | - Platform dependent size, but guaranteed to be able to store a pointer? 71 | - Also an array length? 72 | - Uninitialized bits -- for which types are uninitialized bits valid? 73 | - If you have `struct A { .. }` and `struct B { .. }` with no 74 | `#[repr]` annotations, and they have the same field types, can we 75 | say that they will have the same layout? 76 | - or do we have the freedom to rearrange the types of `A` but not 77 | `B`, e.g. based on PGO results 78 | - What about different instantiations of the same struct? (`Vec` 79 | vs `Vec`) 80 | - Rust currently says that no single value may be larger than `isize` bytes 81 | - is this good? can it be changed? does it matter *here* anyway? 82 | 83 | ## Active threads 84 | 85 | To start, we will create threads for each major categories of types 86 | (with a few suggested focus points): 87 | 88 | - Integers and floating points 89 | - What about signaling NaN etc? ([Seems like a 90 | non-issue](https://github.com/rust-lang/rust/issues/40470#issuecomment-343803381), 91 | but it'd be good to resummarize the details). 92 | - is `usize` the native size of a pointer? [the max of various other considerations](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212702266)? 93 | what are edge cases here? 94 | - Rust currently states that the maximum size of any single value must fit in with `isize` 95 | - Can we say a bit more about why? (e.g., [ensuring that "pointer diff" is representable](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212703192) 96 | - Booleans 97 | - Prior discussions ([#46156][], [#46176][]) documented bool as a single 98 | byte that is either 0 or 1. 99 | - Enums 100 | - See dedicated thread about "niches" and `Option`-style layout optimization 101 | below. 102 | - Define: C-like enum 103 | - Can a C-like enum ever have an invalid discriminant? (Presumably not) 104 | - Empty enums and the `!` type 105 | - [RFC 2195][] defined the layout of `#[repr(C)]` enums with payloads. 106 | - [RFC 2363][] offers a proposal to permit specifying discriminations. 107 | - Structs 108 | - Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out 109 | (and/or treated by the ABI)? 110 | - e.g., what about different structs with same definition 111 | - across executions of the same program? 112 | - For example, [rkruppe 113 | writes](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212776247) 114 | that we might "want to guarantee (some subset of) newtype 115 | unpacking and relegate `#[repr(transparent)]` to being the way 116 | to guarantee to other crates that a type with private fields is 117 | and will remain a newtype?" 118 | - Tuples 119 | - Are these effectively anonymous structs? 120 | - Unions 121 | - Can we ever say anything about the initialized contents of a union? 122 | - Is `#[repr(C)]` meaningful on a union? 123 | - When (if ever) do we guarantee that all fields have the same address? 124 | - Fn pointers (`fn()`, `extern "C" fn()`) 125 | - When is transmuting from one `fn` type to another allowed? 126 | - Can you transmute from a `fn` to `usize` or raw pointer? 127 | - In theory this is platform dependent, and C certainly draws a 128 | distinction between `void*` and a function pointer, but are 129 | there any modern and/or realisic platforms where it is an 130 | issue? 131 | - Is `Option` guaranteed to be a pointer (possibly null)? 132 | - References `&T` and `&mut T` 133 | - Out of scope: aliasing rules 134 | - Always aligned, non-null 135 | - When using the C ABI, these map to the C pointer types, presumably 136 | - Raw pointers 137 | - Effectively same as integers? 138 | - Is `ptr::null` etc guaranteed to be equal to `0_usize`? 139 | - C does guarantee that `0` when cast to a pointer is NULL 140 | - Layout knobs: 141 | - Custom alignment ([RFC 1358]) 142 | - Packed ([RFC 1240] talks about some safety issues) 143 | 144 | [#46156]: https://github.com/rust-lang/rust/pull/46156 145 | [#46176]: https://github.com/rust-lang/rust/pull/46176 146 | [RFC 2363]: https://github.com/rust-lang/rfcs/pull/2363 147 | [RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html 148 | [RFC 1358]: https://rust-lang.github.io/rfcs/1358-repr-align.html 149 | [RFC 1240]: https://rust-lang.github.io/rfcs/1240-repr-packed-unsafe-ref.html 150 | [RFC 1758]: https://rust-lang.github.io/rfcs/1758-repr-transparent.html 151 | -------------------------------------------------------------------------------- /active_discussion/stable_addresses.md: -------------------------------------------------------------------------------- 1 | # Stable addresses 2 | 3 | Clearly, if you have a `&T` reference, the actual pointer address of 4 | that memory must remain valid while **that reference** is in active 5 | use. But how stable are the memory addresses of local variables in 6 | between borrows? Consider: 7 | 8 | ```rust 9 | let x = 22; 10 | foo(&x); 11 | foo(&x); 12 | 13 | fn foo(y: &usize) { .. } 14 | ``` 15 | 16 | Is `foo` guaranteed to be given the same pointer each time? Note that 17 | safe code can observe the pointer value by doing `y as *const usize as 18 | usize`. If, however, the answer is no, that is helpful to the compiler 19 | since it can spill `x` only while it is borrowed but otherwise simply 20 | store `x` in a register. 21 | 22 | **Range of possible answers:** 23 | 24 | - local variables have stable addresses (de facto true today) 25 | - local variables have stable addresses while borrowed, but may change betwen borrows (would be nice) 26 | -------------------------------------------------------------------------------- /active_discussion/storage_liveness.md: -------------------------------------------------------------------------------- 1 | # Storage liveness 2 | 3 | If you move out from a variable, can you still use the underlying stack space? 4 | 5 | ```rust 6 | { 7 | let mut x: Vec = ....; 8 | let p: *mut Vec = &mut x; 9 | drop(x); // compiler can see `x` is uninitialized 10 | 11 | // what happens if you use `p` here? 12 | } // StorageDead(x) 13 | ``` 14 | -------------------------------------------------------------------------------- /active_discussion/uninhabited_types.md: -------------------------------------------------------------------------------- 1 | # Uninhabited types like `!` and exhaustiveness 2 | 3 | TBD 4 | -------------------------------------------------------------------------------- /active_discussion/uninitialized_memory.md: -------------------------------------------------------------------------------- 1 | # Uninitialized memory 2 | 3 | TBD 4 | -------------------------------------------------------------------------------- /active_discussion/unions.md: -------------------------------------------------------------------------------- 1 | # Unions 2 | 3 | TBD 4 | -------------------------------------------------------------------------------- /active_discussion/validity.md: -------------------------------------------------------------------------------- 1 | # Data type validity requirements 2 | 3 | This discussion is meant to focus on the question: Which invariants derived from 4 | types are there that the compiler expects to be *always* maintained, and 5 | (equivalently) that unsafe code must *always* uphold (or else cause undefined 6 | behavior)? This is what is called "validity invariant" in 7 | [Ralf's blog post](https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html), 8 | but we might also decide to change that name. 9 | 10 | ### Interactions and constraints 11 | 12 | Choices of invariants interact, in particular, with layout optimizations: For 13 | example, the fact that `Option<&T>` is pointer-sized relies on the fact that the 14 | validity invariant for `&T` rules out `0x0`, and hence we can use that value as 15 | signaling the `None` case. 16 | 17 | Moreover, the invariants are constrained by attributes that we emit when 18 | generating LLVM IR. For example, we emit `aligned` attributes pretty much any 19 | time we can, which means it is probably a good idea to say that valid references 20 | must be aligned. 21 | 22 | Finally, another consideration to take into account is that ruling out certain 23 | behavior can be great for bug finding. For example, if arithmetic overflow is 24 | defined to have two's-complement-behavior, then bug finding tools can no longer 25 | use overflow as an indication of a software bug. (This is a real problem with 26 | unsigned integer arithmetic in C/C++.) 27 | 28 | ### Possible bit patterns 29 | 30 | The validity invariant of a type is, basically, a set of bit patterns that is 31 | allowed to occur at that type. ("Basically" because the invariant may also be 32 | allowed to depend on memory.) To discuss this properly, we need to first agree 33 | on what "bit patterns" even are. It is not enough to just consider sequences of 34 | 0 and 1, because we also need to take uninitialized data into account. For the 35 | purpose of this discussion, I think it is sufficient to consider every bit as 36 | being either 0, 1 or uninitialized. 37 | [That is not always sufficient](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html), 38 | but I think we can mostly ignore the extra complications introduced by pointer 39 | values. 40 | 41 | In terms of comparing with C, the "uninitialized" bit corresponds to what C 42 | calls "indeterminate" data. In particular, it is allowed to be a "trap 43 | representation". Also, observing the same indeterminate data multiple times is 44 | allowed to yield different results. That's why I am proposing we treat it as a 45 | third state a bit can be in. 46 | 47 | In terms of LLVM, the "uninitialized" bit corresponds to `poison`. It is *not* 48 | the same as `undef`! See 49 | [this paper](https://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf) for some 50 | more material on the topic. 51 | 52 | ### Extent of "always" 53 | 54 | One point we will have to figure out is what exactly "always" means. Thinking 55 | in terms of a semantics for MIR, data most probably needs to be valid any time 56 | it is copied, which primarily happens when executing assignment statements (the 57 | other cases are passing of function arguments and return values). However, it 58 | is less clear whether merely creating a place without accessing the data inside 59 | (such as in `&*x`) should require the data to be valid. 60 | 61 | The entire discussion here is only about validity invariants that have to hold 62 | when the compiler considers a variable initialized. For example, `let b: bool;` 63 | is completely okay to not be initialized because the compiler knows about that; 64 | `let b: bool = mem::uninitialized();` however copies uninitialized data at type 65 | `bool` and hence violates `bool`'s validity invariant. 66 | 67 | ## Goals 68 | 69 | * For every primitive type, determine which assumptions (if any) the compiler 70 | makes about values *not* occurring at that type (serving as a lower bound for 71 | what to declare invalid), and determine which popular patterns in unsafe code 72 | might create "interesting" values of this type that safe code cannot create on 73 | its own (serving as an upper bound for how much we want to declare invalid). 74 | Both of these bounds are soft, but informative. 75 | * Based on that, map out a design space of invariants that seem reasonable. 76 | * Determine when exactly the validity invariant is assumed to hold. 77 | 78 | ## Active threads 79 | 80 | To start, we will create threads for each major category of types. 81 | 82 | * Integers and floating point types 83 | * Do we allow values that contain uninitialized bits? If yes, what are the 84 | rules for arithmetic and logical operations involving uninitialized bits, 85 | e.g. in cases like `x * 0`? There is also some interaction with bug finding 86 | here: tools can only flag uninitialized data at integer type as a bug if we 87 | do not allow that to happen in unsafe code. 88 | 89 | * Raw pointers 90 | * Do we allow values that contain uninitialized bits? 91 | * Are there any requirements on the metadata? 92 | 93 | * References 94 | 95 | I propose splitting this discussion into three pieces. 96 | 97 | * Bit-level properties 98 | * Presumably, references must be non-NULL. 99 | * They probably also must be aligned, but is that required every time a 100 | reference is taken? Also see the [ongoing discussion in RFC 2582][RFC2582]. 101 | * Can there ever be uninitialized bits in a reference? 102 | * Memory-related properties 103 | * Do references have to be dereferencable? What exactly does that even 104 | mean? We have a design choice here to make this *not* part of validity, 105 | but part of the aliasing model instead (and in fact, that is what miri 106 | currently implements). In terms of Stacked Borrows, the operation that 107 | asserts references being dereferencable is `Retag`, and not the validity 108 | check happening at the assignment. That helps to keep validity 109 | independent of the state of memory. 110 | * Does `&[mut] T` have to point to data that is valid at `T`? This interacts 111 | with the question of whether `&*x` is allowed when `x` is a well-aligned 112 | non-null dereferencable pointer that points to invalid data. 113 | * Size-related properties 114 | * For references to unsized types, does validity require the metadata to 115 | make sense? Valid metadata is required to even define a notion of 116 | "dereferencable", because we have to specify the extent of memory that is 117 | dereferencable (i.e., we have to specify how many bytes are 118 | dereferencable). 119 | 120 | On the other hand, this makes validity depend on memory, at least for 121 | vtables. However, vtables are somewhat special memory: They are never 122 | deallocated and never mutated. So while determining the size depends on 123 | memory, we know for sure that the computed size cannot ever change for a 124 | given reference. 125 | 126 | All of this gets much, much more complicated with custom DSTs. For 127 | example, for a C-style string pointer, does validity require there to be a 128 | 0-terminator? Should checking for validity, and/or retagging, really 129 | execute arbitrary user-defined code to determine the extent of memory 130 | covered by this reference? 131 | 132 | * Out of scope: aliasing rules 133 | 134 | * Function pointers 135 | * Presumably, these must be non-NULL. Anything else? Can there ever be 136 | uninitialized bits? 137 | 138 | * Booleans 139 | * Is there anything to say besides: A `bool` must be `0x0` or `0x1`? Do we 140 | allow the remaining bits to be uninitialized? 141 | 142 | * Unions 143 | * Do we make any restrictions here, or are unions just "bags of bits" that may 144 | contain anything? That would mean we can do no layout optimizations. 145 | 146 | * Enums 147 | * Is there anything to say besides: The discriminant must be valid, and all 148 | fields of the active variant must be valid at their respective types? 149 | * The padding between fields can be anything, including uninitialized. 150 | 151 | * Structs, tuples, arrays and all other aggregates (closures, ...) 152 | * Is there anything to say besides: All fields must be valid at their 153 | respective types? 154 | * The padding between fields can be anything, including uninitialized. It was 155 | [recently determined][generators-maybe-uninit] that generators behave 156 | different from other aggregates here. Are we okay with that? Should we push 157 | for generator fields to reflect this in their types? 158 | 159 | * `ManuallyDrop` 160 | * `ManuallyDrop` might be special in terms of the validity invariant. 161 | Probably it requires its data to be bitstring-valid, but does a 162 | `ManuallyDrop<&T>` have to be dereferencable? 163 | 164 | [RFC2582]: https://github.com/rust-lang/rfcs/pull/2582 165 | [generators-maybe-uninit]: https://github.com/rust-lang/rust/pull/56100 166 | 167 | -------------------------------------------------------------------------------- /meeting-notes/20180308.md: -------------------------------------------------------------------------------- 1 | # Meeting 0: 30 Aug 2018 2 | 3 | Woooo--the Unsafe Code Guidelines working group is *officially* rebooted! See the [zulip log](https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines/topic/meeting.202018-08-30) for all of the gory details. 4 | 5 | The meetings are intended to be mostly administrative and *not* focus on technical details, but rather evaluate the WG progress and see where we should focus our efforts. 6 | 7 | ## Takeaways 8 | * We'll be moving meetings to (roughly) every 2 weeks on Thursdays at 1515 UTC (ping @nikomatsakis if you want an official calendar invite) 9 | * Our very first **active discussion** will be...Data structure representation layout! 10 | * The "validity invariant" discussion will be left as future work 11 | * Let's make a [glossary][glossary] to get everyone on the same terminology page 12 | 13 | [glossary]: https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/reference/src/introduction.md 14 | 15 | ## What's next 16 | 17 | Go discuss! 18 | * Define "invalid ranges" for values we require currently and those we want to guarantee 19 | * Which Rust types have defined binary representations 20 | * If/when can you reinterpret a type? 21 | 22 | Active topics are divided into [issues][active]. Keeping the discussions centralized in these issues should help us keep the discussion open and asynchronous. We're happy to reevaluate this approach--reach out to @avadacatavra or @nikomatsakis in [Zulip][zulip] if you have comments/concerns. 23 | 24 | ## Goals 25 | 26 | * Reach a consensus on some representations and work towards a writeup on them (hopefully) 27 | * Identify any areas where there's a less obvious answer and what the options are 28 | 29 | [active]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues?q=is%3Aopen+is%3Aissue+label%3A%22active+discussion+topic%22 30 | [zulip]: https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines 31 | 32 | See you in the GH discussion! 33 | -------------------------------------------------------------------------------- /meeting-notes/20180913.md: -------------------------------------------------------------------------------- 1 | # Meeting 1: 13 Sept 2018 2 | 3 | The active discussion is still Data Representation. It seems like we’re ready for writeups for the following issues: 4 | 5 | - [structs](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11) and [tuples](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12): niko 6 | - [references and pointers](http://16https://github.com/rust-rfcs/unsafe-code-guidelines/issues/16): avadacatavra 7 | - [function pointers](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/14): nicole mazzuca 8 | - [packed/align](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/17): **** 9 | 10 | We’re still looking for a consensus/further discussion on: 11 | 12 | - [enums](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10) 13 | - [unions](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/13) 14 | - [integers/floating points](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/9) 15 | 16 | If you’re interested in working on a writeup, please comment on the issue. I’ll be adding some tags to help us organize this. See zulip for the [full log](https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines/topic/meeting.202018-09-13). 17 | 18 | ## How is the format working out? 19 | Overall, it seems like the github discussions are productive. However, there’s the question of how we produce concrete summaries/writeups/more permanent artifacts (aka how are we actually writing this reference book?). 20 | 21 | **Who/what/why/when/how of writeups** 22 | 23 | We should designate people responsible for writeups earlier. This is also a place where we should reach out to members of the community who might be interested/involved in topics elsewhere, but not necessarily aware of the discussions occurring here. 24 | 25 | The writeups should reflect whatever consensus was reached in the GH issue—someone will summarize the discussion and open a PR on the mdbook in the repo. We can then iterate on the PR with comments/reviews/suggestions. If needed, we can always merge a starting point PR (for documentation and revisitation) or close a PR if it looks like the topic needs more discussion. 26 | 27 | So, what do I mean by **consensus**? We don’t need to agree on what the answer is, but we should agree on what the questions are. Some things to think about: 28 | 29 | - how is this topic currently handled? is this appropriate? 30 | - what are different approaches and tradeoffs? 31 | - what do people need to keep in mind when dealing with this topic? 32 | - if there isn’t necessarily a right way, is there a wrong way? 33 | 34 | We’ll see how this writeup process goes and document it once something seems like it works. 35 | 36 | ## What’s next? 37 | 38 | - Oops! We forgot to add a [license](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/20). 39 | - Start writing up the Data Representation chapter 40 | - Do we want to say anything about other representations (e.g. `Box`, slices, fat pointers…) 41 | - Should we have an ABI section 42 | - Keep discussing topics that aren’t ready for writeups 43 | - Figure out how to do writeups more effectively 44 | 45 | -------------------------------------------------------------------------------- /meeting-notes/20181011.md: -------------------------------------------------------------------------------- 1 | # Meeting 11 Oct 2018 2 | 3 | Zulip thread: [meeting 2018-10-11](https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines/subject/meeting.202018-10-11) 4 | 5 | The main discussion was about how to finish up with the structs+tuples 6 | PR ([#31]). We opted to pull out several "unresolved questions" into 7 | sub-issues and call for a "final comment period" on the PR. This means 8 | we will merge the PR next meeting, presuming nothing arises. 9 | 10 | [#31]: https://github.com/rust-rfcs/unsafe-code-guidelines/pull/31 11 | 12 | We also reviewed the assignments for writing summaries of other 13 | issues, which hasn't really happened yet. It seems like it'd be good 14 | to get some "fresh blood" to take a look. For reference, here are a 15 | list of other issues. For all of them, there is at least some 16 | tentative consensus on certain issues, and so it would be great to 17 | have people volunteer to write up a PR (similar to [#31]) that 18 | documents what is known and pulls out important unresolved questions. 19 | You can kind the list of issues by searching for [the writeup-needed 20 | label](https://github.com/rust-rfcs/unsafe-code-guidelines/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Aissue+label%3A%22active+discussion+topic%22+label%3Awriteup-needed): 21 | 22 | - References and raw pointers (#16) 23 | - Fn pointers (#14) 24 | - Unions (#13) 25 | - Enums (#10) 26 | - Integers and floating points (#9) 27 | 28 | -------------------------------------------------------------------------------- /meeting-notes/20181025.md: -------------------------------------------------------------------------------- 1 | # Meeting 4: 25 Oct 2018 2 | 3 | It’s that time again! UNSAFE CODE UPDATE TIME! You can view the entire discussion [here](https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines/topic/meeting.202018-10-25). We’re transitioning to a new active discussion topic: **Validity Invariants**. Over the next two weeks, we’ll be wrapping up the current discussion (data representation) and starting to move on to new topics. 4 | 5 | ## Summary 6 | 7 | Thank you to everyone who participated in the data representation discussions and a *huge* thank you to Niko and Ralf for their writeups on structs/tuples and unions (and ubsan for volunteering for function pointers). Here’s the section outline: 8 | 9 | - Introduction to data representation 10 | - Integers and Floating Points 11 | - Structs and Tuples 12 | - Unions 13 | - Enums — [HELP NEEDED](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10) 14 | - References and Pointers — [HELP NEEDED](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/16) 15 | - Function Pointers 16 | 17 | While we're talking about writeups, I want to make a note about how to handle disagreement/bikeshedding/etc. With the work we're doing, I totally expect some of that to happen. Instead of getting too deep in the weeds/bikeshed, I'd prefer to document the discussion and different points (which @nikomatsakis did a great job in the structs/tuples writeup) so that we can continue to make progress and potentially revisit the issues at a later date. 18 | 19 | That said, we don’t want to leave dangling issues. Lingering points from previous discussions that might require a “circle back” at a later point should be noted with a FIXME/TODO/gh label. 20 | 21 | Unfortunately, neither Niko nor I will be at RustFest Rome, but if people are interested, you should organize a face-to-face! I will be organizing a face-to-face in Orlando, and hopefully at a Rust gathering next year. 22 | 23 | Daylight savings time is upon us—you can keep track of timezones via the calendar invite (ping @nikomatsakis). If you’d like to participate in meetings synchronously and the current time doesn’t work for you, bring it up [here](https://rust-lang.zulipchat.com/#narrow/stream/136281-wg-unsafe-code-guidelines/topic/meeting.20time.20and.20DST). 24 | 25 | We’re also [TWITTER OFFICIAL](https://twitter.com/rust_unsafe) 🦄 26 | 27 | ## What's next? 28 | - Finish writeups for *data representation* 29 | - Start discussions for *validity invariants* 30 | 31 | -------------------------------------------------------------------------------- /meeting-notes/20190307.md: -------------------------------------------------------------------------------- 1 | # 07 March 2019 2 | 3 | [Zulip thread.](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/meeting-2019-03-07) 4 | 5 | The discussion focused on the status of the validity invariant 6 | discussion. We talked about [where discussion was getting stuck, and 7 | how we might reframe things to make progress][link1]. In particular, 8 | we discussed the idea of [trying to make a "group summary 9 | comment"][link2] that documents the trade-offs found so far, and in 10 | particular thought about doing it via a [Dropbox Paper 11 | document][link3]. **We could use help here!** 12 | 13 | We also discussed [getting the repo working with GH pages][link4] and the [need to review the 14 | array layout PR][link5]. 15 | 16 | 17 | [link1]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/meeting-2019-03-07/near/160213135 18 | [link2]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/meeting-2019-03-07/near/160213850 19 | [link3]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/meeting-2019-03-07/near/160214503 20 | [link4]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/meeting-2019-03-07/near/160214959 21 | [link5]: https://rust-lang.zulipchat.com/#narrow/stream/136281-t-lang.2Fwg-unsafe-code-guidelines/topic/meeting-2019-03-07/near/160215250 22 | -------------------------------------------------------------------------------- /reference/.gitignore: -------------------------------------------------------------------------------- 1 | book 2 | -------------------------------------------------------------------------------- /reference/book.toml: -------------------------------------------------------------------------------- 1 | [book] 2 | authors = ["The Rust Project Developers"] 3 | multilingual = false 4 | src = "src" 5 | title = "Unsafe Code Guidelines Reference" 6 | 7 | [output.html] 8 | git-repository-url = "https://github.com/rust-lang/unsafe-code-guidelines" 9 | -------------------------------------------------------------------------------- /reference/src/SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | - [Introduction](./introduction.md) 4 | - [Glossary](./glossary.md) 5 | - [Data layout](./layout.md) 6 | - [Structs and tuples](./layout/structs-and-tuples.md) 7 | - [Scalars](./layout/scalars.md) 8 | - [Enums](./layout/enums.md) 9 | - [Unions](./layout/unions.md) 10 | - [Pointers](./layout/pointers.md) 11 | - [Function pointers](./layout/function-pointers.md) 12 | - [Arrays and Slices](./layout/arrays-and-slices.md) 13 | - [Packed SIMD vectors](./layout/packed-simd-vectors.md) 14 | - [Validity](./validity.md) 15 | - [Unions](./validity/unions.md) 16 | - [Function Pointers](./validity/function-pointers.md) 17 | - [Optimizations](./optimizations.md) 18 | - [Return value optimization](./optimizations/return_value_optimization.md) 19 | -------------------------------------------------------------------------------- /reference/src/glossary.md: -------------------------------------------------------------------------------- 1 | ## Glossary 2 | 3 | ### ABI (of a type) 4 | [abi]: #abi-of-a-type 5 | 6 | The *function call ABI* or short *ABI* of a type defines how it is passed *by-value* across a function boundary. 7 | Possible ABIs include passing the value directly in zero or more registers, or passing it indirectly as a pointer to the actual data. 8 | The space of all possible ABIs is huge and extremely target-dependent. 9 | Rust therefore does generally not clearly define the ABI of any type, it only defines when two types are *ABI-compatible*, 10 | which means that it is legal to call a function declared with an argument or return type `T` using a declaration or function pointer with argument or return type `U`. 11 | 12 | Note that ABI compatibility is stricter than layout compatibility. 13 | For instance `#[repr(C)] struct S(i32)` is (guaranteed to be) layout-compatible with `i32`, but it is *not* ABI-compatible. 14 | 15 | ### Abstract Byte 16 | [abstract byte]: #abstract-byte 17 | 18 | The *byte* is the smallest unit of storage in Rust. 19 | [Memory allocations][allocation] are thought of as storing a list of bytes, and at the lowest level each load returns a list of bytes and each store takes a list of bytes and puts it into memory. 20 | (The [representation relation] then defines how to convert between those lists of bytes and higher-level values such as mathematical integers or pointers.) 21 | 22 | However, a *byte* in the Rust Abstract Machine is more complicated than just an integer in `0..256` -- think of it as there being some extra "shadow state" that is relevant for the Abstract Machine execution (in particular, for whether this execution has UB), but that disappears when compiling the program to assembly. 23 | That's why we call it *abstract byte*, to distinguish it from the physical machine byte in `0..256`. 24 | 25 | The most obvious "shadow state" is tracking whether memory is initialized. 26 | See [this blog post](https://www.ralfj.de/blog/2019/07/14/uninit.html) for details, but the gist of it is that bytes in memory are more like `Option` where `None` indicates that this byte is uninitialized. 27 | Operations like `copy` work on that representation, so if you copy from some uninitialized memory into initialized memory, the target memory becomes "de-initialized". 28 | Another piece of shadow state is [pointer provenance][provenance]: the Abstract Machine tracks the "origin" of each pointer value to enforce the rule that a pointer used to access some memory is "based on" the original pointer produced when that memory got allocated. 29 | This provenance must be preserved when the pointer is stored to memory and loaded again later, which implies that abstract bytes must be able to carry provenance. 30 | 31 | Without committing to the exact shape of provenance in Rust, we can therefore say that an (abstract) byte in the Rust Abstract Machine looks as follows: 32 | 33 | ```rust 34 | pub enum AbstractByte { 35 | /// An uninitialized byte. 36 | Uninit, 37 | /// An initialized byte with a value in `0..256`, 38 | /// optionally with some provenance (if it is encoding a pointer). 39 | Init(u8, Option), 40 | } 41 | ``` 42 | 43 | ### Aliasing 44 | 45 | *Aliasing* occurs when one pointer or reference points to a "span" of memory 46 | that overlaps with the span of another pointer or reference. A span of memory is 47 | similar to how a slice works: there's a base byte address as well as a length in 48 | bytes. 49 | 50 | **Note**: a full aliasing model for Rust, defining when aliasing is allowed 51 | and when not, has not yet been defined. The purpose of this definition is to 52 | define when aliasing *happens*, not when it is *allowed*. The most developed 53 | potential aliasing model so far is [Stacked Borrows][stacked-borrows]. 54 | 55 | Consider the following example: 56 | 57 | ```rust 58 | fn main() { 59 | let u: u64 = 7_u64; 60 | let r: &u64 = &u; 61 | let s: &[u8] = unsafe { 62 | core::slice::from_raw_parts(&u as *const u64 as *const u8, 8) 63 | }; 64 | let (head, tail) = s.split_first().unwrap(); 65 | } 66 | ``` 67 | 68 | In this case, both `r` and `s` alias each other, since they both point to all of 69 | the bytes of `u`. 70 | 71 | However, `head` and `tail` do not alias each other: `head` points to the first 72 | byte of `u` and `tail` points to the other seven bytes of `u` after it. Both `head` 73 | and `tail` alias `s`, any overlap is sufficient to count as an alias. 74 | 75 | The span of a pointer or reference is the size of the value being pointed to or referenced. 76 | Depending on the type, you can determine the size as follows: 77 | 78 | * For a type `T` that is [`Sized`](https://doc.rust-lang.org/core/marker/trait.Sized.html) 79 | The span length of a pointer or reference to `T` is found with `size_of::()`. 80 | * When `T` is not `Sized` the story is a little tricker: 81 | * If you have a reference `r` you can use `size_of_val(r)` to determine the 82 | span of the reference. 83 | * If you have a pointer `p` you must unsafely convert that to a reference before 84 | you can use `size_of_val`. There is not currently a safe way to determine the 85 | span of a pointer to an unsized type. 86 | 87 | The [Data layout](./layout.md) chapter also has more information on the sizes of different types. 88 | 89 | One interesting side effect of these rules is that references and pointers to 90 | Zero Sized Types _never_ alias each other, because their span length is always 0 91 | bytes. 92 | 93 | It is also important to know that LLVM IR has a `noalias` attribute that works 94 | somewhat differently from this definition. However, that's considered a low 95 | level detail of a particular Rust implementation. When programming Rust, the 96 | Abstract Rust Machine is intended to operate according to the definition here. 97 | 98 | ### Allocation 99 | [allocation]: #allocation 100 | 101 | An *allocation* is a chunk of memory that is addressable from Rust. 102 | Allocations are created for objects on the heap, for stack-allocated variables, for globals (statics and consts), but also for objects that do not have Rust-inspectable data such as functions and vtables. 103 | An allocation has a contiguous range of [memory addresses][memory address] that it covers, and it can generally only be deallocated all at once. 104 | (Though in the future, we might allow allocations with holes, and we might allow growing/shrinking an allocation.) 105 | This range can be empty, but even empty allocations have a *base address* that they are located at. 106 | The base address of an allocation is not necessarily unique; but if two distinct allocations have the same base address then at least one of them must be empty. 107 | 108 | Pointer arithmetic is generally only possible within an allocation: 109 | [provenance][provenance] ensures that each pointer "remembers" which allocation it points to, 110 | and accesses are only permitted if the address is in range of the allocation associated with the pointer. 111 | 112 | Data inside an allocation is stored as [abstract bytes][abstract byte]; 113 | in particular, allocations do not track which type the data inside them has. 114 | 115 | ### Interior mutability 116 | 117 | *Interior Mutation* means mutating memory where there also exists a live shared reference pointing to the same memory; or mutating memory through a pointer derived from a shared reference. 118 | "live" here means a value that will be "used again" later. 119 | "derived from" means that the pointer was obtained by casting a shared reference and potentially adding an offset. 120 | This is not yet precisely defined, which will be fixed as part of developing a precise aliasing model. 121 | 122 | Finding live shared references propagates recursively through references, but not through raw pointers. 123 | So, for example, if data immediately pointed to by a `&T` or `& &mut T` is mutated, that's interior mutability. 124 | If data immediately pointed to by a `*const T` or `&*const T` is mutated, that's *not* interior mutability. 125 | 126 | *Interior mutability* refers to the ability to perform interior mutation without causing UB. 127 | All interior mutation in Rust has to happen inside an [`UnsafeCell`](https://doc.rust-lang.org/core/cell/struct.UnsafeCell.html), so all data structures that have interior mutability must (directly or indirectly) use `UnsafeCell` for this purpose. 128 | 129 | ### Layout 130 | [layout]: #layout 131 | 132 | The *layout* of a type defines its size and alignment as well as the offsets of its subobjects (e.g. fields of structs/unions/enums/... or elements of arrays, and the discriminant of enums). 133 | 134 | Note that layout does not capture everything that there is to say about how a type is represented on the machine; it notably does not include [ABI][abi] or [Niches][niche]. 135 | 136 | Note: Originally, *layout* and *representation* were treated as synonyms, and Rust language features like the `#[repr]` attribute reflect this. 137 | In this document, *layout* and [*representation*][representation relation] are not synonyms. 138 | 139 | ### Memory Address 140 | [memory address]: #memory-address 141 | 142 | A *memory address* is an integer value that identifies where in the process' memory some data is stored. 143 | This will typically be a virtual address, if the Rust process runs as a regular user-space program. 144 | It can also be a physical address for bare-level / kernel code. Rust doesn't really care either way, the point is: 145 | it's an address as understood by the CPU, it's what the load/store instructions need to identify where in memory to perform the load/store. 146 | 147 | Note that a pointer in Rust is *not* just a memory address. 148 | A pointer value consists of a memory address and [provenance][provenance]. 149 | 150 | ### Niche 151 | [niche]: #niche 152 | 153 | The *niche* of a type determines invalid bit-patterns that will be used by layout optimizations. 154 | 155 | For example, `&mut T` has at least one niche, the "all zeros" bit-pattern. This 156 | niche is used by layout optimizations like ["`enum` discriminant 157 | elision"](layout/enums.html#discriminant-elision-on-option-like-enums) to 158 | guarantee that `Option<&mut T>` has the same size as `&mut T`. 159 | 160 | While all niches are invalid bit-patterns, not all invalid bit-patterns are 161 | niches. For example, the "all bits uninitialized" is an invalid bit-pattern for 162 | `&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a 163 | niche. 164 | 165 | ### Padding 166 | [padding]: #padding 167 | 168 | *Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized. 169 | 170 | Padding can be thought of as the type containing secret fields of type `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties: 171 | * `Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit`. 172 | * Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized. 173 | 174 | Note that padding is a property of the *type* and not the memory: reading from the padding of an `&Foo` (by casting to a byte reference) may produce initialized values if the `&Foo` is pointing to memory that was initialized (for example, if it was originally a byte buffer initialized to `0`), but the moment you perform a typed copy out of that reference you will have uninitialized padding bytes in the copy. 175 | 176 | 177 | We can also define padding in terms of the [representation relation]: 178 | A byte at index `i` is a padding byte for type `T` if, 179 | for all values `v` and lists of bytes `b` such that `v` and `b` are related at `T` (let's write this `Vrel_T(v, b)`), 180 | changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`). 181 | In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any. 182 | 183 | This definition works fine for product types (structs, tuples, arrays, ...). 184 | The desired notion of "padding byte" for enums and unions is still unclear. 185 | 186 | ### Place 187 | [place]: #place 188 | 189 | A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr]. 190 | A place is basically a pointer (pointing to some location in memory, potentially carrying [provenance]), but might contain more information such as size or alignment (the details will have to be determined as the Rust Abstract Machine gets specified more precisely). 191 | A place has a type, indicating the type of [values][value] that it stores. 192 | 193 | The key operations on a place are: 194 | * Storing a [value] of the same type in it (when it is used on the left-hand side of an assignment). 195 | * Loading a [value] of the same type from it (through the place-to-value coercion). 196 | * Converting between a place (of type `T`) and a pointer value (of type `&T`, `&mut T`, `*const T` or `*mut T`) using the `&` and `*` operators. 197 | This is also the only way a place can be "stored": by converting it to a value first. 198 | 199 | ### Pointer Provenance 200 | [provenance]: #pointer-provenance 201 | 202 | The *provenance* of a pointer is used to distinguish pointers that point to the same [memory address] (i.e., pointers that, when cast to `usize`, will compare equal). 203 | Provenance is extra state that only exists in the Rust Abstract Machine; it is needed to specify program behavior but not present any more when the program runs on real hardware. 204 | In other words, pointers that only differ in their provenance can *not* be distinguished any more in the final binary (but provenance can influence how the compiler translates the program). 205 | 206 | The exact form of provenance in Rust is unclear. 207 | It is also unclear whether provenance applies to more than just pointers, i.e., one could imagine integers having provenance as well (so that pointer provenance can be preserved when pointers are cast to an integer and back). 208 | In the following, we give some examples if what provenance *could* look like. 209 | 210 | *Using provenance to track originating allocation.* 211 | For example, we have to distinguish pointers to the same location if they originated from different [allocations][allocation]. 212 | Cross-allocation pointer arithmetic [does not lead to usable pointers](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset), so the Rust Abstract Machine *somehow* has to remember the original allocation to which a pointer pointed. 213 | It could use provenance to achieve this: 214 | 215 | ```rust 216 | // Let's assume the two allocations here have base addresses 0x100 and 0x200. 217 | // We write pointer provenance as `@N` where `N` is some kind of ID uniquely 218 | // identifying the allocation. 219 | let raw1 = Box::into_raw(Box::new(13u8)); 220 | let raw2 = Box::into_raw(Box::new(42u8)); 221 | let raw2_wrong = raw1.wrapping_add(raw2.wrapping_sub(raw1 as usize) as usize); 222 | // These pointers now have the following values: 223 | // raw1 points to address 0x100 and has provenance @1. 224 | // raw2 points to address 0x200 and has provenance @2. 225 | // raw2_wrong points to address 0x200 and has provenance @1. 226 | // In other words, raw2 and raw2_wrong have same *address*... 227 | assert_eq!(raw2 as usize, raw2_wrong as usize); 228 | // ...but it would be UB to dereference raw2_wrong, as it has the wrong *provenance*: 229 | // it points to address 0x200, which is in allocation @2, but the pointer 230 | // has provenance @1. 231 | ``` 232 | 233 | This kind of provenance also exists in C/C++, but Rust is more permissive by (a) providing a [way to do pointer arithmetic across allocation boundaries without causing immediate UB](https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset) (though, as we have seen, the resulting pointer still cannot be used for locations outside the allocation it originates), and (b) by allowing pointers to always be compared safely, even if their provenance differs. 234 | For some more information, see [this document proposing a more precise definition of provenance for C](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf). 235 | 236 | *Using provenance for Rust's aliasing rules.* 237 | Another example of pointer provenance is the "tag" from [Stacked Borrows][stacked-borrows]. 238 | For some more information, see [this blog post](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html). 239 | 240 | ### Representation (relation) 241 | [representation relation]: #representation-relation 242 | 243 | A *representation* of a [value] is a list of [(abstract) bytes][abstract byte] that is used to store or "represent" that value in memory. 244 | 245 | We also sometimes speak of the *representation of a type*; this should more correctly be called the *representation relation* as it relates values of this type to lists of bytes that represent this value. 246 | The term "relation" here is used in the mathematical sense: the representation relation is a predicate that, given a value and a list of bytes, says whether this value is represented by that list of bytes (`val -> list byte -> Prop`). 247 | 248 | The relation should be functional for a fixed list of bytes (i.e., every list of bytes has at most one associated representation). 249 | It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`). 250 | For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes). 251 | 252 | See the [value domain][value-domain] for an example how values and representation relations can be made more precise. 253 | 254 | ### Soundness (of code / of a library) 255 | [soundness]: #soundness-of-code--of-a-library 256 | 257 | *Soundness* is a type system concept (actually originating from the study of logics) and means that the type system is "correct" in the sense that well-typed programs actually have the desired properties. 258 | For Rust, this means well-typed programs cannot cause [Undefined Behavior][ub]. 259 | This promise only extends to safe code however; for `unsafe` code, it is up to the programmer to uphold this contract. 260 | 261 | Accordingly, we say that a library (or an individual function) is *sound* if it is impossible for safe code to cause Undefined Behavior using its public API. 262 | Conversely, the library/function is *unsound* if safe code *can* cause Undefined Behavior. 263 | 264 | ### Undefined Behavior 265 | [ub]: #undefined-behavior 266 | 267 | *Undefined Behavior* is a concept of the contract between the Rust programmer and the compiler: 268 | The programmer promises that the code exhibits no undefined behavior. 269 | In return, the compiler promises to compile the code in a way that the final program does on the real hardware what the source program does according to the Rust Abstract Machine. 270 | If it turns out the program *does* have undefined behavior, the contract is void, and the program produced by the compiler is essentially garbage (in particular, it is not bound by any specification; the program does not even have to be well-formed executable code). 271 | 272 | In Rust, the [Nomicon](https://doc.rust-lang.org/nomicon/what-unsafe-does.html) and the [Reference](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) both have a list of behavior that the language considers undefined. 273 | Rust promises that safe code cannot cause Undefined Behavior---the compiler and authors of unsafe code takes the burden of this contract on themselves. 274 | For unsafe code, however, the burden is still on the programmer. 275 | 276 | Also see: [Soundness][soundness]. 277 | 278 | ### Validity and safety invariant 279 | 280 | The *validity invariant* is an invariant that all data must uphold any time it is accessed or copied in a typed manner. 281 | This invariant is known to the compiler and exploited by optimizations such as improved enum layout or eliding in-bounds checks. 282 | 283 | In terms of MIR statements, "accessed or copied" means whenever an assignment statement is executed. 284 | That statement has a type (LHS and RHS must have the same type), and the data being assigned must be valid at that type. 285 | Moreover, arguments passed to a function must be valid at the type given in the callee signature, and the return value of a function must be valid at the type given in the caller signature. 286 | OPEN QUESTION: Are there more cases where data must be valid? 287 | 288 | In terms of code, some data computed by `TERM` is valid at type `T` if and only if the following program does not have UB: 289 | ```rust,ignore 290 | fn main() { unsafe { 291 | let t: T = std::mem::transmute(TERM); 292 | } } 293 | ``` 294 | 295 | The *safety* invariant is an invariant that safe code may assume all data to uphold. 296 | This invariant is used to justify which operations safe code can perform. 297 | The safety invariant can be temporarily violated by unsafe code, but must always be upheld when interfacing with unknown safe code. 298 | It is not relevant when arguing whether some *program* has UB, but it is relevant when arguing whether some code safely encapsulates its unsafety -- in other words, it is relevant when arguing whether some *library* is [sound][soundness]. 299 | 300 | In terms of code, some data computed by `TERM` (possibly constructed from some `arguments` that can be *assumed* to satisfy the safety invariant) is valid at type `T` if and only if the following library function can be safely exposed to arbitrary (safe) code as part of the public library interface: 301 | ```rust,ignore 302 | pub fn make_something(arguments: U) -> T { unsafe { 303 | std::mem::transmute(TERM) 304 | } } 305 | ``` 306 | 307 | One example of valid-but-unsafe data is a `&str` or `String` that's not well-formed UTF-8: the compiler will not run its own optimizations that would cause any trouble here, so unsafe code may temporarily violate the invariant that strings are `UTF-8`. 308 | However, functions on `&str`/`String` may assume the string to be `UTF-8`, meaning they may cause UB if the string is *not* `UTF-8`. 309 | This means that unsafe code violating the UTF-8 invariant must not perform string operations (it may operate on the data as a byte slice though), or else it risks UB. 310 | Moreover, such unsafe code must not return a non-UTF-8 string to the "outside" of its safe abstraction boundary, because that would mean safe code could cause UB by doing `bad_function().chars().count()`. 311 | 312 | To summarize: *Data must always be valid, but it only must be safe in safe code.* 313 | For some more information, see [this blog post](https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html). 314 | 315 | ### Value 316 | [value]: #value 317 | 318 | A *value* (called "value of the expression" or "rvalue" in C and "prvalue" in C++) is what gets stored in a [place], and also the result of computing a [*value expression*][place-value-expr]. 319 | A value has a type, and it denotes the abstract mathematical concept that is represented by data in our programs. 320 | 321 | For example, a value of type `u8` is a mathematical integer in the range `0..256`. 322 | Values can be (according to their type) turned into a list of [(abstract) bytes][abstract byte], which is called a [representation][representation relation] of the value. 323 | Values are ephemeral; they arise during the computation of an instruction but are only ever persisted in memory through their representation. 324 | (This is comparable to how run-time data in a program is ephemeral and is only ever persisted in serialized form.) 325 | 326 | ### Zero-sized type / ZST 327 | 328 | Types with zero size are called zero-sized types, which is abbreviated as "ZST". 329 | This document also uses the "1-ZST" abbreviation, which stands for "one-aligned 330 | zero-sized type", to refer to zero-sized types with an alignment requirement of 1. 331 | 332 | For example, `()` is a "1-ZST" but `[u16; 0]` is not because it has an alignment 333 | requirement of 2. 334 | 335 | [stacked-borrows]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md 336 | [value-domain]: https://github.com/rust-lang/unsafe-code-guidelines/tree/master/wip/value-domain.md 337 | [place-value-expr]: https://doc.rust-lang.org/reference/expressions.html#place-expressions-and-value-expressions 338 | -------------------------------------------------------------------------------- /reference/src/introduction.md: -------------------------------------------------------------------------------- 1 | ## Rust's Unsafe Code Guidelines Reference 2 | 3 | This document is a past effort by the [UCG WG][ucg_wg] to provide a "guide" for 4 | writing unsafe code that "recommends" what kinds of things unsafe code can and 5 | cannot do, and that documents which guarantees unsafe code may rely on. It is 6 | largely abandoned right now. However, the [glossary](glossary.md) is actively 7 | maintained. 8 | 9 | Unless stated otherwise, the information in the guide is mostly a 10 | "recommendation" and still subject to change. 11 | 12 | [ucg_wg]: https://github.com/rust-lang/unsafe-code-guidelines 13 | -------------------------------------------------------------------------------- /reference/src/layout/arrays-and-slices.md: -------------------------------------------------------------------------------- 1 | # Layout of Rust array types and slices 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current layout guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/arrays-and-slices.md). 8 | -------------------------------------------------------------------------------- /reference/src/layout/enums.md: -------------------------------------------------------------------------------- 1 | # Layout of Rust `enum` types 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current layout guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/enums.md). 8 | -------------------------------------------------------------------------------- /reference/src/layout/function-pointers.md: -------------------------------------------------------------------------------- 1 | # Representation of Function Pointers 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current layout guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/function-pointers.md). 8 | -------------------------------------------------------------------------------- /reference/src/layout/packed-simd-vectors.md: -------------------------------------------------------------------------------- 1 | # Layout of packed SIMD vectors 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current layout guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/packed-simd-vectors.md). 8 | -------------------------------------------------------------------------------- /reference/src/layout/pointers.md: -------------------------------------------------------------------------------- 1 | # Layout of reference and pointer types 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current layout guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/pointers.md). 8 | -------------------------------------------------------------------------------- /reference/src/layout/scalars.md: -------------------------------------------------------------------------------- 1 | # Layout of scalar types 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current layout guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/scalars.md). 8 | -------------------------------------------------------------------------------- /reference/src/layout/structs-and-tuples.md: -------------------------------------------------------------------------------- 1 | # Layout of structs and tuples 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current layout guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/structs-and-tuples.md). 8 | -------------------------------------------------------------------------------- /reference/src/layout/unions.md: -------------------------------------------------------------------------------- 1 | # Layout of unions 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current layout guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/layout/unions.md). 8 | -------------------------------------------------------------------------------- /reference/src/optimizations/return_value_optimization.md: -------------------------------------------------------------------------------- 1 | **This page has been archived** 2 | 3 | It did not actually reflect current language guarantees and caused frequent confusion. 4 | 5 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/optimizations/return_value_optimization.md). 6 | -------------------------------------------------------------------------------- /reference/src/validity/function-pointers.md: -------------------------------------------------------------------------------- 1 | # Validity of function pointers 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current language guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/validity/function-pointers.md). 8 | -------------------------------------------------------------------------------- /reference/src/validity/unions.md: -------------------------------------------------------------------------------- 1 | # Validity of unions 2 | 3 | **This page has been archived** 4 | 5 | It did not actually reflect current language guarantees and caused frequent confusion. 6 | 7 | The old content can be accessed [on GitHub](https://github.com/rust-lang/unsafe-code-guidelines/blob/c138499c1de03b908dfe719a41193c84f8146883/reference/src/validity/unions.md). 8 | -------------------------------------------------------------------------------- /resources/deliberate-ub.md: -------------------------------------------------------------------------------- 1 | When a crate explicitly acknowledges that what it does is UB, but prefers keeping that code over UB-free alternatives (or there are no UB-free alternatives), that is always a concerning sign. 2 | We should evaluate whether there truly is some use-case here that is not currently achievable in well-defined Rust, and work with crate authors on providing a UB-free solution. 3 | 4 | ## Known cases of deliberate UB 5 | 6 | ### Cases related to concurrency 7 | 8 | * crossbeam's `AtomicCell` "fallback path" uses a SeqLock, which [is well-known to not be compatible with the C++ memory model](https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf). 9 | Specifically the problem is [this non-atomic volatile read](https://github.com/crossbeam-rs/crossbeam/blob/5d07fe43540d7f21517a51813acd9332744e90cb/crossbeam-utils/src/atomic/atomic_cell.rs#L980) which can cause data races and hence UB. 10 | This would be fine if we either (a) adopted LLVM's handling of memory races (then the problematic read would merely return `undef` instead of UB due to a data race), or (b) added [bytewise atomic memcpy](https://github.com/rust-lang/rfcs/pull/3301) and used that instead of the non-atomic volatile load. 11 | This is currently *not* UB in the LLVM IR we generate, due to LLVM's different handling of read-write races and because the code carefully uses `MaybeUninit`. 12 | * crossbeam's `AtomicCell` "fast path" uses the standard library `Atomic*` types to do atomic reads and writes of [*any type* that has the right size](https://github.com/crossbeam-rs/crossbeam/blob/master/crossbeam-utils/src/atomic/atomic_cell.rs#L928-L932). 13 | However, doing an `AtomicU32` read (returning a `u32`) on a `(u16, u8)` is unsound because the padding byte can be uninitialized. 14 | (Also, pointer provenance is lost, so `AtomicCell<*const T>` does not behave the way people expect.) 15 | To fix this we need to be able to perform atomic loads at type `MaybeUninit`. 16 | The LLVM IR we generate here contains `noundef` so this UB is not just a specification artifact.
17 | Furthermore, `compare_exchange` will compare padding bytes, which leads to UB. 18 | It is not clear how to best specify a useful `compare_exchange` that can work on padding bytes, 19 | see the [discussion here](https://github.com/rust-lang/unsafe-code-guidelines/issues/449).
20 | The alternative is to not use the "fast path" for problematic types (and fall back to the SeqLock), but that requires some way to query at `const`-time whether the type contains padding (or provenance). 21 | (Or of course one can use inline assembly, but it would be better if that was not required. This is in fact what crossbeam now does, via [atomic-maybe-uninit](https://github.com/taiki-e/atomic-maybe-uninit).) 22 | * crossbeam's deque uses [volatile accesses that really should be atomic instead](https://github.com/crossbeam-rs/crossbeam/blob/5a154def002304814d50f3c7658bd30eb46b2fad/crossbeam-deque/src/deque.rs#L70-L88). 23 | They cannot use atomic accesses as those are not possible for arbitrary `T`. 24 | This would be resolved by bytewise atomic memcpy. 25 | 26 | ### Cases related to aliasing 27 | 28 | * `yoke` and similar crates relying in "stable deref" properties cause various forms of aliasing trouble (such as [having `Box` that alias with things](https://github.com/unicode-org/icu4x/issues/2095), or [having references in function arguments that get deallocated while the function runs](https://github.com/unicode-org/icu4x/issues/3696)). 29 | This also violates the LLVM assumptions that come with `noalias` and `dereferenceable`. 30 | This could be fixed by [`MaybeDangling`](https://github.com/rust-lang/rfcs/pull/3336). 31 | * The entire `async fn` ecosystem and every hand-implemented self-referential generator or future is unsound since the self-reference aliases the `&mut` reference to the full generator/future. 32 | This is currently hackfixed by making `Unpin` meaningful for UB; a proper solution would be to add something like [`UnsafeAliased`](https://github.com/rust-lang/rfcs/pull/3467). 33 | * Stacked Borrows forbids a bunch of things that might be considered too restrictive (and that go well beyond LLVM `noalias`): 34 | strict subobject provenance [rules out the `&Header` pattern](https://github.com/rust-lang/unsafe-code-guidelines/issues/256) and also affects [raw pointers derived from references](https://github.com/rust-lang/unsafe-code-guidelines/issues/134); 35 | eager assertion of uniquess makes [even read-only functions such as `as_mut_ptr` dangerous when they take `&mut`](https://github.com/rust-lang/unsafe-code-guidelines/issues/133); 36 | `&UnsafeCell` surprisingly [requires read-write memory even when it is never written](https://github.com/rust-lang/unsafe-code-guidelines/issues/303). 37 | There is a bunch of code out there that violates these rules one way or another. 38 | All of these are resolved by [Tree Borrows](https://perso.crans.org/vanille/treebor/), though [some subtleties around `as_mut_ptr` do remain](https://github.com/rust-lang/unsafe-code-guidelines/issues/450). 39 | 40 | ### Other cases 41 | 42 | * `gxhash` wants to do a vector-sized load that may go out-of-bounds, and didn't find a better solution than causing UB with an OOB load and then masking off the extra bytes. 43 | See [here](https://github.com/ogxd/gxhash/issues/82) for some discussion and [here](https://github.com/ogxd/gxhash/blob/9eb19b021ff94a7b37beb5f479880d07e029b933/src/gxhash/platform/mod.rs#L18) for the relevant code. 44 | The same [also happens in `compiler-builtins`](https://github.com/rust-lang/compiler-builtins/issues/559). 45 | 46 | ## Former cases of deliberate UB that have at least a work-in-progress solution to them 47 | 48 | * Various `offset_of` implementations caused UB by using `mem::uninitialized()`, or they used `&(*base).field` or `addr_of!((*base).field)` to project a dummy pointer to the field which is UB due to out-of-bounds pointer arithmetic. 49 | The `memoffset` crate has a sound implementation that however causes stack allocations which the compiler must optimize away. 50 | This will be fixed properly by the native [`offset_of!` macro](https://github.com/rust-lang/rfcs/pull/3308), which is [currently in nightly](https://github.com/rust-lang/rust/issues/106655). 51 | * It used to be common to unwind out of `extern "C"` functions which is UB, see [this discussions](https://internals.rust-lang.org/t/unwinding-through-ffi-after-rust-1-33/9521). 52 | This is fixed by `extern "C-unwind"`, which is stable since Rust 1.71. 53 | -------------------------------------------------------------------------------- /resources/llvm-assumptions.md: -------------------------------------------------------------------------------- 1 | Some of the things we want people to do with Rust can currently not be expressed in LLVM in a way that is fully backed by the LLVM LangRef. 2 | Let's collect a list of those cases here. 3 | 4 | ## List of LLVM assumptions not backed by the spec 5 | 6 | - To implement `ptr.addr()`, we assume that a pointer-to-int transmute yields the address. 7 | The LangRef is quiet about this (as it is about almost everything related to provenance). 8 | Alive says that this yields poison. 9 | - To implement the desired semantics for `MaybeUninit<$int>` we need a type of arbitrary size that can hold arbitrary data -- including provenance. 10 | LLVM currently has no such type, the only type that is fully guaranteed to support provenance is `ptr` and that has a fixed size. 11 | [LLVM issue](https://github.com/llvm/llvm-project/issues/142141) 12 | - This one is not about current Rust but about possible future extensions: 13 | when LLVM returns `poison` for some operation, we can *not* say that this corresponds to `uninit` in Rust. We *must* declare this immediate UB. 14 | The reason for this is that LLVM does not really support `poison` being stored in memory; Rust's `uninit` can therefore only correspond to LLVM's `undef`. 15 | -------------------------------------------------------------------------------- /triagebot.toml: -------------------------------------------------------------------------------- 1 | [relabel] 2 | allow-unauthenticated = [ 3 | "A-*", 4 | "C-*", 5 | "S-*", 6 | "T-*", 7 | ] 8 | 9 | # Gives us the commands 'ready', 'author', 'blocked' 10 | [shortcut] 11 | -------------------------------------------------------------------------------- /wip/memory-interface.md: -------------------------------------------------------------------------------- 1 | This file [moved](https://github.com/RalfJung/minirust/blob/master/spec/mem/interface.md) 2 | -------------------------------------------------------------------------------- /wip/stacked-borrows.md: -------------------------------------------------------------------------------- 1 | # Stacked Borrows 2 | 3 | **Note:** This document is not normative nor endorsed by the UCG WG. It is maintained by @RalfJung to reflect what is currently implemented in [Miri]. 4 | 5 | This is not a guide! 6 | It is more of a reference. 7 | For more background, see the [paper on Stacked Borrows](https://plv.mpi-sws.org/rustbelt/stacked-borrows/). 8 | 9 | Stacked Borrows is also the subject of the following blog-posts: 10 | 11 | * [Stacked Borrows 0.1](https://www.ralfj.de/blog/2018/08/07/stacked-borrows.html) is the initial idea of what Stacked Borrows might look like before anything got implemented. This post is interesting for some of the historical context it gives, but is largely superseded by the next post. 12 | * [Stacked Borrows 1.0](https://www.ralfj.de/blog/2018/11/16/stacked-borrows-implementation.html) is the first version that got implemented. This post is a self-contained, improved introduction to Stacked Borrows. 13 | * [Stacked Borrows 1.1](https://www.ralfj.de/blog/2018/12/26/stacked-borrows-barriers.html) extends Stacked Borrows 1 with partial support for two-phase borrows and explains the idea of "barriers". 14 | * We took some notes when [discussing Stacked Borrows 1.1 at the 2019 Rust All-Hands][all-hands]. 15 | * [Stacked Borrows 2.0](https://www.ralfj.de/blog/2019/04/30/stacked-borrows-2.html) is a re-design of Stacked Borrows 1 that maintains the original core ideas, but changes the mechanism to support more precise tracking of shared references. 16 | * [Stacked Borrows 2.1](https://www.ralfj.de/blog/2019/05/21/stacked-borrows-2.1.html) slightly tweaks the rules for read and write accesses and describes a high-level way of thinking about the new shape of the "stack" in Stacked Borrows 2. 17 | 18 | Changes from to the latest post (2.1) to the paper: 19 | 20 | * Retags are "shallow" instead of recursively looking for references inside compound types. 21 | * Reborrowing of a shared reference, when searching for `UnsafeCell`, no longer reads enum discriminants. It treats enums like unions now. 22 | 23 | Changes since publication of the paper: 24 | 25 | * Items with `SharedReadWrite` permission are not protected even with `FnEntry` retagging. 26 | * Removal of `Untagged`. 27 | * Addition of "weak protectors" to justify `noalias` on `Box` parameters. 28 | 29 | [Miri]: https://github.com/solson/miri/ 30 | [all-hands]: https://paper.dropbox.com/doc/Topic-Stacked-borrows--AXAkoFfUGViWL_PaSryqKK~hAg-2q57v4UM7cIkxCq9PQc22 31 | 32 | ## Extra state 33 | 34 | Stacked Borrows adds some extra state to the Rust abstract machine. 35 | Every pointer value has a *tag* (in addition to the location in memory that the pointer points to), and every memory location carries a *stack* (in addition to the byte of data stored at that location). 36 | Moreover, there is a per-call-frame `CallId` as well as some global tracking state. 37 | 38 | ```rust 39 | /// PtrId: uniquely identifying a pointer. 40 | // `nat` is the type of mathematical natural numbers, meaning we don't want to think about overflow. 41 | // NOTE: Miri just uses `NonZeroU64` which, realistically, will not overflow because we only ever increment by 1. 42 | type PtrId = nat; 43 | /// For historic reasons we often call pointer IDs "tags". 44 | type Tag = PtrId; 45 | 46 | /// CallId: uniquely identifying a stack frame. 47 | type CallId = nat; 48 | 49 | /// Indicates which permission is granted (by this item to some pointers) 50 | pub enum Permission { 51 | /// Grants unique mutable access. 52 | Unique, 53 | /// Grants shared mutable access. 54 | SharedReadWrite, 55 | /// Grants shared read-only access. 56 | SharedReadOnly, 57 | /// Grants no access, but separates two groups of SharedReadWrite so they are not 58 | /// all considered mutually compatible. 59 | Disabled, 60 | } 61 | 62 | /// The flavor of the protector. 63 | enum ProtectorKind { 64 | /// Protected against aliasing violations from other pointers. 65 | /// 66 | /// Items protected like this cause UB when they are invalidated, *but* the pointer itself may 67 | /// still be used to issue a deallocation. 68 | /// 69 | /// This is required for LLVM IR pointers that are `noalias` but *not* `dereferenceable`. 70 | WeakProtector, 71 | 72 | /// Protected against any kind of invalidation. 73 | /// 74 | /// Items protected like this cause UB when they are invalidated or the memory is deallocated. 75 | /// This is strictly stronger protection than `WeakProtector`. 76 | /// 77 | /// This is required for LLVM IR pointers that are `dereferenceable` (and also allows `noalias`). 78 | StrongProtector, 79 | } 80 | 81 | /// An item on the per-location stack, controlling which pointers may access this location and how. 82 | pub struct Item { 83 | /// The permission this item grants. 84 | perm: Permission, 85 | /// The pointers the permission is granted to. 86 | tag: Tag, 87 | /// An optional protector, ensuring the item cannot get popped until `CallId` is over. 88 | protector: Option<(ProtectorKind, CallId)>, 89 | } 90 | 91 | /// Per-location stack of borrow items. 92 | pub struct Stack { 93 | /// Used *mostly* as a stack; never empty. 94 | /// Invariants: 95 | /// * Above a `SharedReadOnly` there can only be more `SharedReadOnly`. 96 | /// * No tag occurs in the stack more than once. 97 | borrows: Vec, 98 | } 99 | 100 | /// Extra per-call-frame state: the ID of this function call. 101 | pub struct FrameExtra { 102 | id: CallId, 103 | } 104 | 105 | /// Extra global state: the next `PtrId`, as well as the next `CallId`. 106 | /// Both are just monotonically increasing counters, ensuring they are unique. 107 | /// Also see the code for `new_ptr` and `new_call` below. 108 | pub struct Tracking { 109 | next_ptr_id: PtrId, 110 | next_call_id: CallId, 111 | } 112 | ``` 113 | 114 | The tag and the stack exist separately, i.e., when a pointer is stored in memory, then we both have a tag stored as part of this pointer value (remember, [bytes are more than `u8`](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html)), and every byte occupied by the pointer has a stack regulating access to this location. 115 | Also these two do not interact, i.e., when loading a pointer from memory, we just load the tag that was stored as part of this pointer. 116 | The stack of a location, and the tag of a pointer stored at some location, do not have any effect on each other. 117 | 118 | ## Retag statement 119 | 120 | Stacked Borrows introduces a new operation (a new MIR statement) on the Rust abstract machine. 121 | *Retagging* operates on a place (Rust's equivalent of a C lvalue; see [the glossary](../reference/src/glossary.md)), and it also carries a flag indicating the kind of retag that is being performed: 122 | 123 | ```rust 124 | pub enum RetagKind { 125 | /// The initial retag when entering a function 126 | FnEntry, 127 | /// Retag preparing for a two-phase borrow 128 | TwoPhase, 129 | /// Retagging raw pointers 130 | Raw, 131 | /// A "normal" retag 132 | Default, 133 | } 134 | ``` 135 | 136 | `Retag` is inserted into the MIR for the following situations: 137 | 138 | * A retag happens after every assignment MIR statement where the assigned type may be of reference or box type. 139 | This is usually a `Default` retag. However, if the RHS of this assignment is a `Ref` which allows two-phase borrows, then this is a `TwoPhase` retag. 140 | 141 | Currently, if the LHS of the assignment involves a `Deref`, no `Retag` is inserted. 142 | That's just a limitation of the current implementation: after executing this assignment, evaluating the place (the LHS) again could yield a different location in memory, which means we would retag the wrong thing. 143 | Proper retagging here requires either a copy through a temporary, or making retagging integral part of the semantics of assignment. 144 | 145 | * A `Raw` retag happens after every assignment where the RHS is a cast from a reference to a raw pointer. 146 | 147 | * A `FnEntry` retag happens in the first basic block of every function, retagging each argument. 148 | 149 | * A `Default` retag happens on the return value of every function that gets called (i.e., this is the first statement in the basic block that the call will return to). 150 | 151 | * The automatically generated drop shims (generated as the body of `ptr::real_drop_in_place`) perform a `Raw` retag of their argument because they use it as a raw pointer. 152 | 153 | ## Operational semantics 154 | 155 | ### Generating IDs 156 | 157 | Whenever we need to generate a new pointer or function call ID, that means we effectively call one of the following methods: 158 | 159 | ```rust 160 | impl Tracking { 161 | fn new_ptr(&mut self) -> PtrId { 162 | let val = self.next_ptr_id; 163 | self.next_ptr_id += 1; 164 | val 165 | } 166 | 167 | fn new_call(&mut self) -> CallId { 168 | let id = self.next_call_id; 169 | self.next_call_id += 1; 170 | id 171 | } 172 | } 173 | ``` 174 | 175 | These methods will never return the same value twice. 176 | 177 | ### Tracking function calls 178 | 179 | To attach metadata to a particular function call, we assign a fresh ID to every call stack (so this distinguishes multiple calls to the same function). 180 | In other words, the per-stack-frame `CallId` is initialized by `Tracking::new_call`. 181 | 182 | We say that a `CallId` is *active* if the call stack contains a stack frame with that ID. 183 | 184 | **Note**: Miri uses a slightly more complex system to track the set of active `CallId`; that is just an optimization to avoid having to scan the call stack all the time. 185 | 186 | ### Preliminaries for items 187 | 188 | For brevity, we will write `(tag: perm)` to represent `Item { tag, perm, protector: None }`, and `(tag: perm; kind, call)` to represent `Item { tag, perm, protector: Some((kind, call)) }`. 189 | 190 | The following defines whether a permission grants a particular kind of memory access to a pointer with the right tag: 191 | `Unique` and `SharedReadWrite` grant all accesses, `SharedReadOnly` grants only read access. 192 | 193 | ```rust 194 | pub enum AccessKind { 195 | Read, 196 | Write, 197 | } 198 | 199 | /// This defines for a given permission, whether it permits the given kind of access. 200 | fn grants(self: Permission, access: AccessKind) -> bool { 201 | // Disabled grants nothing. Otherwise, all items grant read access, and except for SharedReadOnly they grant write access. 202 | self != Permission::Disabled && (access == AccessKind::Read || self != Permission::SharedReadOnly) 203 | } 204 | ``` 205 | 206 | Based on this, we define the *granting* item in a stack (for a given tag and access) to be the topmost item that grants the given access to this tag: 207 | 208 | ```rust 209 | /// Find the item granting the given kind of access to the given tag, and return where that item is in the stack. 210 | fn find_granting(self: &Stack, access: AccessKind, tag: Tag) -> Option { 211 | self.borrows.iter() 212 | .enumerate() // we also need to know *where* in the stack 213 | .rev() // search top-to-bottom 214 | // Return permission of first item that grants access. 215 | .find_map(|(idx, item)| 216 | if item.perm.grants(access) && tag == item.tag { 217 | Some(idx) 218 | } else { 219 | None 220 | } 221 | ) 222 | } 223 | ``` 224 | 225 | In general, the structure of the stack looks as follows: 226 | On the top, we might have a bunch of `SharedReadOnly` items. Below that, we have "blocks" consisting of either a single `Unique` item, or a bunch of consecutive `SharedReadWrite`. 227 | `Disabled` items serve to separate two blocks of `SharedReadWrite` that would otherwise be considered one block. 228 | Using any item within a block is equivalent to using any other item in that same block. 229 | 230 | ### Allocating memory 231 | 232 | When allocating memory, we have to initialize the `Stack` associated with the new locations, and we have to choose a `Tag` for the initial pointer to this memory. 233 | 234 | - Stack memory is handled by an environment (which is part of the information carried in a stack frame of the Rust abstract machine) that maps each local variable to a place. 235 | A place is a pointer together with some other data that is not relevant here -- the key point is that a place, just like every other pointer, carries a tag. 236 | When the local variable becomes live and its backing memory gets allocated, we generate a new pointer ID `id` by calling `Tracking::new_ptr` and use `id` as tag for the place of this local variable. 237 | We also initialize the stack of all the memory locations in this new memory allocation with `Stack { borrows: vec![(id: Unique)] }`. 238 | - For heap allocations, we pick a fresh `id` at allocation time. The stack of each freshly allocated memory location is `Stack { borrows: vec![(id: SharedReadWrite)] }`, and the initial pointer to that memory has tag `id`. 239 | - For global allocations (`static`, environment and program argument data, ...), we pick a fresh `id` associated with the global, and each time a pointer to the global is created, it gets tagged `id`. 240 | The stacks in that memory are initialized with `Stack { borrows: vec![(id: SharedReadWrite)] }`. 241 | 242 | ### Accessing memory 243 | 244 | On every memory access (reads and writes -- see below for deallocation), 245 | we perform the following extra operation for every location that gets accessed (i.e., for a 4-byte access, this happens for each of the 4 bytes): 246 | 247 | 1. Find the granting item. If there is none, this is UB. 248 | 2. Check if this is a read access or a write access. 249 | - For write accesses, pop all *blocks* above the one containing the granting item. That is, remove all items above the granting one, except if the granting item is a `SharedReadWrite` in which case the consecutive `SharedReadWrite` above it are kept (but everything beyond is popped). If any of the popped items is protected (weakly or strongly) with a `CallId` of an active call, we have UB. 250 | - For read accesses, disable all `Unique` items above the granting one: change their permission to `Disabled`. This means they cannot be used any more. We do not remove them from the stack to avoid merging two blocks of `SharedReadWrite`. If any disabled item is protected (weakly or strongly) with a `CallId` of an active call, we have UB. 251 | 252 | ### Reborrowing 253 | 254 | Adding new permissions to the stack happens by reborrowing pointers. 255 | 256 | **Granting a pointer permission to a location.** 257 | To grant new permissions to a location, we need a parent tag (the tag of the pointer from which the new pointer is derived), and an `Item` for the newly created pointer that should be added to the stack (this indicates both which pointer is granted access and what the permission is). 258 | As a Rust signature, this would be: 259 | ```rust 260 | fn grant( 261 | self: &mut Stacks, 262 | derived_from: Tag, 263 | new: Item, 264 | ) 265 | ``` 266 | We proceed as follows: 267 | 268 | 1. We consider this operation as corresponding to a write access if `new.perm.grants(AccessKind::Write)`, and to a read access otherwise. 269 | 2. Find the granting item for this access and the parent tag. If there is none, this is UB. 270 | 3. Check if we are adding a `SharedReadWrite`. 271 | - If yes, add the new item on top of the current block. 272 | - If no, perform the actions of the corresponding access. 273 | Then push the new item to the top of the stack. 274 | 275 | **Reborrowing a pointer.** 276 | To reborrow a pointer, we are given: 277 | - a (typed) place, i.e., a location in memory, a tag and the type of the data we expect there (from which we can compute the size); 278 | - which kind of reference/pointer this is (`Unique`, `Shared` or a raw pointer which might be mutable or not); 279 | - a `new_tag: Tag` for the reborrowed pointer; 280 | - whether this reborrow needs to be protected, and if yes, how (weak or strong protection). 281 | 282 | The type of the place and the kind of reference/pointer together give the full type of the reference/pointer (or as much of it was we need). 283 | As a Rust signature, this would be: 284 | ```rust 285 | pub enum RefKind { 286 | /// `&mut` and `Box`. 287 | Unique { two_phase: bool }, 288 | /// `&` with or without interior mutability. 289 | Shared, 290 | /// `*mut`/`*const` (raw pointers). 291 | Raw { mutable: bool }, 292 | } 293 | 294 | fn reborrow( 295 | self: &mut MiriInterpContext, 296 | place: MPlaceTy, 297 | kind: RefKind, 298 | new_tag: Tag, 299 | protect: Option, 300 | ) 301 | ``` 302 | 303 | We will grant `new_tag` permission for all the locations covered by this place, by calling `grant` for each location. 304 | The parent tag (`derived_from`) is given by the place. 305 | The interesting question is which permission to use for the new item: 306 | - For non-two-phase `Unique`, the permission is `Unique`. 307 | - For mutable raw pointers and two-phase `Unique`, the permission is `SharedReadWrite`. 308 | - For `Shared` and immutable raw pointers, the permission is different for locations inside of and outside of `UnsafeCell`. 309 | Inside `UnsafeCell`, it is `SharedReadWrite`; outside it is `SharedReadOnly`. 310 | - The `UnsafeCell` detection is entirely static: it recurses through structs, 311 | tuples and the like, but when hitting an `enum` or `union` or so, it treats 312 | the entire field as an `UnsafeCell` unless its type is frozen. This avoids 313 | hard-to-analyze recursive behavior caused by Stacked Borrows itself doing 314 | memory accesses that are subject to Stacked Borrows rules. 315 | - For immutable raw pointers, the rules are the same as for `Shared`. 316 | 317 | If the reborrow is protected and we are not inside an `UnsafeCell` behind a `Shared` or an immutable raw pointer, 318 | the new item will have its protector set to the `CallId` of the current function call (i.e., of the topmost frame in the call stack). 319 | Otherwise the new item will not have a protector. 320 | 321 | So, basically, for every location, we call `grant` like this: 322 | ```rust 323 | let (perm, protect) = match ref_kind { 324 | RefKind::Unique { two_phase: false } => 325 | (Permission::Unique, protect), 326 | RefKind::Raw { mutable: true } | 327 | RefKind::Unique { two_phase: true } => 328 | (Permission::SharedReadWrite, protect), 329 | RefKind::Raw { mutable: false } | 330 | RefKind::Shared => 331 | if inside_unsafe_cell { (Permission::SharedReadWrite, /* do not protect */ None) } 332 | else { (Permission::SharedReadOnly, protect) } 333 | }; 334 | let protector = protect.map(|kind| (kind, current_call_id())); 335 | 336 | location.stack.grant( 337 | place.tag, 338 | Item { tag: new_tag, perm, protector } 339 | ); 340 | ``` 341 | 342 | ### Retagging 343 | 344 | When executing `Retag(kind, place)`, we check if `place` holds a reference (`&[mut] _`) or box (`Box<_>`), and if `kind == Raw` we also check each raw pointer (`*[const,mut] _`). 345 | For those we perform the following steps: 346 | 347 | 1. We compute a fresh tag: `Tracking::new_ptr_id()`. 348 | 2. We determine if and how will want to protect the items we are going to generate: 349 | If `kind == FnEntry`, then a protector will be added; for references, we use a `StrongProtector`, for box a `WeakProtector`. 350 | (This means for both of them there is UB if the pointer gets invalidated while the call is active; and for references, additionally there is UB if the memory the pointer points to gets deallocated in anyway -- even if the pointer itself is used for that deallocation.) 351 | For other `kind`, no protector is added. 352 | 3. We perform reborrowing of the memory this pointer points to with the new tag and indicating whether we want protection, treating boxes as `RefKind::Unique { two_phase: false }`. 353 | 354 | We do not recurse into fields of structs or other compound types, only "bare" references/... get retagged. 355 | 356 | **Note**: Miri offers a flag, `-Zmiri-retag-fields`, that changes this behavior to also recurse into compound types to search for references to retag. 357 | We never recurse through a pointer indirection. 358 | 359 | ### Deallocating memory 360 | 361 | Memory deallocation first acts like a write access through the pointer used for deallocation. 362 | After that is done, we additionally check all *strong* protectors remaining in the stack: if any of them is still active, we have undefined behavior. 363 | (Weak protectors do not matter here.) 364 | 365 | ## Adjustments to libstd 366 | 367 | libstd needed/needs some patches to comply with this model. These provide a good opportunity to review if we are okay with the requirements that Stacked Borrows places onto unsafe code. For an up-to-date list of **"violations of Stacked Borrows"**, please refer to [this list maintained in the Miri repo](https://github.com/rust-lang/miri#bugs-found-by-miri). 368 | 369 | * [`VecDeque` creating overlapping mutable references](https://github.com/rust-lang/rust/pull/56161) 370 | * [Futures turning a shared reference into a mutable one](https://github.com/rust-lang/rust/pull/56319) 371 | * [`str` turning a shared reference into a mutable one](https://github.com/rust-lang/rust/pull/58200) 372 | * [`BTreeMap` creating mutable references that overlap with shared references](https://github.com/rust-lang/rust/pull/58431) 373 | * [`LinkedList` creating overlapping mutable references](https://github.com/rust-lang/rust/pull/60072) 374 | * [`VecDeque` invalidates a protected shared reference](https://github.com/rust-lang/rust/issues/60076) 375 | 376 | ## Biggest conceptual issues 377 | 378 | The two biggest conceptual issues with this model are the following: 379 | 380 | - Raw pointer casts generate fresh tags. 381 | This is a problem because it means we need to detect the *transitions* from references to raw pointers, which is not always easy. 382 | (In contrast, for all other retags we can just retag whenever we see a reference, no matter where it comes from.) 383 | It is also frequently surprising to programmers, e.g. when `addr_of_mut!(local)` is invalidated by direct writes to `local`. 384 | Finally it leads to the raw pointer type at the moment of transition being significant, which again defies the usual intuition and general goal of raw pointers that their type is not semantically relevant. 385 | - On reads we do not follow a proper stack discipline. 386 | Instead, we just disable all `Unique` above the item that granted the read access. 387 | This is obviously ugly, but more importantly it means that the first issue cannot be easily fixed: 388 | if raw pointer casts just retained the original tag, then a raw pointer derived from `&mut` would become invalidated when the `&mut` becomes invalidated, and that just breaks way too much code. 389 | (Currently, the raw pointer instead becomes a `SharedReadWrite` on top of the `Unique`, so the `Unique` can be invalidated while the raw pointer remains usable.) 390 | 391 | The best known alternative for the second point is to go the Tree Borrows route of *freezing* all `Unique` (and their children) above a read-granting item. 392 | This basically means we would be popping such `Unique` (and everything above them) *only for writes* but not for reads---much nicer than the current situation. 393 | However, this breaks *tons* of code that looks like this: 394 | ```rust 395 | ptr::copy_nonoverlapping(src.as_ptr(), dest.as_mut_ptr(), dest.len()); 396 | ``` 397 | Here `dest` is a slice. 398 | The issue is that calling `dest.len()` *after* `dest.as_mut_ptr()` does a shared-read-only reborrow of `dest`, which freezes the raw pointer returned by `as_mut_ptr` and thus makes later writes to it UB. 399 | 400 | It's not clear how this could be fixed without going all the way to [trees](https://perso.crans.org/vanille/treebor/). 401 | -------------------------------------------------------------------------------- /wip/value-domain.md: -------------------------------------------------------------------------------- 1 | This file [moved](https://github.com/RalfJung/minirust/blob/master/spec/lang/values.md). 2 | --------------------------------------------------------------------------------