├── CODE_OF_CONDUCT.md ├── LICENSE ├── ORG_CODE_OF_CONDUCT.md ├── README.md ├── accepted ├── async-imports.md ├── benchmark-infrastructure.md ├── benchmark-suite-milestones.dot ├── benchmark-suite-milestones.png ├── benchmark-suite.md ├── cfi-improvements-with-pauth-and-bti.md ├── cranelift-backend-transition.md ├── cranelift-dynamic-vector.md ├── cranelift-egraph.md ├── cranelift-isel-isle-peepmatic.md ├── cranelift-roadmap-2022.md ├── cranelift-roadmap-2023.md ├── exception-handling.md ├── isle-extended-patterns.md ├── new-api.md ├── pulley.md ├── remove-old-cranelift-backend.md ├── rfc-process.md ├── shared-host-functions.md ├── tail-calls.md ├── vulnerability-response-runbook.md ├── wasm-gc-milestones.dot ├── wasm-gc-milestones.png ├── wasm-gc-type-hierarchy.png ├── wasm-gc.md ├── wasmtime-baseline-compilation.md ├── wasmtime-debugging.md ├── wasmtime-instance-allocator.md ├── wasmtime-lts.md ├── wasmtime-one-dot-oh.md └── what-is-considered-a-security-bug.md ├── template-draft.md └── template-full.md /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | *Note*: this Code of Conduct pertains to individuals' behavior. Please also see the [Organizational Code of Conduct][OCoC]. 4 | 5 | ## Our Pledge 6 | 7 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. 8 | 9 | ## Our Standards 10 | 11 | Examples of behavior that contributes to creating a positive environment include: 12 | 13 | * Using welcoming and inclusive language 14 | * Being respectful of differing viewpoints and experiences 15 | * Gracefully accepting constructive criticism 16 | * Focusing on what is best for the community 17 | * Showing empathy towards other community members 18 | 19 | Examples of unacceptable behavior by participants include: 20 | 21 | * The use of sexualized language or imagery and unwelcome sexual attention or advances 22 | * Trolling, insulting/derogatory comments, and personal or political attacks 23 | * Public or private harassment 24 | * Publishing others' private information, such as a physical or electronic address, without explicit permission 25 | * Other conduct which could reasonably be considered inappropriate in a professional setting 26 | 27 | ## Our Responsibilities 28 | 29 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 30 | 31 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 32 | 33 | ## Scope 34 | 35 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 36 | 37 | ## Enforcement 38 | 39 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the Bytecode Alliance CoC team at [report@bytecodealliance.org](mailto:report@bytecodealliance.org). The CoC team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The CoC team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 40 | 41 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the Bytecode Alliance's leadership. 42 | 43 | ## Attribution 44 | 45 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] 46 | 47 | [OCoC]: https://github.com/bytecodealliance/wasmtime/blob/main/ORG_CODE_OF_CONDUCT.md 48 | [homepage]: https://www.contributor-covenant.org 49 | [version]: https://www.contributor-covenant.org/version/1/4/ 50 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /ORG_CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Bytecode Alliance Organizational Code of Conduct (OCoC) 2 | 3 | *Note*: this Code of Conduct pertains to organizations' behavior. Please also see the [Individual Code of Conduct](CODE_OF_CONDUCT.md). 4 | 5 | ## Preamble 6 | 7 | The Bytecode Alliance (BA) welcomes involvement from organizations, 8 | including commercial organizations. This document is an 9 | *organizational* code of conduct, intended particularly to provide 10 | guidance to commercial organizations. It is distinct from the 11 | [Individual Code of Conduct (ICoC)](CODE_OF_CONDUCT.md), and does not 12 | replace the ICoC. This OCoC applies to any group of people acting in 13 | concert as a BA member or as a participant in BA activities, whether 14 | or not that group is formally incorporated in some jurisdiction. 15 | 16 | The code of conduct described below is not a set of rigid rules, and 17 | we did not write it to encompass every conceivable scenario that might 18 | arise. For example, it is theoretically possible there would be times 19 | when asserting patents is in the best interest of the BA community as 20 | a whole. In such instances, consult with the BA, strive for 21 | consensus, and interpret these rules with an intent that is generous 22 | to the community the BA serves. 23 | 24 | While we may revise these guidelines from time to time based on 25 | real-world experience, overall they are based on a simple principle: 26 | 27 | *Bytecode Alliance members should observe the distinction between 28 | public community functions and private functions — especially 29 | commercial ones — and should ensure that the latter support, or at 30 | least do not harm, the former.* 31 | 32 | ## Guidelines 33 | 34 | * **Do not cause confusion about Wasm standards or interoperability.** 35 | 36 | Having an interoperable WebAssembly core is a high priority for 37 | the BA, and members should strive to preserve that core. It is fine 38 | to develop additional non-standard features or APIs, but they 39 | should always be clearly distinguished from the core interoperable 40 | Wasm. 41 | 42 | Treat the WebAssembly name and any BA-associated names with 43 | respect, and follow BA trademark and branding guidelines. If you 44 | distribute a customized version of software originally produced by 45 | the BA, or if you build a product or service using BA-derived 46 | software, use names that clearly distinguish your work from the 47 | original. (You should still provide proper attribution to the 48 | original, of course, wherever such attribution would normally be 49 | given.) 50 | 51 | Further, do not use the WebAssembly name or BA-associated names in 52 | other public namespaces in ways that could cause confusion, e.g., 53 | in company names, names of commercial service offerings, domain 54 | names, publicly-visible social media accounts or online service 55 | accounts, etc. It may sometimes be reasonable, however, to 56 | register such a name in a new namespace and then immediately donate 57 | control of that account to the BA, because that would help the project 58 | maintain its identity. 59 | 60 | For further guidance, see the BA Trademark and Branding Policy 61 | [TODO: create policy, then insert link]. 62 | 63 | * **Do not restrict contributors.** If your company requires 64 | employees or contractors to sign non-compete agreements, those 65 | agreements must not prevent people from participating in the BA or 66 | contributing to related projects. 67 | 68 | This does not mean that all non-compete agreements are incompatible 69 | with this code of conduct. For example, a company may restrict an 70 | employee's ability to solicit the company's customers. However, an 71 | agreement must not block any form of technical or social 72 | participation in BA activities, including but not limited to the 73 | implementation of particular features. 74 | 75 | The accumulation of experience and expertise in individual persons, 76 | who are ultimately free to direct their energy and attention as 77 | they decide, is one of the most important drivers of progress in 78 | open source projects. A company that limits this freedom may hinder 79 | the success of the BA's efforts. 80 | 81 | * **Do not use patents as offensive weapons.** If any BA participant 82 | prevents the adoption or development of BA technologies by 83 | asserting its patents, that undermines the purpose of the 84 | coalition. The collaboration fostered by the BA cannot include 85 | members who act to undermine its work. 86 | 87 | * **Practice responsible disclosure** for security vulnerabilities. 88 | Use designated, non-public reporting channels to disclose technical 89 | vulnerabilities, and give the project a reasonable period to 90 | respond, remediate, and patch. [TODO: optionally include the 91 | security vulnerability reporting URL here.] 92 | 93 | Vulnerability reporters may patch their company's own offerings, as 94 | long as that patching does not significantly delay the reporting of 95 | the vulnerability. Vulnerability information should never be used 96 | for unilateral commercial advantage. Vendors may legitimately 97 | compete on the speed and reliability with which they deploy 98 | security fixes, but withholding vulnerability information damages 99 | everyone in the long run by risking harm to the BA project's 100 | reputation and to the security of all users. 101 | 102 | * **Respect the letter and spirit of open source practice.** While 103 | there is not space to list here all possible aspects of standard 104 | open source practice, some examples will help show what we mean: 105 | 106 | * Abide by all applicable open source license terms. Do not engage 107 | in copyright violation or misattribution of any kind. 108 | 109 | * Do not claim others' ideas or designs as your own. 110 | 111 | * When others engage in publicly visible work (e.g., an upcoming 112 | demo that is coordinated in a public issue tracker), do not 113 | unilaterally announce early releases or early demonstrations of 114 | that work ahead of their schedule in order to secure private 115 | advantage (such as marketplace advantage) for yourself. 116 | 117 | The BA reserves the right to determine what constitutes good open 118 | source practices and to take action as it deems appropriate to 119 | encourage, and if necessary enforce, such practices. 120 | 121 | ## Enforcement 122 | 123 | Instances of organizational behavior in violation of the OCoC may 124 | be reported by contacting the Bytecode Alliance CoC team at 125 | [report@bytecodealliance.org](mailto:report@bytecodealliance.org). The 126 | CoC team will review and investigate all complaints, and will respond 127 | in a way that it deems appropriate to the circumstances. The CoC team 128 | is obligated to maintain confidentiality with regard to the reporter of 129 | an incident. Further details of specific enforcement policies may be 130 | posted separately. 131 | 132 | When the BA deems an organization in violation of this OCoC, the BA 133 | will, at its sole discretion, determine what action to take. The BA 134 | will decide what type, degree, and duration of corrective action is 135 | needed, if any, before a violating organization can be considered for 136 | membership (if it was not already a member) or can have its membership 137 | reinstated (if it was a member and the BA canceled its membership due 138 | to the violation). 139 | 140 | In practice, the BA's first approach will be to start a conversation, 141 | with punitive enforcement used only as a last resort. Violations 142 | often turn out to be unintentional and swiftly correctable with all 143 | parties acting in good faith. 144 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Bytecode Alliance RFCs 2 | 3 | This respository is the home of the RFC (request for comments) process for Bytecode Alliance projects. RFCs are a tool for getting feedback on design and implementation ideas and for consensus-building among stakeholders. 4 | 5 | ## What is an RFC? 6 | 7 | An RFC is a markdown file laying out a problem and a proposed solution. To support getting feedback early on, RFCs can come in [draft](template-draft.md) or [full](template-full.md) forms (see the linked templates for details). Draft RFCs should be opened as [draft PRs](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests#draft-pull-requests). In either case, discussion happens by opening a pull request to place the RFC's markdown file into the `accepted` directory. 8 | 9 | ## When is an RFC needed? 10 | 11 | Many changes to Bytecode Alliance projects can and should happen through every-day GitHub processes: issues and pull requests. An RFC is warranted when: 12 | 13 | * The work involves changes that will significantly affect stakeholders or project contributors. Each project may provide more specific guidance. Examples include: 14 | * Major architectural changes 15 | * Major new features 16 | * Simple changes that have significant downstream impact 17 | * Changes that could affect guarantees or level of support, e.g. removing support for a target platform 18 | * Changes that could affect mission alignment, e.g. by changing properties of the security model 19 | * The work is substantial and you want to get early feedback on your approach. 20 | 21 | ## Workflow 22 | 23 | ### Creating and discussing an RFC 24 | 25 | * The RFC process begins by submitting a (possibly draft) pull request, using one of the two templates available in the repository root. The pull request should propose to add a single markdown file into the `accepted` subdirectory, following the template format, and with a descriptive name. 26 | 27 | * The pull request is tagged with a **project label** designating the Bytecode Alliance project it targets. 28 | 29 | * Once an RFC PR is open, stakeholders and project contributors will discuss it together with the author, raising any points of concern, exploring tradeoffs, and honing the design. 30 | 31 | ### Making a decision: merge or close 32 | 33 | TBD, see [the first RFC](https://github.com/bytecodealliance/rfcs/pull/1). -------------------------------------------------------------------------------- /accepted/async-imports.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | [summary]: #summary 4 | 5 | This RFC proposes adding native support to the `wasmtime` crate for imported 6 | functions which are `async`. Additionally it provides APIs to call wasm 7 | functions as an `async` function. 8 | 9 | # Motivation 10 | [motivation]: #motivation 11 | 12 | WebAssembly today firmly lives in the "synchronous" world of computation. 13 | Current stable WebAssembly does not have any support for stack-switching or any 14 | standard form of handling asynchronous programs (e.g. something with an event 15 | loop). There are eventual proposals for doing so, but they're expected to be 16 | somewhat far out. 17 | 18 | Rust, however, has an asynchronous ecosystem built around the `Future` trait 19 | in the standard library. This means that if you have an asynchronous host 20 | function you'd like to expose to WebAssembly, it's very difficult to do so today 21 | with Wasmtime. 22 | 23 | We have also seen a number of points of interest in Wasmtime supporting 24 | something akin to stack switching, frequently to enable async functions with 25 | wasmtime: 26 | 27 | * The [`wasmtime-async`](https://crates.io/crates/wasmtime-async) crate. 28 | * [Lucet](https://github.com/bytecodealliance/lucet) supports async 29 | imports/calls. 30 | * [Requests to run lots of wasm on a 31 | server](https://github.com/bytecodealliance/wasmtime/issues/642) 32 | * [Lunatic](https://github.com/lunatic-lang/lunatic) is an example project using 33 | Wasmtime which has stack switching under the hood. 34 | 35 | A frequent use case of Wasmtime is to power server-side WebAssembly execution 36 | where the server may be executing many WebAssembly modules simultaneously. It's 37 | expected that many of these modules will be waiting on I/O or host 38 | functionality, and having a thread-per-module will be too costly in terms of 39 | synchronization and resources. 40 | 41 | The goal of this RFC is to specifically target the `async` use case in Rust. 42 | While the implementation will involve stack-switching and some communication 43 | between the two stacks, it does not expose an API to users which allows 44 | manually doing low-level operations like stack switching, yielding, and 45 | resumption. 46 | 47 | # Proposal 48 | [proposal]: #proposal 49 | 50 | This RFC proposes adding `async` support natively to the `wasmtime` crate 51 | itself. While it's perhaps possible to build this externally (as shown by 52 | `wasmtime-async`) it's a requested enough feature that adding core support 53 | should increase its visibility and help it feel more well supported. 54 | 55 | When adding support for the `wasmtime` crate, however, this proposal also aims 56 | to not disturb the functionality that Wasmtime has today. Not all users of 57 | Wasmtime need to handle `async` code and/or switch stacks. It should still be 58 | possible, for example if necessary, to build Wasmtime without `async` support. 59 | This effectively means that this proposal isn't changing existing APIs in 60 | `wasmtime`, instead augmenting existing ones. Additionally, however, we want to 61 | avoid "split the world" scenarios as much as we can. For example `Memory` 62 | doesn't have much reason to be different in async and non-async worlds, and same 63 | with `Module`. 64 | 65 | ### Changes to `Store` 66 | 67 | With these goals in mind, the first change to Wasmtime is a new 68 | constructor for `Store`: 69 | 70 | ```rust 71 | impl Store { 72 | pub fn new_async(engine: &Engine) -> Store { 73 | // ... 74 | } 75 | } 76 | ``` 77 | 78 | This `new_async` method sets a runtime flag within `Store` intending to indicate 79 | that it is to be used with asynchronous methods defined below. The goal of this 80 | method is to prevent mistakes, not to enable any sort of internal functionality 81 | at this time. This'll be a bit more clear below as we cover the changes to 82 | `Func`. 83 | 84 | ### Creating a `Func` 85 | 86 | The first location where dealing with `async` functions on the host comes into 87 | play when you're creating a `Func` from host-defined functionality. The existing 88 | `Func::new` and `Func::wrap` constructors will continue to work in both async 89 | and non-async `Store` objects. In additon to these constructors, however, new 90 | constructors which take asynchronous functions will be added: 91 | 92 | ```rust 93 | impl Func { 94 | pub fn new_async<'a, T, R>( 95 | store: &Store, 96 | ty: FuncType, 97 | state: T, 98 | func: fn(Caller<'a>, &'a T, &'a [Val], &'a mut [Val]) -> R, 99 | ) -> Func 100 | where 101 | R: Future> + 'a, 102 | { 103 | // ... 104 | } 105 | 106 | pub fn wrap_async(store: &Store, state: T, func: impl IntoFuncAsync) -> Func { 107 | // ... 108 | } 109 | } 110 | 111 | impl<'a, T, A, Fut, R> IntoFuncAsync for fn(&'a T, A) -> Fut 112 | where 113 | Fut: Future + 'a, 114 | A: WasmTy + 'a, 115 | R: WasmRet, // includes `T`, (), and `Result` 116 | { 117 | // ... 118 | } 119 | 120 | impl<'a, T, A, Fut, R> IntoFuncAsync for fn(Caller<'a>, &'a T, A) -> Fut 121 | where 122 | Fut: Future + 'a, 123 | A: WasmTy + 'a, 124 | R: WasmRet, // includes `T`, (), and `Result` 125 | { 126 | // ... 127 | } 128 | ``` 129 | 130 | These are intended to mirror the `new` and `wrap` functions, except they're 131 | bound to functions returning futures instead. These signatures are a bit 132 | wonky, however, and worth going into more detail of. 133 | 134 | First, these functions will panic if the `Store` passed in is not an async 135 | store. This is hoped to prevent a footgun where you hook up an asynchronous 136 | function into a context that's expected to be called synchronously. There's no 137 | real way for us to make this work so we want the programming error to get 138 | reported ASAP. 139 | 140 | Semantically what these instances of `Func` are expected to do is that when 141 | they're invoked they'll be responsible for `poll`-ing the `Future` returned. 142 | When the future is "pending" we'll stack-switch back to the original future that 143 | called us (to be described later). When the future is ready we'll return the 144 | value back to wasm. This way the wasm sees a "blocking" call to an imported 145 | function, when in fact we just suspended the instance temporarily. 146 | 147 | The weirdest part about these signatures, however, is the fact that you're 148 | passing in an explicit state (`T`) and working with a function pointer (`fn`) 149 | instead of some sort of closure (unlike `new` and `wrap` which use `Fn`). The 150 | reason for this is effectively that [async fn in traits is 151 | hard](http://smallcultfollowing.com/babysteps/blog/2019/10/26/async-fn-in-traits-are-hard/). 152 | More concretely what we *want* to write is: 153 | 154 | ```rust 155 | pub fn new_async( 156 | store: &Store, 157 | ty: FuncType, 158 | func: impl AsyncFn(Caller<'_>, &[Val], &[Val]) -> Result<(), Trap>, 159 | ) -> Func 160 | { 161 | // ... 162 | } 163 | ``` 164 | 165 | but the `AsyncFn` trait does not exist today, nor does Rust have the support to 166 | make it exist. An alternative signature is perhaps: 167 | 168 | ```rust 169 | pub fn new_async<'a, R>( 170 | store: &Store, 171 | ty: FuncType, 172 | func: impl Fn(Caller<'a>, &'a [Val], &'a [Val]) -> R, 173 | ) -> Func 174 | where 175 | R: Future> + 'a, 176 | { 177 | // ... 178 | } 179 | ``` 180 | 181 | but this does not allow the output future to borrow the captured environment of 182 | the closure, which is expected to be a common operation. This exposes one of the 183 | key weakneses of async-fn-in-traits which is that there's no way right now to 184 | tie the lifetime of `&self` in the function call to the output future. While 185 | perhaps solvable in the future, there's other problems to tackle as well, and 186 | we'd like to ship something in the meantime! 187 | 188 | Coming back to our original signature: 189 | 190 | ```rust 191 | pub fn new_async<'a, T, R>( 192 | store: &Store, 193 | ty: FuncType, 194 | state: T, 195 | func: fn(&'a T, Caller<'a>, &'a [Val], &'a mut [Val]) -> R, 196 | ) -> Func 197 | where 198 | R: Future> + 'a, 199 | { 200 | // ... 201 | } 202 | ``` 203 | 204 | By using a function pointer we are able to connect the lifetime of the input 205 | state and all the aguments to the output future, meaning it's allowed to borrow 206 | any of its inputs, as you'd expect in `async` Rust. Example usage of this might 207 | look like: 208 | 209 | ```rust 210 | struct MyState; 211 | 212 | impl MyState { 213 | async fn foo(&self) -> i32 { 214 | // ... 215 | } 216 | } 217 | 218 | let state: MyState = ..; 219 | let func = Func::wrap_async(&store, state, |s| async move { s.foo().await }); 220 | ``` 221 | 222 | This is still not as ergonomic as `Func::wrap`, but it's hoped that it's about 223 | the closest that we can get for now. 224 | 225 | ### Calling a `Func` 226 | 227 | After we've created a slew of `Func` instances backed by asynchronous functions, 228 | then at some point we need to actually call wasm code! This is currently done 229 | with `Func::call`, `Func::get`, and `Instance::new` today. Each of these will 230 | get an async counterpart. 231 | 232 | First, however, each entry point of calling a function will panic if it's done 233 | from the wrong store. For example if `Func::call` is used on an async store, 234 | then `Func::call` will panic. Similarly if you use `Func::get2` or 235 | `Instance::new` on an async store the function will panic. Like above with 236 | creating a `Func`, the goal is to signal a programmer error ASAP instead of 237 | deferring it to "this panics only if called synchronously and happens to call an 238 | async import". 239 | 240 | Calling a function is expected to be a pretty straightfoward addition: 241 | 242 | ```rust 243 | impl Func { 244 | pub async fn call_async(&self, params: &[Val]) -> Result> { 245 | // ... 246 | } 247 | } 248 | ``` 249 | 250 | and same with instantiation: 251 | 252 | ```rust 253 | impl Instance { 254 | pub async fn new_async(store: &Store, module: &Module, imports: &[Extern]) -> Result { 255 | // ... 256 | } 257 | } 258 | ``` 259 | 260 | The `Func::get*` suite of methods are a bit more complicated but intended to 261 | still be relatively straightforward: 262 | 263 | ```rust 264 | impl Func { 265 | pub fn get1_async(&self) -> Result FuncCall> 266 | where 267 | A: WasmTy, 268 | R: WasmTy, 269 | { 270 | // ... 271 | } 272 | } 273 | 274 | pub struct FuncCall { 275 | // ... 276 | } 277 | 278 | impl Future for FuncCall { 279 | type Output = Result; 280 | // ... 281 | } 282 | ``` 283 | 284 | These will all *panic* (not return an error) if called within a non-async store. 285 | 286 | In all of these cases the futures returned, when polled, will execute 287 | WebAssembly code. This is more fully described in the next section. The futures 288 | is only pending if an import is called whose future winds up returning that it's 289 | pending. 290 | 291 | ### Execution trace of async wasm 292 | 293 | Here we'll discuss a (hopefully) complete picture of what executing WebAssembly 294 | asynchronously looks like. Some details will be eschewed here and there but 295 | it's intended that all the pieces above can be connected. 296 | 297 | 1. The first step is that the embedder will invoke some WebAssembly 298 | asynchronously. This is done via one of the three entry points of 299 | `Func::call_async`, `Func::get_async` (and then calling it), or 300 | `Instance::new_async`. 301 | 302 | 2. The entry point's future, let's call it future A, is eventually polled. As 303 | this is the first time A is polled, we need to set up a new native stack, 304 | let's call it stack 2 and our current stack stack 1, for execution that can 305 | be suspended. The future A will allocate stack 2 from the `Store` (probably 306 | through some sort of configurable trait) and then initialize stack 2 so, 307 | when started, it will call the appropriate wasm function. The exact details 308 | of the switch will be left to the implementation, but the idea is that some 309 | assembly code will be involved. 310 | 311 | 3. Some native trampoline code initially executes on stack 2 which does 312 | any setup necessary, and then jumps to the native wasm code, still executing 313 | on stack 2. 314 | 315 | 4. WebAssembly executes for awhile on stack 2, doing its thing and 316 | executing for a bit. Note that any synchronous host functions are no 317 | different in this scenario, they'll just execute to completion when called. 318 | 319 | 5. Eventually an imported function is called which was defined with an 320 | asynchronous Rust function. This first invokes the asynchronous function 321 | which returns a future, called future B. Using the polling context we had 322 | originally in step 2 when polling A, we poll the future B. 323 | 324 | 6. Future B is not ready. This triggers a transition back to stack 1. Future A's 325 | `poll` method receives informationg through a side-channel (probably in 326 | `Store` or something like that) about the status of what happened on stack 2. 327 | It sees that future B is not ready, so future A says it is not ready. 328 | 329 | 7. When the runtime deems appropriate, we are then polled again (still on stack 330 | 1). Future A's `poll` method runs again and sees that this is the second time 331 | polling. It switches to stack 2 (stored internally) and resumes execution. 332 | 333 | 8. On stack 2 we re-poll future B, which was pinned to stack 2 and as a result 334 | hasn't moved. If B isn't ready we restart from 6, but let's say it's ready. 335 | This means we return from the native trampoline back to the wasm code. 336 | 337 | 9. Wasm code continues executing on stack 2. If it calls more asynchronous 338 | imports we resume from step 5. Otherwise let's say the wasm returns. 339 | 340 | 10. After the wasm returns we are returned to the original native trampoline 341 | on stack 2. We squirrel away the return values and then switch to stack 1. 342 | Back on stack 1 we see that execution finished. After deallocating the stack 343 | we then return that the future is done. 344 | 345 | Note that this is all somewhat slow so the initial implementation probably won't 346 | be too heavily optimized like the `Func::get*` specialized trampolines. It'll 347 | likely use the `Func::call`-style trampolines to transmit a variable number of 348 | arguments through the stack to avoid having lots of different entry points. 349 | 350 | ### Cancellation 351 | 352 | In the Rust asynchronous world if you want to cancel a future you simply `drop` 353 | it, effectively deallocating its resources. In the trace above we have three 354 | distinct drop points. If we drop the future between steps 1/2, before we ever 355 | polled, then no stack was ever allocated so it's easy to deallocate. Dropping 356 | after step 10, after the future is done, is also easy because there is no stack. 357 | Dropping between 6/7, however, is the interesting part. 358 | 359 | If the future is dropped while wasm is suspended, then we need to clean up the 360 | stack. To do this we need to unwind all frames on the stack and run destructors, 361 | we can't simply just deallocate the stack to the original allocator. The current 362 | thinking of how to do this is that we'll induce traps. 363 | 364 | When (in the above example) future A is dropped between steps 6/7, the following 365 | will happen: 366 | 367 | * In the destructor of future A we'll resume execution on stack 2. 368 | 369 | * Execution on stack 2 will receive the signal that it's been canceled. This 370 | will drop the future B and then return a trap from the host call, simulating 371 | that the imported function returned a trap. 372 | 373 | * Execution of the wasm will proceed as usual if a trap otherwise happened. 374 | 375 | * Eventually we'll reach the original native trampoline code on stack 2, which 376 | will switch back to stack 1 indicating that a trap happened. 377 | 378 | * Back on stack 1, inside future A's destructor, we'll discard the result of the 379 | execution, a trap, and deallocate the stack now that there are no more frames 380 | on it. 381 | 382 | The trickiest part about this is that if wasm/native code are interleaved on the 383 | stack then native code can "catch" the "you've been cancelled" trap and resume 384 | execution as usual, possibly taking a long time to return. This is, however, 385 | also true of "you've been interrupted" traps which can also be caught. It's 386 | expected that native code mostly just forwards traps along so this won't be much 387 | of a problem (despite it being a possibility). If this issue is urgent, 388 | however, we could consider adding a mechanism to `Store` which "poisons" wasm 389 | execution or similar and prevents reentering any wasm module and always returns 390 | a trap if wasm is called. This way while we're running A's destructor it will 391 | prevent any further entry into wasm. 392 | 393 | # Rationale and alternatives 394 | [rationale-and-alternatives]: #rationale-and-alternatives 395 | 396 | One of the main restrictions of this proposal is that the only API surface area 397 | added to `wasmtime` is a few `async`-aware functions. This intentionally leaves 398 | out low-level API knobs for yielding/resumption/stack switching. The hope is 399 | that almost all consumers will be able to get by with only using `async`. 400 | Designing support for the highest priority use case also hopefully helps keeps 401 | us flexible as future wasm proposals get fleshed out and may want to get added 402 | to wasmtime as well. Ideally we wouldn't accidentally box ourselves into a 403 | particular implementation which ends up being difficult to add a future wasm 404 | proposal. 405 | 406 | Another goal of this proposal was to integrate well into the `wasmtime` crate's 407 | existing API, but it still unfortunately has a "split world" problem with 408 | calling and creating functions. It's unknown whether this will make 409 | wasmtime-as-a-library difficult to use if contexts aren't able to easily reason 410 | about whether async should be used or not. It's hoped, however, that given the 411 | purpose of Wasmtime that this is not a heavily shared dependency amongst many 412 | crates but rather a component of a larger application which is able to make 413 | these sorts of policy decisions. 414 | 415 | # Open questions 416 | [open-questions]: #open-questions 417 | 418 | - All futures returned by `wasmtime` will not be `Send`. This is due to the fact 419 | that nothing relating to a `Store` is `Send`. How does this impact expected 420 | use cases of running `wasmtime` with asynchronous imports? Is this too 421 | restrictive? If too restrictive, how is this expected to be solved? 422 | -------------------------------------------------------------------------------- /accepted/benchmark-infrastructure.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | [summary]: #summary 3 | 4 | This RFC proposes setting up a benchmarking and performance-tracking 5 | infrastructure to track Cranelift and wasmtime performance over time. We would 6 | like to (i) establish some server infrastructure, (ii) build automation 7 | integrated with our GitHub repos, and (iii) provide a web-based frontend to 8 | show historical and current performance data. 9 | 10 | # Motivation 11 | [motivation]: #motivation 12 | 13 | As we develop the Cranelift compiler and its machine backends, and as it is 14 | used in more production settings, its performance becomes increasingly 15 | important. In many settings, a few percentage points change in compilation time 16 | or generated-code performance is quite a big deal. In addition, experience in 17 | many compiler projects over the years has shown that small performance deltas 18 | have a way of accumulating over time; if there is no systematic tracking, this 19 | gradual drift can add up to sizeable performance regressions. 20 | 21 | It's thus important both to track historical trends, in order to avoid 22 | regressions and (more positively) to measure our progress; and to measure the 23 | effect of individual changes, when significant enough, to determine objectively 24 | whether optimizations or design choices are good ideas. 25 | 26 | Independently of *what* benchmarks we measure, we need an *infrastructure* to 27 | measure, record, and visualize statistics. This RFC proposes building such an 28 | infrastructure. 29 | 30 | # Proposal sketch 31 | [proposal]: #proposal 32 | 33 | ## Step 1: Establish physical infrastructure 34 | 35 | First, we need to establish a set of machines to run benchmarks and host 36 | results. The GitHub-hosted CI infrastructure we currently use for testing will 37 | likely not be adequate, because (i) runtime of benchmarks will be longer than 38 | we would like for CI actions, (ii) we don't want to tie benchmark runs to CI 39 | runs, necessarily, and (iii) CI-runner environments are unpredictable and 40 | unreliable environments for accurate, low-noise benchmark measurements. 41 | 42 | We already have access to one AArch64 server-grade host for interactive 43 | development of the Cranelift ARM backend, generously [provided by 44 | ARM](https://github.com/WorksOnArm/cluster/issues/204), and have the ability to 45 | run benchmarking on it. We have also procured access to a dedicated x86-64 46 | machine. These two machines would allow us to benchmark Linux/x86-64 and 47 | Linux/aarch64. Another worthwhile target would be Windows/x86-64, which is 48 | interesting because of differing ABIs and some VM details, but we have not yet 49 | decided to commit resources (an additional machine) or develop specific runner 50 | infrastructure for this. 51 | 52 | Relevant Bytecode Alliance-affiliated engineers would then be granted access to 53 | these machines and maintain a benchmark-runner infrastructure as described 54 | below. 55 | 56 | ## Step 2: Build and host a metrics-visualization web UI 57 | 58 | We will need a frontend to present the benchmarking data that this 59 | infrastructure collects. Johnnie Birch and Andrew Brown of Intel have developed 60 | a prototype web-based frontend that records and plots performance of benchmark 61 | suites over time; this is the most likely candidate. 62 | 63 | We hope to initially work out a static-file / client-side solution, which would 64 | allow us maximal flexibility in hosting configuration. For example, the 65 | benchmarking infrastructure could upload the latest data to a Git repository, 66 | to be served by GitHub Pages. We do not anticipate the need for a full 67 | database-based backend, and if data becomes large enough then we can segment 68 | the static data (JSON) files and download selectively. 69 | 70 | Initially, we can populate the data for this UI with manually-initiated 71 | benchmarking runs on the benchmarking machines. 72 | 73 | ## Step 3: Set up GitHub-integrated benchmark runner automation 74 | 75 | Once we have the machines set up, a UI established, and the ability to run 76 | benchmarks and upload results, we should develop a daemon that runs 77 | continuously on benchmark-runner machines and monitors our GitHub repositories. 78 | When a request for a benchmark run is initiated, the runner daemon should start 79 | a run; when the run is complete, it should either post the results on the 80 | relevant GitHub PR conversation, or include the results in its web UI, or both, 81 | as appropriate. 82 | 83 | The largest design question will be how this automation ensures security: 84 | because the runner will download a git commit and execute arbitrary code from 85 | that commit, we *cannot* simply execute this on every PR or every anonymous 86 | request. Rather, we will likely have a bot that includes an allow-list of 87 | trusted project members, and wait for a special GitHub comment from such a 88 | person to kick off a run. 89 | 90 | If any of the above becomes impractical or difficult, an intermediate 91 | design-point could involve a web interface that allows approved users 92 | (authenticated in some way) to request a run on a given git commit hash; this 93 | would not require any GitHub API integration. We expect, though, given prior 94 | art in GitHub-integrated CI-like bots (such as the Rust project's bors bot), 95 | that the integration should not present unforeseen problems. 96 | 97 | Note that this runner infrastructure, at the macro level (repo checkout and 98 | command invocation), is largely orthogonal to the benchmark suite and its 99 | specific harness. We will likely want to design a simple in-repo configuration 100 | file consumed by the runner daemon that specifies what to run and in what 101 | environment (e.g., a Dockerfile-specified container). 102 | 103 | # Open questions 104 | [open-questions]: #open-questions 105 | 106 | ## Security 107 | 108 | We need to ensure that, given the potential for arbitrary execution of code 109 | from a PR, the necessary safeguards are in place to gate on approvals. 110 | Potentially we should also sandbox in other ways; for example wrap the runner 111 | daemon in a chroot or container with limited capabilities, and cap its resource 112 | limits. 113 | 114 | ## Availability Expectations 115 | 116 | We should set expectations that the benchmarking service may occasionally 117 | become unavailable due to hiccups in operational details: for example, a 118 | build/benchmarking server might go down or run out of disk space, or a PR might 119 | break a benchmark run and not be caught by CI. At least two questions arise: 120 | how or if such a situation blocks work from being done, and what resources 121 | (time / engineering) we apply to minimize it. 122 | 123 | Given that not every PR will be performance-sensitive and require benchmarking, 124 | such downtime should not impact progress in most cases, so it is likely to work 125 | well enough to start with a "best effort" policy. In other words, repairing any 126 | benchmark-infrastructure breakage should be done, but is lower-priority than 127 | other work, so as to not impose too much of a burden or require an "on-call" 128 | system. If a performance-sensitive PR is blocked on a need for benchmarking 129 | results, then we can expedite work to bring it back online. Otherwise, the 130 | ability to request runs on arbitrary commits should allow us to fill in a gap 131 | after-the-fact and reconstruct a full performance history on the `main` branch 132 | even if we experience some outages. 133 | -------------------------------------------------------------------------------- /accepted/benchmark-suite-milestones.dot: -------------------------------------------------------------------------------- 1 | digraph { 2 | rankdir = "LR"; 3 | 4 | { 5 | rank = same; 6 | initial_runner [label = "Initial Runner"]; 7 | initial_candidates [label = "Initial Benchmark Program Candidates"]; 8 | initial_analysis [label = "Initial Results Analysis w/ Significance Testing"]; 9 | } 10 | 11 | { 12 | rank = same; 13 | mvp [shape = "rectangle", label = "MVP"]; 14 | } 15 | initial_runner -> mvp; 16 | initial_candidates -> mvp; 17 | initial_analysis -> mvp; 18 | 19 | { 20 | rank = same; 21 | shuffling_allocator [label = "Shuffling Allocator"]; 22 | stack_padding [label = "Cranelift Stack Padding"]; 23 | random_env_vars [label = "Randomizing Environment Variables"]; 24 | shuffling_linker [label = "Shuffling Linker"]; 25 | random_exec_order [label = "Randomizing Execution Order"]; 26 | } 27 | initial_runner -> random_env_vars; 28 | initial_runner -> random_exec_order; 29 | 30 | shuffling_allocator -> measurement_bias_mitigated; 31 | stack_padding -> measurement_bias_mitigated; 32 | random_env_vars -> measurement_bias_mitigated; 33 | shuffling_linker -> measurement_bias_mitigated; 34 | random_exec_order -> measurement_bias_mitigated; 35 | 36 | { 37 | // rank = same; 38 | // sig_test [label = "Tests of Signficance"]; 39 | effect_size [label = "Effect Size Confidence Intervals"]; 40 | } 41 | // initial_analysis -> sig_test; 42 | // sig_test -> effect_size; 43 | initial_analysis -> effect_size; 44 | 45 | { 46 | rank = same; 47 | more_candidates [label = "More Benchmark Candidates"]; 48 | metrics_instrumenter [label = "Candidate Metrics Instrumentation"]; 49 | pca_analysis [label = "Principal Component Analysis"]; 50 | } 51 | initial_candidates -> more_candidates; 52 | 53 | more_candidates -> representative_and_diverse_corpus; 54 | metrics_instrumenter -> representative_and_diverse_corpus; 55 | pca_analysis -> representative_and_diverse_corpus; 56 | 57 | effect_size -> sound_analysis; 58 | 59 | { 60 | rank = same; 61 | github_action [label = "GitHub Action Bot for Testing PRs"]; 62 | regression_tester [label = "Automatic Regression Detector"]; 63 | } 64 | initial_runner -> github_action; 65 | initial_runner -> regression_tester; 66 | initial_analysis -> regression_tester; 67 | 68 | github_action -> dx; 69 | regression_tester -> dx; 70 | 71 | { 72 | rank = same; 73 | measurement_bias_mitigated [shape = "rectangle", label = "Measurement Bias Mitigated"]; 74 | representative_and_diverse_corpus [shape = "rectangle", label = "Representative and Diverse Corpus"]; 75 | sound_analysis [shape = "rectangle", label = "Statistically Sound and Rigorous Analysis"]; 76 | dx [shape = "rectangle", label = "Excellent Developer Experience"]; 77 | } 78 | 79 | { 80 | rank = same; 81 | benchmark_suite_complete [shape = "rectangle", label = "Benchmark Suite Complete"]; 82 | } 83 | 84 | measurement_bias_mitigated -> benchmark_suite_complete; 85 | dx -> benchmark_suite_complete; 86 | sound_analysis -> benchmark_suite_complete; 87 | representative_and_diverse_corpus -> benchmark_suite_complete; 88 | } 89 | -------------------------------------------------------------------------------- /accepted/benchmark-suite-milestones.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bytecodealliance/rfcs/3ad1a42255b66619efd0e86e9dffda7639ace51d/accepted/benchmark-suite-milestones.png -------------------------------------------------------------------------------- /accepted/cfi-improvements-with-pauth-and-bti.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | [summary]: #summary 3 | 4 | This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two 5 | technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target 6 | Identification. 7 | 8 | # Motivation 9 | [motivation]: #motivation 10 | 11 | The [security model of WebAssembly][wasm-security] ensures that Wasm modules execute in a sandboxed 12 | environment isolated from the host runtime. One aspect of that model is that it provides implicit 13 | control flow integrity (CFI) by forcing all function call targets to specify a valid entry in the 14 | function index space, by using a protected call stack that is not affected by buffer overflows in 15 | the module heap, and so on. As a result, in some Wasm applications the runtime is able to execute 16 | untrusted code safely. However, the burden of ensuring that the security properties are upheld is 17 | placed on the compiler to a large extent. 18 | 19 | On the other hand, a further aspect of the WebAssembly design is efficient execution (close to 20 | native speed), which leads to a natural tendency towards sophisticated optimizing compilers. 21 | Unfortunately, the additional complexity increases the risk of implementation problems and in 22 | particular compromises of the security properties. For example, Cranelift has been affected by 23 | issues such as [CVE-2021-32629][cve] that could make it possible to access the protected call stack 24 | or memory that is private to the host runtime. 25 | 26 | We are trying to tackle the challenge of ensuring compiler correctness with initiatives such as 27 | expanding fuzzing and making it possible to apply formal verification to at least some parts of the 28 | compilation process. However, it is also reasonable to consider a defense in depth strategy and to 29 | evaluate mitigations for potential future issues. 30 | 31 | Finally, Wasmtime can be used as a library and in particular embedded into an application that is 32 | implemented in languages that lack some of the hardening provided by Rust such as C and C++. In that 33 | case the compiled WebAssembly code could provide convenient instruction sequences for attacks that 34 | subvert normal control flow and that originate from the embedder's code, even if Cranelift and 35 | Wasmtime themselves lack any defects. 36 | 37 | [cve]: https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-hpqh-2wqx-7qp5 38 | [wasm-security]: https://webassembly.org/docs/security 39 | 40 | # Proposal 41 | [proposal]: #proposal 42 | 43 | Currently this proposal focuses on the AArch64 execution environment. 44 | 45 | ## Background 46 | 47 | The Pointer Authentication (PAuth) extension to the Arm architecture protects function returns, i.e. 48 | provides back-edge CFI. It is described in section D5.1.5 of 49 | [the Arm Architecture Reference Manual][arm-arm]. Some of the PAuth operations act as `NOP` 50 | instructions when executed by a processor that does not support the extension. Furthermore, a code 51 | generator can use either one of two keys (A and B) for the pointer authentication instructions; the 52 | architecture does not impose any restrictions on any of them, leaving that to the software 53 | environment. 54 | 55 | The Branch Target Identification (BTI) extension protects other kinds of indirect branches, that is 56 | provides forward-edge CFI and is described in section D5.4.4. Whether BTI applies to an executable 57 | memory page or not is controlled by a dedicated page attribute. Note that the `BTI` "landing pad" 58 | for indirect branches acts as a `NOP` instruction when the extension is not active (e.g. for 59 | processors that do not support BTI). 60 | 61 | Both extensions are applicable only to the AArch64 execution state and are optional, so the usage of 62 | each CFI technique will be controlled by dedicated settings. Wasmtime embedders need to consider a 63 | subtlety - the setting values may happen to be located in memory that could be potentially 64 | accessible to an attacker, so the latter could disable the use of PAuth and BTI in subsequent code 65 | generation. Mitigating this issue is outside the scope of this proposal. 66 | 67 | The article [*Code reuse attacks: The compiler story*][code-reuse-attacks] and the whitepaper 68 | [*Pointer Authentication on ARMv8.3*][qualcomm-pauth] provide an introduction to the technologies. 69 | 70 | In the Intel® 64 architecture [the Control-Flow Enforcement Technology (CET)][intel-cet] provides 71 | similar capabilities. 72 | 73 | [arm-arm]: https://developer.arm.com/documentation/ddi0487/gb/?lang=en 74 | [code-reuse-attacks]: https://community.arm.com/arm-community-blogs/b/tools-software-ides-blog/posts/code-reuse-attacks-the-compiler-story 75 | [intel-cet]: https://www.intel.com/content/www/us/en/developer/articles/technical/technical-look-control-flow-enforcement-technology.html 76 | [qualcomm-pauth]: https://www.qualcomm.com/documents/whitepaper-pointer-authentication-armv83 77 | 78 | ## Improved back-edge CFI with PAuth 79 | 80 | Assuming that the A key is used, the proposed implementation will add the `PACIASP` instruction to 81 | the beginning of every function compiled by Cranelift and will replace the final return with either 82 | the `RETAA` instruction or a combination of `AUTIASP` and `RET`. 83 | 84 | In environments that use the DWARF format for unwinding the implementation will be modified to apply 85 | the `DW_CFA_AARCH64_negate_ra_state` operation or an equivalent immediately after the `PACIASP` 86 | instruction. 87 | 88 | Those steps will be skipped for simple leaf functions that do not construct frame records on the 89 | stack. 90 | 91 | As a conrete example, consider the following function: 92 | 93 | ```plain 94 | function %f() { 95 | fn0 = %g() 96 | 97 | block0: 98 | call fn0() 99 | return 100 | } 101 | ``` 102 | 103 | Without the proposal it will result in the generation of: 104 | 105 | ```plain 106 | stp fp, lr, [sp, #-16]! 107 | mov fp, sp 108 | ldr x0, 1f 109 | b 2f 110 | 1: 111 | .byte 0x00, 0x00, 0x00, 0x00 112 | .byte 0x00, 0x00, 0x00, 0x00 113 | 2: 114 | blr x0 115 | ldp fp, lr, [sp], #16 116 | ret 117 | ``` 118 | 119 | And with the proposal: 120 | 121 | ```plain 122 | paciasp 123 | stp fp, lr, [sp, #-16]! 124 | mov fp, sp 125 | ldr x0, 1f 126 | b 2f 127 | 1: 128 | .byte 0x00, 0x00, 0x00, 0x00 129 | .byte 0x00, 0x00, 0x00, 0x00 130 | 2: 131 | blr x0 132 | ldp fp, lr, [sp], #16 133 | retaa 134 | ``` 135 | 136 | Associated AArch64-specific Cranelift settings - the default values are always `false`: 137 | * `has_pauth` - specifies whether the target environment supports PAuth 138 | * `sign_return_address` - the main setting controlling whether the back-edge CFI implementation is 139 | used; results in the generation of operations that act as `NOP` instructions unless `has_pauth` is 140 | also enabled 141 | * `sign_return_address_all` - specifies that all function return addresses will be authenticated, 142 | including the previously mentioned cases that do not need it in principle 143 | * `sign_return_address_with_bkey` - changes the generated instructions to use the B key; note that 144 | this is enforced for any Apple ABI, irrespective of the value of this setting 145 | 146 | ## Enhanced forward-edge CFI with BTI 147 | 148 | The proposed implementation will add the `BTI j` instruction to the beginning of every basic block 149 | that is the target of an indirect branch and that is not a function prologue. Note that in the 150 | AArch64 backend generated function calls always target function prologues and indirect branches that 151 | do not act like function calls appear only in the implementation of the `br_table` IR operation. 152 | On the other hand, function prologues will begin with the `BTI c` instruction, keeping in mind that 153 | Cranelift does not have any special handling of tail calls. If PAuth is used at the same time, then 154 | the initial `PACIASP`/`PACIBSP` operation will act as a landing pad instead. 155 | 156 | There is only one associated AArch64-specific Cranelift setting, `use_bti`, which is `false` by 157 | default. Wasmtime will set the respective memory protection attribute for all executable pages if 158 | the WebAssembly module has been compiled with that setting enabled; similarly for the Cranelift JIT. 159 | 160 | ## CFI improvements to code that is not compiled by Cranelift 161 | 162 | Currently the code that is not compiled by Cranelift is in assembly, C, C++, or Rust. 163 | 164 | Improving CFI for compiled C, C++, and Rust code with the same technologies is outside the scope of 165 | this proposal, but in general it should be achievable by passing the appropriate parameters to the 166 | respective compiler. 167 | 168 | Functions implemented in assembly will get a similar treatment as generated code, i.e. they will 169 | start with the `PACIASP` instruction (and any unwinding directives), assuming that the A key is 170 | used. However, the regular return will be preserved and instead will be preceded by the `AUTIASP` 171 | instruction. The reason is that both `AUTIASP` and `PACIASP` act as `NOP` instructions when executed 172 | by a processor that does not support PAuth, thus making the assembly code generic. Functions that do 173 | not need the pointer authentication operations will start with the `BTI c` instruction instead. 174 | 175 | One potential problem in the interaction between code that is compiled by Cranelift and code that is 176 | not is that only one side might have the CFI enhancements. However, this proposal does not have any 177 | ABI implications, so Rust code in the Wasmtime implementation that does not use PAuth and BTI, for 178 | example, would be able to call functions compiled by Cranelift without any issues and vice versa. 179 | The reason is that it is the responsibility of the callee to ensure that PAuth is used correctly, 180 | while everything is transparent to the caller. As for BTI, if an executable memory page does not 181 | have the respective attribute set, then the extension does not have any effect, except for 182 | introducing extra `NOP` instructions, irrespective of how the code has been reached (e.g. via a 183 | branch from a page with BTI protections enabled); similarly for branches out of the unprotected 184 | page. The major exception that is relevant to Wasmtime is unwinding, but there should be no issues 185 | as long as the abovementioned DWARF operation is used and the system unwinder is recent. 186 | 187 | Future work that is beyond what this proposal presents may introduce further hardening that 188 | necessitates ABI changes, e.g. by being based on 189 | [the proposed PAuth ABI extension to ELF][pauth-abi] or something similar. 190 | 191 | [pauth-abi]: https://github.com/ARM-software/abi-aa/blob/2021Q3/pauthabielf64/pauthabielf64.rst 192 | 193 | ### Fiber implementation in Wasmtime 194 | 195 | The fiber implementation in Wasmtime consists of a significant amount of assembly code that will 196 | receive the treatment described in the previous section, as an initial implementation. However, the 197 | fiber switching code saves the values of all callee-saved registers on the stack, i.e. memory that 198 | is potentially accessible to an adversary. Some of those values could be code addresses that would 199 | be used by indirect branches, so a complete CFI implementation will verify the integrity of the 200 | saved state with the `PACGA` instruction. 201 | 202 | # Rationale and alternatives 203 | [rationale-and-alternatives]: #rationale-and-alternatives 204 | 205 | Since the existing implementation already uses the standard back-edge CFI techniques that are 206 | preferred in the absence of special hardware support (i.e. a separate protected stack that is not 207 | used for buffers that could be accessed out of bounds), the alternative is not to implement the 208 | proposal, so the rationale is based mainly on the overhead being insignificant. In terms of code 209 | size the impact of the back-edge CFI improvements is 1 or 2 additional instructions per function. 210 | 211 | The [Clang CFI design][clang-cfi-design] provides an idea for an alternative implementation of the 212 | forward-edge CFI mechanism that is enabled by BTI. It involves instrumenting every indirect branch 213 | to check if its destination is permitted. While the overhead of this approach can be reduced by 214 | using efficient data structures for the destination address lookup and optionally limiting the 215 | checks only to indirect function calls, it is still significantly larger than the worst-case BTI 216 | overhead of one instruction per basic block per function. On the other hand, it does not require any 217 | special hardware support, so it could be applied to all supported platforms. 218 | 219 | [clang-cfi-design]: https://clang.llvm.org/docs/ControlFlowIntegrityDesign.html 220 | 221 | # Open questions 222 | [open-questions]: #open-questions 223 | 224 | - What is the performance overhead of the proposal? 225 | -------------------------------------------------------------------------------- /accepted/cranelift-backend-transition.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | [summary]: #summary 3 | 4 | This RFC proposes to transition the default x86-64 compiler backend in 5 | Cranelift's `codegen` crate, and also in Wasmtime when using the 6 | Cranelift backend, to the "new", "MachInst", or "VCode" backend that 7 | has been developed over the past year. Given that it has achieved 8 | sufficient feature-completeness to pass all relevant tests, has 9 | undergone testing in CI and via fuzzing since late 2020, and has 10 | recently passed a security audit, we believe it is time to transition 11 | the default setting. This RFC will propose a set of criteria that we 12 | should ensure are met before we actually make the switch, and a set of 13 | steps to take once we do. Note that this RFC does *not* propose 14 | removing the old backend; it should remain an option that is 15 | selectable both by a runtime "variant" flag, and by a build-time Cargo 16 | feature flag that makes it the default variant again. 17 | 18 | # Motivation 19 | [motivation]: #motivation 20 | 21 | For the past year, we have been developing a [new backend 22 | infrastructure](https://github.com/bytecodealliance/wasmtime/tree/main/cranelift/codegen/src/machinst/) 23 | for Cranelift. Although this infrastructure was initially brought-up 24 | in tandem with our aarch64 backend, we soon began work on a new x86-64 25 | backend in the framework as well, since our long-term plan was to 26 | deprecate and eventually remove the old Cranelift backend design and 27 | related infrastructure. 28 | 29 | For the past year, the current ("legacy" or "old") x86-64 backend has 30 | remained the default for a user of the `cranelift-codegen` crate, or 31 | `wasmtime`, who does not set any non-default options. However, we have 32 | ensured that the new backend runs on CI alongside the old one, and it 33 | is available under a feature flag today. 34 | 35 | Work on this new backend has progressed to the point that it is now 36 | stable, generally has good performance, and seems to be generally 37 | easier to work with and maintain. Furthermore, we have gained 38 | confidence in its security stance after a security-focused evaluation. 39 | 40 | Given the increasing confidence in the new x86-64 Cranelift backend, 41 | several significant users of the compiler have switched to the new x86 42 | backend by default. In particular, Lucet ([PR 43 | #646](bytecodealliance/lucet#646)) and rustc\_codegen\_cranelift ([PR 44 | #1127](bjorn3/rustc_codegen_cranelift#/1127) and [PR 45 | #1140](bjorn3/rustc_codegen_cranelift#1140)) have already switched to using the new backend by default (Lucet) or as the only option (cg\_clif). 46 | 47 | Given this setting, we believe it is time to consider switching the 48 | default backend for the `cranelift-codegen` crate, hence for all users 49 | by default; and also, at the same time, for Wasmtime. 50 | 51 | This discussion began in summer 2020 in issue 52 | [#1936](bytecodealliance/wasmtime#1936); this RFC extends @bnjbvr's 53 | initial proposal and excellent fact-finding work in that issue in 54 | order to get final sign-off and finish the transition. 55 | 56 | # Proposal 57 | [proposal]: #proposal 58 | 59 | At a high level, we propose (i) evaluating criteria by which we decide 60 | to make the transition; (ii) obtaining the appropriate sign-offs; and 61 | (iii) transitioning in several steps in a way that is designed to 62 | preserve stability and leave options if anything goes wrong. 63 | 64 | ## Transition Criteria 65 | 66 | 1. *Feature-completeness*: All needed functionality must be present and 67 | all tests must pass. 68 | 69 | - Status: we are largely there already. In 70 | bytecodealliance/wasmtime#2718, a trial/draft PR, the default was 71 | switched and all tests that are still applicable (i.e., not testing 72 | specific details of the old backend) are passing. Only 73 | bytecodealliance/wasmtime#2710 needs to land for the new backend to 74 | be fully feature-complete. 75 | 76 | - Note: the new backend does not fully support Wasm-SIMD. However, 77 | the old backend no longer does, either, after the proposal's 78 | recent evolution. Hence, we are willing to accept this 79 | incompleteness because it is not a regression overall. 80 | 81 | 2. *Performance*: The new backend has acceptable compilation speed and 82 | generates acceptably good code. 83 | 84 | - Earlier evaluations showed that we were trending this way. No 85 | recent comprehensive comparison has been done, however. This RFC 86 | proposes to use the 87 | [Sightglass](https://github.com/bytecodealliance/sightglass/) 88 | benchmark suite in order to evaluate both dimensions. 89 | 90 | - Compilation time: we should expect as good or better compilation 91 | time on most benchmarks. We have seen some cases where the register 92 | allocator in the new backend can be slower on very large 93 | inputs. This RFC proposes to balance such slowdowns against the 94 | other benefits of the transition and to cautiously accept a small 95 | budget for some such cases, in tandem with a priority effort to 96 | understand the slowdown and address it soon. 97 | 98 | - Open question: how much degradation should we accept? 99 | 100 | - Runtime: we should expect as good or better code generated on most 101 | benchmarks. 102 | 103 | - Open question: how much degradation should we accept? 104 | 105 | 3. *Compatibility*: there should be no known or open blocking issues 106 | with the new backend when used by any significant user; in other 107 | words, we should not break anyone, and if the transition would do 108 | so, we should work with these stakeholder(s) first to work around 109 | the issue. 110 | 111 | - Status: cg\_clif and Lucet have already transitioned to the new 112 | backend. Wasmtime is compatible. Firefox uses Cranelift only on 113 | aarch64, which is already using the new backend framework. 114 | 115 | - Open question: are there other projects with which we should 116 | consult? 117 | 118 | 4. *Clean fuzzing record*: we should transition our fuzzers to use the 119 | new backend exclusively, and wait to ensure that no issues arise. We 120 | are currently fuzzing the new backend in Wasmtime differentially 121 | against an interpreter (`wasmi`), and in the past we have fuzzed it 122 | differentially against the old backend when embedded in 123 | Lucet. However, we have other fuzz targets as well that drive the 124 | compiler in different ways. 125 | 126 | ## Transition Steps 127 | 128 | 1. Switch the fuzzers to use the new backend. Add the appropriate 129 | feature flags to `fuzz/Cargo.toml`; wait and verify that `oss-fuzz` 130 | is running the new builds and that no issues arise. 131 | 132 | - *Open question*: how long should we wait? 133 | 134 | 2. Make a final Wasmtime/Cranelift release with the old backend as 135 | default, to provide the latest possible "new features on old 136 | backend" snapshot. 137 | 138 | 3. Land bytecodealliance/wasmtime#2718, which switches the defaults 139 | and makes the old backend a non-default option. 140 | 141 | - All users of `cranelift-codegen` and `wasmtime` get the new backend by default. 142 | 143 | - Both backends are built into the crate. Any user can 144 | programmatically ask for `BackendVariant::Legacy` when 145 | instantiating the compiler backend. 146 | 147 | - The filetest infrastructure accepts variants: `target x86_64 148 | legacy` vs. `target x86_64 machinst` in test files, and 149 | instantiates the appropriate backend. 150 | 151 | - By depending on the codegen or wasmtime crate with the 152 | `old-x86-backend` Cargo feature enabled, the old backend becomes 153 | the default again when no backend variant is specified. 154 | 155 | - The old backend continues to be tested on CI in a separate job, 156 | just as the new backend is tested today. 157 | 158 | 4. Actively monitor users and ensure that no breakage occurs. Address 159 | any issues by fixing bugs, implementing any functionality we have 160 | missed, or helping users to select the old backend, if appropriate. 161 | 162 | 5. Make several releases with the new backend as default. Keep the old 163 | backend alive with CI; it will be "fully supported" during this 164 | time, but its use will be deprecated. 165 | 166 | 6. At some future point, when we are confident that no significant 167 | use-cases or dependencies remain, we can remove the old 168 | backend. This will be a separate RFC process with its own 169 | consensus-gathering. 170 | 171 | # Rationale and alternatives 172 | [rationale-and-alternatives]: #rationale-and-alternatives 173 | 174 | This transition's rationale largely derives from the rationale for the 175 | new backend in general: the new design is simpler, more maintainable, 176 | generally more performant, and has a more trustworthy security 177 | stance. Given these benefits, if there are no blocking issues, it 178 | makes sense to switch the default. 179 | 180 | Furthermore, there is a continuing cost to maintaining support for the 181 | old backend. Its presence in the codebase adds complications because 182 | many data-structures and code paths have to account for both 183 | cases. When we add features that require backend support, we have to 184 | evaluate a tricky tradeoff question: do we spend time adding the 185 | feature to the old backend as well, given that at some future point it 186 | will be removed? If such work is necessary to support a feature in the 187 | *default* build today, then extra work is required that will 188 | eventually no longer be useful. 189 | 190 | Because of this cost and the tradeoffs of spending more effort on the 191 | old backend, it is already stagnating slightly in terms of feature 192 | support. For example, its SIMD support is not up-to-date with respect 193 | to the latest Wasm SIMD proposal. As other Wasm proposals advance, it 194 | will likely fall further behind and time spent updating it will be 195 | harder and harder to justify. 196 | 197 | The major alternative is simply to do nothing: retain the old backend as 198 | default, and new backend as a non-default option. This is the status-quo and 199 | has the least risk; individual embedders can always enable the new backend if 200 | desired. However, it has the significant downside that it implies ongoing 201 | support for the old backend; in the steady state, this is twice the maintenance 202 | and possibly feature-implementation work for one platform (x86-64), and so we 203 | do not consider this to be a serious option *unless* the new backend shows 204 | serious defects that prevent switching. 205 | 206 | In other words, given the direction that effort is being, and will be, spent, 207 | and given the effort and maintenance burden of various options in the future, 208 | it seems to be inevitable that the switch will occur at some point. The 209 | question to answer is whether the new backend is ready *yet* to be the default, 210 | or whether we need to wait longer. 211 | 212 | # Open questions 213 | [open-questions]: #open-questions 214 | 215 | 1. How much compile-time and runtime performance degradation should we 216 | accept, if any, before moving forward with this transition? 217 | 218 | 2. Which other projects or users should we consult to ensure that we 219 | will not cause unnecessary breakage? 220 | 221 | 3. How long should we allow the new backend to run on all fuzz targets 222 | (not just the new-backend-specific one that already exists) before 223 | moving forward? 224 | 225 | 4. How long should we retain the old backend before starting to 226 | consider its removal? 227 | 228 | 5. Should we audit the old backend's code or tests to see if we are 229 | missing any significant functionality or test coverage that should 230 | be moved over? 231 | -------------------------------------------------------------------------------- /accepted/cranelift-dynamic-vector.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | This RFC proposes a way to handle flexible vector types as specified at: https://github.com/WebAssembly/flexible-vectors 4 | 5 | [summary]: #summary 6 | 7 | The proposal is to introduce new dynamically-sized vector types into Cranelift, that: 8 | - Enable dynamic vector type creation using existing fixed vector types and a dynamic scaling factor. 9 | - The dynamic types, 'dt', express a vector with a lane type and shape, but a target-defined scaling factor. 10 | - Space has been allocated in ir::Type for concrete definitions of these new types. 11 | - The dynamic scaling factor is a global value which is defined by the target. 12 | - We currently only support scaling factors which are compile-time constants. 13 | 14 | # Motivation 15 | [motivation]: #motivation 16 | 17 | Flexible vectors are likely coming to WebAssembly so we should support the current spec. This is current path forward to support vectors that are wider than 128-bits. 18 | Rust is also currently starting to use LLVM's ScalableVectorType and, as a target backend, Cranelift could support those directly with a dynamic vector type. 19 | 20 | # Proposal 21 | [proposal]: #proposal 22 | 23 | The following proposal includes changes to the type system, adding specific entities for dynamic stack slots as well as specific instructions that take those entities as operands. 24 | 25 | We can add dynamic vector types by modifying existing structs to hold an extra bit of information to represent the dynamic nature. 26 | 27 | ## Type System 28 | - The new types do not report themselves as vectors, so ty.is\_vector() = false, but are explicitly reported via ty.is\_dynamic\_vector(). 29 | - is\_vector is also renamed to is\_sized\_vector to avoid ambiguity. 30 | - The TypeSet, ValueTypeSet and TypeSetBuilder structs gain an extra NumSet to specify a minimum number of dynamic lanes. 31 | - At the encoding level, space has been taken from the special types range to allow for the new types. Special types now occupy 0x01-0x2f and everything else has moved to fill the space, with dynamic types occupying the end of the range 0x80-0xff. 32 | - These changes allow the usual polymorphic vector operations to be automatically built for the new set of dynamic types. 33 | 34 | ## IR Entities and Function Changes 35 | 36 | A new global value is introduced `dyn_scale` which is parameterized by a base vector type. This global value can then be used to create dynamic types, such as `dt0 = i32x4*gv0`. 37 | 38 | DynamicTypes are created and held like other IR entities, with the function holding a PrimaryMap\. The DynamicTypeData holds the base vector type along with the GlobalValue which is the scaling factor. 39 | 40 | A new entity is added to the IR, the DynamicStackSlot, and the Function will hold these separately from the existing StackSlot entities, also providing two APIs to create them: 41 | - create\_sized\_stack\_slot 42 | - create\_dynamic\_stack\_slot 43 | 44 | Keeping two vectors enables us to continue to use each entity's index as it's slot identifier, and allows us to place the entities in different positions in the frame. It also enables the backends to easily track which slots are dynamic, and which are not. DynamicStackSlots are defined differently to existing StackSlots as they are defined with a DynamicType instead of a size, e.g. `dss0 = explicit_dynamic_slot dt0` 45 | 46 | ## Instructions 47 | Three new instructions are added at the IR level to use the new DynamicStackSlot entity: 48 | - DynamicStackAddr 49 | - DynamicStackLoad 50 | - DynamicStackStore 51 | 52 | The primary difference between these operations and their existing counterparts is that they only take a DynamicStackSlot operand, without a byte offset. 53 | 54 | DynamicVectorScale is another instruction introduced, and this enables the materialization of a `dyn_scale` value when used by `globalvalue`. 55 | 56 | ExtractVector is also introduced, currently just for testing, which takes a dynamic vector value and an immediate value as a sub-vector index. This allows us to return a fixed-width value from a test function. 57 | 58 | ## ABI Layer Changes 59 | 60 | - The method stack_stackslot_addr is renamed to sized\_stackslot\_addr. 61 | - The method dynamic\_stackslots\_addr is introduced. 62 | 63 | A key challenge to supporting these new types is that the register allocator expects to be given an constant value for the size of a spill slot, for a given register class. So, the current expectation is that the backends will continue to provide a fixed number, potentially larger that they currently do. A new method on the TargetIsa trait, `vector_scale`, which returns the largest number of bytes for a given dynamic IR type. This is used by the ABI layer to cache the sizes of all the used dynamic types, the largest of which is used for the spillslot size. The size returned by the Isa is also used to calculate the dynamic stackslot offsets, just as is done for the existing stack slots. This means that the frame layout changes are minimal, just with the dynamic slots appended after the fixed size slots. 64 | 65 | ``` 66 | //! ```plain 67 | //! (high address) 68 | //! 69 | //! +---------------------------+ 70 | //! | ... | 71 | //! | stack args | 72 | //! | (accessed via FP) | 73 | //! +---------------------------+ 74 | //! SP at function entry -----> | return address | 75 | //! +---------------------------+ 76 | //! | ... | 77 | //! | clobbered callee-saves | 78 | //! unwind-frame base ----> | (pushed by prologue) | 79 | //! +---------------------------+ 80 | //! FP after prologue --------> | FP (pushed by prologue) | 81 | //! +---------------------------+ 82 | //! | spill slots | 83 | //! | (accessed via nominal SP) | 84 | //! | ... | 85 | //! | stack slots | 86 | //! | dynamic stack slots | 87 | //! | (accessed via nominal SP) | 88 | //! nominal SP ---------------> | (alloc'd by prologue) | 89 | //! (SP at end of prologue) +---------------------------+ 90 | //! | [alignment as needed] | 91 | //! | ... | 92 | //! | args for call | 93 | //! SP before making a call --> | (pushed at callsite) | 94 | //! +---------------------------+ 95 | //! 96 | //! (low address) 97 | //! ``` 98 | ``` 99 | 100 | # Rationale and alternatives 101 | [rationale-and-alternatives]: #rationale-and-alternatives 102 | 103 | The main change here is the introduction of dynamically created types, using an existing vector type as a base and a scaling factor represented by a global value. Using a global value fits with clif IR in that we have a value which is not allowed to change during the execution of the function. The alternative is to add types which have an implicit scaling factor which could make verification more complicated, or impossible. 104 | 105 | The new vector types also aren't comparable to the existing vector types, which means that no existing paths can accidentally try to treat them as such. 106 | 107 | It's possible that the hardware vector length will be fixed, so one alternative would be to generate IR with fixed widths using information from the backend. The one advantage is that we'd not have to add dynamic types in the IR at all. But there are two main disadvantages with this approach: 108 | - In an ahead-of-time setting, Cranelift would not be able to take advantage of larger vectors where the width is implementation defined. It would be possible to target architecture extensions such as Intel's AVX2 and AVX-512, which have static sizes, but not for architectures like Arm's SVE. 109 | - It is also currently undecided whether the flexible vector specification will include operations to set the vector length during program execution, so we shouldn't design out this possibility. 110 | 111 | This doesn't mean that a backend can't select a fixed width during code generation, if desired. The current simd-128 implementations would be able to map the dynamic types directly to their current operations and we could also add a legalization layer for backends which only want to support simd-128, or another fixed size. 112 | 113 | # Open questions 114 | [open-questions]: #open-questions 115 | 116 | - What behaviour would the interpreter have? I would expect it to default to the existing simd-128 semantics. 117 | - Testing is also an issue, is it reasonable to assume that function under (run)test neither take or return dynamic vectors? If so, how should the result values be defined and checked against? I have currently implemented an instruction, extract\_vector, which takes a dynamic vector and an immediate which provides an index to a 128-bit sub-vector. Together with passing scalars as function parameters and splatting them into dynamic vectors, it allows simple testing of lane-wise operations. 118 | -------------------------------------------------------------------------------- /accepted/cranelift-roadmap-2023.md: -------------------------------------------------------------------------------- 1 | # Cranelift Roadmap for 2023 2 | 3 | Following the tradition of start-of-year roadmaps (for 4 | [2021](https://github.com/bytecodealliance/rfcs/pull/8) and 5 | [2022](https://github.com/bytecodealliance/rfcs/pull/18)), this RFC 6 | outlines a collection of ideas and projects that we have on the table 7 | for 2023. The ideas range in certainty from direct, concrete tasks 8 | that follow up on previous work and which we definitely plan to 9 | complete; to projects that we haven't started yet, or have only 10 | started planning, but will definitely try to complete; to some ideas 11 | that are more speculative, but would be very interesting or valuable 12 | if we found a way to tackle them. In the past, we have actually 13 | achieved a good amount of our roadmap (end-of-year posts for 14 | [2021](https://bytecodealliance.org/articles/cranelift-progress-2021) 15 | and 16 | [2022](https://bytecodealliance.org/articles/cranelift-progress-2022)), 17 | and while we hope to be similarly ambitious this year, there are no 18 | guarantees! This document instead should be seen as a collective 19 | braindump of our beliefs regarding productive directions for future 20 | work. 21 | 22 | Following work in 2022, we have largely completed "core refactorings" 23 | of the compiler: we have our new register allocator, backend, and 24 | mid-end frameworks in place, and we have taken on quite a few 25 | mid-sized projects throughout the stack to achieve some very nice 26 | speedups and correctness improvements, and fill in functionality 27 | gaps. 28 | 29 | Given that, the theme in 2023 should largely be to carry through the 30 | efforts we have to the point of *polished completeness*, and fill in 31 | many of the "nice-to-have" gaps, like up-to-date and thorough 32 | documentation and examples. There are also some good opportunities for 33 | in-depth projects on new features that we are currently missing, but 34 | these are likely to be more scoped and less cross-cutting than the 35 | "core refactorings" above. In other words, the compiler is maturing 36 | (and this is a good thing!); let's work toward furthering that! 37 | 38 | ## Compiler Performance: regalloc2 and SSA 39 | 40 | The first compiler performance-related project we plan to tackle, 41 | early in 2023, is the continuation of our 42 | [regalloc2](https://github.com/bytecodealliance/regalloc2) 43 | compile-time optimization work leveraging 44 | [SSA](https://en.wikipedia.org/wiki/Static_single-assignment_form) 45 | invariants at the input to the register allocator. 46 | 47 | The last phase was completed mostly by a long-running series of 48 | changes to eliminate all non-SSA input (multiple definitions of one 49 | virtual register, or use of special "pinned" virtual registers that 50 | correspond to fixed physical registers), which we have been [checking 51 | in debug 52 | builds](https://github.com/bytecodealliance/wasmtime/pull/5354) and 53 | fuzzing with now for almost two months. 54 | 55 | Now that the input to the allocator is pure SSA, we can greatly 56 | simplify its design, unlocking potentially large compile-time 57 | improvements. In brief, without SSA invariants, we can only keep *one 58 | copy* of a virtual register around, because it may be mutated 59 | throughout its lifetime. This one-copy rule complicates the allocator 60 | in many ways: it forces unnecessary "liverange splits" and awkward 61 | constraint rewrites. For example, when a spilled value is used, the 62 | liverange in the spillslot ends, the value moves to a register for one 63 | instruction, then it moves back to another liverange (in either the 64 | same or a different spillslot). If the two spillslot allocations are 65 | the same, and the value-use was a read (use rather than def), the 66 | "redundant move eliminator" can eliminate the store back to the 67 | spillslot. But this represents extra work in several ways: another 68 | liverange to process, and the move-eliminator logic altogether. It 69 | would be better to have a copy in the register and retain one long 70 | spillslot allocation that we only have to process once. With SSA, 71 | inserting moves is easier: we know that the value does not change once 72 | defined, so we can copy from any existing liverange. 73 | 74 | We anticipate that this could result in significant speedups: a 75 | majority of compile-time is usually in regalloc, most of the 76 | allocator's time is spent in the main worklist processing loop that 77 | allocates one bundle (of liveranges) at a time, and when we compare 78 | stats, we see that regalloc2 sometimes has 2-4x as many liveranges as 79 | IonMonkey's allocator on the same Wasm module, due to the above 80 | restrictions. Reducing that number should produce large swings. 81 | 82 | ## Generated-Code Quality (Regular Sprints?) 83 | 84 | Next, we hope to perform an additional pass of generic "generated code 85 | quality" work, driven by profiling and examination of hot code in 86 | benchmarks. 87 | 88 | In the past, we have periodically examined "hot blocks" and, by 89 | reading the disassembly, seen things we could fix. Because we don't 90 | always have a regression test for every "good" outcome, *new* 91 | opportunities of this form sometimes appear. When optimizing 92 | generated code with the egraphs framework, for example, I noticed that 93 | some changes we had made in condition (bool) representation resulted 94 | in reification of bools rather than use of the implicit processor 95 | flags; I fixed that 96 | [here](https://github.com/bytecodealliance/wasmtime/pull/5391). A 97 | "compare; set-conditional; compare; branch" sequence is obvious in 98 | disassemblies when compared to the more standard (and expected) 99 | "compare; branch" sequence at the end of a block, but we just hadn't 100 | been checking regularly, and didn't have a test for this (we do now 101 | though!). 102 | 103 | As a lesson from that, we've observed that we should probably make 104 | regular time for "code quality sprints": take some benchmarks, such as 105 | those in Sightglass, profile them, and observe the hottest basic 106 | blocks' code. If we don't see anything obviously wrong or ripe for 107 | improvement, all the better; perhaps we can use the opportunity to 108 | write some new regression tests for the code translations that *did* 109 | work well. But it's likely that if we look again, after some time, 110 | we'll find a handful of small speedups. Individually thse can be 111 | niche, but they add up, and a mature compiler requires persistence in 112 | this kind of work to achieve consistently good code! 113 | 114 | ## Correctness via Formal Verification 115 | 116 | We plan to continue with our formal-verification collaboration 117 | in 2023. This collaboration with several academics (@avanhatt, 118 | @mpardesh, @mlfbrown, @asampson) builds off of ISLE's 119 | formal-verification focus: by writing our compiler's rewrite rules in 120 | a DSL, we enable automated analysis by translating the rules into 121 | other forms (for example, SMT clauses that link semantic descriptions 122 | of CLIF and of a machine ISA). The work will be described in more 123 | detail elsewhere, when ready, but is already achieving some noteworthy 124 | milestones: our collaborators' system recently was able to 125 | [independently find a bug in a 16-bit cls 126 | lowering](https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/ISLE.20verification.20found.20previously.20reported.20bug/near/320792613) 127 | that had previously been discovered and fixed. This is exciting, and 128 | within the Cranelift project we plan to do whatever is needed to 129 | facilitate a full verification of our backends (starting with their 130 | aarch64 focus) and integrate their work into our tree. 131 | 132 | The initial verification project has been scoped to ensure 133 | feasibility: backend rules only (not mid-end), for aarch64 only, CLIF 134 | opcodes used by Wasm lowering only, and integer only. In due time, we 135 | hope that we can relax each of these. Our collaborators have looked 136 | into some x86-64 semantics that might be used to verify our x64 137 | backend as well. Verifying floating-point lowerings is in some cases 138 | (when it's not a trivial `fadd`-lowers-to-`fadd` rule) an open 139 | research problem, but this would be great to have. And, lastly, one of 140 | the reasons for our use of ISLE in the egraphs mid-end optimization 141 | framework was to enable eventual verification of CLIF-to-CLIF rewrite 142 | rules as well. 143 | 144 | ## Tail Calls 145 | 146 | We have [begun planning support for tail 147 | calls](https://github.com/bytecodealliance/rfcs/pull/29), and plan to 148 | implement this fully in the first part of 2023. A "tail call" is a 149 | call from one function to another in "tail position": that is, the 150 | last thing that a function does before returning, such that the return 151 | value of the callee becomes the return value of the caller. In this 152 | case, there is nothing left to do in the caller, so we can actually 153 | remove the stackframe entirely before *jumping* to the callee. Once we 154 | have tail-call support, we have a generic way of implementing 155 | arbitrary control flow between functions. As a result, tail calls are 156 | the foundational form of control flow in functional languages such as 157 | [Scheme](https://en.wikipedia.org/wiki/Scheme_(programming_language)), 158 | and can be used as a very convenient building block by a compiler when 159 | *targeting* a language as well. There is a [Wasm proposal for tail 160 | calls](https://github.com/webassembly/tail-call) that recently 161 | advanced to "stage 4" (the stage just before full acceptance into the 162 | standard) so there is pressure from Wasmtime to implement tail-call 163 | support in Cranelift. Once we have it, it will undoubtedly see use in 164 | many different ways. 165 | 166 | ## Debugging 167 | 168 | We have long had *partial* support for debugging in Cranelift: in 169 | particular, the compiler understands "debug value labels" applied to 170 | CLIF values, and provides metadata in return with the compiled machine 171 | code to indicate where in the machine state these values reside. The 172 | embedder can use this to implement higher-level debug observability: 173 | for example, Wasmtime translates Wasm-DWARF debug info to native DWARF 174 | so that a debugger attached to the Wasmtime process can step through 175 | and view local variables and memory state at the Wasm level. 176 | 177 | While this is nice to have, it has some longstanding issues, and no 178 | one has been able to dedicate the time to give this part of the 179 | codebase the attention it deserves. For example, [this failing 180 | assert](https://github.com/bytecodealliance/wasmtime/issues/4669) is 181 | reported semi-regularly; while the actual failure is in Wasmtime DWARF 182 | code, it's unclear where the root cause lies. At the Cranelift level, 183 | we sometimes [lose observability of 184 | locals](https://github.com/bytecodealliance/wasmtime/issues/3884) and 185 | [globals](https://github.com/bytecodealliance/wasmtime/issues/3439) 186 | more often than we could with more work. 187 | 188 | We have a general idea of what is needed to make the native-DWARF 189 | debugging experience more robust (e.g., [this 190 | refactor](https://github.com/bytecodealliance/wasmtime/issues/5537) on 191 | the Wasmtime side, and whatever fixes it may need on the Cranelift 192 | side). Beyond that, we have a number of ideas for debugging support at 193 | different semantic levels, and hope to form a [Debugging 194 | SIG](https://github.com/bytecodealliance/governance/pull/26) to 195 | develop these ideas further. 196 | 197 | ## Exception-Handling 198 | 199 | We have wanted to implement [exception-handling 200 | support](https://github.com/bytecodealliance/wasmtime/issues/2049) for 201 | some time. At the Cranelift level, this requires changes to how the 202 | compiler reasons about control flow (to account for exceptional edges) 203 | and mechanisms to produce the metadata needed by the unwinder as it 204 | searches for handlers (catch-points). Such support would be 205 | immediately useful to multiple embedding use-cases: the [Wasm 206 | exception-handling 207 | proposal](https://github.com/WebAssembly/exception-handling) will 208 | require it, and so will proper unwind support in cg\_clif. Any other 209 | language implementation built on Cranelift that requires zero-cost 210 | exception unwinding could make use of this feature as well. 211 | 212 | ## Super-optimization (Achieving the Peepmatic Dream) 213 | 214 | Now that we've 215 | [implemented](https://github.com/bytecodealliance/wasmtime/pull/5382) 216 | and [turned on by 217 | default](https://github.com/bytecodealliance/wasmtime/pull/5587) the 218 | egraph-based mid-end optimization framework, which allows us to 219 | optimize expressions in the CLIF (program IR) by writing simple 220 | rewrite rules in ISLE, we have several significant plans for using it 221 | more fully. 222 | 223 | One large project we've been hoping to return to for a while is the 224 | use of 225 | [superoptimization](https://en.wikipedia.org/wiki/Superoptimization) 226 | to generate a large body of fruitful optimization/rewrite rules ahead 227 | of time. The general idea here is that a compiler embodies domain 228 | knowledge about the "best" way to achieve certain computations; we 229 | should be able to precompute this domain knowledge by starting with 230 | definitions of our building blocks (instructions or operators) and 231 | doing a brute-force search. It doesn't matter too much how much time 232 | this takes, because we only do it once, offline, to generate rules, 233 | and then we check them in as part of the Cranelift source code. 234 | 235 | We had previously worked toward this idea with 236 | [Peepmatic](https://github.com/bytecodealliance/wasmtime/tree/dba74024aa412f284871375db292c1bf9079d769/cranelift/peepmatic), 237 | and so in a very real way, the idea here is to "pursue the Peepmatic 238 | dream". We believe we can reuse much of the infrastructure (e.g., 239 | tools to feed Cranelift program fragments into 240 | [Souper](https://github.com/google/souper), a superoptimizer) and are 241 | also interested to see what scaling challenges a large body of rules 242 | may bring to the mid-end framework. 243 | 244 | ## Compiler Stats Framework 245 | 246 | We would like to build a "statistics framework" for the compiler that 247 | aggregates data about the compilation process in order to help with 248 | debugging and performance tuning, and in general to improve compiler 249 | observability. The basic idea is to instrument the various passes and 250 | analyses of the compiler with updates to stats-counters for particular 251 | "events" or cases -- for example, every time a redundant instruction is 252 | removed by GVN, or a rewrite rule fires, or a register is spilled -- 253 | as well as measures of various data-structure and code sizes (number 254 | of basic-blocks, instructions, register liveranges, etc). 255 | 256 | Such data is very useful if it helps to narrow the potential causes of 257 | an issue or of surprising performance. For example, when investigating 258 | the compile-time impact of the register allocator, a very high counter 259 | for "liverange splits" might strongly hint at what is going 260 | wrong. Especially if there is a correlation between some statistic and 261 | runtime or compile time, we might be able to quickly zero in on 262 | performance bugs or places we could improve. 263 | 264 | Objective measures like event-counts (e.g., number of register spills) 265 | are also *extremely* useful for showing the effect of compiler 266 | changes. They are in some sense the opposite of an "end-to-end" 267 | measure like compile time or runtime. That is, these objective stats 268 | are *deterministic* (identical every time for a given input), 269 | *transparent* (if spill-count increases by one, we can actually 270 | examine the one new spill) and *specific* (if spills increase, we know 271 | the issue is with register allocation, or something that affects it 272 | like register pressure). In contrast, end-to-end measures are *subject 273 | to noise*, *opaque* and *often not actionable* (if measurements show 274 | we got 0.1% slower, why and how do we fix it?). 275 | 276 | We already have ad-hoc forms of this in the [egraph 277 | framework](https://github.com/bytecodealliance/wasmtime/blob/20a216923b5d6bc129935ad56de1ca9ccea38949/cranelift/codegen/src/egraph.rs#L593-L612) 278 | and 279 | [regalloc2](https://github.com/bytecodealliance/regalloc2/blob/376294e828bd252aed717b6864fdcf9d637f23c8/src/ion/data_structures.rs#L658-L693). They 280 | are not passed to the user in any systematic way, but rather just 281 | printed to the log output as trace-level messages. We could start by 282 | building a central framework that collects these numbers and then 283 | provides them programmatically per compiled function. We could then 284 | build *tooling* that allows printing stats for every compiled function 285 | in a module (Wasm or CLIF), selecting functions with particularly 286 | abnormal stats (e.g., "find me the function with the most register 287 | spills / largest stack frame"), and *diffing* stats between runs. 288 | 289 | ## Support the Development of Winch (Baseline Compiler) 290 | 291 | The [Winch](https://github.com/bytecodealliance/rfcs/pull/28) baseline 292 | compiler project aims to provide an alternate backend (replacing 293 | Cranelift) for the Wasmtime engine. While not within the scope of 294 | Cranelift (hence this RFC) directly, the project *is* reusing pieces 295 | of Cranelift. In particular, the back end of Cranelift -- the 296 | `MachInst` struct for each ISA, its `emit` method, and the 297 | `MachBuffer` -- form a reasonably usable assembler library, and Winch 298 | has chosen to build on top of these abstractions to share effort with 299 | us. This is mutually beneficial: not only does the baseline compiler 300 | project get a large head-start, but the work has helped to clarify a 301 | layering separation in Cranelift and will eventually drive more 302 | refactoring to cleanly split this layer off. In the fullness of time, 303 | this may be generally usable as a machine-code emission library. In 304 | this RFC, we propose simply to dedicate the time to abstracting out, 305 | exporting, or otherwise providing the pieces that Winch needs to 306 | succeed. 307 | 308 | ## Miscellaneous Refactors and Improvements 309 | 310 | ### Legalizations in Mid-end Framework 311 | 312 | A longstanding, slightly annoying aspect of our reworked backend 313 | framework is that while *most* of the instruction selection and 314 | lowering happens in a single pass while we generate VCode, there are 315 | still some in-place mutations that happen beforehand. The 316 | [legalizer](https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/legalizer/mod.rs) 317 | performs this rewrite with handwritten pattern-matching code -- 318 | perhaps the last place this kind of code still exists in Cranelift. We 319 | should migrate these to the mid-end framework. This should not be too 320 | complex, though it does require some thought about what to do in 321 | non-optimizing mode (when the egraph pass doesn't run): perhaps there 322 | can be a lightweight egraph-pass mode for "necessary" 323 | rewrites. Building this would then give us another tool to use to 324 | factor out common logic from backends: instead, "narrowing" (e.g., 325 | from i128 to i64 values) can be done by one machine-independent set of 326 | rules. 327 | 328 | ### Other Fuzzing and Translation Validation? 329 | 330 | In addition to the big-hammer "formal verification" project described 331 | above, we should continue to plan and think of ways to fuzz and 332 | validate correctness. For example, we have some ideas on a kind of 333 | [translation validation for ISLE rule 334 | compilation](https://github.com/bytecodealliance/wasmtime/pull/5435#pullrequestreview-1228022447), 335 | to ensure that `islec` generates Rust code that faithfully implements 336 | our validated rules (though in that specific case the idea may need a 337 | little more thought to get to a practical effort/feasibility/benefit 338 | tradeoff point). When we eventually [update and improve debugging 339 | support](https://github.com/bytecodealliance/wasmtime/issues/5537) 340 | (see below), one way of validating our handling of DWARF debug info is 341 | to differentially fuzz against another debugger step-by-step. And so 342 | on: there are likely many ways that we could build custom 343 | oracles/checkers and validate compiler-specific properties, making 344 | good use of the fuzzing time that we have. 345 | 346 | ## Documentation 347 | 348 | Finally, we should make a concerted effort in 2023 to improve our 349 | project documentation, in at least three different ways. 350 | 351 | ### Cranelift-Producer Examples (Toy Language) 352 | 353 | We currently have fairly limited information available to new 354 | Cranelift embedders. Ideally, the codegen library should be easy to 355 | pick up and use, with examples to follow showing how to get started, 356 | generate common kinds of control flow, memory access, calls to and 357 | from host code, and the like. It may be easiest to show all of this in 358 | the context of a toy language example implementation. Regardless of 359 | the format, we owe our potential users better than the raw Rust API 360 | documentation and barebones examples we have today. This carries 361 | benefits back to the project, too, indirectly: making Cranelift easier 362 | to pick up and adopt will lead to a greater number of users, which 363 | provides us a more diverse and useful stream of feedback and a larger 364 | pool of engineers who have a vested interest in seeing the compiler 365 | improve. 366 | 367 | ### Cranelift Porting (How to Write a Backend) 368 | 369 | We have had various inquiries over the past several years regarding 370 | how to port Cranelift to target a new architecture. Our canonical 371 | answer so far has been to refer to the other backends. This is only 372 | barely adequate! We did receive a [very welcome contribution of our 373 | RISC-V 374 | backend](https://github.com/bytecodealliance/wasmtime/pull/4271), and 375 | it is to the credit of the developer of that PR that it was built 376 | without good documentation on our part. However, it would be far 377 | better if we could provide a "How to Write a Backend" document. Note, 378 | also, that while the *direct* audience of such a document is fairly 379 | small, its indirect effect as a forcing-function for proper 380 | documentation of backend architecture will also be quite helpful. 381 | 382 | ### Documentation Update Pass 383 | 384 | Finally, we have a wide variety of documentation written over the 385 | lifetime of the Cranelift project, and not all of it is up-to-date 386 | with respect to the latest compiler design. We have been fairly 387 | accepting of large compiler changes: a new backend framework, changes 388 | to IR types (bools, flags, endianness) and instructions (branches), 389 | the way optimizations are written, etc. While our current state is an 390 | excellent starting-point, we should make a *full pass* over all 391 | documents (standalone files and doc-comments) to ensure they are 392 | up-to-date, accurate, and complete. 393 | 394 | ## Acknowledgments 395 | 396 | This RFC contains ideas from discussions with Jamey Sharp, Trevor 397 | Elliott, Nick Fitzgerald, Andrew Brown, and others. Thanks to all of 398 | the contributors for the regular very stimulating and thoughtful 399 | project meetings and the resulting inspiration as well. 400 | -------------------------------------------------------------------------------- /accepted/pulley.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | [summary]: #summary 3 | 4 | Introduce Pulley — a portable, optimizing interpreter — to Wasmtime. 5 | 6 | # Motivation 7 | [motivation]: #motivation 8 | 9 | 10 | 11 | 12 | At present, if you want to use Wasmtime, you are limited to the ISAs that 13 | Cranelift targets.[^or-winch] That means your options are currently `aarch64`, 14 | `riscv64`, `s390x`, and `x86-64`. If you want to use Wasmtime on a 32-bit 15 | architecture, for example, you're currently out of luck. Your only option would 16 | be to add a new Cranelift backend for the ISA, which is a fairly involved task, 17 | especially for someone who isn't already familiar with Cranelift. 18 | 19 | [^or-winch]: Or targets where Winch has a backend, but currently this is a 20 | strict subset of the targets that Cranelift supports. 21 | 22 | **Adding a portable interpreter to Wasmtime means that you will be able to take 23 | Wasmtime anywhere that you can compile Rust.** However, you will not have the 24 | near-native execution speeds you can expect from an optimizing Wasm compiler 25 | targeting native code. Therefore, **a secondary goal is execution speed**, to 26 | minimize that performance gap as much as possible. Finally, the introduction of 27 | an interpreter **should not change Wasmtime's public API, nor radically impact 28 | existing Wasmtime internals**. Ideally, this is just a new option on 29 | `wasmtime::Config` next to Cranelift and Winch, and the rest of the public API 30 | otherwise remains as it is today. And ideally, other than the introduction of 31 | the interpreter itself, the rest of Wasmtime's internals are unaffected. 32 | 33 | ## Non-Goals 34 | 35 | It is also worth mentioning some common goals for interpreters that are 36 | explicitly *not* goals for this effort, or at least are not things that we would 37 | sacrifice our main goals for: 38 | 39 | * **Being a simple, obviously-correct, reference interpreter:** Wasm already has 40 | a fantastic reference interpreter written in a simple and direct style that 41 | closely matches the language specification and can be used, for example, as an 42 | authoritative differential fuzzing oracle. Furthermore, Cranelift already has 43 | an interpreter for its intermediate representation for use with testing and 44 | fuzzing. Taking this implementation approach for Pulley would be redundant and 45 | would also limit us with regards to our secondary goal of execution speed. 46 | 47 | That said, we will, of course, go to great lengths to ensure safety and 48 | correctness, but will rely primarily on our existing, extensive fuzzing and 49 | testing infrastructure for these properties rather than on naivety of 50 | implementation. 51 | 52 | * **Minimizing the duration between first receiving a Wasm binary to the moment 53 | execution begins:** To achieve the lowest-latency start up, an [in-place 54 | interpreter][wizard] is required. However, Wasm, as a bytecode format, is not 55 | conducive to efficient interpretation. It is a stack-based format, which 56 | allows for compact encoding (useful when transferring Wasm binaries over the 57 | wire) but also requires more opcodes to perform the same operation than a 58 | register-based format does. And unfortunately, the expensive part of an 59 | interpreter is its opcode-switch loop, and doing less work per opcode 60 | magnifies the cost of the opcode-switch loop. Therefore, to advance our 61 | secondary goal of execution performance, we will not make Pulley an in-place 62 | interpreter that prioritizes fast start up. 63 | 64 | However, once the Wasm module is pre-processed, we can expect the same [~5 65 | microsecond instantiation times that Wasmtime provides for pre-compiled Wasm 66 | modules][fast-instantiation]. The Wasm can be pre-processed offline and 67 | ahead-of-time and then sent to the device that will actually execute the Wasm 68 | module, after which it can enjoy both low-latency instantiation as well as 69 | improved execution throughput. 70 | 71 | * **Dynamically "tiering up" from the interpreter to the compiler:** As a follow 72 | up to the last non-goal, because this interpreter is not designed for 73 | minimizing start up times, it is also not intended to be used as an initial 74 | tier for starting Wasm execution quickly while waiting for our optimizing 75 | compiler to finish compiling the Wasm in the background, and then dynamically 76 | switching execution from the interpreter to the compiled code, once it's 77 | ready. 78 | 79 | * **Being the very fastest interpreter in the world:** Finally, we will not aim 80 | to create the world's very fastest interpreter; doing so involves 81 | [implementing the interpreter's opcode-switch loop in hand-written 82 | assembly][luajit-comment]. Writing assembly directly conflicts with our 83 | primary motivation: portability. That said, you can [achieve codegen that is 84 | very close to that ideal, hand-written assembly via tail 85 | calls][musttail]. Unfortunately, the Rust language has not stabilized its 86 | `become` feature for guaranteed tail calls. Therefore, until Rust stabilizes 87 | that feature, we are limited to relying on LLVM to recognize our opcode-switch 88 | loop's code pattern and clean it up into something similar to what it would 89 | have produced if we had used tail calls. Some initial exploratory 90 | investigation has shown that LLVM is at least capable of doing that. We can 91 | also experiment with the combination of macros and cargo features to choose 92 | between the unstable `become` tail calls or the stable `loop { match opcode { 93 | .. } }` at compile time without changing our interpreter's source. 94 | 95 | # Proposal 96 | [proposal]: #proposal 97 | 98 | 99 | 100 | 101 | 102 | 103 | As Wasm interpreters start trying to improve execution speed, and after they've 104 | already optimized the opcode-switch loop as much as they can, they tend to move 105 | away from Wasm's stack-based bytecode and adopt an internal, register-based 106 | bytecode. When given a Wasm program, they translate it into this internal 107 | bytecode before actually interpreting it. Once this translation step exists, 108 | they start adding peephole optimizations and a simple register allocator to 109 | generate better internal bytecode. They start adding "super-instructions" 110 | (sometimes called "macro ops") that perform the work of multiple Wasm 111 | instructions in a single internal instruction, amortizing the expensive 112 | per-opcode decoding and branch misprediction overheads that plague 113 | interpreters. Although this pipeline still ends with an interpreter loop, the 114 | front half of it looks very much like an optimizing compiler. 115 | 116 | The Wasmtime project already leverages an optimizing compiler: 117 | Cranelift. Cranelift has sophisticated mid-end optimizations (the e-graphs 118 | mid-end and its associated rewrites), a robust register allocator (`regalloc2`), 119 | and an excellent DSL for matching multiple input instructions and lowering them 120 | to a single, complex output instruction (ISLE). It has global value numbering, 121 | loop-invariant code motion, redundant load elimination, store-to-load 122 | forwarding, dead-code elimination, and more. What was present in that opimized 123 | interpreter's pipeline, but which Wasmtime and Cranelift are missing, is a 124 | portable internal bytecode that can be interpreted on any architecture that the 125 | Rust compiler can target. 126 | 127 | **This RFC proposes that we define a new, low-level bytecode designed for fast 128 | interpretation, add a Cranelift backend to target this low-level bytecode, and 129 | finally implement a portable interpreter for that bytecode.** This lets us 130 | leverage all our existing Cranelift optimizations — not reimplementing a 131 | subset of them in a custom, ad-hoc, Wasm-to-bytecode translator — while 132 | still ultimately resulting in an portable, internal bytecode for 133 | interpretation. We reuse shared foundations, minimizing maintenance burden, and 134 | end up with something that is both portable and which should produce 135 | high-quality bytecode for intepretation. 136 | 137 | What follows are some general, incomplete, and sometimes-conflicting principles 138 | we should try and follow when designing the bytecode format and its interpreter: 139 | 140 | * The bytecode should be simple and fast to decode in software. For example, we 141 | should avoid overly-complicated bitpacking, and only reach for that kind of 142 | thing when benchmarks and profiles show it to be of benefit. 143 | 144 | * The interpreter should be able to avoid materializing `enum Instruction { 145 | .. }` values, and instead decode immediates and operands as needed in each 146 | opcode handler. 147 | 148 | * Because we aren't materializing `enum Instruction { .. }` values, we don't 149 | have to worry about unused padding or one large instruction inflating all 150 | small instructions, and so we can lean into a variably-sized encoding where 151 | some instructions are encoded with a single byte and others with many. This 152 | helps us keep the bytecode compact and cache-efficient. 153 | 154 | * We should lean into defining super-instructions. ISLE's pattern matching makes 155 | finding and taking advantage of opportunities to emit super-instructions easy, 156 | and the more we do in each turn of the interpreter loop the less we are 157 | impacted by its overhead. 158 | 159 | * We should not define the bytecode such that multiple branches are required to 160 | handle a single instruction. For example, we should *not* copy how many of 161 | Cranelift's `MachInst`s are defined where there are nested enum types: 162 | 163 | ```rust 164 | enum Inst { 165 | AluOp { 166 | opcode: AluOpcode, 167 | .. 168 | }, 169 | Load { amode: AddrMode, .. }, 170 | .. 171 | } 172 | 173 | enum AluOpcode { Add, Sub, And, .. } 174 | 175 | enum AddrMode { 176 | RegOffset { base: Reg, offset: i32 }, 177 | RegShifted { base: Reg, index: Reg, shift: u8 }, 178 | .. 179 | } 180 | ``` 181 | 182 | This would require branching on an `Inst` to discover that it is an 183 | `Inst::AluOp` and then branching again on the `AluOpcode` variant to find an 184 | `AluOpcode::Add`, or similarly branching from an unknown `Inst` to an 185 | `Inst::Load` and then again from an unknown `AddrMode` to an 186 | `AddrMode::RegOffset`. Branches are expensive in interpreters since many Wasm 187 | program locations map to the same native code location inside the core of the 188 | interpreter, obfuscating patterns from the branch predictor. 189 | 190 | Instead we should do the moral equivalent of the following: 191 | 192 | ```rust 193 | enum Inst { 194 | AluAdd { .. }, 195 | AluSub { .. }, 196 | AluAnd { .. }, 197 | LoadRegOffset { base: Reg, offset: i32 }, 198 | LoadRegShifted { base: Reg, index: Reg, shift: u8 }, 199 | .. 200 | } 201 | ``` 202 | 203 | With this approach, each opcode handler is branch-free and the only branches 204 | are from an unknown opcode to its handler. 205 | 206 | * We should structure the innermost interpreter opcode-switch loop such that we 207 | can coerce LLVM to produce code like the following pseudo-asm for each opcode 208 | handler: 209 | 210 | ```python 211 | handler: 212 | # Upon entry, the PC is pointing at this handler's opcode. 213 | # 214 | # First, decode immediates and operands, advancing the interpreter's PC as 215 | # we do so. 216 | operand = load pc+1 217 | immediate = load pc+2 218 | 219 | # Next, perform the actual operation. 220 | do_opcode operand, immediate 221 | 222 | # Finally, advance the pc, read the next opcode, and branch to its handler. 223 | pc = add pc, 3 224 | next_opcode = load pc 225 | next_handler = handler_table[next_opcode] 226 | jump next_handler 227 | ``` 228 | 229 | This results in a minimal number of branches and gives the branch predictor a 230 | tiny bit more context to make its predictions with, since each handler has its 231 | own copy of the decode-next-opcode-and-jump-to-the-next-handler sequence. 232 | 233 | We should ideally avoid each handler branching back to the top of a loop which 234 | then branches again on the PC's current opcode, as this introduces multiple 235 | branches and obfuscates patterns from the branch predictor. 236 | 237 | Of course, this is going to be on a best-effort basis, largely relying on 238 | playing nice with LLVM's optimizer, since we aren't writing assembly by hand 239 | and can't use tail calls yet. 240 | 241 | * We should implement a disassembler for the bytecode format so that we can 242 | integrate the new backend with Cranelift's existing filetests. Ideally this 243 | disassembler is automatically generated from the bytecode definition. 244 | 245 | # Rationale and alternatives 246 | [rationale-and-alternatives]: #rationale-and-alternatives 247 | 248 | 249 | 250 | 251 | * Instead of creating our own bytecode format, we could reuse an existing 252 | format, or even a interpret a real ISA. SpiderMonkey, for example, has an 253 | ARM32 interpreter that it uses for testing. However, if we do not define our 254 | own bytecode, then we lose flexibility. For example, we cannot define our own 255 | super-instructions for common sequences of operations that we recognize. Nor 256 | can we define the bytecode's encoding and make sure that the overhead of 257 | decoding instructions in software is as low as possible. 258 | 259 | * We could lean into adding Winch backends for new ISAs, instead of adding an 260 | interpreter, for our portability story as a simpler alternative to a new 261 | Cranelift backend. However, Winch and Cranelift share a fair amount of code 262 | for e.g. encoding machine instructions, and it isn't clear that adding a new 263 | Winch backend is actually any easier than adding a new Cranelift backend. It 264 | is also still work that will need to be done on a per-target basis. In 265 | contrast, adding an interpreter gives us support for everything that `rustc` 266 | can target with a fixed amount of effort, and without any new per-target 267 | implementation costs on our end. 268 | 269 | * Instead of using a side table mapping each opcode to its associated handler, 270 | we could have the handler's function pointer itself be the opcode, jump 271 | directly to the next instruction's opcode, avoiding an indirection. However, 272 | this inflates the bytecode size (an opcode is a pointer-sized value instead of 273 | a single byte, possibly longer for rarer operations due to a variably-sized 274 | encoding) which adds additional cache pressure. Additionally, it makes 275 | serializing the bytecode for use in another process difficult, since the 276 | opcode changes with ASLR. This latter hurdle is surmountable, we could define 277 | relocations for each instruction in the bytecode, but this would add 278 | complexity and slow down start up times, since many relocations would need to 279 | be resolved when loading a bytecode module. All those relocations would force 280 | the whole module to be mapped into memory as well. 281 | 282 | This is the approach that JavaScriptCore's Low-Level Interpreter (LLInt) 283 | previously took, but [they ultimately abandoned it][llint-new-bytecode] when 284 | they started caching bytecode to disk across sessions and desired a 285 | more-compact bytecode format. 286 | 287 | # Open questions 288 | [open-questions]: #open-questions 289 | 290 | 291 | 292 | 293 | 294 | 295 | * The exact bits of the bytecode, what super-instructions we ultimately define, 296 | and the structure of the interpreter all have a bunch of open questions that 297 | can really only be resolved through implementation and experimentation. I 298 | expect that we will handle these as they come, and don't think we need to get 299 | into their specifics before this RFC is merged. 300 | 301 | * What is "Pulley" a backronym for? The best I can come up with is "Portable, 302 | Universal, Low-Level Execution strategY".[^y] That "y" really makes things 303 | difficult... 304 | 305 | This should definitely be resolved before this RFC merges. 306 | 307 | [^y]: Thanks to Chris Fallin for the "strategY" suggestion. 308 | 309 | [wizard]: https://arxiv.org/abs/2205.01183 310 | [fast-instantiation]: https://bytecodealliance.org/articles/wasmtime-10-performance#wasm-module-instantiation 311 | [luajit-comment]: https://www.reddit.com/r/programming/comments/badl2/luajit_2_beta_3_is_out_support_both_x32_x64/c0lrus0/ 312 | [musttail]: https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html 313 | [llint-new-bytecode]: https://webkit.org/blog/9329/a-new-bytecode-format-for-javascriptcore/ 314 | -------------------------------------------------------------------------------- /accepted/remove-old-cranelift-backend.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | [summary]: #summary 3 | 4 | This RFC proposes to remove the old x86-64 backend from Cranelift, and 5 | subsequently clean up various bits of the IR, other compiler data 6 | structures, and metaprogramming / codegen infrastructure that are only 7 | used by old-style backends. The RFC represents two main decision 8 | points: (i) whether it is now time to remove the old backend, and (ii) 9 | what else in the codebase we will now be able to remove as a 10 | consequence of this action. 11 | 12 | # Motivation 13 | [motivation]: #motivation 14 | 15 | In RFC #10, we proposed switching the default Cranelift backend to the 16 | new implementation, based on the `MachInst` framework. This new 17 | framework is also in use by the aarch64 backend, which was natively 18 | developed for the new APIs, and the recently-added s390x backend as 19 | well. 20 | 21 | To our knowledge, no significant consumers of Cranelift or Wasmtime 22 | are still relying on the old backend, and it has been included as a 23 | non-default option for two releases (0.27.x and 0.28.0) now. All 24 | ongoing development effort is targetted to the new backends. 25 | 26 | Alongside this, retaining the old backend as an option has several 27 | ongoing costs. First, because we have decided thus far to ensure it 28 | continues to pass tests, it imposes additional requirements on any new 29 | features we might add. While we have sometimes adopted a more 30 | fine-grained attitude of "this won't work on the old backend, and 31 | that's OK" (see: newer Wasm instructions), it is still a decision that 32 | has to be made in each case. 33 | 34 | Second, and more significantly, the old backend is the sole factor 35 | that keeps a number of other pieces of infrastructure alive. The core 36 | IR data structures contain some abstractions that are relevant only 37 | for the old backend, and this occasionally causes confusion. For 38 | example, someone looking for information on stackslots, or regalloc 39 | results, or code-layout information, will not find the information on 40 | the CLIF after compilation is done: the new backends produce the 41 | information in a separate IR (VCode). There is thus a cognitive 42 | overhead involved in maintaining "old and deprecated" vs. "new and 43 | supported" status in one's head for bits of the compiler, and a 44 | significant source of confusion for newcomers in particular. 45 | 46 | # Proposal 47 | [proposal]: #proposal 48 | 49 | We thus propose to (i) remove the old x86 backend, making the x64 50 | backend based on the `MachInst` framework the only supported option 51 | for x86-64 in the future; and (ii) then performing "dead-code 52 | elimination" as far as it will take us. 53 | 54 | ## Part 1: Deciding to Remove the Old Backend 55 | 56 | The main question here is whether it is now the proper time to remove 57 | the backend. In #10, we suggested maintaining its functionality while 58 | "mak[ing] several releases with the new backend as default". We have 59 | now done so for two releases (0.27.x and 0.28). We can also directly 60 | consider several known users of Cranelift: 61 | 62 | * Wasmtime: transitioned to new backend in 63 | bytecodealliance/wasmtime#2718. Feature flag to continue to use old 64 | backend. 65 | 66 | * Lucet: transitioned to new backend in 67 | bytecodealliance/lucet#646. Feature flag to continue to use old 68 | backend. 69 | 70 | * cg\_clif: transitioned to new backend in 71 | bjorn3/rustc_codegen_cranelift#1127 and removed ability to use old 72 | backend. 73 | 74 | * Firefox/SpiderMonkey: most up-to-date integration (Baldrdash) used 75 | new backend only. 76 | 77 | * VeriWasm: updated to support new x64 backend in PlSysSec/veriwasm#2. 78 | 79 | Question 1: Are there any other known use-cases that remain on the old 80 | backend? 81 | 82 | Question 2: Is there any functionality in the old backend that we have 83 | not yet adequately replicated in the new backend? 84 | 85 | Question 3: given the above, is it acceptable to remove the old 86 | backend? 87 | 88 | This RFC proposes answering "yes" to Question 3 above, contingent on 89 | receiving no answers to Question 1 or Question 2 that would change our 90 | path. 91 | 92 | ## Part 2: Logistics 93 | 94 | There are several steps that we can take, in order, to remove the old 95 | backend and then carry out some clean-up work afterward. 96 | 97 | Much of this work, especially the work to snip out the backend itself 98 | and replace legalizations where needed, has already been drafted by 99 | @bjorn3 in bytecodealliance/wasmtime#3009 (thanks!). This RFC's goal 100 | is to gain consensus on a process around merging this work, and 101 | outline the steps to carry it through with the appropriate cleanup 102 | afterward. 103 | 104 | 1. Remove the `BackendVariant::Legacy` enum option. This is an 105 | API-breaking change that will force embedders who were explicitly 106 | selecting the old backend to see that the old backend is no longer 107 | available. 108 | 109 | 2. Remove the `old-x86-backend` Cargo build flag. 110 | 111 | 3. Remove the x86 backend itself: recipes and encodings in 112 | `cranelift/codegen/isa/x86/`. 113 | 114 | 4. Remove any remaining backend-specific CDSL / meta-crate code except 115 | for that which remains necessary. We believe this should include at 116 | least register definitions and platform-specific legalizations. We 117 | will need to replace some of the legalizations that the new backend 118 | relies on with handwritten versions in the `simple_legalize` 119 | framework. 120 | 121 | 5. Remove old code in the rest of the compiler. 122 | 123 | - Support for generating unwind info from the old backend's 124 | compilation result. 125 | - Support for generating debuginfo from the old backend's 126 | compilation result. 127 | - Compiler components only used when compiling with the old 128 | backend: 129 | - The register allocator. 130 | - The ABI legalization code. 131 | - The branch relaxation and binary emission pipeline. 132 | - Compiler data structures that are no longer used: 133 | - `encodings`, `locations`, `entry_diversions`, `offsets`, 134 | `jt_offsets`, prologue and epilogue info, etc., on 135 | `ir::Function`. 136 | - Any code that still generates/maintains/uses any of the above 137 | can and should be removed as well. 138 | 139 | 6. Begin to consider how the pipeline could be simplified in other 140 | ways now that some constraints are gone. 141 | - CodeSink: return machine code in some format more similar to the 142 | `MachBuffer`'s output, i.e., a single monolithic buffer rather 143 | than the `put1`/`put2`/... fine-grained API calling into the 144 | embedder repeatedly? 145 | 146 | # Rationale and alternatives 147 | [rationale-and-alternatives]: #rationale-and-alternatives 148 | 149 | As described above, this is the end of a long journey of refactoring 150 | and transition to a cleaner design; there is no reason to keep the old 151 | backend around once we've migrated all use-cases away from it and are 152 | no longer spending effort maintaining it. 153 | 154 | # Open questions 155 | [open-questions]: #open-questions 156 | 157 | 1. Are there any significant users of the old backend that we have missed? 158 | 159 | 2. Is there any functionality in the old backend that we have not yet 160 | adequately replicated in the new backend? 161 | -------------------------------------------------------------------------------- /accepted/rfc-process.md: -------------------------------------------------------------------------------- 1 | # RFC process for the Bytecode Alliance 2 | 3 | # Summary 4 | 5 | As the Bytecode Alliance (BA) grows, we need more formalized ways of communicating about and reaching consensus on major changes to core projects. This document proposes to adapt ideas from Rust’s [RFC](https://github.com/rust-lang/rfcs/) and [MCP](https://forge.rust-lang.org/compiler/mcp.html) processes to the Bytecode Alliance context. 6 | 7 | # Motivation 8 | 9 | There are two primary motivations for creating an RFC process for the Bytecode Alliance: 10 | 11 | * **Coordination with stakeholders**. Core BA projects have a growing set of stakeholders using and contributing to foundational projects, often from varied organizations or teams. An RFC process makes it easier to communicate _possible major changes_ that stakeholders may care about, and gives them a chance to weigh in. 12 | 13 | * **Coordination within a project**. As the BA grows, we hope and expect that projects will be actively developed by multiple organizations, rather than just by a “home” organization. While day-to-day activity can be handled through issues, pull requests, and regular meetings, having a dedicated RFC venue makes it easier to separate out discussions with far-ranging consequences that all project developers may have an interest in. 14 | 15 | # Proposal 16 | 17 | The design of this RFC process draws ideas from Rust’s [Request for Comment](https://github.com/rust-lang/rfcs/) (RFC) and [Major Change Proposal](https://forge.rust-lang.org/compiler/mcp.html) (MCP) processes, adapting to the BA and trying to keep things lightweight. 18 | 19 | ## Stakeholders 20 | 21 | Each core BA project has a formal set of **stakeholders**. These are individuals, organized into groups by project and/or member organization. Stakeholders are not necessarily members of the BA. Formally, stakeholder review is required for RFCs to be accepted, and stakeholders can likewise block the process from proceeding. 22 | 23 | The process for determining core BA projects and their stakeholder set will ultimately be defined by the Technical Steering Committee, once it is in place. Until then, the current BA Steering Committee will be responsible for creating a provisional stakeholder arrangement, as well as deciding whether to accept this RFC. 24 | 25 | ## Structure and workflow 26 | 27 | ### Creating and discussing an RFC 28 | 29 | * We have a dedicated bytecodealliance/rfcs repo that houses _all_ RFCs for core BA projects, much like Rust’s rfcs repo that is shared between all Rust teams. 30 | * The rfcs repo will be structured similarly to the one in Rust: 31 | * A template markdown file laying out the format of RFCs, like [Rust’s](https://github.com/rust-lang/rfcs/blob/master/0000-template.md) but simplified. 32 | * A subdirectory holding the text of all accepted RFCs, like [Rust’s](https://github.com/rust-lang/rfcs/tree/master/text). 33 | * New RFCs are submitted via pull request, and both technical and process discussion happens via the comment thread. 34 | * The RFC is tagged with **project labels**, corresponding to the BA project(s) it targets. This tagging informs tooling about the relevant stakeholder set. 35 | 36 | ### Making a decision: merge or close 37 | 38 | * When discussion has stabilized around the main points of contention, any stakeholder can make a **motion to finalize**, using a special comment syntax understood by tooling. This motion comes with a disposition: merge or close. 39 | * N.B.: an RFC may be closed for reasons of timing or other project management concerns. The project team should make clear under what conditions, if any, a similar RFC would be reconsidered. 40 | * In response to the motion to finalize, a bot will post a special comment with a **stakeholder checklist**. 41 | * This list includes the GitHub handle for each individual stakeholder, organized into stakeholder groups. 42 | * The individual who filed the motion to finalize is automatically checked off. 43 | * Once _any_ stakeholder from a _different_ group has signed off, the RFC will move into a 10 calendar day **final comment period** (FCP), long enough to ensure that other stakeholders have at least a full business week to respond. 44 | * During FCP, any stakeholder can raise an **objection** using a syntax understood by the bot. Doing so aborts FCP and labels the RFC as `blocked-by-stakeholder` until the objection is formally resolved (again using a special comment syntax). 45 | * Finally, the RFC is automatically merged/close if either: 46 | * The FCP elapses without any objections. 47 | * A stakeholder from _each_ group has signed off, short-cutting the waiting period. 48 | 49 | ## What goes into an RFC? 50 | 51 | In general, an RFC is a single markdown file with a number of required sections. Many RFCs should begin as _drafts_, to encourage early discussion about the approach and only a sketch of a proposal. Full RFCs contain a fleshed-out proposal and more discussion around rationale and alternatives. 52 | 53 | Template files for both draft and full RFCs will be available in the repository root. 54 | 55 | ### Draft RFCs 56 | 57 | It is encouraged to use the RFC process to discuss ideas early in the design phase, before a _full_ proposal is ready. Such RFCs should be marked as [a draft PR](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests#draft-pull-requests), and contain a markdown file with the following sections: 58 | 59 | * **Motivation**. What problem are you ultimately hoping to solve? 60 | * **Proposal sketch**. At least a short sketch of a possible approach. Beginning discussion without _any_ proposal tends to be unproductive. 61 | * **Open questions**. This section is especially important for draft RFCs: it’s where you highlight the key issues you are hoping that discussion will address. 62 | 63 | RFCs cannot be merged in draft form. Before any motion to merge, the draft should be revised to include all required RFC sections, and the PR should be changed to a standard GitHub PR. 64 | 65 | ### Full RFCs 66 | 67 | Full RFCs are markdown files containing the following sections (which will be laid out in a template file): 68 | 69 | * **Summary**. A ~one paragraph overview of the RFC. 70 | * **Motivation**. What problem does the RFC solve? 71 | * **Proposal**. The meat of the RFC. 72 | * **Rationale and alternatives**. A discussion of tradeoffs: why was the proposal chosen, rather than alternatives? 73 | * **Open questions**. Often an RFC is initially created with a broad proposal but some gaps that need community input to fill in. 74 | 75 | ## What requires an RFC? 76 | 77 | When should you open an RFC, rather than just writing code and opening a traditional PR? 78 | 79 | * When the work involves changes that will significantly affect stakeholders or project contributors. Each project may provide more specific guidance. Examples include: 80 | * Major architectural changes 81 | * Major new features 82 | * Simple changes that have significant downstream impact 83 | * Changes that could affect guarantees or level of support, e.g. removing or adding support for a target platform 84 | * Changes that could affect mission alignment, e.g. by changing properties of the security model 85 | * When the work is substantial and you want to get early feedback on your approach. 86 | 87 | ## Tooling 88 | 89 | The proposal above assumes the presence of tooling to manage the GitHub workflow around RFCs. The intent is to use Rust's [rfcbot](https://github.com/rust-lang/rfcbot-rs). It will need some modifications to support the proposed workflow; these can hopefully land upstream so we can share the tool with the Rust community, but otherwise the BA will maintain its own fork. 90 | 91 | ## Approval of this RFC and future process-related RFCs 92 | 93 | This RFC, and any future RFCs targeting similar process-related questions, will go before the Bytecode Alliance Steering Committee (eventually replaced by the Technical Steering Committee, once that group has been formed). 94 | 95 | # Rationale and alternatives 96 | 97 | ## Stakeholder and FCP approach 98 | 99 | This proposal tries to strike a good balance between being able to move quickly, and making sure stakeholder consent is represented. 100 | 101 | The core thinking is that getting active signoff from at least one person from a different stakeholder group, together with the original motion to finalize, is sufficient evidence that an RFC has received external vetting. The additional 10 day period gives all stakeholders the opportunity to review the proposal; if there are any concerns, or even just a desire to take more time to review, any stakeholder can file an objection. And on the other hand, proposals can move even more quickly with signoff from each group. 102 | 103 | In total, this setup allows proposals to move more quickly than either Rust’s RFC or MCP processes, due to the less stringent review requirements and ability to end FCP with enough signoff. At the same time, the default FCP waiting period and objection facility should provide stakeholders with enough notice and tools to slow things down when needed. 104 | 105 | ## Organizations vs individuals 106 | 107 | Motion through the FCP process is gated at the stakeholder group level, but the actual check-off process is tied to individuals. There are reasons for both: 108 | 109 | * **Gated by group**. The goal of the signoff process is to provide a “stakeholder consent” check before finalizing an RFC. We view a signoff from _any_ member of an organization or project as a fair representation of that organization or projects’s interests, and expect individuals to act accordingly. 110 | * **Individual sign-off**. Despite being gated by groups, we call out individuals by GitHub handle for review. Doing so helps avoid diffusion of responsibility: if we instead merely had a checkbox per organization, it could easily create a situation where no particular individual felt “on duty” to review. In addition, tracking which individual approved the review provides an important process record in case that individual fails to represent their group's interests. 111 | 112 | ## Comparison to Rust’s RFC and MCP processes 113 | 114 | The Rust RFC process has successfully governed the development of the language from before 1.0, and covers an enormous range of subprojects, including the language design, tooling, the compiler, documentation, and even the Rust web site. It is a proven model and variations of the process have been adopted by several other large projects. Its workflow is simple and fits entirely within GitHub, which is already the central point of coordination within the BA. And given the central role of Rust within the BA, it’s a model that members are likely to already be familiar with. 115 | 116 | That said, a major difference in design is the notion of **stakeholders** and how they impact the decision-making process. Here we borrow some thinking from Rust’s lightweight MCP process, allowing a decision to go forward after getting buy-in from just one stakeholder within a different group -- but still requiring a waiting period to do so. A further innovation is that getting sign off from within _all_ groups immediately concludes FCP. That was not possible in the Rust community, where the set of stakeholders is unbounded. 117 | 118 | A common complaint about the Rust RFC process is that it is, in some ways, a victim of its own success: RFC comment threads can quickly become overwhelming. While this may eventually become an issue for the BA as well, we have some additional recourse in this proposal: we have a clear stakeholder model which will allow us to prioritize concerns from stakeholders, and take action when individuals who are not formal stakeholders overwhelm comment threads. 119 | 120 | We could instead follow Rust’s MCP model and use Zulip streams for RFC discussion. However, that introduces a more complex workflow (spanning multiple systems) and leaves a less permanent and less accessible record of discussion. 121 | 122 | # Open questions 123 | 124 | * As written, an individual stakeholder can block acceptance of an RFC indefinitely. In the Rust community, such an arrangement has sometimes caused problems, leading to a secondary process that kicks in after a waiting period, and requires an _additional_ stakeholder to approve continued blocking. Should we consider building in such a mechanism now, or add it to the process only if it becomes necessary later? 125 | -------------------------------------------------------------------------------- /accepted/shared-host-functions.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | [summary]: #summary 4 | 5 | This RFC proposes adding support in the Wasmtime API for sharing host function definitions between multiple stores. 6 | 7 | # Motivation 8 | [motivation]: #motivation 9 | 10 | Wasmtime is currently well-suited for long-lived module instantiations, where a singular `Instance` is created for a 11 | WebAssembly program and the program is run to completion. 12 | 13 | In this model, the time spent defining host functions for the instance is negligible because the setup is only performed once 14 | and the instance is, potentially, expected to run for a long time. 15 | 16 | However, this model is not ideal for services that create short-lived instantiations for each service request, where the time 17 | spent creating host functions to import at instantiation-time can be considerable for the repeated creation of many instances. 18 | 19 | As an instance is expected to live only for the duration of the request, each instance would be associated with its own `Store` 20 | that is dropped when the request completes. 21 | 22 | Given the nature of `Store` isolation in Wasmtime, this means host function definitions must be recreated for each and every 23 | request handled by the service. 24 | 25 | If instead host functions could be defined for a `Config` (in addition to `Store`), then the repeated effort to define the 26 | host functions would be eliminated and the time it takes to create an instance to handle a request would therefore be 27 | noticeably reduced. 28 | 29 | # Proposal 30 | [proposal]: #proposal 31 | 32 | As `Func` is inherently tied to a `Store`, it cannot be used to represent a function associated with an `Config`. 33 | 34 | This RFC proposes not having an analogous type to represent these functions; instead, users will define the function via 35 | methods on `Config`, but use `Store` (or a `Linker` associated with a `Store`) to get the `Func` representing the function. 36 | 37 | ## Overview example 38 | 39 | A simple example that demonstrates defining a host function in a `Config`: 40 | 41 | ```rust 42 | let mut config = Config::default(); 43 | 44 | config.wrap_host_func("", "hello", |caller: Caller, ptr: i32, len: i32| { 45 | let mem = caller.get_export("memory").unwrap().into_memory().unwrap(); 46 | println!( 47 | "Hello {}!", 48 | std::str::from_utf8(unsafe{ &mem.data_unchecked()[ptr..ptr + len] }).unwrap() 49 | ); 50 | }); 51 | 52 | let engine = Engine::new(&config); 53 | let module = Module::new( 54 | &engine, 55 | r#" 56 | (module 57 | (import "" "hello" (func $hello (param i32 i32))) 58 | (memory (export "memory") 1) 59 | (func (export "run") 60 | i32.const 0 61 | i32.const 5 62 | call $hello 63 | ) 64 | (data (i32.const 0) "world") 65 | ) 66 | "#, 67 | )?; 68 | 69 | let store = Store::new(module.engine()); 70 | let linker = Linker::new(&store); 71 | 72 | let instance = linker.instantiate(&module)?; 73 | let run = instance 74 | .get_export("run") 75 | .unwrap() 76 | .into_func() 77 | .unwrap() 78 | .get0::<()>(); 79 | 80 | run()?; 81 | ``` 82 | 83 | ## Changes to `wasmtime::Config` 84 | 85 | This RFC proposes that the following methods be added to `Config`: 86 | 87 | ```rust 88 | /// Defines a host function for the [`Config`] for the given callback. 89 | /// 90 | /// Use [`Store::get_host_func`] to get a [`Func`] representing the function. 91 | /// 92 | /// Note that the implementation of `func` must adhere to the `ty` 93 | /// signature given, error or traps may occur if it does not respect the 94 | /// `ty` signature. 95 | /// 96 | /// Additionally note that this is quite a dynamic function since signatures 97 | /// are not statically known. For performance reasons, it's recommended 98 | /// to use [`Config::wrap_host_func`] if you can because with statically known 99 | /// signatures the engine can optimize the implementation much more. 100 | /// 101 | /// The callback must be `Send` and `Sync` as it is shared between all engines created 102 | /// from the `Config`. For more relaxed bounds, use [`Func::new`] to define the function. 103 | pub fn define_host_func( 104 | &mut self, 105 | module: &str, 106 | name: &str, 107 | ty: FuncType, 108 | func: impl Fn(Caller<'_>, &[Val], &mut [Val]) -> Result<(), Trap> + Send + Sync + 'static, 109 | ); 110 | 111 | /// Defines a host function for the [`Config`] from the given Rust closure. 112 | /// 113 | /// Use [`Store::get_host_func`] to get a [`Func`] representing the function. 114 | /// 115 | /// See [`Func::wrap`] for information about accepted parameter and result types for the closure. 116 | /// 117 | /// The closure must be `Send` and `Sync` as it is shared between all engines created 118 | /// from the `Config`. For more relaxed bounds, use [`Func::wrap`] to wrap the closure. 119 | pub fn wrap_host_func( 120 | &mut self, 121 | module: &str, 122 | name: &str, 123 | func: impl IntoFunc + Send + Sync, 124 | ); 125 | ``` 126 | 127 | Similar to `Func::new`, `define_host_func` will define an host function that can be used for any Wasm function type. 128 | 129 | Similar to `Func::wrap`, `wrap_host_func` will generically accept different `Fn` signatures to determine the WebAssembly type of 130 | the function. 131 | 132 | These methods will internally create an `InstanceHandle` to represent the host function. 133 | 134 | However, the instance handle will not owned by a store (as is the case with `Func`); instead, the `Config` will own the associated instance handles and deallocate them when dropped. 135 | 136 | Note: the `IntoFunc` trait is documented as internal to Wasmtime and will need to be extended to implement this feature; 137 | therefore that will not be considered a breaking change. 138 | 139 | ## Changes to `wasmtime::Store` 140 | 141 | To use host functions defined on a `Config` as imports for module instantiation or as `funcref` values, a `Func` representation is 142 | required. 143 | 144 | This proposal calls for adding the following methods to `Store`: 145 | 146 | ```rust 147 | /// Gets a host function from the [`crate::Config`] associated with this [`Store`]. 148 | /// 149 | /// Returns `None` if the given host function is not defined. 150 | pub fn get_host_func(&self, module: &str, name: &str) -> Option; 151 | 152 | /// Gets a context value from the store. 153 | /// 154 | /// Returns a reference to the context value if present. 155 | pub fn get(&self) -> Option<&T>; 156 | 157 | /// Sets a context value into the store. 158 | /// 159 | /// Returns the given value as an error if an existing value is already set. 160 | pub fn set(&self, value: T) -> Result<(), T>; 161 | ``` 162 | 163 | `get_host_func` will register the function's instance handle with the `Store`, but as a *borrowed* handle which it is 164 | not responsible for deallocation. As `Store` keeps a reference on its associated `Engine` (and therefore `Config`), this ownership model should be 165 | sufficient to prevent `funcref` values or imports of shared host functions from outliving the function instance itself. 166 | 167 | For host functions that require contextual data (e.g. WASI), the `set` method will be used for associating context with 168 | a `Store` and it can later be retrieved via `caller.store().get()`. 169 | 170 | ## Changes to `wasmtime::Linker` 171 | 172 | There will be no changes to the API surface of `Linker` to support the changes proposed in this RFC. 173 | 174 | However, the `Linker` implementation will be changed such that it will fallback to calling `Store::get_host_func` when 175 | resolving imports for `Linker::instantiate` and `Linker::module`. 176 | 177 | This should allow for overriding shared host function imports in the `Linker` if an import of the same name has already been defined 178 | in the associated `Config`. 179 | 180 | ## Changes to `wasmtime_wasi::Wasi` 181 | 182 | `Wasi` is a type generated from `wasmtime-wiggle`. 183 | 184 | This proposal adds the following methods to the generated `Wasi` type: 185 | 186 | ```rust 187 | /// Adds the WASI host functions to the given [`Wasmtime::Config`]. 188 | /// 189 | /// Host functions added to the config expect [`Wasi::set_context`] to be called. 190 | /// 191 | /// WASI host functions will trap if the context is not set in the calling [`wasmtime::Store`]. 192 | pub fn add_to_config(config: &mut Config); 193 | 194 | /// Sets the context in the given store. 195 | /// 196 | /// This method must be called on a store that imports a WASI host function when using `Wasi::add_to_config`. 197 | /// 198 | /// Returns the given context as an error if a context has already been set in the store. 199 | pub fn set_context(store: &Store, ctx: WasiCtx) -> Result<(), WasiCtx>; 200 | ``` 201 | 202 | `add_to_config` will add the various WASI functions to the given `Config`. 203 | 204 | The function implementations will expect that `set_context` has been called for the `Store` prior to any invocations. 205 | 206 | If the context is not set, the WASI function implementations will trap. 207 | 208 | ### WASI shared host function example 209 | 210 | ```rust 211 | let mut config = Config::default(); 212 | Wasi::add_to_config(&config); 213 | 214 | let engine = Engine::new(&config); 215 | 216 | let module = Module::new( 217 | &engine, 218 | r#" 219 | (module 220 | (import "wasi_snapshot_preview1" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32))) 221 | (memory (export "memory") 1) 222 | (func (export "run") 223 | i32.const 1 224 | i32.const 0 225 | i32.const 1 226 | i32.const 12 227 | call $fd_write 228 | drop 229 | ) 230 | (data (i32.const 0) "\0C\00\00\00\0D\00\00\00\00\00\00\00Hello world!\n") 231 | ) 232 | "#, 233 | )?; 234 | 235 | let store = Store::new(module.engine()); 236 | 237 | // Set the WasiCtx in the store 238 | // Without this, the call to `fd_write` will trap 239 | assert!(Wasi::set_context(&store, WasiCtxBuilder::new().build()?).is_ok()); 240 | 241 | let linker = Linker::new(&store); 242 | let instance = linker.instantiate(&module)?; 243 | 244 | let run = instance 245 | .get_export("run") 246 | .unwrap() 247 | .into_func() 248 | .unwrap() 249 | .get0::<()>(); 250 | 251 | run()?; 252 | ``` 253 | 254 | # Rationale and alternatives 255 | [rationale-and-alternatives]: #rationale-and-alternatives 256 | 257 | No other designs have been considered. 258 | 259 | # Open questions 260 | [open-questions]: #open-questions 261 | 262 | * Should this be exposed via Wasmtime-specific C API functions? 263 | 264 | * Is `Store` the right place for storing context? 265 | For WASI, this means all instances that use the same store will share the `WasiCtx`. 266 | -------------------------------------------------------------------------------- /accepted/vulnerability-response-runbook.md: -------------------------------------------------------------------------------- 1 | # Bytecode Alliance Vulnerability Response Runbook 2 | This document describes a series of steps to be followed by the Bytecode Alliance team when responding to a vulnerability that is discovered or reported privately. 3 | 4 | This document does not cover cases where vulnerabilities are publicly disclosed without coordination. Or cases where unpatched vulnerabilities are under active exploitation. In those cases, contact security@bytecodealliance.org. 5 | ## Select an Incident Manager 6 | 7 | Pick an individual who will track that the entire process in this document is followed. 8 | By default, the person inside Bytecode Alliance who discovers or learns of the 9 | vulnerability first has this duty, until they explicitly hand if off to 10 | someone else. 11 | 12 | ## Identifying the Vulnerability 13 | 14 | Figure out what bytecode alliance repository it lives in. 15 | 16 | The vulnerability probably lives in one single repository in the Bytecode 17 | Alliance ecosystem. If fixing it requires coordinated work across two 18 | repositories, this guide may not work. In situations not covered by this guide, please 19 | email security@bytecodealliance.org to seek guidance. 20 | 21 | ## Create a GitHub Security Advisory Draft 22 | 23 | Under the "Security" tab for each repo, you can [create a new draft security 24 | advisory.](https://github.com/bytecodealliance/wasmtime/security/advisories/) 25 | 26 | ### Affected product 27 | 28 | Identify the package the vulnerability primarily exists in. This, and all 29 | packages in the Bytecode Alliance org which depend on that package, should 30 | be identified as an Affected Product. 31 | 32 | You can leave the Patched versions blank in the draft to start. 33 | 34 | ### Determine Severity 35 | 36 | We follow the Severity levels defined in the [OpenSSL Security 37 | Policy](https://www.openssl.org/policies/secpolicy.html). 38 | 39 | ### Draft a Description 40 | 41 | The initial description doesn't have to be polished for public disclosure - it 42 | can be edited throughout the draft process. The initial purpose of this 43 | description is to communicate to collaborators what the vulnerability is. 44 | It needs to be sufficient for GitHub Staff to spot-check the draft when 45 | requesting a CVE. 46 | 47 | ### Request a CVE 48 | 49 | Once you have a draft description, you should have GitHub request a CVE on 50 | your behalf. During US west coast business hours, this typically takes under 2 51 | hours to process. If it takes more than a day, ping Bytecode Alliance partners 52 | who may have contacts at GitHub, where they can inquire on why the process is 53 | stuck. You can keep working on everything while this request is pending. 54 | 55 | ## Invite collaborators to the Security Advisory Draft 56 | 57 | The incident manager is responsible for selecting trustworthy individuals to collaborate 58 | on authoring the security advisory and its patch. The set of collaborators should be 59 | kept as small as possible. It should include an individual who is a code reviewer on 60 | each Bytecode Alliance project affected by the vulnerability. 61 | 62 | GitHub Security Advisories let you invite individuals to collaborate on the draft. 63 | Collaborators will be able to view and edit the advisory, and push and pull commits 64 | to the private repository where the patch is under development. 65 | 66 | ## Collaborate on a patch 67 | 68 | Use the security release PR features in the GitHub Security Advisory to 69 | collaborate on a patch. GitHub will create a private fork of the repository on 70 | your behalf and invite the advisory collaboators to it. 71 | 72 | In the private fork repository, create a single PR which will track the patch 73 | to resolve the vulnerability. This patch is automatically merged into the 74 | public repository's default branch when the disclosure is made public. 75 | 76 | Cloud-based CI (github actions or otherwise) will not run on the security 77 | advisory fork. Collaborators are responsible for running CI locally before 78 | pushing to the PR branch. 79 | 80 | Use the PR review features to indicate approval for the patch. 81 | 82 | ### Releases as part of the patch 83 | 84 | Wasmtime has committed that security issues will be applied to the current 85 | version of Wasmtime, and released as patch releases: see [Wasmtime 1.0 86 | RFC](https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasmtime-one-dot-oh.md#backports---security-fixes) 87 | 88 | Other projects without an existing policy should do the same, unless they have 89 | a very good reason for issuing backports. 90 | 91 | The patch PR should increment any version numbers required to make the patch 92 | release. The merged PR should be immediately publishable to package registries 93 | when the disclosure is made public. Create a merge commit and run your package 94 | publishing tools in dry-run mode to verify this ahead-of-time as much as 95 | possible. 96 | 97 | ## Advanced disclosure to stakeholders 98 | 99 | The incident manager is responsible for identifying the downstream making advanced 100 | disclosure to trustworthy stakeholders. All stakeholders who receive advanced disclosure 101 | must maintain secrecy of the vulnerability until public disclosure is made. 102 | 103 | ## Commit to a public disclosure date 104 | 105 | Once you have confidence in a patch, and stakeholders have been pre-notified, it is 106 | time to determine a date for public disclosure. Incident managers may discuss the public 107 | disclosure date with stakeholders who have received advanced to understand their feasibility 108 | and timelines for mitigation. 109 | 110 | By default, public disclosure should be no more than 30 days after initial discovery 111 | of the vulnerability. Incident managers are encouraged to select the shortest timeline 112 | possible. 113 | 114 | However, the incident manager may make an exception to the 30 day limit if the 115 | nature of the vulnerability requires it. Reasons for exceptions may include the 116 | technical feasibility of mitigating the vulnerability, or coordination with other 117 | security disclosure processes. Do not make exceptions lightly: inappropriate exceptions 118 | may undermine trust in the Bytecode Alliance and its projects. 119 | 120 | ## Notify sec-announce 121 | 122 | Once you have committed to a disclosure date, send an email to the 123 | [`sec-announce@bytecodealliance.org` Google 124 | Group](https://groups.google.com/a/bytecodealliance.org/g/sec-announce). 125 | 126 | This email should use this template: 127 | 128 | ```pre 129 | The Bytecode Alliance would like to announce the forthcoming security release 130 | of the . 131 | 132 | The release will be made available on at approximately