├── .gitignore ├── .gitattributes ├── .pr-preview.json ├── .editorconfig ├── .github ├── ISSUE_TEMPLATE │ ├── config.yml │ ├── 0-new-issue.yml │ └── 1-new-feature.yml └── workflows │ └── build.yml ├── Makefile ├── README.md ├── LICENSE └── review-drafts ├── 2018-07.bs ├── 2019-01.bs └── 2019-07.bs /.gitignore: -------------------------------------------------------------------------------- 1 | /infra.spec.whatwg.org/ 2 | /deploy.sh 3 | /infra.html 4 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | * text=auto 2 | *.bs diff=html linguist-language=HTML 3 | -------------------------------------------------------------------------------- /.pr-preview.json: -------------------------------------------------------------------------------- 1 | { 2 | "src_file": "infra.bs", 3 | "type": "bikeshed", 4 | "params": { 5 | "force": 1, 6 | "md-status": "LS-PR", 7 | "md-Text-Macro": "PR-NUMBER {{ pull_request.number }}" 8 | } 9 | } 10 | -------------------------------------------------------------------------------- /.editorconfig: -------------------------------------------------------------------------------- 1 | root = true 2 | 3 | [*] 4 | end_of_line = lf 5 | insert_final_newline = true 6 | charset = utf-8 7 | indent_size = 2 8 | indent_style = space 9 | trim_trailing_whitespace = true 10 | max_line_length = 100 11 | 12 | [Makefile] 13 | indent_style = tab 14 | 15 | [*.md] 16 | max_line_length = off 17 | 18 | [*.bs] 19 | indent_size = 1 20 | 21 | [*.py] 22 | indent_size = 4 23 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/config.yml: -------------------------------------------------------------------------------- 1 | blank_issues_enabled: false 2 | contact_links: 3 | - name: Chat 4 | url: https://whatwg.org/chat 5 | about: Please do reach out with questions and feedback! 6 | - name: Stack Overflow 7 | url: https://stackoverflow.com/ 8 | about: If you're having trouble building a web page, this is not the right repository. Consider asking your question on Stack Overflow instead. 9 | -------------------------------------------------------------------------------- /.github/workflows/build.yml: -------------------------------------------------------------------------------- 1 | name: Build 2 | 3 | on: 4 | pull_request: 5 | branches: 6 | - main 7 | push: 8 | branches: 9 | - main 10 | workflow_dispatch: 11 | 12 | jobs: 13 | build: 14 | name: Build 15 | runs-on: ubuntu-22.04 16 | steps: 17 | - uses: actions/checkout@v3 18 | with: 19 | fetch-depth: 2 20 | - uses: actions/setup-python@v4 21 | with: 22 | python-version: "3.11" 23 | - run: pip install bikeshed && bikeshed update 24 | # Note: `make deploy` will do a deploy dry run on PRs. 25 | - run: make deploy 26 | env: 27 | SERVER: ${{ secrets.MARQUEE_SERVER }} 28 | SERVER_PUBLIC_KEY: ${{ secrets.MARQUEE_PUBLIC_KEY }} 29 | SERVER_DEPLOY_KEY: ${{ secrets.MARQUEE_DEPLOY_KEY }} 30 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/0-new-issue.yml: -------------------------------------------------------------------------------- 1 | name: New issue 2 | description: File a new issue against the Infra Standard. 3 | body: 4 | - type: markdown 5 | attributes: 6 | value: | 7 | Before filling out this form, please familiarize yourself with the [Code of Conduct](https://whatwg.org/code-of-conduct). You might also find the [FAQ](https://whatwg.org/faq) and [Working Mode](https://whatwg.org/working-mode) useful. 8 | 9 | If at any point you have questions, please reach out to us on [Chat](https://whatwg.org/chat). 10 | - type: textarea 11 | attributes: 12 | label: "What is the issue with the Infra Standard?" 13 | validations: 14 | required: true 15 | - type: markdown 16 | attributes: 17 | value: "Thank you for taking the time to improve the Infra Standard!" 18 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SHELL=/bin/bash -o pipefail 2 | .PHONY: local remote deploy 3 | 4 | remote: infra.bs 5 | @ (HTTP_STATUS=$$(curl https://api.csswg.org/bikeshed/ \ 6 | --output infra.html \ 7 | --write-out "%{http_code}" \ 8 | --header "Accept: text/plain, text/html" \ 9 | -F die-on=warning \ 10 | -F md-Text-Macro="COMMIT-SHA LOCAL COPY" \ 11 | -F file=@infra.bs) && \ 12 | [[ "$$HTTP_STATUS" -eq "200" ]]) || ( \ 13 | echo ""; cat infra.html; echo ""; \ 14 | rm -f infra.html; \ 15 | exit 22 \ 16 | ); 17 | 18 | local: infra.bs 19 | bikeshed spec infra.bs infra.html --md-Text-Macro="COMMIT-SHA LOCAL-COPY" 20 | 21 | deploy: infra.bs 22 | curl --remote-name --fail https://resources.whatwg.org/build/deploy.sh 23 | bash ./deploy.sh 24 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/1-new-feature.yml: -------------------------------------------------------------------------------- 1 | name: New feature 2 | description: Request a new feature in the Infra Standard. 3 | labels: ["addition/proposal", "needs implementer interest"] 4 | body: 5 | - type: markdown 6 | attributes: 7 | value: | 8 | Before filling out this form, please familiarize yourself with the [Code of Conduct](https://whatwg.org/code-of-conduct), [FAQ](https://whatwg.org/faq), and [Working Mode](https://whatwg.org/working-mode). They help with setting expectations and making sure you know what is required. The FAQ ["How should I go about proposing new features to WHATWG standards?"](https://whatwg.org/faq#adding-new-features) is especially relevant. 9 | 10 | If at any point you have questions, please reach out to us on [Chat](https://whatwg.org/chat). 11 | - type: textarea 12 | attributes: 13 | label: "What problem are you trying to solve?" 14 | validations: 15 | required: true 16 | - type: textarea 17 | attributes: 18 | label: "What solutions exist today?" 19 | - type: textarea 20 | attributes: 21 | label: "How would you solve it?" 22 | - type: textarea 23 | attributes: 24 | label: "Anything else?" 25 | - type: markdown 26 | attributes: 27 | value: "Thank you for taking the time to improve the Infra Standard!" 28 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This repository hosts the [Infra Standard](https://infra.spec.whatwg.org/). 2 | 3 | ## Code of conduct 4 | 5 | We are committed to providing a friendly, safe, and welcoming environment for all. Please read and respect the [Code of Conduct](https://whatwg.org/code-of-conduct). 6 | 7 | ## Contribution opportunities 8 | 9 | Folks notice minor and larger issues with the Infra Standard all the time and we'd love your help fixing those. Pull requests for typographical and grammar errors are also most welcome. 10 | 11 | Issues labeled ["good first issue"](https://github.com/whatwg/infra/labels/good%20first%20issue) are a good place to get a taste for editing the Infra Standard. Note that we don't assign issues and there's no reason to ask for availability either, just provide a pull request. 12 | 13 | If you are thinking of suggesting a new feature, read through the [FAQ](https://whatwg.org/faq) and [Working Mode](https://whatwg.org/working-mode) documents to get yourself familiarized with the process. 14 | 15 | We'd be happy to help you with all of this [on Chat](https://whatwg.org/chat). 16 | 17 | ## Pull requests 18 | 19 | In short, change `infra.bs` and submit your patch, with a [good commit message](https://github.com/whatwg/meta/blob/main/COMMITTING.md). 20 | 21 | Please add your name to the Acknowledgments section in your first pull request, even for trivial fixes. The names are sorted lexicographically. 22 | 23 | To ensure your patch meets all the necessary requirements, please also see the [Contributor Guidelines](https://github.com/whatwg/meta/blob/main/CONTRIBUTING.md). Editors of the Infra Standard are expected to follow the [Maintainer Guidelines](https://github.com/whatwg/meta/blob/main/MAINTAINERS.md). 24 | 25 | ## Tests 26 | 27 | Tests are an essential part of the standardization process and will need to be created or adjusted as changes to the standard are made. Tests for the Infra Standard can be found in the `infra/` directory of [`web-platform-tests/wpt`](https://github.com/web-platform-tests/wpt). 28 | 29 | A dashboard showing the tests running against browser engines can be seen at [wpt.fyi/results/infra](https://wpt.fyi/results/infra). 30 | 31 | ## Building "locally" 32 | 33 | For quick local iteration, run `make`; this will use a web service to build the standard, so that you don't have to install anything. See more in the [Contributor Guidelines](https://github.com/whatwg/meta/blob/main/CONTRIBUTING.md#building). 34 | 35 | ## Formatting 36 | 37 | Use a column width of 100 characters. 38 | 39 | Do not use newlines inside "inline" elements, even if that means exceeding the column width requirement. 40 | ```html 41 |
The
42 | remove(tokens…)
43 | method, when invoked, must run these steps:
44 | ```
45 | is okay and
46 | ```html
47 |
The remove(tokens…) method, when
49 | invoked, must run these steps:
50 | ```
51 | is not.
52 |
53 | Using newlines between "inline" element tag names and their content is also forbidden. (This actually alters the content, by adding spaces.) That is
54 | ```html
55 | token
56 | ```
57 | is fine and
58 | ```html
59 | token
60 |
61 | ```
62 | is not.
63 |
64 | An `
` element inside it, unless it's a child of `
For each token in tokens, in given order, that is not in 71 | tokens, append token to tokens. 72 | ``` 73 | is not indented, but 74 | ```html 75 |
For each token in tokens, run these substeps: 78 | 79 |
If token is the empty string, throw a {{SyntaxError}} exception. 81 | ``` 82 | is. 83 | 84 | End tags may be included (if done consistently) and attributes may be quoted (using double quotes), though the prevalent theme is to omit end tags and not quote attributes (unless they contain a space). 85 | 86 | Place one newline between paragraphs (including list elements). Place three newlines before `
Do not place a newline above. 90 | 91 |
Place a newline above. 92 |
Place a newline above. 95 | 96 | 97 |
A request has an associated 107 | redirect mode,... 108 | ``` 109 | ```html 110 |
Let redirectMode be request's redirect mode. 111 | ``` 112 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). 2 | 3 | This work is licensed under a Creative Commons Attribution 4.0 International 4 | License. To the extent portions of it are incorporated into source code, 5 | such portions in the source code are licensed under the BSD 3-Clause License instead. 6 | 7 | - - - - 8 | 9 | Creative Commons Attribution 4.0 International Public License 10 | 11 | By exercising the Licensed Rights (defined below), You accept and agree 12 | to be bound by the terms and conditions of this Creative Commons 13 | Attribution 4.0 International Public License ("Public License"). To the 14 | extent this Public License may be interpreted as a contract, You are 15 | granted the Licensed Rights in consideration of Your acceptance of 16 | these terms and conditions, and the Licensor grants You such rights in 17 | consideration of benefits the Licensor receives from making the 18 | Licensed Material available under these terms and conditions. 19 | 20 | 21 | Section 1 -- Definitions. 22 | 23 | a. Adapted Material means material subject to Copyright and Similar 24 | Rights that is derived from or based upon the Licensed Material 25 | and in which the Licensed Material is translated, altered, 26 | arranged, transformed, or otherwise modified in a manner requiring 27 | permission under the Copyright and Similar Rights held by the 28 | Licensor. For purposes of this Public License, where the Licensed 29 | Material is a musical work, performance, or sound recording, 30 | Adapted Material is always produced where the Licensed Material is 31 | synched in timed relation with a moving image. 32 | 33 | b. Adapter's License means the license You apply to Your Copyright 34 | and Similar Rights in Your contributions to Adapted Material in 35 | accordance with the terms and conditions of this Public License. 36 | 37 | c. Copyright and Similar Rights means copyright and/or similar rights 38 | closely related to copyright including, without limitation, 39 | performance, broadcast, sound recording, and Sui Generis Database 40 | Rights, without regard to how the rights are labeled or 41 | categorized. For purposes of this Public License, the rights 42 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 43 | Rights. 44 | 45 | d. Effective Technological Measures means those measures that, in the 46 | absence of proper authority, may not be circumvented under laws 47 | fulfilling obligations under Article 11 of the WIPO Copyright 48 | Treaty adopted on December 20, 1996, and/or similar international 49 | agreements. 50 | 51 | e. Exceptions and Limitations means fair use, fair dealing, and/or 52 | any other exception or limitation to Copyright and Similar Rights 53 | that applies to Your use of the Licensed Material. 54 | 55 | f. Licensed Material means the artistic or literary work, database, 56 | or other material to which the Licensor applied this Public 57 | License. 58 | 59 | g. Licensed Rights means the rights granted to You subject to the 60 | terms and conditions of this Public License, which are limited to 61 | all Copyright and Similar Rights that apply to Your use of the 62 | Licensed Material and that the Licensor has authority to license. 63 | 64 | h. Licensor means the individual(s) or entity(ies) granting rights 65 | under this Public License. 66 | 67 | i. Share means to provide material to the public by any means or 68 | process that requires permission under the Licensed Rights, such 69 | as reproduction, public display, public performance, distribution, 70 | dissemination, communication, or importation, and to make material 71 | available to the public including in ways that members of the 72 | public may access the material from a place and at a time 73 | individually chosen by them. 74 | 75 | j. Sui Generis Database Rights means rights other than copyright 76 | resulting from Directive 96/9/EC of the European Parliament and of 77 | the Council of 11 March 1996 on the legal protection of databases, 78 | as amended and/or succeeded, as well as other essentially 79 | equivalent rights anywhere in the world. 80 | 81 | k. You means the individual or entity exercising the Licensed Rights 82 | under this Public License. Your has a corresponding meaning. 83 | 84 | 85 | Section 2 -- Scope. 86 | 87 | a. License grant. 88 | 89 | 1. Subject to the terms and conditions of this Public License, 90 | the Licensor hereby grants You a worldwide, royalty-free, 91 | non-sublicensable, non-exclusive, irrevocable license to 92 | exercise the Licensed Rights in the Licensed Material to: 93 | 94 | a. reproduce and Share the Licensed Material, in whole or 95 | in part; and 96 | 97 | b. produce, reproduce, and Share Adapted Material. 98 | 99 | 2. Exceptions and Limitations. For the avoidance of doubt, where 100 | Exceptions and Limitations apply to Your use, this Public 101 | License does not apply, and You do not need to comply with 102 | its terms and conditions. 103 | 104 | 3. Term. The term of this Public License is specified in Section 105 | 6(a). 106 | 107 | 4. Media and formats; technical modifications allowed. The 108 | Licensor authorizes You to exercise the Licensed Rights in 109 | all media and formats whether now known or hereafter created, 110 | and to make technical modifications necessary to do so. The 111 | Licensor waives and/or agrees not to assert any right or 112 | authority to forbid You from making technical modifications 113 | necessary to exercise the Licensed Rights, including 114 | technical modifications necessary to circumvent Effective 115 | Technological Measures. For purposes of this Public License, 116 | simply making modifications authorized by this Section 2(a) 117 | (4) never produces Adapted Material. 118 | 119 | 5. Downstream recipients. 120 | 121 | a. Offer from the Licensor -- Licensed Material. Every 122 | recipient of the Licensed Material automatically 123 | receives an offer from the Licensor to exercise the 124 | Licensed Rights under the terms and conditions of this 125 | Public License. 126 | 127 | b. No downstream restrictions. You may not offer or impose 128 | any additional or different terms or conditions on, or 129 | apply any Effective Technological Measures to, the 130 | Licensed Material if doing so restricts exercise of the 131 | Licensed Rights by any recipient of the Licensed 132 | Material. 133 | 134 | 6. No endorsement. Nothing in this Public License constitutes or 135 | may be construed as permission to assert or imply that You 136 | are, or that Your use of the Licensed Material is, connected 137 | with, or sponsored, endorsed, or granted official status by, 138 | the Licensor or others designated to receive attribution as 139 | provided in Section 3(a)(1)(A)(i). 140 | 141 | b. Other rights. 142 | 143 | 1. Moral rights, such as the right of integrity, are not 144 | licensed under this Public License, nor are publicity, 145 | privacy, and/or other similar personality rights; however, to 146 | the extent possible, the Licensor waives and/or agrees not to 147 | assert any such rights held by the Licensor to the limited 148 | extent necessary to allow You to exercise the Licensed 149 | Rights, but not otherwise. 150 | 151 | 2. Patent and trademark rights are not licensed under this 152 | Public License. 153 | 154 | 3. To the extent possible, the Licensor waives any right to 155 | collect royalties from You for the exercise of the Licensed 156 | Rights, whether directly or through a collecting society 157 | under any voluntary or waivable statutory or compulsory 158 | licensing scheme. In all other cases the Licensor expressly 159 | reserves any right to collect such royalties. 160 | 161 | 162 | Section 3 -- License Conditions. 163 | 164 | Your exercise of the Licensed Rights is expressly made subject to the 165 | following conditions. 166 | 167 | a. Attribution. 168 | 169 | 1. If You Share the Licensed Material (including in modified 170 | form), You must: 171 | 172 | a. retain the following if it is supplied by the Licensor 173 | with the Licensed Material: 174 | 175 | i. identification of the creator(s) of the Licensed 176 | Material and any others designated to receive 177 | attribution, in any reasonable manner requested by 178 | the Licensor (including by pseudonym if 179 | designated); 180 | 181 | ii. a copyright notice; 182 | 183 | iii. a notice that refers to this Public License; 184 | 185 | iv. a notice that refers to the disclaimer of 186 | warranties; 187 | 188 | v. a URI or hyperlink to the Licensed Material to the 189 | extent reasonably practicable; 190 | 191 | b. indicate if You modified the Licensed Material and 192 | retain an indication of any previous modifications; and 193 | 194 | c. indicate the Licensed Material is licensed under this 195 | Public License, and include the text of, or the URI or 196 | hyperlink to, this Public License. 197 | 198 | 2. You may satisfy the conditions in Section 3(a)(1) in any 199 | reasonable manner based on the medium, means, and context in 200 | which You Share the Licensed Material. For example, it may be 201 | reasonable to satisfy the conditions by providing a URI or 202 | hyperlink to a resource that includes the required 203 | information. 204 | 205 | 3. If requested by the Licensor, You must remove any of the 206 | information required by Section 3(a)(1)(A) to the extent 207 | reasonably practicable. 208 | 209 | 4. If You Share Adapted Material You produce, the Adapter's 210 | License You apply must not prevent recipients of the Adapted 211 | Material from complying with this Public License. 212 | 213 | 214 | Section 4 -- Sui Generis Database Rights. 215 | 216 | Where the Licensed Rights include Sui Generis Database Rights that 217 | apply to Your use of the Licensed Material: 218 | 219 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 220 | to extract, reuse, reproduce, and Share all or a substantial 221 | portion of the contents of the database; 222 | 223 | b. if You include all or a substantial portion of the database 224 | contents in a database in which You have Sui Generis Database 225 | Rights, then the database in which You have Sui Generis Database 226 | Rights (but not its individual contents) is Adapted Material; and 227 | 228 | c. You must comply with the conditions in Section 3(a) if You Share 229 | all or a substantial portion of the contents of the database. 230 | 231 | For the avoidance of doubt, this Section 4 supplements and does not 232 | replace Your obligations under this Public License where the Licensed 233 | Rights include other Copyright and Similar Rights. 234 | 235 | 236 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 237 | 238 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 239 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 240 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 241 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 242 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 243 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 244 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 245 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 246 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 247 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 248 | 249 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 250 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 251 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 252 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 253 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 254 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 255 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 256 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 257 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 258 | 259 | c. The disclaimer of warranties and limitation of liability provided 260 | above shall be interpreted in a manner that, to the extent 261 | possible, most closely approximates an absolute disclaimer and 262 | waiver of all liability. 263 | 264 | 265 | Section 6 -- Term and Termination. 266 | 267 | a. This Public License applies for the term of the Copyright and 268 | Similar Rights licensed here. However, if You fail to comply with 269 | this Public License, then Your rights under this Public License 270 | terminate automatically. 271 | 272 | b. Where Your right to use the Licensed Material has terminated under 273 | Section 6(a), it reinstates: 274 | 275 | 1. automatically as of the date the violation is cured, provided 276 | it is cured within 30 days of Your discovery of the 277 | violation; or 278 | 279 | 2. upon express reinstatement by the Licensor. 280 | 281 | For the avoidance of doubt, this Section 6(b) does not affect any 282 | right the Licensor may have to seek remedies for Your violations 283 | of this Public License. 284 | 285 | c. For the avoidance of doubt, the Licensor may also offer the 286 | Licensed Material under separate terms or conditions or stop 287 | distributing the Licensed Material at any time; however, doing so 288 | will not terminate this Public License. 289 | 290 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 291 | License. 292 | 293 | 294 | Section 7 -- Other Terms and Conditions. 295 | 296 | a. The Licensor shall not be bound by any additional or different 297 | terms or conditions communicated by You unless expressly agreed. 298 | 299 | b. Any arrangements, understandings, or agreements regarding the 300 | Licensed Material not stated herein are separate from and 301 | independent of the terms and conditions of this Public License. 302 | 303 | 304 | Section 8 -- Interpretation. 305 | 306 | a. For the avoidance of doubt, this Public License does not, and 307 | shall not be interpreted to, reduce, limit, restrict, or impose 308 | conditions on any use of the Licensed Material that could lawfully 309 | be made without permission under this Public License. 310 | 311 | b. To the extent possible, if any provision of this Public License is 312 | deemed unenforceable, it shall be automatically reformed to the 313 | minimum extent necessary to make it enforceable. If the provision 314 | cannot be reformed, it shall be severed from this Public License 315 | without affecting the enforceability of the remaining terms and 316 | conditions. 317 | 318 | c. No term or condition of this Public License will be waived and no 319 | failure to comply consented to unless expressly agreed to by the 320 | Licensor. 321 | 322 | d. Nothing in this Public License constitutes or may be interpreted 323 | as a limitation upon, or waiver of, any privileges and immunities 324 | that apply to the Licensor or You, including from the legal 325 | processes of any jurisdiction or authority. 326 | 327 | - - - - 328 | 329 | BSD 3-Clause License 330 | 331 | Redistribution and use in source and binary forms, with or without 332 | modification, are permitted provided that the following conditions are met: 333 | 334 | 1. Redistributions of source code must retain the above copyright notice, this 335 | list of conditions and the following disclaimer. 336 | 337 | 2. Redistributions in binary form must reproduce the above copyright notice, 338 | this list of conditions and the following disclaimer in the documentation 339 | and/or other materials provided with the distribution. 340 | 341 | 3. Neither the name of the copyright holder nor the names of its 342 | contributors may be used to endorse or promote products derived from 343 | this software without specific prior written permission. 344 | 345 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 346 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 347 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 348 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 349 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 350 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 351 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 352 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 353 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 354 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 355 | 356 | - - - - 357 | -------------------------------------------------------------------------------- /review-drafts/2018-07.bs: -------------------------------------------------------------------------------- 1 |
2 | Group: WHATWG 3 | Date: 2018-07-23 4 | H1: Infra 5 | Shortname: infra 6 | Text Macro: TWITTER infrastandard 7 | Abstract: The Infra Standard aims to define the fundamental concepts upon which standards are built. 8 | Translation: ja https://triple-underscore.github.io/infra-ja.html 9 |10 | 11 |
12 | urlPrefix: https://tc39.github.io/ecma262/; spec: ECMA-262; 13 | type: dfn 14 | text: %JSONParse%; url: sec-json.parse 15 | text: List; url: sec-list-and-record-specification-type 16 | text: The String Type; url: sec-ecmascript-language-types-string-type 17 | type: abstract-op; text: Call; url: sec-call 18 |19 | 20 | 21 |
Deduplicate boilerplate in standards. 25 | 26 |
Align standards on conventions, terminology, and data structures. 27 | 28 |
Be a place for concepts used by multiple standards without a good home. 29 | 30 |
Help write clear and readable algorithmic prose by clarifying otherwise ambiguous concepts. 31 |
Suggestions for more goals welcome.
34 | 35 | 36 |To make use of the Infra Standard in a document titled X, use 39 | X depends on the Infra Standard. Additionally, cross-referencing terminology 40 | is encouraged to avoid ambiguity. 41 | 42 |
Specification authors are also encouraged to add their specification to the 43 | list of dependent specifications in 44 | order to help the editors ensure that any future breaking changes to the Infra Standard are 45 | correctly reflected by any such dependencies. 46 | 47 | 48 |
All assertions, diagrams, examples, and notes are non-normative, as are all sections explicitly 53 | marked non-normative. Everything else is normative. 54 | 55 |
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", 56 | "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in 57 | RFC 2119. [[!RFC2119]] 58 | 59 |
These keywords have equivalent meaning when written in lowercase and cannot appear in 60 | non-normative content. 61 | 62 |
This is a willful violation of RFC 8174, motivated by legibility and a desire 63 | to preserve long-standing practice in many non-IETF-published pre-RFC 8174 documents. [[RFC8174]] 64 | 65 |
All of the above is applicable to both this standard and any document that uses this standard. 66 | Documents using this standard are encouraged to limit themselves to "must", "must not", "should", 67 | and "may", and to use these in their lowercase form as that is generally considered to be more 68 | readable. 69 | 70 |
For non-normative content "strongly encouraged", "strongly discouraged", "encouraged", 71 | "discouraged", "can", "cannot", "could", "could not", "might", and "might not" can be used instead. 72 | 73 | 74 |
In general, specifications interact with and rely on a wide variety of other specifications. In 77 | certain circumstances, unfortunately, conflicting needs require a specification to violate the 78 | requirements of other specifications. When this occurs, a document using the Infra Standard should 79 | denote such transgressions as a willful violation, and note the reason for that 80 | violation. 81 | 82 |
The previous section, [[#conformance]], documents a 83 | willful violation of RFC 8174 committed by the Infra Standard. 84 | 85 | 86 |
The word "or", in cases where both inclusive "or" and exclusive "or" are possible (e.g., "if 89 | either width or height is zero"), means an inclusive "or" (implying "or both"), unless it is called 90 | out as being exclusive (with "but not both"). 91 | 92 | 93 |
Algorithms, and requirements phrased in the imperative as part of algorithms (such as "strip any 96 | leading spaces" or "return false") are to be interpreted with the meaning of the keyword (e.g., 97 | "must") used in introducing the algorithm or step. If no such keyword is used, must is implied. 98 | 99 |
For example, were the spec to say:
101 | 102 |103 |111 | 112 |To eat an orange, the user must: 104 | 105 |
106 |
110 |- Peel the orange. 107 |
- Separate each slice of the orange. 108 |
- Eat the orange slices. 109 |
it would be equivalent to the following:
113 | 114 |115 |123 | 124 |To eat an orange: 116 | 117 |
118 |
122 |- The user must peel the orange. 119 |
- The user must separate each slice of the orange. 120 |
- The user must eat the orange slices. 121 |
Here the key word is "must".
125 | 126 |Modifying the above example, if the algorithm was introduced only with "To eat 127 | an orange:", it would still have the same meaning, as "must" is implied. 128 |
Conformance requirements phrased as algorithms or specific steps may be implemented in any 131 | manner, so long as the end result is equivalent. (In particular, the algorithms are intended to be 132 | easy to follow, and not intended to be performant.) 133 | 134 |
Performance is tricky to get correct as it is influenced by user perception, computer 135 | architectures, and different types of input that can change over time in how common they are. For 136 | instance, a JavaScript engine likely has many different code paths for what is standardized as a 137 | single algorithm, in order to optimize for speed or memory consumption. Standardizing all those code 138 | paths would be an insurmountable task and not productive as they would not stand the test of time 139 | as well as the single algorithm would. Therefore performance is best left as a field to compete 140 | over. 141 | 142 | 143 |
A variable is declared with "let" and changed with "set". 146 | 147 |
Let |list| be a new list.
148 | 149 |Let |value| be null. 152 | 153 |
If |input| is a string, then set |value| to |input|. 154 | 155 |
Otherwise, set |value| to |input|, UTF-8 decoded. 156 | 157 |
Let activationTarget be 162 | target, if isActivationEvent is true and target has activation 163 | behavior, and null otherwise. 164 | 165 |
Variables must not be used before they are declared. Variables are 166 | block scoped. 167 | Variables must not be declared more than once per algorithm. 168 | 169 | 170 |
The control flow of algorithms is such that a requirement to "return" or "throw" terminates the 173 | algorithm the statement was in. "Return" will hand the given value, if any, to its caller. "Throw" 174 | will make the caller automatically rethrow the given value, if any, and thereby terminate the 175 | caller's algorithm. Using prose the caller has the ability to "catch" the exception and perform 176 | another action. 177 | 178 | 179 |
Sometimes it is useful to stop performing a series of steps once a condition becomes true. 182 | 183 |
To do this, state that a given series of steps will abort when a specific 184 | condition is reached. This indicates that the specified steps must be evaluated, not 185 | as-written, but by additionally inserting a step before each of them that evaluates 186 | condition, and if condition evaluates to true, skips the remaining steps. 187 | 188 |
In such algorithms, the subsequent step can be annotated to run if aborted, in 189 | which case it must run if any of the preceding steps were skipped due to the condition 190 | of the preceding abort when step evaluated to true. 191 | 192 |
The following algorithm 194 | 195 |
Let |result| be an empty list. 197 | 198 |
Run these steps, but abort when the user clicks the "Cancel" button: 200 | 201 |
211 |If aborted, append "Didn't finish!" to |result|.
214 |
is equivalent to the more verbose formulation
217 | 218 |Let |result| be an empty list. 220 | 221 |
If the user has not clicked the "Cancel" button, then: 223 | 224 |
Compute the first million digits of π, and append the result 226 | to |result|. 227 | 228 |
If the user has not clicked the "Cancel" button, then: 230 | 231 |
238 |If the user clicked the "Cancel" button, then append
241 | "Didn't finish!" to |result|.
242 |
Whenever this construct is used, implementations are allowed to evaluate 246 | condition during the specified steps rather than before and after each step, as long as 247 | the end result is indistinguishable. For instance, as long as |result| in the above example is not 248 | mutated during a compute operation, the user agent could stop the computation. 249 | 250 | 251 |
There's a variety of ways to repeat a set of steps until a condition is reached. 254 | 255 |
The Infra Standard is not (yet) exhaustive on this; please file an issue if you need 256 | something. 257 | 258 |
As defined for lists (and derivatives) and 261 | maps. 262 | 263 |
An instruction to repeat a set of steps as long as a condition is met. 266 | 267 |
While |condition| is "met":
269 |
270 |
… 272 |
An iteration's flow can be controlled via requirements to 277 | continue or break. 278 | Continue will skip over any remaining steps in an iteration, proceeding to the 279 | next item. If no further items remain, the iteration will stop. Break will skip 280 | over any remaining steps in an iteration, and skip over any remaining items as well, stopping the 281 | iteration. 282 | 283 |
Let |example| be the list « 1, 2, 3, 4 ». The following prose would perform |operation| 285 | upon 1, then 2, then 3, then 4: 286 | 287 |
For each |item| of |example|: 290 |
The following prose would perform |operation| upon 1, then 2, then 4. 3 would be skipped. 297 | 298 |
For each |item| of |example|: 301 |
The following prose would perform |operation| upon 1, then 2. 3 and 4 would be skipped. 309 | 310 |
319 |To improve readability, it can sometimes help to add assertions to algorithms, stating 325 | invariants. To do this, write "Assert:", followed by a statement that must be 326 | true. If the statement ends up being false that indicates an issue with the document using the Infra 327 | Standard that should be reported and addressed. 328 | 329 |
Since the statement can only ever be true, it has no implications for implementations. 330 | 331 |
Let |x| be "Aperture Science".
334 |
Assert: |x| is "Aperture Science".
335 |
The value null is used to indicate the lack of a value. It can be used interchangeably with the 343 | JavaScript null value. [[!ECMA-262]] 344 | 345 |
Let element be null. 346 | 347 |
If input is the empty string, then return null. 348 | 349 | 350 |
A boolean is either true or false. 353 | 354 |
Let elementSeen be false. 355 | 356 | 357 |
A byte is a sequence of eight bits, represented as a double-digit hexadecimal 360 | number in the range 0x00 to 0xFF, inclusive. 361 | 362 |
An ASCII byte is a byte in the range 0x00 (NUL) to 0x7F (DEL), 363 | inclusive. As illustrated, an ASCII byte, excluding 0x28 and 0x29, may be followed by the 364 | representation outlined in the Standard Code 365 | section of ASCII format for Network Interchange, between parentheses. [[!RFC20]] 366 | 367 |
0x28 may be followed by "(left parenthesis)" and 0x29 by "(right parenthesis)". 368 | 369 |
0x49 (I) when UTF-8 decoded becomes the 370 | code point U+0049 (I). 371 | 372 | 373 |
A byte sequence is a sequence of bytes, represented as a space-separated 376 | sequence of bytes. Byte sequences with bytes in the range 0x20 (SP) to 0x7E (~), inclusive, can 377 | alternately be written as a string, but using backticks instead of quotation marks, to avoid 378 | confusion with an actual string. 379 | 380 |
0x48 0x49 can also be represented as `HI`.
382 |
383 |
Headers, such as `Content-Type`, are byte sequences.
384 |
To get a byte sequence out of a string, using UTF-8 encode from 387 | the Encoding Standard is encouraged. In rare circumstances isomorphic encode might be needed. 388 | [[ENCODING]] 389 | 390 |
A byte sequence's length is the number of 391 | bytes it contains. 392 | 393 |
To byte-lowercase a byte sequence, increase each byte it 394 | contains, in the range 0x41 (A) to 0x5A (Z), inclusive, by 0x20. 395 | 396 |
To byte-uppercase a byte sequence, subtract each byte it 397 | contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20. 398 | 399 |
A byte sequence A is a byte-case-insensitive match for a 400 | byte sequence B, if the byte-lowercase of A is the 401 | byte-lowercase of B. 402 | 403 |
To isomorphic decode a byte sequence input, return a 404 | string whose length is equal to input's 405 | length and whose code points have the same values as 406 | input's bytes, in the same order. 407 | 408 | 409 |
A code point is a Unicode code point and is 412 | represented as a four-to-six digit hexadecimal number, typically prefixed with "U+". 413 | 414 |
A code point may be followed by its name, by its rendered form between parentheses when it 415 | is not U+0028 or U+0029, or by both. Documents using the Infra Standard are encouraged to follow 416 | code points by their name when they cannot be rendered or are U+0028 or U+0029, and their 417 | rendered form between parentheses otherwise, for legibility. 418 | 419 |
A code point's name is defined in the Unicode Standard and represented in 420 | ASCII uppercase. [[!UNICODE]] 421 | 422 |
The code point rendered as 🤔 is represented as U+1F914. 424 | 425 |
When referring to that code point, we might say "U+1F914 (🤔)", to provide extra context. 426 | Documents are allowed to use "U+1F914 THINKING FACE (🤔)" as well, though this is somewhat verbose. 427 |
Code points that are difficult 430 | to render unambigiously, such as U+000A, can be referred to as "U+000A LF". U+0029 can be referred 431 | to as "U+0029 RIGHT PARENTHESIS", because even though it renders, this avoids unmatched parentheses. 432 | 433 |
Code points are sometimes referred to as characters and in certain contexts are 434 | prefixed with "0x" rather than "U+". 435 | 436 |
A surrogate is a code point that is in the range U+D800 to U+DFFF, 437 | inclusive. 438 | 439 |
A scalar value is a code point that is not a surrogate. 440 | 441 |
A noncharacter is a code point that is in the range U+FDD0 to U+FDEF, 442 | inclusive, or U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, 443 | U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, 444 | U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, 445 | U+FFFFF, U+10FFFE, or U+10FFFF. 446 | 447 |
An ASCII code point is a code point in the range U+0000 NULL to 448 | U+007F DELETE, inclusive. 449 | 450 |
An ASCII tab or newline is 451 | U+0009 TAB, U+000A LF, or U+000D CR. 452 | 453 |
ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 454 | SPACE. 455 | 456 |
"Whitespace" is a mass noun. 457 | 458 |
A C0 control is a code point in the range U+0000 NULL to 459 | U+001F INFORMATION SEPARATOR ONE, inclusive. 460 | 461 |
A C0 control or space is a 462 | C0 control or U+0020 SPACE. 463 | 464 |
A control is a C0 control or a code point in the range 465 | U+007F DELETE to U+009F APPLICATION PROGRAM COMMAND, inclusive. 466 | 467 |
An ASCII digit is a code point in the range U+0030 (0) to U+0039 (9), 468 | inclusive. 469 | 470 |
An ASCII upper hex digit is an ASCII digit or a code point in the 471 | range U+0041 (A) to U+0046 (F), inclusive. 472 | 473 |
An ASCII lower hex digit is an ASCII digit or a code point in the 474 | range U+0061 (a) to U+0066 (f), inclusive. 475 | 476 |
An ASCII hex digit is an ASCII upper hex digit or 477 | ASCII lower hex digit. 478 | 479 |
An ASCII upper alpha is a code point in the range U+0041 (A) to 480 | U+005A (Z), inclusive. 481 | 482 |
An ASCII lower alpha is a code point in the range U+0061 (a) to 483 | U+007A (z), inclusive. 484 | 485 |
An ASCII alpha is an ASCII upper alpha or ASCII lower alpha. 486 | 487 |
An ASCII alphanumeric is an ASCII digit or ASCII alpha. 488 | 489 | 490 |
A JavaScript string is a sequence of unsigned 16-bit integers, also known as 493 | code units. 494 | 495 |
This is different from how the Unicode Standard defines "code unit". In particular it 496 | refers exclusively to how the Unicode Standard defines it for Unicode 16-bit strings. [[UNICODE]] 497 | 498 |
A JavaScript string's length is the number of 499 | code units it contains. 500 | 501 |
A JavaScript string can also be interpreted as containing code points, per the 502 | conversion defined in The String Type section of the JavaScript specification. [[!ECMA-262]] 503 | 504 |
This conversion process converts surrogate pairs into their corresponding 505 | scalar value and maps isolated surrogates to their corresponding code point, leaving 506 | them effectively as-is. 507 | 508 |
A JavaScript string consisting 509 | of the code units 0xD83D, 0xDCA9, and 0xD800, when interpreted as containing 510 | code points, would consist of the code points U+1F4A9 and U+D800. 511 | 512 |
A scalar value string is a sequence of scalar values. 513 | 514 |
A scalar value string is useful for any kind of I/O or other kind of operation 515 | where UTF-8 encode comes into play. 516 | 517 | 518 |
String can be used to refer to either a JavaScript string or 519 | scalar value string, when it is clear from the context which is meant or when the distinction 520 | is immaterial. Strings are denoted by double quotes and monospace font. 521 | 522 |
"Hello, world!" is a string.
523 |
524 |
A string's length is the number of code points it 525 | contains. 526 | 527 |
To convert a JavaScript string into a 528 | scalar value string, replace any surrogates with U+FFFD. 529 | 530 | 531 |
The replaced surrogates are always isolated surrogates, since the process of 532 | interpreting the JavaScript string as containing code points will have converted surrogate 533 | pairs into scalar values. 534 | 535 |
A scalar value string can always be used as JavaScript string implicitly since it 536 | is a subset. The reverse is only possible if the JavaScript string is known to not contain 537 | surrogates; otherwise a conversion must be 538 | performed. 539 | 540 |
An implementation likely has to perform explicit conversion, depending on how it 541 | actually ends up representing JavaScript and 542 | scalar value strings. It is even fairly typical for implementations to have multiple 543 | implementations of just JavaScript strings for performance and memory reasons. 544 | 545 |
To isomorphic encode a string input, run these steps:
548 | 549 |Assert: input contains no code points greater than U+00FF. 551 | 552 |
Return a byte sequence whose length is equal to 553 | input's length and whose bytes have the same values as 554 | input's code points, in the same order. 555 |
An ASCII string is a string whose code points are all 560 | ASCII code points. 561 | 562 |
To ASCII lowercase a string, replace all ASCII upper alphas in 563 | the string with their corresponding code point in ASCII lower alpha. 564 | 565 |
To ASCII uppercase a string, replace all ASCII lower alphas in 566 | the string with their corresponding code point in ASCII upper alpha. 567 | 568 |
A string A is an ASCII case-insensitive match for a 569 | string B, if the ASCII lowercase of A is the 570 | ASCII lowercase of B. 571 | 572 | 573 |
To strip newlines from a string, remove any U+000A LF and U+000D CR 576 | code points from the string. 577 | 578 |
To strip leading and trailing ASCII whitespace from a string, remove all 579 | ASCII whitespace that are at the start or the end of the string. 580 | 581 |
To strip and collapse ASCII whitespace in a string, replace any sequence 582 | of one or more consecutive code points that are ASCII whitespace in the string 583 | with a single U+0020 SPACE code point, and then remove any leading and trailing 584 | ASCII whitespace from that string. 585 | 586 |
To collect a sequence of code points meeting a condition condition from 590 | a string input, given a position variable 591 | position tracking the position of the calling algorithm within input:
592 | 593 |Let result be the empty string. 595 | 596 |
While position doesn't point past the end of input and the 598 | code point at position within input meets the condition 599 | condition: 600 | 601 |
Append that code point to the end of result. 603 | 604 |
Advance position by 1. 605 |
Return result. 609 |
In addition to returning the collected code points, this algorithm updates the 612 | position variable in the calling algorithm. 613 | 614 |
To skip ASCII whitespace within a string input given a 615 | position variable position, collect a sequence of code points that are 616 | ASCII whitespace from input given position. The collected 617 | code points are not used, but position is still updated. 618 | 619 |
To strictly split a string 622 | input on a particular delimiter code point delimiter:
623 | 624 |Let position be a position variable for input, initially 626 | pointing at the start of input. 627 | 628 |
Let tokens be a list of strings, initially empty. 629 | 630 |
Let token be the result of collecting a sequence of code points that are 631 | not equal to delimiter from input, given position. 632 | 633 |
Append token to tokens. 634 | 635 |
While position is not past the end of input: 637 | 638 |
Advance position to the next code point in input. (This 640 | skips past the delimiter.) 641 | 642 |
Let token be the result of collecting a sequence of code points that are 643 | not equal to delimiter from input, given position. 644 | 645 |
Append token to tokens. 646 |
Return tokens. 650 |
This algorithm is a "strict" split, as opposed to the commonly-used variants 653 | for ASCII whitespace and 654 | for commas below, which are both more lenient in various ways involving 655 | interspersed ASCII whitespace. 656 | 657 |
To split a 658 | string input on ASCII whitespace: 659 | 660 |
Let position be a position variable for input, initially 662 | pointing at the start of input. 663 | 664 |
Let tokens be a list of strings, initially empty. 665 | 666 |
Skip ASCII whitespace within input given position. 667 | 668 |
While position is not past the end of input: 670 | 671 |
Let token be the result of collecting a sequence of code points that are 673 | not ASCII whitespace from input, given position. 674 | 675 |
Append token to tokens. 676 | 677 |
Skip ASCII whitespace within input given position. 678 |
Return tokens. 682 |
To split a string 685 | input on commas: 686 | 687 |
Let position be a position variable for input, initially 689 | pointing at the start of input. 690 | 691 |
Let tokens be a list of strings, initially empty. 692 | 693 |
While position is not past the end of input: 695 | 696 |
Let token be the result of collecting a sequence of code points that are 699 | not U+002C (,) from input, given position. 700 | 701 |
token might be the empty string. 702 |
Append token to tokens. 707 | 708 |
If position is not past the end of input, then: 710 | 711 |
Assert: the code point at position within input is 713 | U+002C (,). 714 | 715 |
Advance position by 1. 716 |
Return tokens. 722 |
To concatenate a list of 725 | strings list, using an optional separator string separator, run 726 | these steps: 727 | 728 |
If list is empty, then return the empty string. 730 | 731 |
If separator is not given, then set separator to the empty string. 732 | 733 |
Return a string whose contents are list's items, in 734 | order, separated from each other by separator. 735 |
To serialize a set set, return the 738 | concatenation of set using U+0020 SPACE. 739 | 740 | 741 |
Conventionally, specifications have operated on a variety of vague specification-level data 744 | structures, based on shared understanding of their semantics. This generally works well, but can 745 | lead to ambiguities around edge cases, such as iteration order or what happens when you 746 | append an item to an ordered set that the set already 747 | contains. It has also led to a variety of divergent notation and phrasing, especially 748 | around more complex data structures such as maps. 749 | 750 |
This standard provides a small set of common data structures, along with notation and phrasing 751 | for working with them, in order to create common ground. 752 | 753 | 754 |
A list is a specification type consisting of a finite ordered sequence of 757 | items. 758 | 759 |
For notational convenience, a literal syntax can be used to express lists, by surrounding 760 | the list by « » characters and separating its items with a comma. An indexing syntax 761 | can be used by providing a zero-based index into a list inside square brackets. The index cannot be 762 | out-of-bounds, except when used with exists. 763 | 764 |
Let |example| be the list « "a",
765 | "b", "c", "a" ». Then |example|[1] is the string
766 | "b".
767 |
768 |
To append to a list that is not an ordered set is to 771 | add the given item to the end of the list. 772 | 773 |
To prepend to a list that is not an ordered set is to 774 | add the given item to the beginning of the list. 775 | 776 |
To replace within a list that is not an ordered set is 777 | to replace all items from the list that match a given condition with the given item, 778 | or do nothing if none do. 779 | 780 |
The above definitions are modified when the list is an ordered set; see 781 | below for ordered set append, prepend, and 782 | replace. 783 | 784 |
To insert an item into a list before an 785 | index is to add the given item to the list between the given index − 1 and the given index. If 786 | the given index is 0, then prepend the given item to the list. 787 | 788 |
To remove zero or more items from a list is 789 | to remove all items from the list that match a given condition, or do nothing if none do. 790 | 791 |
Removing |x| from the list « |x|, |y|, |z|, |x| » is to remove all 793 | items from the list that are equal to |x|. The list now is equivalent to « |y|, |z| ». 794 | 795 |
Removing all items that start with the string "a" from the
796 | list « "a", "b", "ab", "ba" » is to
797 | remove the items "a" and "ab". The list is now equivalent to «
798 | "b", "ba" ».
799 |
To empty a list is to remove 802 | all of its items. 803 | 804 |
A list contains an 805 | item if it appears in the list. We can also denote this by saying that, for a 806 | list |list| and an index |index|, "|list|[|index|] exists". 807 | 808 |
A list's size is the number of 809 | items the list contains. 810 | 811 |
A list is empty if 812 | its size is zero. 813 | 814 |
To iterate over a list, performing a 815 | set of steps on each item in order, use phrasing of the form 816 | "For each |item| of list", and then operate on |item| in the 817 | subsequent prose. 818 | 819 |
To clone a list |list| is to create a new 820 | list |clone|, of the same designation, and, for each |item| of |list|, 821 | append |item| to |clone|, so that |clone| contains the same 822 | items, in the same order as |list|. 823 | 824 | Note: This is a "shallow clone", as the items themselves are not cloned in any way. 825 | 826 |
Let |original| be the ordered set «
827 | "a", "b", "c" ». Cloning |original| creates
828 | a new ordered set |clone|, so that replacing "a" with
829 | "foo" in |clone| gives « "foo", "b", "c" »,
830 | while |original|[0] is still the string "a".
831 |
832 |
The list type originates from the JavaScript specification (where it is capitalized, as 835 | List); we repeat some elements of its definition here for ease of reference, 836 | and provide an expanded vocabulary for manipulating lists. Whenever JavaScript expects a 837 | List, a list as defined here can be used; they are the same type. 838 | [[!ECMA-262]] 839 | 840 |
Some lists are designated as stacks. A stack is a list, 843 | but conventionally, the following operations are used to operate on it, instead of using 844 | append, prepend, or remove. 845 | 846 |
To push onto a stack is to append to it. 847 | 848 |
To pop from a stack is to remove its last 849 | item and return it, if the stack is not empty, or to return 850 | nothing otherwise. 851 | 852 |
Although stacks are lists, for each must not be used with them; 853 | instead, a combination of while and pop is more appropriate. 854 | 855 |
Some lists are designated as queues. A queue is a list, 858 | but conventionally, the following operations are used to operate on it, instead of using 859 | append, prepend, or remove. 860 | 861 |
To enqueue in a queue is to append to it. 862 | 863 |
To dequeue from a queue is to remove its first 864 | item and return it, if the queue is not empty, or to return 865 | nothing if it is. 866 | 867 |
Although queues are lists, for each must not be used with them; 868 | instead, a combination of while and dequeue is more appropriate. 869 | 870 |
Some lists are designated as ordered sets. An 873 | ordered set is a list with the additional semantic that it must not contain the same 874 | item twice. 875 | 876 |
Almost all cases on the web platform require an ordered set, instead of an 877 | unordered one, since interoperability requires that any developer-exposed enumeration of the set's 878 | contents be consistent between browsers. In those cases where order is not important, we still use 879 | ordered sets; implementations can optimize based on the fact that the order is not observable. 880 | 881 |
To append to an ordered set is to do nothing if the set already 882 | contains the given item, or to perform the normal list 883 | append operation otherwise. 884 | 885 |
To prepend to an ordered set is to do nothing if the set already 886 | contains the given item, or to perform the normal list 887 | prepend operation otherwise. 888 | 889 |
To replace within an ordered set 890 | set, given item and replacement: if set 891 | contains item or replacement, then replace the first instance 892 | of either with replacement and remove all other instances. 893 | 894 |
Replacing "a" with "c" within the 895 | ordered set « "a", "b", "c" » gives « "c", "b" ». Within « "c", "b", "a" » it gives 896 | « "c", "b" » as well. 897 | 898 |
An ordered set |set| is a subset of another ordered set 899 | |superset| (and conversely, |superset| is a superset of |set|) if, 900 | for each |item| of |set|, |superset| contains |item|. 901 | 902 |
This implies that an ordered set is both a subset and a 903 | superset of itself. 904 | 905 |
The intersection of ordered sets |A| and |B|, is the result 906 | of creating a new ordered set |set| and, for each |item| of |A|, if |B| 907 | contains |item|, appending |item| to |set|. 908 | 909 |
The union of ordered sets |A| and |B|, is the result of 910 | cloning |A| as |set| and, for each |item| of |B|, 911 | appending |item| to |set|. 912 | 913 |
The range n to m, inclusive, creates a new 916 | ordered set containing all of the integers from n up to and including m 917 | in consecutively increasing order, as long as m is greather than or equal to 918 | n. 919 | 920 |
For each n of the range 1 to 921 | 4, inclusive, … 922 | 923 | 924 |
An ordered map, or sometimes just "map", is a 927 | specification type consisting of a finite ordered sequence of 928 | key/value pairs, with no key appearing twice. 929 | Each key/value pair is called an entry. 930 | 931 |
As with ordered sets, by default we assume that maps need to be ordered for 932 | interoperability among implementations. 933 | 934 |
A literal syntax can be used to express ordered maps, by surrounding the ordered map with 935 | «[ ]» characters, denoting each of its entries as |key| → |value|, and separating its 936 | entries with a comma. An indexing syntax can be used to look up and set values by 937 | providing a key inside square brackets. 938 | 939 |
Let |example| be the ordered map «[
940 | "a" → `x`, "b" → `y` ]». Then
941 | |example|["a"] is the byte sequence `x`.
942 |
943 |
To get the value of an entry in an 946 | ordered map given a key is to retrieve the value of any 947 | existing entry if the map contains an entry with the given key, or 948 | to return nothing otherwise. We can also use the indexing syntax explained above. 949 | 950 |
To set the value of an entry in an 951 | ordered map to a given value is to update the value of any existing 952 | entry if the map contains an entry with the given key, 953 | or if none such exists, to add a new entry with the given key/value to the end of the map. We can 954 | also denote this by saying, for an ordered map |map|, key |key|, and value |value|, 955 | "set |map|[|key|] to |value|". 956 | 957 |
To remove an entry from an ordered map is to remove 958 | all entries from the map that match a given condition, or do nothing if none do. If 959 | the condition is having a certain key, then we can also denote this by saying, for 960 | an ordered map |map| and key |key|, "remove |map|[|key|]". 961 | 962 |
An ordered map contains an 963 | entry with a given key if there exists an entry with that key. 964 | We can also denote this by saying that, for an ordered map |map| and key |key|, "|map|[|key|] 965 | exists". 966 | 967 |
To get the keys of an 968 | ordered map, return a new ordered set whose items are each of the 969 | keys in the map's entries. 970 | 971 |
To get the values of an 972 | ordered map, return a new list whose items are each of the 973 | values in the map's entries. 974 | 975 |
An ordered map's size is the size of the result 976 | of running get the keys on the map. 977 | 978 |
An ordered map is empty if its 979 | size is zero. 980 | 981 |
To iterate over an ordered map, performing 982 | a set of steps on each entry in order, use phrasing of the form 983 | "For each |key| → |value| of |map|", and then operate on |key| and |value| in the 984 | subsequent prose. 985 | 986 | 987 |
A struct is a specification type consisting of a finite set of 990 | items, each of which has a unique and immutable 991 | name. 992 | 993 |
Structs with a defined order are also known as tuples. For 996 | notational convenience, a literal syntax can be used to express tuples, by surrounding the 997 | tuple with parenthesis and separating its items with a comma. To use this notation, 998 | the names need to be clear from context. This can be done by preceding the first 999 | instance with the name given to the tuple. 1000 | 1001 |
A status is an example tuple consisting of a code (a three-digit number) and text (a byte 1005 | sequence). 1006 | 1007 |
A nonsense algorithm that manipulates status tuples for the purpose of demonstrating their 1008 | usage is then:
1009 | 1010 |OK`).
1012 | FOO BAR`).
1013 | It is intentional that not all structs are tuples. Documents using the 1018 | Infra Standard might need the flexibility to add new names to their struct 1019 | without breaking literal syntax used by their dependencies. In that case a tuple is not appropriate. 1020 | 1021 |
Tuples with two items are also known as pairs. 1024 | For pairs, a slightly shorter literal syntax can be used, separating the two 1025 | items with a / character. 1026 | 1027 |
Another way of expressing our |statusInstance| tuple above would be
1028 | as 200/`OK`.
1029 |
1030 |
1031 |
To parse JSON from bytes given bytes, run these steps: 1034 | 1035 |
Let jsonText be the result of running UTF-8 decode on bytes. 1037 | [[!ENCODING]] 1038 | 1039 |
Return ? Call(%JSONParse%, undefined, « jsonText »). 1041 | [[!ECMA-262]] 1042 | 1043 |
The conventions used in this step are those of the JavaScript specification. 1044 |
To forgiving-base64 encode given a byte sequence data, apply 1050 | the base64 algorithm defined in section 4 of RFC 4648 to data and return the result. 1051 | [[!RFC4648]] 1052 | 1053 |
This is named forgiving-base64 encode for symmetry with 1054 | forgiving-base64 decode, which is different from the RFC as it defines error handling for 1055 | certain inputs. 1056 | 1057 |
To forgiving-base64 decode given a string data, run these steps:
1058 | 1059 |Remove all ASCII whitespace from data. 1061 | 1062 | 1063 |
If data's length divides by 4 leaving no remainder, then: 1065 | 1066 |
If data ends with one or two U+003D (=) code points, then remove them 1068 | from data. 1069 |
If data's length divides by 4 leaving a remainder of 1, then 1072 | return failure. 1073 | 1074 |
If data contains a code point that is not one of 1076 | 1077 |
then return failure. 1084 | 1085 |
Let output be an empty byte sequence. 1086 | 1087 |
Let buffer be an empty buffer that can have bits appended to it. 1088 | 1089 |
Let position be a position variable for data, initially 1090 | pointing at the start of data. 1091 | 1092 |
While position does not point past the end of data: 1094 | 1095 |
Find the code point pointed to by position in the second column of 1097 | Table 1: The Base 64 Alphabet of RFC 4648. Let n be the number given in the first cell 1098 | of the same row. [[!RFC4648]] 1099 | 1100 |
Append the six bits corresponding to n, most significant bit first, to 1101 | buffer. 1102 | 1103 |
If buffer has accumulated 24 bits, interpret them as three 8-bit big-endian 1104 | numbers. Append three bytes with values equal to those numbers to output, in the same 1105 | order, and then empty buffer. 1106 | 1107 |
Advance position by 1. 1108 |
If buffer is not empty, it contains either 12 or 18 bits. If it contains 12 bits, 1112 | then discard the last four and interpret the remaining eight as an 8-bit big-endian number. If it 1113 | contains 18 bits, then discard the last two and interpret the remaining 16 as two 8-bit big-endian 1114 | numbers. Append the one or two bytes with values equal to those one or two numbers to 1115 | output, in the same order.
1116 | 1117 |The discarded bits mean that, for instance, "YQ" and
1118 | "YR" both return `a`.
1119 |
1120 |
Return output. 1121 |
The HTML namespace is "http://www.w3.org/1999/xhtml".
1127 |
1128 |
The MathML namespace is "http://www.w3.org/1998/Math/MathML".
1129 |
1130 |
The SVG namespace is "http://www.w3.org/2000/svg".
1131 |
1132 |
The XLink namespace is "http://www.w3.org/1999/xlink".
1133 |
1134 |
The XML namespace is "http://www.w3.org/XML/1998/namespace".
1135 |
1136 |
The XMLNS namespace is "http://www.w3.org/2000/xmlns/".
1137 |
1138 |
1139 |
Many thanks to 1142 | Addison Phillips, 1143 | Aryeh Gregor, 1144 | Chris Rebert, 1145 | Daniel Ehrenberg, 1146 | Dominic Farolino, 1147 | Jake Archibald, 1148 | Jungkee Song, 1149 | Leonid Vasilyev, 1150 | Malika Aubakirova, 1151 | Michael™ Smith, 1152 | Mike West, 1153 | Ms2ger, 1154 | Philip Jägenstedt, 1155 | Rashaun "Snuggs" Stovall, 1156 | Sergey Shekyan, 1157 | Simon Pieters, 1158 | Tab Atkins, 1159 | Tobie Langel, 1160 | triple-underscore, 1161 | and Xue Fuqiao 1162 | for being awesome! 1163 | 1164 |
This standard is written by Anne van Kesteren 1165 | (Mozilla, 1166 | annevk@annevk.nl) and 1167 | Domenic Denicola (Google, 1168 | d@domenic.me). 1169 | -------------------------------------------------------------------------------- /review-drafts/2019-01.bs: -------------------------------------------------------------------------------- 1 |
2 | Group: WHATWG 3 | Date: 2019-01-23 4 | H1: Infra 5 | Shortname: infra 6 | Text Macro: TWITTER infrastandard 7 | Abstract: The Infra Standard aims to define the fundamental concepts upon which standards are built. 8 | Translation: ja https://triple-underscore.github.io/infra-ja.html 9 |10 | 11 |
12 | urlPrefix: https://tc39.github.io/ecma262/; spec: ECMA-262; 13 | type: dfn 14 | text: %JSONParse%; url: sec-json.parse 15 | text: %JSONStringify%; url: #sec-json.stringify 16 | text: List; url: sec-list-and-record-specification-type 17 | text: The String Type; url: sec-ecmascript-language-types-string-type 18 | type: abstract-op; text: Call; url: sec-call 19 |20 | 21 | 22 |
Deduplicate boilerplate in standards. 26 | 27 |
Align standards on conventions, terminology, and data structures. 28 | 29 |
Be a place for concepts used by multiple standards without a good home. 30 | 31 |
Help write clear and readable algorithmic prose by clarifying otherwise ambiguous concepts. 32 |
Suggestions for more goals welcome.
35 | 36 | 37 |To make use of the Infra Standard in a document titled X, use 40 | X depends on the Infra Standard. Additionally, cross-referencing terminology 41 | is encouraged to avoid ambiguity. 42 | 43 |
Specification authors are also encouraged to add their specification to the 44 | list of dependent specifications in 45 | order to help the editors ensure that any future breaking changes to the Infra Standard are 46 | correctly reflected by any such dependencies. 47 | 48 | 49 |
All assertions, diagrams, examples, and notes are non-normative, as are all sections explicitly 54 | marked non-normative. Everything else is normative. 55 | 56 |
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", 57 | "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in 58 | RFC 2119. [[!RFC2119]] 59 | 60 |
These keywords have equivalent meaning when written in lowercase and cannot appear in 61 | non-normative content. 62 | 63 |
This is a willful violation of RFC 8174, motivated by legibility and a desire 64 | to preserve long-standing practice in many non-IETF-published pre-RFC 8174 documents. [[RFC8174]] 65 | 66 |
All of the above is applicable to both this standard and any document that uses this standard. 67 | Documents using this standard are encouraged to limit themselves to "must", "must not", "should", 68 | and "may", and to use these in their lowercase form as that is generally considered to be more 69 | readable. 70 | 71 |
For non-normative content "strongly encouraged", "strongly discouraged", "encouraged", 72 | "discouraged", "can", "cannot", "could", "could not", "might", and "might not" can be used instead. 73 | 74 | 75 |
In general, specifications interact with and rely on a wide variety of other specifications. In 78 | certain circumstances, unfortunately, conflicting needs require a specification to violate the 79 | requirements of other specifications. When this occurs, a document using the Infra Standard should 80 | denote such transgressions as a willful violation, and note the reason for that 81 | violation. 82 | 83 |
The previous section, [[#conformance]], documents a 84 | willful violation of RFC 8174 committed by the Infra Standard. 85 | 86 | 87 |
The word "or", in cases where both inclusive "or" and exclusive "or" are possible (e.g., "if 90 | either width or height is zero"), means an inclusive "or" (implying "or both"), unless it is called 91 | out as being exclusive (with "but not both"). 92 | 93 | 94 |
Algorithms, and requirements phrased in the imperative as part of algorithms (such as "strip any 97 | leading spaces" or "return false") are to be interpreted with the meaning of the keyword (e.g., 98 | "must") used in introducing the algorithm or step. If no such keyword is used, must is implied. 99 | 100 |
For example, were the spec to say:
102 | 103 |104 |112 | 113 |To eat an orange, the user must: 105 | 106 |
107 |
111 |- Peel the orange. 108 |
- Separate each slice of the orange. 109 |
- Eat the orange slices. 110 |
it would be equivalent to the following:
114 | 115 |116 |124 | 125 |To eat an orange: 117 | 118 |
119 |
123 |- The user must peel the orange. 120 |
- The user must separate each slice of the orange. 121 |
- The user must eat the orange slices. 122 |
Here the key word is "must".
126 | 127 |Modifying the above example, if the algorithm was introduced only with "To eat 128 | an orange:", it would still have the same meaning, as "must" is implied. 129 |
Conformance requirements phrased as algorithms or specific steps may be implemented in any 132 | manner, so long as the end result is equivalent. (In particular, the algorithms are intended to be 133 | easy to follow, and not intended to be performant.) 134 | 135 |
Performance is tricky to get correct as it is influenced by user perception, computer 136 | architectures, and different types of input that can change over time in how common they are. For 137 | instance, a JavaScript engine likely has many different code paths for what is standardized as a 138 | single algorithm, in order to optimize for speed or memory consumption. Standardizing all those code 139 | paths would be an insurmountable task and not productive as they would not stand the test of time 140 | as well as the single algorithm would. Therefore performance is best left as a field to compete 141 | over. 142 | 143 | 144 |
A variable is declared with "let" and changed with "set". 147 | 148 |
Let |list| be a new list.
149 | 150 |Let |value| be null. 153 | 154 |
If |input| is a string, then set |value| to |input|. 155 | 156 |
Otherwise, set |value| to |input|, UTF-8 decoded. 157 | 158 |
Let activationTarget be 163 | target, if isActivationEvent is true and target has activation 164 | behavior, and null otherwise. 165 | 166 |
Variables must not be used before they are declared. Variables are 167 | block scoped. 168 | Variables must not be declared more than once per algorithm. 169 | 170 | 171 |
The control flow of algorithms is such that a requirement to "return" or "throw" terminates the 174 | algorithm the statement was in. "Return" will hand the given value, if any, to its caller. "Throw" 175 | will make the caller automatically rethrow the given value, if any, and thereby terminate the 176 | caller's algorithm. Using prose the caller has the ability to "catch" the exception and perform 177 | another action. 178 | 179 | 180 |
Sometimes it is useful to stop performing a series of steps once a condition becomes true. 183 | 184 |
To do this, state that a given series of steps will abort when a specific 185 | condition is reached. This indicates that the specified steps must be evaluated, not 186 | as-written, but by additionally inserting a step before each of them that evaluates 187 | condition, and if condition evaluates to true, skips the remaining steps. 188 | 189 |
In such algorithms, the subsequent step can be annotated to run if aborted, in 190 | which case it must run if any of the preceding steps were skipped due to the condition 191 | of the preceding abort when step evaluated to true. 192 | 193 |
The following algorithm 195 | 196 |
Let |result| be an empty list. 198 | 199 |
Run these steps, but abort when the user clicks the "Cancel" button: 201 | 202 |
212 |If aborted, append "Didn't finish!" to |result|.
215 |
is equivalent to the more verbose formulation
218 | 219 |Let |result| be an empty list. 221 | 222 |
If the user has not clicked the "Cancel" button, then: 224 | 225 |
Compute the first million digits of π, and append the result 227 | to |result|. 228 | 229 |
If the user has not clicked the "Cancel" button, then: 231 | 232 |
239 |If the user clicked the "Cancel" button, then append
242 | "Didn't finish!" to |result|.
243 |
Whenever this construct is used, implementations are allowed to evaluate 247 | condition during the specified steps rather than before and after each step, as long as 248 | the end result is indistinguishable. For instance, as long as |result| in the above example is not 249 | mutated during a compute operation, the user agent could stop the computation. 250 | 251 | 252 |
There's a variety of ways to repeat a set of steps until a condition is reached. 255 | 256 |
The Infra Standard is not (yet) exhaustive on this; please file an issue if you need 257 | something. 258 | 259 |
As defined for lists (and derivatives) and 262 | maps. 263 | 264 |
An instruction to repeat a set of steps as long as a condition is met. 267 | 268 |
While |condition| is "met":
270 |
271 |
… 273 |
An iteration's flow can be controlled via requirements to 278 | continue or break. 279 | Continue will skip over any remaining steps in an iteration, proceeding to the 280 | next item. If no further items remain, the iteration will stop. Break will skip 281 | over any remaining steps in an iteration, and skip over any remaining items as well, stopping the 282 | iteration. 283 | 284 |
Let |example| be the list « 1, 2, 3, 4 ». The following prose would perform |operation| 286 | upon 1, then 2, then 3, then 4: 287 | 288 |
For each |item| of |example|: 291 |
The following prose would perform |operation| upon 1, then 2, then 4. 3 would be skipped. 298 | 299 |
For each |item| of |example|: 302 |
The following prose would perform |operation| upon 1, then 2. 3 and 4 would be skipped. 310 | 311 |
320 |To improve readability, it can sometimes help to add assertions to algorithms, stating 326 | invariants. To do this, write "Assert:", followed by a statement that must be 327 | true. If the statement ends up being false that indicates an issue with the document using the Infra 328 | Standard that should be reported and addressed. 329 | 330 |
Since the statement can only ever be true, it has no implications for implementations. 331 | 332 |
Let |x| be "Aperture Science".
335 |
Assert: |x| is "Aperture Science".
336 |
The value null is used to indicate the lack of a value. It can be used interchangeably with the 344 | JavaScript null value. [[!ECMA-262]] 345 | 346 |
Let element be null. 347 | 348 |
If input is the empty string, then return null. 349 | 350 | 351 |
A boolean is either true or false. 354 | 355 |
Let elementSeen be false. 356 | 357 | 358 |
A byte is a sequence of eight bits, represented as a double-digit hexadecimal 361 | number in the range 0x00 to 0xFF, inclusive. 362 | 363 |
An ASCII byte is a byte in the range 0x00 (NUL) to 0x7F (DEL), 364 | inclusive. As illustrated, an ASCII byte, excluding 0x28 and 0x29, may be followed by the 365 | representation outlined in the Standard Code 366 | section of ASCII format for Network Interchange, between parentheses. [[!RFC20]] 367 | 368 |
0x28 may be followed by "(left parenthesis)" and 0x29 by "(right parenthesis)". 369 | 370 |
0x49 (I) when UTF-8 decoded becomes the 371 | code point U+0049 (I). 372 | 373 | 374 |
A byte sequence is a sequence of bytes, represented as a space-separated 377 | sequence of bytes. Byte sequences with bytes in the range 0x20 (SP) to 0x7E (~), inclusive, can 378 | alternately be written as a string, but using backticks instead of quotation marks, to avoid 379 | confusion with an actual string. 380 | 381 |
0x48 0x49 can also be represented as `HI`.
383 |
384 |
Headers, such as `Content-Type`, are byte sequences.
385 |
To get a byte sequence out of a string, using UTF-8 encode from 388 | the Encoding Standard is encouraged. In rare circumstances isomorphic encode might be needed. 389 | [[ENCODING]] 390 | 391 |
A byte sequence's length is the number of 392 | bytes it contains. 393 | 394 |
To byte-lowercase a byte sequence, increase each byte it 395 | contains, in the range 0x41 (A) to 0x5A (Z), inclusive, by 0x20. 396 | 397 |
To byte-uppercase a byte sequence, subtract each byte it 398 | contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20. 399 | 400 |
A byte sequence A is a byte-case-insensitive match for a 401 | byte sequence B, if the byte-lowercase of A is the 402 | byte-lowercase of B. 403 | 404 |
To isomorphic decode a byte sequence input, return a 405 | string whose length is equal to input's 406 | length and whose code points have the same values as 407 | input's bytes, in the same order. 408 | 409 | 410 |
A code point is a Unicode code point and is 413 | represented as a four-to-six digit hexadecimal number, typically prefixed with "U+". 414 | 415 |
A code point may be followed by its name, by its rendered form between parentheses when it 416 | is not U+0028 or U+0029, or by both. Documents using the Infra Standard are encouraged to follow 417 | code points by their name when they cannot be rendered or are U+0028 or U+0029, and their 418 | rendered form between parentheses otherwise, for legibility. 419 | 420 |
A code point's name is defined in the Unicode Standard and represented in 421 | ASCII uppercase. [[!UNICODE]] 422 | 423 |
The code point rendered as 🤔 is represented as U+1F914. 425 | 426 |
When referring to that code point, we might say "U+1F914 (🤔)", to provide extra context. 427 | Documents are allowed to use "U+1F914 THINKING FACE (🤔)" as well, though this is somewhat verbose. 428 |
Code points that are difficult 431 | to render unambigiously, such as U+000A, can be referred to as "U+000A LF". U+0029 can be referred 432 | to as "U+0029 RIGHT PARENTHESIS", because even though it renders, this avoids unmatched parentheses. 433 | 434 |
Code points are sometimes referred to as characters and in certain contexts are 435 | prefixed with "0x" rather than "U+". 436 | 437 |
A surrogate is a code point that is in the range U+D800 to U+DFFF, 438 | inclusive. 439 | 440 |
A scalar value is a code point that is not a surrogate. 441 | 442 |
A noncharacter is a code point that is in the range U+FDD0 to U+FDEF, 443 | inclusive, or U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, 444 | U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, 445 | U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, 446 | U+FFFFF, U+10FFFE, or U+10FFFF. 447 | 448 |
An ASCII code point is a code point in the range U+0000 NULL to 449 | U+007F DELETE, inclusive. 450 | 451 |
An ASCII tab or newline is 452 | U+0009 TAB, U+000A LF, or U+000D CR. 453 | 454 |
ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 455 | SPACE. 456 | 457 |
"Whitespace" is a mass noun. 458 | 459 |
A C0 control is a code point in the range U+0000 NULL to 460 | U+001F INFORMATION SEPARATOR ONE, inclusive. 461 | 462 |
A C0 control or space is a 463 | C0 control or U+0020 SPACE. 464 | 465 |
A control is a C0 control or a code point in the range 466 | U+007F DELETE to U+009F APPLICATION PROGRAM COMMAND, inclusive. 467 | 468 |
An ASCII digit is a code point in the range U+0030 (0) to U+0039 (9), 469 | inclusive. 470 | 471 |
An ASCII upper hex digit is an ASCII digit or a code point in the 472 | range U+0041 (A) to U+0046 (F), inclusive. 473 | 474 |
An ASCII lower hex digit is an ASCII digit or a code point in the 475 | range U+0061 (a) to U+0066 (f), inclusive. 476 | 477 |
An ASCII hex digit is an ASCII upper hex digit or 478 | ASCII lower hex digit. 479 | 480 |
An ASCII upper alpha is a code point in the range U+0041 (A) to 481 | U+005A (Z), inclusive. 482 | 483 |
An ASCII lower alpha is a code point in the range U+0061 (a) to 484 | U+007A (z), inclusive. 485 | 486 |
An ASCII alpha is an ASCII upper alpha or ASCII lower alpha. 487 | 488 |
An ASCII alphanumeric is an ASCII digit or ASCII alpha. 489 | 490 | 491 |
A JavaScript string is a sequence of unsigned 16-bit integers, also known as 494 | code units. 495 | 496 |
This is different from how the Unicode Standard defines "code unit". In particular it 497 | refers exclusively to how the Unicode Standard defines it for Unicode 16-bit strings. [[UNICODE]] 498 | 499 |
A JavaScript string's length is the number of 500 | code units it contains. 501 | 502 |
A JavaScript string can also be interpreted as containing code points, per the 503 | conversion defined in The String Type section of the JavaScript specification. [[!ECMA-262]] 504 | 505 |
This conversion process converts surrogate pairs into their corresponding 506 | scalar value and maps isolated surrogates to their corresponding code point, leaving 507 | them effectively as-is. 508 | 509 |
A JavaScript string consisting 510 | of the code units 0xD83D, 0xDCA9, and 0xD800, when interpreted as containing 511 | code points, would consist of the code points U+1F4A9 and U+D800. 512 | 513 |
A scalar value string is a sequence of scalar values. 514 | 515 |
A scalar value string is useful for any kind of I/O or other kind of operation 516 | where UTF-8 encode comes into play. 517 | 518 | 519 |
String can be used to refer to either a JavaScript string or 520 | scalar value string, when it is clear from the context which is meant or when the distinction 521 | is immaterial. Strings are denoted by double quotes and monospace font. 522 | 523 |
"Hello, world!" is a string.
524 |
525 |
A string's length is the number of code points it 526 | contains. 527 | 528 |
To convert a JavaScript string into a 529 | scalar value string, replace any surrogates with U+FFFD. 530 | 531 | 532 |
The replaced surrogates are always isolated surrogates, since the process of 533 | interpreting the JavaScript string as containing code points will have converted surrogate 534 | pairs into scalar values. 535 | 536 |
A scalar value string can always be used as JavaScript string implicitly since it 537 | is a subset. The reverse is only possible if the JavaScript string is known to not contain 538 | surrogates; otherwise a conversion must be 539 | performed. 540 | 541 |
An implementation likely has to perform explicit conversion, depending on how it 542 | actually ends up representing JavaScript and 543 | scalar value strings. It is even fairly typical for implementations to have multiple 544 | implementations of just JavaScript strings for performance and memory reasons. 545 | 546 |
To isomorphic encode a string input, run these steps:
549 | 550 |Assert: input contains no code points greater than U+00FF. 552 | 553 |
Return a byte sequence whose length is equal to 554 | input's length and whose bytes have the same values as 555 | input's code points, in the same order. 556 |
An ASCII string is a string whose code points are all 561 | ASCII code points. 562 | 563 |
To ASCII lowercase a string, replace all ASCII upper alphas in 564 | the string with their corresponding code point in ASCII lower alpha. 565 | 566 |
To ASCII uppercase a string, replace all ASCII lower alphas in 567 | the string with their corresponding code point in ASCII upper alpha. 568 | 569 |
A string A is an ASCII case-insensitive match for a 570 | string B, if the ASCII lowercase of A is the 571 | ASCII lowercase of B. 572 | 573 | 574 |
To ASCII encode a string input, run these steps: 575 | 576 |
Assert: input is an ASCII string. 578 | 579 |
Note: This precondition ensures that isomorphic encode and 580 | UTF-8 encode return the same byte sequence for this input. 581 | 582 |
Return the isomorphic encoding of input. 583 |
To ASCII decode a byte sequence input, run these steps: 586 | 587 |
Assert: All bytes in input are ASCII bytes. 589 | 590 |
Note: This precondition ensures that isomorphic decode and 591 | UTF-8 decode return the same string for this input. 592 | 593 |
Return the isomorphic decoding of input. 594 |
To strip newlines from a string, remove any U+000A LF and U+000D CR 600 | code points from the string. 601 | 602 |
To normalize newlines in a string, replace every U+000D CR U+000A LF 603 | code point pair with a single U+000A LF code point, and then replace every remaining 604 | U+000D CR code point with a U+000A LF code point. 605 | 606 |
To strip leading and trailing ASCII whitespace from a string, remove all 607 | ASCII whitespace that are at the start or the end of the string. 608 | 609 |
To strip and collapse ASCII whitespace in a string, replace any sequence 610 | of one or more consecutive code points that are ASCII whitespace in the string 611 | with a single U+0020 SPACE code point, and then remove any leading and trailing 612 | ASCII whitespace from that string. 613 | 614 |
To collect a sequence of code points meeting a condition condition from 618 | a string input, given a position variable 619 | position tracking the position of the calling algorithm within input:
620 | 621 |Let result be the empty string. 623 | 624 |
While position doesn't point past the end of input and the 626 | code point at position within input meets the condition 627 | condition: 628 | 629 |
Append that code point to the end of result. 631 | 632 |
Advance position by 1. 633 |
Return result. 637 |
In addition to returning the collected code points, this algorithm updates the 640 | position variable in the calling algorithm. 641 | 642 |
To skip ASCII whitespace within a string input given a 643 | position variable position, collect a sequence of code points that are 644 | ASCII whitespace from input given position. The collected 645 | code points are not used, but position is still updated. 646 | 647 |
To strictly split a string 650 | input on a particular delimiter code point delimiter:
651 | 652 |Let position be a position variable for input, initially 654 | pointing at the start of input. 655 | 656 |
Let tokens be a list of strings, initially empty. 657 | 658 |
Let token be the result of collecting a sequence of code points that are 659 | not equal to delimiter from input, given position. 660 | 661 |
Append token to tokens. 662 | 663 |
While position is not past the end of input: 665 | 666 |
Assert: the code point at position within input is 668 | delimiter. 669 | 670 |
Advance position by 1. 671 | 672 |
Let token be the result of collecting a sequence of code points that are 673 | not equal to delimiter from input, given position. 674 | 675 |
Append token to tokens. 676 |
Return tokens. 680 |
This algorithm is a "strict" split, as opposed to the commonly-used variants 683 | for ASCII whitespace and 684 | for commas below, which are both more lenient in various ways involving 685 | interspersed ASCII whitespace. 686 | 687 |
To split a 688 | string input on ASCII whitespace: 689 | 690 |
Let position be a position variable for input, initially 692 | pointing at the start of input. 693 | 694 |
Let tokens be a list of strings, initially empty. 695 | 696 |
Skip ASCII whitespace within input given position. 697 | 698 |
While position is not past the end of input: 700 | 701 |
Let token be the result of collecting a sequence of code points that are 703 | not ASCII whitespace from input, given position. 704 | 705 |
Append token to tokens. 706 | 707 |
Skip ASCII whitespace within input given position. 708 |
Return tokens. 712 |
To split a string 715 | input on commas: 716 | 717 |
Let position be a position variable for input, initially 719 | pointing at the start of input. 720 | 721 |
Let tokens be a list of strings, initially empty. 722 | 723 |
While position is not past the end of input: 725 | 726 |
Let token be the result of collecting a sequence of code points that are 729 | not U+002C (,) from input, given position. 730 | 731 |
token might be the empty string. 732 |
Append token to tokens. 737 | 738 |
If position is not past the end of input, then: 740 | 741 |
Assert: the code point at position within input is 743 | U+002C (,). 744 | 745 |
Advance position by 1. 746 |
Return tokens. 752 |
To concatenate a list of 755 | strings list, using an optional separator string separator, run 756 | these steps: 757 | 758 |
If list is empty, then return the empty string. 760 | 761 |
If separator is not given, then set separator to the empty string. 762 | 763 |
Return a string whose contents are list's items, in 764 | order, separated from each other by separator. 765 |
To serialize a set set, return the 768 | concatenation of set using U+0020 SPACE. 769 | 770 | 771 |
Conventionally, specifications have operated on a variety of vague specification-level data 774 | structures, based on shared understanding of their semantics. This generally works well, but can 775 | lead to ambiguities around edge cases, such as iteration order or what happens when you 776 | append an item to an ordered set that the set already 777 | contains. It has also led to a variety of divergent notation and phrasing, especially 778 | around more complex data structures such as maps. 779 | 780 |
This standard provides a small set of common data structures, along with notation and phrasing 781 | for working with them, in order to create common ground. 782 | 783 | 784 |
A list is a specification type consisting of a finite ordered sequence of 787 | items. 788 | 789 |
For notational convenience, a literal syntax can be used to express lists, by surrounding 790 | the list by « » characters and separating its items with a comma. An indexing syntax 791 | can be used by providing a zero-based index into a list inside square brackets. The index cannot be 792 | out-of-bounds, except when used with exists. 793 | 794 |
Let |example| be the list « "a",
795 | "b", "c", "a" ». Then |example|[1] is the string
796 | "b".
797 |
798 |
To append to a list that is not an ordered set is to 801 | add the given item to the end of the list. 802 | 803 |
To prepend to a list that is not an ordered set is to 804 | add the given item to the beginning of the list. 805 | 806 |
To replace within a list that is not an ordered set is 807 | to replace all items from the list that match a given condition with the given item, 808 | or do nothing if none do. 809 | 810 |
The above definitions are modified when the list is an ordered set; see 811 | below for ordered set append, prepend, and 812 | replace. 813 | 814 |
To insert an item into a list before an 815 | index is to add the given item to the list between the given index − 1 and the given index. If 816 | the given index is 0, then prepend the given item to the list. 817 | 818 |
To remove zero or more items from a list is 819 | to remove all items from the list that match a given condition, or do nothing if none do. 820 | 821 |
Removing |x| from the list « |x|, |y|, |z|, |x| » is to remove all 823 | items from the list that are equal to |x|. The list now is equivalent to « |y|, |z| ». 824 | 825 |
Removing all items that start with the string "a" from the
826 | list « "a", "b", "ab", "ba" » is to
827 | remove the items "a" and "ab". The list is now equivalent to «
828 | "b", "ba" ».
829 |
To empty a list is to remove 832 | all of its items. 833 | 834 |
A list contains an 835 | item if it appears in the list. We can also denote this by saying that, for a 836 | list |list| and an index |index|, "|list|[|index|] exists". 837 | 838 |
A list's size is the number of 839 | items the list contains. 840 | 841 |
A list is empty if 842 | its size is zero. 843 | 844 |
To iterate over a list, performing a 845 | set of steps on each item in order, use phrasing of the form 846 | "For each |item| of list", and then operate on |item| in the 847 | subsequent prose. 848 | 849 |
To clone a list |list| is to create a new 850 | list |clone|, of the same designation, and, for each |item| of |list|, 851 | append |item| to |clone|, so that |clone| contains the same 852 | items, in the same order as |list|. 853 | 854 | Note: This is a "shallow clone", as the items themselves are not cloned in any way. 855 | 856 |
Let |original| be the ordered set «
857 | "a", "b", "c" ». Cloning |original| creates
858 | a new ordered set |clone|, so that replacing "a" with
859 | "foo" in |clone| gives « "foo", "b", "c" »,
860 | while |original|[0] is still the string "a".
861 |
862 |
The list type originates from the JavaScript specification (where it is capitalized, as 865 | List); we repeat some elements of its definition here for ease of reference, 866 | and provide an expanded vocabulary for manipulating lists. Whenever JavaScript expects a 867 | List, a list as defined here can be used; they are the same type. 868 | [[!ECMA-262]] 869 | 870 |
Some lists are designated as stacks. A stack is a list, 873 | but conventionally, the following operations are used to operate on it, instead of using 874 | append, prepend, or remove. 875 | 876 |
To push onto a stack is to append to it. 877 | 878 |
To pop from a stack is to remove its last 879 | item and return it, if the stack is not empty, or to return 880 | nothing otherwise. 881 | 882 |
Although stacks are lists, for each must not be used with them; 883 | instead, a combination of while and pop is more appropriate. 884 | 885 |
Some lists are designated as queues. A queue is a list, 888 | but conventionally, the following operations are used to operate on it, instead of using 889 | append, prepend, or remove. 890 | 891 |
To enqueue in a queue is to append to it. 892 | 893 |
To dequeue from a queue is to remove its first 894 | item and return it, if the queue is not empty, or to return 895 | nothing if it is. 896 | 897 |
Although queues are lists, for each must not be used with them; 898 | instead, a combination of while and dequeue is more appropriate. 899 | 900 |
Some lists are designated as ordered sets. An 903 | ordered set is a list with the additional semantic that it must not contain the same 904 | item twice. 905 | 906 |
Almost all cases on the web platform require an ordered set, instead of an 907 | unordered one, since interoperability requires that any developer-exposed enumeration of the set's 908 | contents be consistent between browsers. In those cases where order is not important, we still use 909 | ordered sets; implementations can optimize based on the fact that the order is not observable. 910 | 911 |
To append to an ordered set is to do nothing if the set already 912 | contains the given item, or to perform the normal list 913 | append operation otherwise. 914 | 915 |
To prepend to an ordered set is to do nothing if the set already 916 | contains the given item, or to perform the normal list 917 | prepend operation otherwise. 918 | 919 |
To replace within an ordered set 920 | set, given item and replacement: if set 921 | contains item or replacement, then replace the first instance 922 | of either with replacement and remove all other instances. 923 | 924 |
Replacing "a" with "c" within the 925 | ordered set « "a", "b", "c" » gives « "c", "b" ». Within « "c", "b", "a" » it gives 926 | « "c", "b" » as well. 927 | 928 |
An ordered set |set| is a subset of another ordered set 929 | |superset| (and conversely, |superset| is a superset of |set|) if, 930 | for each |item| of |set|, |superset| contains |item|. 931 | 932 |
This implies that an ordered set is both a subset and a 933 | superset of itself. 934 | 935 |
The intersection of ordered sets |A| and |B|, is the result 936 | of creating a new ordered set |set| and, for each |item| of |A|, if |B| 937 | contains |item|, appending |item| to |set|. 938 | 939 |
The union of ordered sets |A| and |B|, is the result of 940 | cloning |A| as |set| and, for each |item| of |B|, 941 | appending |item| to |set|. 942 | 943 |
The range n to m, inclusive, creates a new 946 | ordered set containing all of the integers from n up to and including m 947 | in consecutively increasing order, as long as m is greater than or equal to n. 948 | 949 |
For each n of the range 1 to 950 | 4, inclusive, … 951 | 952 | 953 |
An ordered map, or sometimes just "map", is a 956 | specification type consisting of a finite ordered sequence of 957 | key/value pairs, with no key appearing twice. 958 | Each key/value pair is called an entry. 959 | 960 |
As with ordered sets, by default we assume that maps need to be ordered for 961 | interoperability among implementations. 962 | 963 |
A literal syntax can be used to express ordered maps, by surrounding the ordered map with 964 | «[ ]» characters, denoting each of its entries as |key| → |value|, and separating its 965 | entries with a comma. An indexing syntax can be used to look up and set values by 966 | providing a key inside square brackets. 967 | 968 |
Let |example| be the ordered map «[
969 | "a" → `x`, "b" → `y` ]». Then
970 | |example|["a"] is the byte sequence `x`.
971 |
972 |
To get the value of an entry in an 975 | ordered map given a key is to retrieve the value of any 976 | existing entry if the map contains an entry with the given key, or 977 | to return nothing otherwise. We can also use the indexing syntax explained above. 978 | 979 |
To set the value of an entry in an 980 | ordered map to a given value is to update the value of any existing 981 | entry if the map contains an entry with the given key, 982 | or if none such exists, to add a new entry with the given key/value to the end of the map. We can 983 | also denote this by saying, for an ordered map |map|, key |key|, and value |value|, 984 | "set |map|[|key|] to |value|". 985 | 986 |
To remove an entry from an ordered map is to remove 987 | all entries from the map that match a given condition, or do nothing if none do. If 988 | the condition is having a certain key, then we can also denote this by saying, for 989 | an ordered map |map| and key |key|, "remove |map|[|key|]". 990 | 991 |
An ordered map contains an 992 | entry with a given key if there exists an entry with that key. 993 | We can also denote this by saying that, for an ordered map |map| and key |key|, "|map|[|key|] 994 | exists". 995 | 996 |
To get the keys of an 997 | ordered map, return a new ordered set whose items are each of the 998 | keys in the map's entries. 999 | 1000 |
To get the values of an 1001 | ordered map, return a new list whose items are each of the 1002 | values in the map's entries. 1003 | 1004 |
An ordered map's size is the size of the result 1005 | of running get the keys on the map. 1006 | 1007 |
An ordered map is empty if its 1008 | size is zero. 1009 | 1010 |
To iterate over an ordered map, performing 1011 | a set of steps on each entry in order, use phrasing of the form 1012 | "For each |key| → |value| of |map|", and then operate on |key| and |value| in the 1013 | subsequent prose. 1014 | 1015 | 1016 |
A struct is a specification type consisting of a finite set of 1019 | items, each of which has a unique and immutable 1020 | name. 1021 | 1022 |
Structs with a defined order are also known as tuples. For 1025 | notational convenience, a literal syntax can be used to express tuples, by surrounding the 1026 | tuple with parenthesis and separating its items with a comma. To use this notation, 1027 | the names need to be clear from context. This can be done by preceding the first 1028 | instance with the name given to the tuple. 1029 | 1030 |
A status is an example tuple consisting of a code (a three-digit number) and text (a byte 1034 | sequence). 1035 | 1036 |
A nonsense algorithm that manipulates status tuples for the purpose of demonstrating their 1037 | usage is then:
1038 | 1039 |OK`).
1041 | FOO BAR`).
1042 | It is intentional that not all structs are tuples. Documents using the 1047 | Infra Standard might need the flexibility to add new names to their struct 1048 | without breaking literal syntax used by their dependencies. In that case a tuple is not appropriate. 1049 | 1050 |
Tuples with two items are also known as pairs. 1053 | For pairs, a slightly shorter literal syntax can be used, separating the two 1054 | items with a / character. 1055 | 1056 |
Another way of expressing our |statusInstance| tuple above would be
1057 | as 200/`OK`.
1058 |
1059 |
1060 |
To parse JSON from bytes given bytes, run these steps: 1063 | 1064 |
Let jsonText be the result of running UTF-8 decode on bytes. 1066 | [[!ENCODING]] 1067 | 1068 |
Return ? Call(%JSONParse%, undefined, « jsonText »). 1070 | [[!ECMA-262]] 1071 | 1072 |
The conventions used in this step are those of the JavaScript specification. 1073 | 1074 |
To serialize JSON to bytes a given JavaScript value value, run these 1077 | steps: 1078 | 1079 |
Let jsonString be the result of 1082 | ? Call(%JSONStringify%, undefined, « value »). 1083 | [[!ECMA-262]] 1084 |
The conventions used in this step are those of the JavaScript specification. 1085 | Also, since no additional arguments are passed to %JSONStringify%, the resulting string 1086 | will have no whitespace inserted. 1087 | 1088 |
Return the result of running UTF-8 encode on jsonString. [[!ENCODING]] 1089 |
To forgiving-base64 encode given a byte sequence data, apply 1095 | the base64 algorithm defined in section 4 of RFC 4648 to data and return the result. 1096 | [[!RFC4648]] 1097 | 1098 |
This is named forgiving-base64 encode for symmetry with 1099 | forgiving-base64 decode, which is different from the RFC as it defines error handling for 1100 | certain inputs. 1101 | 1102 |
To forgiving-base64 decode given a string data, run these steps:
1103 | 1104 |Remove all ASCII whitespace from data. 1106 | 1107 | 1108 |
If data's length divides by 4 leaving no remainder, then: 1110 | 1111 |
If data ends with one or two U+003D (=) code points, then remove them 1113 | from data. 1114 |
If data's length divides by 4 leaving a remainder of 1, then 1117 | return failure. 1118 | 1119 |
If data contains a code point that is not one of 1121 | 1122 |
then return failure. 1129 | 1130 |
Let output be an empty byte sequence. 1131 | 1132 |
Let buffer be an empty buffer that can have bits appended to it. 1133 | 1134 |
Let position be a position variable for data, initially 1135 | pointing at the start of data. 1136 | 1137 |
While position does not point past the end of data: 1139 | 1140 |
Find the code point pointed to by position in the second column of 1142 | Table 1: The Base 64 Alphabet of RFC 4648. Let n be the number given in the first cell 1143 | of the same row. [[!RFC4648]] 1144 | 1145 |
Append the six bits corresponding to n, most significant bit first, to 1146 | buffer. 1147 | 1148 |
If buffer has accumulated 24 bits, interpret them as three 8-bit big-endian 1149 | numbers. Append three bytes with values equal to those numbers to output, in the same 1150 | order, and then empty buffer. 1151 | 1152 |
Advance position by 1. 1153 |
If buffer is not empty, it contains either 12 or 18 bits. If it contains 12 bits, 1157 | then discard the last four and interpret the remaining eight as an 8-bit big-endian number. If it 1158 | contains 18 bits, then discard the last two and interpret the remaining 16 as two 8-bit big-endian 1159 | numbers. Append the one or two bytes with values equal to those one or two numbers to 1160 | output, in the same order.
1161 | 1162 |The discarded bits mean that, for instance, "YQ" and
1163 | "YR" both return `a`.
1164 |
1165 |
Return output. 1166 |
The HTML namespace is "http://www.w3.org/1999/xhtml".
1172 |
1173 |
The MathML namespace is "http://www.w3.org/1998/Math/MathML".
1174 |
1175 |
The SVG namespace is "http://www.w3.org/2000/svg".
1176 |
1177 |
The XLink namespace is "http://www.w3.org/1999/xlink".
1178 |
1179 |
The XML namespace is "http://www.w3.org/XML/1998/namespace".
1180 |
1181 |
The XMLNS namespace is "http://www.w3.org/2000/xmlns/".
1182 |
1183 |
1184 |
Many thanks to 1187 | Addison Phillips, 1188 | Aryeh Gregor, 1189 | Chris Rebert, 1190 | Daniel Ehrenberg, 1191 | Dominic Farolino, 1192 | Jake Archibald, 1193 | Jeff Hodges, 1194 | Jungkee Song, 1195 | Leonid Vasilyev, 1196 | Malika Aubakirova, 1197 | Michael™ Smith, 1198 | Mike West, 1199 | Ms2ger, 1200 | Philip Jägenstedt, 1201 | Rashaun "Snuggs" Stovall, 1202 | Sergey Shekyan, 1203 | Simon Pieters, 1204 | Tab Atkins, 1205 | Tobie Langel, 1206 | triple-underscore, 1207 | and Xue Fuqiao 1208 | for being awesome! 1209 | 1210 |
This standard is written by Anne van Kesteren 1211 | (Mozilla, 1212 | annevk@annevk.nl) and 1213 | Domenic Denicola (Google, 1214 | d@domenic.me). 1215 | -------------------------------------------------------------------------------- /review-drafts/2019-07.bs: -------------------------------------------------------------------------------- 1 |
2 | Group: WHATWG 3 | Date: 2019-07-16 4 | H1: Infra 5 | Shortname: infra 6 | Text Macro: TWITTER infrastandard 7 | Abstract: The Infra Standard aims to define the fundamental concepts upon which standards are built. 8 | Translation: ja https://triple-underscore.github.io/infra-ja.html 9 |10 | 11 |
12 | urlPrefix: https://tc39.github.io/ecma262/#; spec: ECMA-262; 13 | type: dfn 14 | text: %JSONParse%; url: sec-json.parse 15 | text: %JSONStringify%; url: sec-json.stringify 16 | text: List; url: sec-list-and-record-specification-type 17 | text: The String Type; url: sec-ecmascript-language-types-string-type 18 | text: realm; url: realm 19 | type: method; for: Array; text: sort(); url: sec-array.prototype.sort 20 | type: abstract-op; 21 | text: Call; url: sec-call 22 | text: Get; url: sec-get-o-p 23 | text: IsArray; url: sec-isarray 24 | text: ToLength; url: sec-tolength 25 | text: ToString; url: sec-tostring 26 | text: Type; url: sec-ecmascript-data-types-and-values 27 |28 | 29 | 30 |
Deduplicate boilerplate in standards. 34 | 35 |
Align standards on conventions, terminology, and data structures. 36 | 37 |
Be a place for concepts used by multiple standards without a good home. 38 | 39 |
Help write clear and readable algorithmic prose by clarifying otherwise ambiguous concepts. 40 |
Suggestions for more goals welcome.
43 | 44 | 45 |To make use of the Infra Standard in a document titled X, use 48 | X depends on the Infra Standard. Additionally, cross-referencing terminology 49 | is encouraged to avoid ambiguity. 50 | 51 |
Specification authors are also encouraged to add their specification to the 52 | list of dependent specifications in 53 | order to help the editors ensure that any future breaking changes to the Infra Standard are 54 | correctly reflected by any such dependencies. 55 | 56 | 57 |
All assertions, diagrams, examples, and notes are non-normative, as are all sections explicitly 62 | marked non-normative. Everything else is normative. 63 | 64 |
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", 65 | "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in 66 | RFC 2119. [[!RFC2119]] 67 | 68 |
These keywords have equivalent meaning when written in lowercase and cannot appear in 69 | non-normative content. 70 | 71 |
This is a willful violation of RFC 8174, motivated by legibility and a desire 72 | to preserve long-standing practice in many non-IETF-published pre-RFC 8174 documents. [[RFC8174]] 73 | 74 |
All of the above is applicable to both this standard and any document that uses this standard. 75 | Documents using this standard are encouraged to limit themselves to "must", "must not", "should", 76 | and "may", and to use these in their lowercase form as that is generally considered to be more 77 | readable. 78 | 79 |
For non-normative content "strongly encouraged", "strongly discouraged", "encouraged", 80 | "discouraged", "can", "cannot", "could", "could not", "might", and "might not" can be used instead. 81 | 82 | 83 |
In general, specifications interact with and rely on a wide variety of other specifications. In 86 | certain circumstances, unfortunately, conflicting needs require a specification to violate the 87 | requirements of other specifications. When this occurs, a document using the Infra Standard should 88 | denote such transgressions as a willful violation, and note the reason for that 89 | violation. 90 | 91 |
The previous section, [[#conformance]], documents a 92 | willful violation of RFC 8174 committed by the Infra Standard. 93 | 94 | 95 |
The word "or", in cases where both inclusive "or" and exclusive "or" are possible (e.g., "if 98 | either width or height is zero"), means an inclusive "or" (implying "or both"), unless it is called 99 | out as being exclusive (with "but not both"). 100 | 101 | 102 |
Algorithms, and requirements phrased in the imperative as part of algorithms (such as "strip any 105 | leading spaces" or "return false") are to be interpreted with the meaning of the keyword (e.g., 106 | "must") used in introducing the algorithm or step. If no such keyword is used, must is implied. 107 | 108 |
For example, were the spec to say:
110 | 111 |112 |120 | 121 |To eat an orange, the user must: 113 | 114 |
115 |
119 |- Peel the orange. 116 |
- Separate each slice of the orange. 117 |
- Eat the orange slices. 118 |
it would be equivalent to the following:
122 | 123 |124 |132 | 133 |To eat an orange: 125 | 126 |
127 |
131 |- The user must peel the orange. 128 |
- The user must separate each slice of the orange. 129 |
- The user must eat the orange slices. 130 |
Here the key word is "must".
134 | 135 |Modifying the above example, if the algorithm was introduced only with "To eat 136 | an orange:", it would still have the same meaning, as "must" is implied. 137 |
Conformance requirements phrased as algorithms or specific steps may be implemented in any 140 | manner, so long as the end result is equivalent. (In particular, the algorithms are intended to be 141 | easy to follow, and not intended to be performant.) 142 | 143 |
Performance is tricky to get correct as it is influenced by user perception, computer 144 | architectures, and different types of input that can change over time in how common they are. For 145 | instance, a JavaScript engine likely has many different code paths for what is standardized as a 146 | single algorithm, in order to optimize for speed or memory consumption. Standardizing all those code 147 | paths would be an insurmountable task and not productive as they would not stand the test of time 148 | as well as the single algorithm would. Therefore performance is best left as a field to compete 149 | over. 150 | 151 | 152 |
A variable is declared with "let" and changed with "set". 155 | 156 |
Let |list| be a new list.
157 | 158 |Let |value| be null. 161 | 162 |
If |input| is a string, then set |value| to |input|. 163 | 164 |
Otherwise, set |value| to |input|, UTF-8 decoded. 165 | 166 |
Let activationTarget be 171 | target if isActivationEvent is true and target has activation 172 | behavior; otherwise null. 173 | 174 |
Variables must not be used before they are declared. Variables are 175 | block scoped. 176 | Variables must not be declared more than once per algorithm. 177 | 178 | 179 |
The control flow of algorithms is such that a requirement to "return" or "throw" terminates the 182 | algorithm the statement was in. "Return" will hand the given value, if any, to its caller. "Throw" 183 | will make the caller automatically rethrow the given value, if any, and thereby terminate the 184 | caller's algorithm. Using prose the caller has the ability to "catch" the exception and perform 185 | another action. 186 | 187 | 188 |
Sometimes it is useful to stop performing a series of steps once a condition becomes true. 191 | 192 |
To do this, state that a given series of steps will abort when a specific 193 | condition is reached. This indicates that the specified steps must be evaluated, not 194 | as-written, but by additionally inserting a step before each of them that evaluates 195 | condition, and if condition evaluates to true, skips the remaining steps. 196 | 197 |
In such algorithms, the subsequent step can be annotated to run if aborted, in 198 | which case it must run if any of the preceding steps were skipped due to the condition 199 | of the preceding abort when step evaluated to true. 200 | 201 |
The following algorithm 203 | 204 |
Let |result| be an empty list. 206 | 207 |
Run these steps, but abort when the user clicks the "Cancel" button: 209 | 210 |
220 |If aborted, append "Didn't finish!" to |result|.
223 |
is equivalent to the more verbose formulation
226 | 227 |Let |result| be an empty list. 229 | 230 |
If the user has not clicked the "Cancel" button, then: 232 | 233 |
Compute the first million digits of π, and append the result 235 | to |result|. 236 | 237 |
If the user has not clicked the "Cancel" button, then: 239 | 240 |
247 |If the user clicked the "Cancel" button, then append
250 | "Didn't finish!" to |result|.
251 |
Whenever this construct is used, implementations are allowed to evaluate 255 | condition during the specified steps rather than before and after each step, as long as 256 | the end result is indistinguishable. For instance, as long as |result| in the above example is not 257 | mutated during a compute operation, the user agent could stop the computation. 258 | 259 | 260 |
There's a variety of ways to repeat a set of steps until a condition is reached. 263 | 264 |
The Infra Standard is not (yet) exhaustive on this; please file an issue if you need 265 | something. 266 | 267 |
As defined for lists (and derivatives) and 270 | maps. 271 | 272 |
An instruction to repeat a set of steps as long as a condition is met. 275 | 276 |
While |condition| is "met":
278 |
279 |
… 281 |
An iteration's flow can be controlled via requirements to 286 | continue or break. 287 | Continue will skip over any remaining steps in an iteration, proceeding to the 288 | next item. If no further items remain, the iteration will stop. Break will skip 289 | over any remaining steps in an iteration, and skip over any remaining items as well, stopping the 290 | iteration. 291 | 292 |
Let |example| be the list « 1, 2, 3, 4 ». The following prose would perform |operation| 294 | upon 1, then 2, then 3, then 4: 295 | 296 |
For each |item| of |example|: 299 |
The following prose would perform |operation| upon 1, then 2, then 4. 3 would be skipped. 306 | 307 |
For each |item| of |example|: 310 |
The following prose would perform |operation| upon 1, then 2. 3 and 4 would be skipped. 318 | 319 |
328 |To improve readability, it can sometimes help to add assertions to algorithms, stating 334 | invariants. To do this, write "Assert:", followed by a statement that must be 335 | true. If the statement ends up being false that indicates an issue with the document using the Infra 336 | Standard that should be reported and addressed. 337 | 338 |
Since the statement can only ever be true, it has no implications for implementations. 339 | 340 |
Let |x| be "Aperture Science".
343 |
Assert: |x| is "Aperture Science".
344 |
The value null is used to indicate the lack of a value. It can be used interchangeably with the 352 | JavaScript null value. [[!ECMA-262]] 353 | 354 |
Let element be null. 355 | 356 |
If input is the empty string, then return null. 357 | 358 | 359 |
A boolean is either true or false. 362 | 363 |
Let elementSeen be false. 364 | 365 | 366 |
A byte is a sequence of eight bits, represented as a double-digit hexadecimal 369 | number in the range 0x00 to 0xFF, inclusive. 370 | 371 |
An ASCII byte is a byte in the range 0x00 (NUL) to 0x7F (DEL), 372 | inclusive. As illustrated, an ASCII byte, excluding 0x28 and 0x29, may be followed by the 373 | representation outlined in the Standard Code 374 | section of ASCII format for Network Interchange, between parentheses. [[!RFC20]] 375 | 376 |
0x28 may be followed by "(left parenthesis)" and 0x29 by "(right parenthesis)". 377 | 378 |
0x49 (I) when UTF-8 decoded becomes the 379 | code point U+0049 (I). 380 | 381 | 382 |
A byte sequence is a sequence of bytes, represented as a space-separated 385 | sequence of bytes. Byte sequences with bytes in the range 0x20 (SP) to 0x7E (~), inclusive, can 386 | alternately be written as a string, but using backticks instead of quotation marks, to avoid 387 | confusion with an actual string. 388 | 389 |
0x48 0x49 can also be represented as `HI`.
391 |
392 |
Headers, such as `Content-Type`, are byte sequences.
393 |
To get a byte sequence out of a string, using UTF-8 encode from 396 | the Encoding Standard is encouraged. In rare circumstances isomorphic encode might be needed. 397 | [[ENCODING]] 398 | 399 |
A byte sequence's length is the number of 400 | bytes it contains. 401 | 402 |
To byte-lowercase a byte sequence, increase each byte it 403 | contains, in the range 0x41 (A) to 0x5A (Z), inclusive, by 0x20. 404 | 405 |
To byte-uppercase a byte sequence, subtract each byte it 406 | contains, in the range 0x61 (a) to 0x7A (z), inclusive, by 0x20. 407 | 408 |
A byte sequence A is a byte-case-insensitive match for a 409 | byte sequence B, if the byte-lowercase of A is the 410 | byte-lowercase of B. 411 | 412 |
A byte sequence a starts with a 415 | byte sequence b if the following steps return true: 416 | 417 |
Let i be 0. 419 | 420 |
While true: 422 | 423 |
Let aByte be the ith byte of a if i is 425 | less than a's length; otherwise null. 426 | 427 |
Let bByte be the ith byte of b if i is 428 | less than b's length; otherwise null. 429 | 430 |
If bByte is null, then return true. 431 | 432 |
Return false if aByte is not bByte. 433 | 434 |
Set i to i + 1. 435 |
A byte sequence a is byte less than a byte sequence 440 | b if the following steps return true: 441 | 442 |
If b starts with a, then return false. 444 | 445 |
If a starts with b, then return true. 446 | 447 |
Let n be the smallest index such that the nth byte of 448 | a is different from the nth byte of b. (There has to be such an 449 | index, since neither byte sequence starts with the other.) 450 | 451 |
If the nth byte of a is less than the nth byte of 452 | b, then return true. 453 | 454 |
Return false. 455 |
To isomorphic decode a byte sequence input, return a 460 | string whose length is equal to input's 461 | length and whose code points have the same values as 462 | input's bytes, in the same order. 463 | 464 | 465 |
A code point is a Unicode code point and is 468 | represented as a four-to-six digit hexadecimal number, typically prefixed with "U+". 469 | 470 |
A code point may be followed by its name, by its rendered form between parentheses when it 471 | is not U+0028 or U+0029, or by both. Documents using the Infra Standard are encouraged to follow 472 | code points by their name when they cannot be rendered or are U+0028 or U+0029; otherwise, 473 | follow them by their rendered form between parentheses, for legibility. 474 | 475 |
A code point's name is defined in the Unicode Standard and represented in 476 | ASCII uppercase. [[!UNICODE]] 477 | 478 |
The code point rendered as 🤔 is represented as U+1F914. 480 | 481 |
When referring to that code point, we might say "U+1F914 (🤔)", to provide extra context. 482 | Documents are allowed to use "U+1F914 THINKING FACE (🤔)" as well, though this is somewhat verbose. 483 |
Code points that are difficult 486 | to render unambigiously, such as U+000A, can be referred to as "U+000A LF". U+0029 can be referred 487 | to as "U+0029 RIGHT PARENTHESIS", because even though it renders, this avoids unmatched parentheses. 488 | 489 |
Code points are sometimes referred to as characters and in certain contexts are 490 | prefixed with "0x" rather than "U+". 491 | 492 |
A surrogate is a code point that is in the range U+D800 to U+DFFF, 493 | inclusive. 494 | 495 |
A scalar value is a code point that is not a surrogate. 496 | 497 |
A noncharacter is a code point that is in the range U+FDD0 to U+FDEF, 498 | inclusive, or U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, 499 | U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, 500 | U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, 501 | U+FFFFF, U+10FFFE, or U+10FFFF. 502 | 503 |
An ASCII code point is a code point in the range U+0000 NULL to 504 | U+007F DELETE, inclusive. 505 | 506 |
An ASCII tab or newline is 507 | U+0009 TAB, U+000A LF, or U+000D CR. 508 | 509 |
ASCII whitespace is U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 510 | SPACE. 511 | 512 |
"Whitespace" is a mass noun. 513 | 514 |
A C0 control is a code point in the range U+0000 NULL to 515 | U+001F INFORMATION SEPARATOR ONE, inclusive. 516 | 517 |
A C0 control or space is a 518 | C0 control or U+0020 SPACE. 519 | 520 |
A control is a C0 control or a code point in the range 521 | U+007F DELETE to U+009F APPLICATION PROGRAM COMMAND, inclusive. 522 | 523 |
An ASCII digit is a code point in the range U+0030 (0) to U+0039 (9), 524 | inclusive. 525 | 526 |
An ASCII upper hex digit is an ASCII digit or a code point in the 527 | range U+0041 (A) to U+0046 (F), inclusive. 528 | 529 |
An ASCII lower hex digit is an ASCII digit or a code point in the 530 | range U+0061 (a) to U+0066 (f), inclusive. 531 | 532 |
An ASCII hex digit is an ASCII upper hex digit or 533 | ASCII lower hex digit. 534 | 535 |
An ASCII upper alpha is a code point in the range U+0041 (A) to 536 | U+005A (Z), inclusive. 537 | 538 |
An ASCII lower alpha is a code point in the range U+0061 (a) to 539 | U+007A (z), inclusive. 540 | 541 |
An ASCII alpha is an ASCII upper alpha or ASCII lower alpha. 542 | 543 |
An ASCII alphanumeric is an ASCII digit or ASCII alpha. 544 | 545 | 546 |
A JavaScript string is a sequence of unsigned 16-bit integers, also known as 549 | code units. 550 | 551 |
This is different from how the Unicode Standard defines "code unit". In particular it 552 | refers exclusively to how the Unicode Standard defines it for Unicode 16-bit strings. [[UNICODE]] 553 | 554 |
A JavaScript string's length is the number of 555 | code units it contains. 556 | 557 |
A JavaScript string can also be interpreted as containing code points, per the 558 | conversion defined in The String Type section of the JavaScript specification. [[!ECMA-262]] 559 | 560 |
This conversion process converts surrogate pairs into their corresponding 561 | scalar value and maps isolated surrogates to their corresponding code point, leaving 562 | them effectively as-is. 563 | 564 |
A JavaScript string consisting 565 | of the code units 0xD83D, 0xDCA9, and 0xD800, when interpreted as containing 566 | code points, would consist of the code points U+1F4A9 and U+D800. 567 | 568 |
A scalar value string is a sequence of scalar values. 569 | 570 |
A scalar value string is useful for any kind of I/O or other kind of operation 571 | where UTF-8 encode comes into play. 572 | 573 | 574 |
String can be used to refer to either a JavaScript string or 575 | scalar value string, when it is clear from the context which is meant or when the distinction 576 | is immaterial. Strings are denoted by double quotes and monospace font. 577 | 578 |
"Hello, world!" is a string.
579 |
580 |
A string's length is the number of code points it 581 | contains. 582 | 583 |
To convert a JavaScript string into a 584 | scalar value string, replace any surrogates with U+FFFD. 585 | 586 | 587 |
The replaced surrogates are always isolated surrogates, since the process of 588 | interpreting the JavaScript string as containing code points will have converted surrogate 589 | pairs into scalar values. 590 | 591 |
A scalar value string can always be used as JavaScript string implicitly since it 592 | is a subset. The reverse is only possible if the JavaScript string is known to not contain 593 | surrogates; otherwise a conversion must be 594 | performed. 595 | 596 |
An implementation likely has to perform explicit conversion, depending on how it 597 | actually ends up representing JavaScript and 598 | scalar value strings. It is even fairly typical for implementations to have multiple 599 | implementations of just JavaScript strings for performance and memory reasons. 600 | 601 |
A string a is a 604 | code unit prefix of a string b 605 | if the following steps return true: 606 | 607 |
Let i be 0. 609 | 610 |
While true: 612 | 613 |
Let aCodeUnit be the ith code unit of a if 615 | i is less than a's length; otherwise null. 616 | 617 |
Let bCodeUnit be the ith code unit of b if 618 | i is less than b's length; otherwise null. 619 | 620 |
If bCodeUnit is null, then return true. 621 | 622 |
Return false if aCodeUnit is different from bCodeUnit. 623 | 624 |
Set i to i + 1. 625 |
When it is clear from context that code units are in play, e.g., because one of the 630 | strings is a literal containing only characters that are in the range U+0020 SPACE to U+007E (~), 631 | "a starts with b" can be used as a synonym for "b is a 632 | code unit prefix of a". 633 | 634 |
With unknown values, it is good to be explicit:
635 | targetString is a code unit prefix of userInput. But with a
636 | literal, we can use plainer language: userInput starts with "!".
637 |
638 |
A string a is code unit less than a string 639 | b if the following steps return true: 640 | 641 |
If b is a code unit prefix of a, then return false. 643 | 644 |
If a is a code unit prefix of b, then return true. 645 | 646 |
Let n be the smallest index such that the nth code unit of 647 | a is different from the nth code unit of b. (There has to be such 648 | an index, since neither string is a prefix of the other.) 649 | 650 |
If the nth code unit of a is less than the nth code unit of 651 | b, then return true. 652 | 653 |
Return false. 654 |
This matches the ordering used by JavaScript's < operator, and its
657 | {{Array/sort()}} method on an array of strings. This ordering compares the 16-bit code units in each
658 | string, producing a highly efficient, consistent, and deterministic sort order. The resulting
659 | ordering will not match any particular alphabet or lexicographic order, particularly for
660 | code points represented by a surrogate pair. [[!ECMA-262]]
661 |
662 |
To isomorphic encode a string input, run these steps:
665 | 666 |Assert: input contains no code points greater than U+00FF. 668 | 669 |
Return a byte sequence whose length is equal to 670 | input's length and whose bytes have the same values as 671 | input's code points, in the same order. 672 |
An ASCII string is a string whose code points are all 677 | ASCII code points. 678 | 679 |
To ASCII lowercase a string, replace all ASCII upper alphas in 680 | the string with their corresponding code point in ASCII lower alpha. 681 | 682 |
To ASCII uppercase a string, replace all ASCII lower alphas in 683 | the string with their corresponding code point in ASCII upper alpha. 684 | 685 |
A string A is an ASCII case-insensitive match for a 686 | string B, if the ASCII lowercase of A is the 687 | ASCII lowercase of B. 688 | 689 | 690 |
To ASCII encode a string input, run these steps: 691 | 692 |
Assert: input is an ASCII string. 694 | 695 |
Note: This precondition ensures that isomorphic encode and 696 | UTF-8 encode return the same byte sequence for this input. 697 | 698 |
Return the isomorphic encoding of input. 699 |
To ASCII decode a byte sequence input, run these steps: 702 | 703 |
Assert: All bytes in input are ASCII bytes. 705 | 706 |
Note: This precondition ensures that isomorphic decode and 707 | UTF-8 decode return the same string for this input. 708 | 709 |
Return the isomorphic decoding of input. 710 |
To strip newlines from a string, remove any U+000A LF and U+000D CR 716 | code points from the string. 717 | 718 |
To normalize newlines in a string, replace every U+000D CR U+000A LF 719 | code point pair with a single U+000A LF code point, and then replace every remaining 720 | U+000D CR code point with a U+000A LF code point. 721 | 722 |
To strip leading and trailing ASCII whitespace from a string, remove all 723 | ASCII whitespace that are at the start or the end of the string. 724 | 725 |
To strip and collapse ASCII whitespace in a string, replace any sequence 726 | of one or more consecutive code points that are ASCII whitespace in the string 727 | with a single U+0020 SPACE code point, and then remove any leading and trailing 728 | ASCII whitespace from that string. 729 | 730 |
To collect a sequence of code points meeting a condition condition from 734 | a string input, given a position variable 735 | position tracking the position of the calling algorithm within input:
736 | 737 |Let result be the empty string. 739 | 740 |
While position doesn't point past the end of input and the 742 | code point at position within input meets the condition 743 | condition: 744 | 745 |
Append that code point to the end of result. 747 | 748 |
Advance position by 1. 749 |
Return result. 753 |
In addition to returning the collected code points, this algorithm updates the 756 | position variable in the calling algorithm. 757 | 758 |
To skip ASCII whitespace within a string input given a 759 | position variable position, collect a sequence of code points that are 760 | ASCII whitespace from input given position. The collected 761 | code points are not used, but position is still updated. 762 | 763 |
To strictly split a string 766 | input on a particular delimiter code point delimiter:
767 | 768 |Let position be a position variable for input, initially 770 | pointing at the start of input. 771 | 772 |
Let tokens be a list of strings, initially empty. 773 | 774 |
Let token be the result of collecting a sequence of code points that are 775 | not equal to delimiter from input, given position. 776 | 777 |
Append token to tokens. 778 | 779 |
While position is not past the end of input: 781 | 782 |
Assert: the code point at position within input is 784 | delimiter. 785 | 786 |
Advance position by 1. 787 | 788 |
Let token be the result of collecting a sequence of code points that are 789 | not equal to delimiter from input, given position. 790 | 791 |
Append token to tokens. 792 |
Return tokens. 796 |
This algorithm is a "strict" split, as opposed to the commonly-used variants 799 | for ASCII whitespace and 800 | for commas below, which are both more lenient in various ways involving 801 | interspersed ASCII whitespace. 802 | 803 |
To split a 804 | string input on ASCII whitespace: 805 | 806 |
Let position be a position variable for input, initially 808 | pointing at the start of input. 809 | 810 |
Let tokens be a list of strings, initially empty. 811 | 812 |
Skip ASCII whitespace within input given position. 813 | 814 |
While position is not past the end of input: 816 | 817 |
Let token be the result of collecting a sequence of code points that are 819 | not ASCII whitespace from input, given position. 820 | 821 |
Append token to tokens. 822 | 823 |
Skip ASCII whitespace within input given position. 824 |
Return tokens. 828 |
To split a string 831 | input on commas: 832 | 833 |
Let position be a position variable for input, initially 835 | pointing at the start of input. 836 | 837 |
Let tokens be a list of strings, initially empty. 838 | 839 |
While position is not past the end of input: 841 | 842 |
Let token be the result of collecting a sequence of code points that are 845 | not U+002C (,) from input, given position. 846 | 847 |
token might be the empty string. 848 |
Append token to tokens. 853 | 854 |
If position is not past the end of input, then: 856 | 857 |
Assert: the code point at position within input is 859 | U+002C (,). 860 | 861 |
Advance position by 1. 862 |
Return tokens. 868 |
To concatenate a list of 871 | strings list, using an optional separator string separator, run 872 | these steps: 873 | 874 |
If list is empty, then return the empty string. 876 | 877 |
If separator is not given, then set separator to the empty string. 878 | 879 |
Return a string whose contents are list's items, in 880 | order, separated from each other by separator. 881 |
To serialize a set set, return the 884 | concatenation of set using U+0020 SPACE. 885 | 886 | 887 |
Conventionally, specifications have operated on a variety of vague specification-level data 890 | structures, based on shared understanding of their semantics. This generally works well, but can 891 | lead to ambiguities around edge cases, such as iteration order or what happens when you 892 | append an item to an ordered set that the set already 893 | contains. It has also led to a variety of divergent notation and phrasing, especially 894 | around more complex data structures such as maps. 895 | 896 |
This standard provides a small set of common data structures, along with notation and phrasing 897 | for working with them, in order to create common ground. 898 | 899 | 900 |
A list is a specification type consisting of a finite ordered sequence of 903 | items. 904 | 905 |
For notational convenience, a literal syntax can be used to express lists, by surrounding 906 | the list by « » characters and separating its items with a comma. An indexing syntax 907 | can be used by providing a zero-based index into a list inside square brackets. The index cannot be 908 | out-of-bounds, except when used with exists. 909 | 910 |
Let |example| be the list « "a",
911 | "b", "c", "a" ». Then |example|[1] is the string
912 | "b".
913 |
914 |
To append to a list that is not an ordered set is to 917 | add the given item to the end of the list. 918 | 919 |
To extend a list |A| with a list |B|, 920 | for each |item| of |B|, append |item| to |A|. 921 | 922 |
To prepend to a list that is not an ordered set is to 936 | add the given item to the beginning of the list. 937 | 938 |
To replace within a list that is not an ordered set is 939 | to replace all items from the list that match a given condition with the given item, 940 | or do nothing if none do. 941 | 942 |
The above definitions are modified when the list is an ordered set; see 943 | below for ordered set append, prepend, and 944 | replace. 945 | 946 |
To insert an item into a list before an 947 | index is to add the given item to the list between the given index − 1 and the given index. If 948 | the given index is 0, then prepend the given item to the list. 949 | 950 |
To remove zero or more items from a list is 951 | to remove all items from the list that match a given condition, or do nothing if none do. 952 | 953 |
Removing |x| from the list « |x|, |y|, |z|, |x| » is to remove all 955 | items from the list that are equal to |x|. The list now is equivalent to « |y|, |z| ». 956 | 957 |
Removing all items that start with the string "a" from the
958 | list « "a", "b", "ab", "ba" » is to
959 | remove the items "a" and "ab". The list is now equivalent to «
960 | "b", "ba" ».
961 |
To empty a list is to remove 964 | all of its items. 965 | 966 |
A list contains an 967 | item if it appears in the list. We can also denote this by saying that, for a 968 | list |list| and an index |index|, "|list|[|index|] exists". 969 | 970 |
A list's size is the number of 971 | items the list contains. 972 | 973 |
A list is empty if 974 | its size is zero. 975 | 976 |
To iterate over a list, performing a 977 | set of steps on each item in order, use phrasing of the form 978 | "For each |item| of list", and then operate on |item| in the 979 | subsequent prose. 980 | 981 |
To clone a list |list| is to create a new 982 | list |clone|, of the same designation, and, for each |item| of |list|, 983 | append |item| to |clone|, so that |clone| contains the same 984 | items, in the same order as |list|. 985 | 986 | Note: This is a "shallow clone", as the items themselves are not cloned in any way. 987 | 988 |
Let |original| be the ordered set «
989 | "a", "b", "c" ». Cloning |original| creates
990 | a new ordered set |clone|, so that replacing "a" with
991 | "foo" in |clone| gives « "foo", "b", "c" »,
992 | while |original|[0] is still the string "a".
993 |
994 |
To sort in ascending order 995 | a list |list|, with a less than algorithm |lessThanAlgo|, is to create a new list 996 | |sorted|, containing the same items as |list| but sorted so that according to 997 | |lessThanAlgo|, each item is less than the one following it, if any. For items that sort the same 998 | (i.e., for which |lessThanAlgo| returns false for both comparisons), their relative order in 999 | |sorted| must be the same as it was in |list|. 1000 | 1001 |
To sort in descending order 1002 | a list |list|, with a less than algorithm |lessThanAlgo|, is to create a new list 1003 | |sorted|, containing the same items as |list| but sorted so that according to 1004 | |lessThanAlgo|, each item is less than the one preceding it, if any. For items that sort the same 1005 | (i.e., for which |lessThanAlgo| returns false for both comparisons), their relative order in 1006 | |sorted| must be the same as it was in |list|. 1007 | 1008 |
Let |original| be the list « (200, "OK"),
1009 | (404, "Not Found"), (null, "OK") ». Sorting |original| in
1010 | ascending order, with |a| being less than |b| if |a|'s second item is
1011 | code unit less than |b|'s second item, gives the result « (404,
1012 | "Not Found"), (200, "OK"), (null, "OK") ».
The list type originates from the JavaScript specification (where it is capitalized, as 1017 | List); we repeat some elements of its definition here for ease of reference, 1018 | and provide an expanded vocabulary for manipulating lists. Whenever JavaScript expects a 1019 | List, a list as defined here can be used; they are the same type. 1020 | [[!ECMA-262]] 1021 | 1022 |
Some lists are designated as stacks. A stack is a list, 1025 | but conventionally, the following operations are used to operate on it, instead of using 1026 | append, prepend, or remove. 1027 | 1028 |
To push onto a stack is to append to it. 1029 | 1030 |
To pop from a stack: if the stack 1031 | is not empty, then remove its last item and return 1032 | it; otherwise, return nothing. 1033 | 1034 |
Although stacks are lists, for each must not be used with them; 1035 | instead, a combination of while and pop is more appropriate. 1036 | 1037 |
Some lists are designated as queues. A queue is a list, 1040 | but conventionally, the following operations are used to operate on it, instead of using 1041 | append, prepend, or remove. 1042 | 1043 |
To enqueue in a queue is to append to it. 1044 | 1045 |
To dequeue from a queue is to remove its first 1046 | item and return it, if the queue is not empty, or to return 1047 | nothing if it is. 1048 | 1049 |
Although queues are lists, for each must not be used with them; 1050 | instead, a combination of while and dequeue is more appropriate. 1051 | 1052 |
Some lists are designated as ordered sets. An 1055 | ordered set is a list with the additional semantic that it must not contain the same 1056 | item twice. 1057 | 1058 |
Almost all cases on the web platform require an ordered set, instead of an 1059 | unordered one, since interoperability requires that any developer-exposed enumeration of the set's 1060 | contents be consistent between browsers. In those cases where order is not important, we still use 1061 | ordered sets; implementations can optimize based on the fact that the order is not observable. 1062 | 1063 |
To append to an ordered set: if the set contains 1064 | the given item, then do nothing; otherwise, perform the normal list 1065 | append operation. 1066 | 1067 |
To prepend to an ordered set: if the set 1068 | contains the given item, then do nothing; otherwise, perform the 1069 | normal list prepend operation. 1070 | 1071 |
To replace within an ordered set 1072 | set, given item and replacement: if set 1073 | contains item or replacement, then replace the first instance 1074 | of either with replacement and remove all other instances. 1075 | 1076 |
Replacing "a" with "c" within the 1077 | ordered set « "a", "b", "c" » gives « "c", "b" ». Within « "c", "b", "a" » it gives 1078 | « "c", "b" » as well. 1079 | 1080 |
An ordered set |set| is a subset of another ordered set 1081 | |superset| (and conversely, |superset| is a superset of |set|) if, 1082 | for each |item| of |set|, |superset| contains |item|. 1083 | 1084 |
This implies that an ordered set is both a subset and a 1085 | superset of itself. 1086 | 1087 |
The intersection of ordered sets |A| and |B|, is the result 1088 | of creating a new ordered set |set| and, for each |item| of |A|, if |B| 1089 | contains |item|, appending |item| to |set|. 1090 | 1091 |
The union of ordered sets |A| and |B|, is the result of 1092 | cloning |A| as |set| and, for each |item| of |B|, 1093 | appending |item| to |set|. 1094 | 1095 |
The range n to m, inclusive, creates a new 1098 | ordered set containing all of the integers from n up to and including m 1099 | in consecutively increasing order, as long as m is greater than or equal to n. 1100 | 1101 |
For each n of the range 1 to 1102 | 4, inclusive, … 1103 | 1104 | 1105 |
An ordered map, or sometimes just "map", is a 1108 | specification type consisting of a finite ordered sequence of 1109 | key/value pairs, with no key appearing twice. 1110 | Each key/value pair is called an entry. 1111 | 1112 |
As with ordered sets, by default we assume that maps need to be ordered for 1113 | interoperability among implementations. 1114 | 1115 |
A literal syntax can be used to express ordered maps, by surrounding the ordered map with 1116 | «[ ]» characters, denoting each of its entries as |key| → |value|, and separating its 1117 | entries with a comma. An indexing syntax can be used to look up and set values by 1118 | providing a key inside square brackets. The index cannot be out-of-bounds, except 1119 | when used with exists. 1120 | 1121 |
Let |example| be the ordered map «[
1122 | "a" → `x`, "b" → `y` ]». Then
1123 | |example|["a"] is the byte sequence `x`.
1124 |
1125 |
To get the value of an entry in an 1128 | ordered map given a key, return the value of the 1129 | entry whose key is the given key. We can also use the indexing syntax 1130 | explained above. 1131 | 1132 |
To set the value of an entry in an 1133 | ordered map to a given value is to update the value of any existing 1134 | entry if the map contains an entry with the given key, 1135 | or if none such exists, to add a new entry with the given key/value to the end of the map. We can 1136 | also denote this by saying, for an ordered map |map|, key |key|, and value |value|, 1137 | "set |map|[|key|] to |value|". 1138 | 1139 |
To remove an entry from an ordered map is to remove 1140 | all entries from the map that match a given condition, or do nothing if none do. If 1141 | the condition is having a certain key, then we can also denote this by saying, for 1142 | an ordered map |map| and key |key|, "remove |map|[|key|]". 1143 | 1144 |
An ordered map contains an 1145 | entry with a given key if there exists an entry with that key. 1146 | We can also denote this by saying that, for an ordered map |map| and key |key|, "|map|[|key|] 1147 | exists". 1148 | 1149 |
To get the keys of an 1150 | ordered map, return a new ordered set whose items are each of the 1151 | keys in the map's entries. 1152 | 1153 |
To get the values of an 1154 | ordered map, return a new list whose items are each of the 1155 | values in the map's entries. 1156 | 1157 |
An ordered map's size is the size of the result 1158 | of running get the keys on the map. 1159 | 1160 |
An ordered map is empty if its 1161 | size is zero. 1162 | 1163 |
To iterate over an ordered map, performing 1164 | a set of steps on each entry in order, use phrasing of the form 1165 | "For each |key| → |value| of |map|", and then operate on |key| and |value| in the 1166 | subsequent prose. 1167 | 1168 |
To sort in ascending order 1169 | a map |map|, with a less than algorithm |lessThanAlgo|, is to create a new map 1170 | |sorted|, containing the same entries as |map| but sorted so that according to 1171 | |lessThanAlgo|, each entry is less than the one following it, if any. For entries that sort the same 1172 | (i.e., for which |lessThanAlgo| returns false for both comparisons), their relative order in 1173 | |sorted| must be the same as it was in |map|. 1174 | 1175 |
To sort in descending order 1176 | a map |map|, with a less than algorithm |lessThanAlgo|, is to create a new map 1177 | |sorted|, containing the same entries as |map| but sorted so that according to 1178 | |lessThanAlgo|, each entry is less than the one preceding it, if any. For entries that sort the same 1179 | (i.e., for which |lessThanAlgo| returns false for both comparisons), their relative order in 1180 | |sorted| must be the same as it was in |map|. 1181 | 1182 | 1183 |
A struct is a specification type consisting of a finite set of 1186 | items, each of which has a unique and immutable 1187 | name. 1188 | 1189 |
Structs with a defined order are also known as tuples. For 1192 | notational convenience, a literal syntax can be used to express tuples, by surrounding the 1193 | tuple with parenthesis and separating its items with a comma. To use this notation, 1194 | the names need to be clear from context. This can be done by preceding the first 1195 | instance with the name given to the tuple. 1196 | 1197 |
A status is an example tuple consisting of a code (a three-digit number) and text (a byte 1201 | sequence). 1202 | 1203 |
A nonsense algorithm that manipulates status tuples for the purpose of demonstrating their 1204 | usage is then:
1205 | 1206 |OK`).
1208 | FOO BAR`).
1209 | It is intentional that not all structs are tuples. Documents using the 1214 | Infra Standard might need the flexibility to add new names to their struct 1215 | without breaking literal syntax used by their dependencies. In that case a tuple is not appropriate. 1216 | 1217 |
Tuples with two items are also known as pairs. 1220 | For pairs, a slightly shorter literal syntax can be used, separating the two 1221 | items with a / character. 1222 | 1223 |
Another way of expressing our |statusInstance| tuple above would be
1224 | as 200/`OK`.
1225 |
1226 |
1227 |
The conventions used in the algorithms in this section are those of the JavaScript 1230 | specification. [[!ECMA-262]] 1231 | 1232 |
To parse JSON from bytes given bytes, run these steps: 1233 | 1234 |
Let jsonText be the result of running UTF-8 decode on bytes. 1236 | [[!ENCODING]] 1237 | 1238 |
Return ? Call(%JSONParse%, undefined, « jsonText »). 1239 |
To serialize JSON to bytes a given JavaScript value value, run these 1242 | steps: 1243 | 1244 |
Let jsonString be 1247 | ? Call(%JSONStringify%, undefined, « value »). 1248 | 1249 |
Since no additional arguments are passed to %JSONStringify%, the resulting 1250 | string will have no whitespace inserted. 1251 | 1252 |
Return the result of running UTF-8 encode on jsonString. [[!ENCODING]] 1253 |
The above two operations operate on JavaScript values directly; in particular, this means that 1258 | the involved objects or arrays are tied to a particular JavaScript realm. In 1259 | standards, it is often more convenient to parse JSON into realm-independent maps, 1260 | lists, strings, booleans, numbers, and nulls. 1261 | 1262 |
To parse JSON into Infra values, given a string jsonText: 1263 | 1264 |
Let |jsValue| be ? [$Call$](%JSONParse%, undefined, « |jsonText| »). 1266 | 1267 |
Return the result of [=converting a JSON-derived JavaScript value to an Infra value=], given 1268 | |jsValue|. 1269 |
To convert a JSON-derived JavaScript value to an Infra value, 1272 | given a JavaScript value jsValue: 1273 | 1274 |
If [$Type$](|jsValue|) is Null, String, or Number, then return |jsValue|. 1276 | 1277 |
If [$IsArray$](|jsValue|) is true, then: 1279 | 1280 |
Let |result| be an empty [=list=]. 1282 | 1283 |
Let |length| be ! [$ToLength$](! [$Get$](|jsValue|, "length")).
1284 |
1285 |
[=list/For each=] |index| of [=the range=] 0 to |length| − 1, inclusive: 1287 | 1288 |
Let |indexName| be ! [$ToString$](|index|). 1290 | 1291 |
Let |jsValueAtIndex| be ! [$Get$](|jsValue|, |indexName|). 1292 | 1293 |
Let |infraValueAtIndex| be the result of [=converting a JSON-derived JavaScript value to an Infra value=], 1294 | given |jsValueAtIndex|. 1295 | 1296 |
[=list/Append=] |infraValueAtIndex| to |result|. 1297 |
Return |result|. 1301 |
Let |result| be an empty [=ordered map=]. 1305 | 1306 |
[=list/For each=] |key| of ! |jsValue|.\[[OwnPropertyKeys]](): 1308 | 1309 |
Let |jsValueAtKey| be ! [$Get$](|jsValue|, |key|). 1311 | 1312 |
Let |infraValueAtKey| be the result of [=converting a JSON-derived JavaScript value to an Infra value=], 1313 | given |jsValueAtKey|. 1314 | 1315 |
[=map/Set=] |result|[|key|] to |infraValueAtKey|. 1316 |
Return |result|. 1320 |
To forgiving-base64 encode given a byte sequence data, apply 1326 | the base64 algorithm defined in section 4 of RFC 4648 to data and return the result. 1327 | [[!RFC4648]] 1328 | 1329 |
This is named forgiving-base64 encode for symmetry with 1330 | forgiving-base64 decode, which is different from the RFC as it defines error handling for 1331 | certain inputs. 1332 | 1333 |
To forgiving-base64 decode given a string data, run these steps:
1334 | 1335 |Remove all ASCII whitespace from data. 1337 | 1338 | 1339 |
If data's length divides by 4 leaving no remainder, then: 1341 | 1342 |
If data ends with one or two U+003D (=) code points, then remove them 1344 | from data. 1345 |
If data's length divides by 4 leaving a remainder of 1, then 1348 | return failure. 1349 | 1350 |
If data contains a code point that is not one of 1352 | 1353 |
then return failure. 1360 | 1361 |
Let output be an empty byte sequence. 1362 | 1363 |
Let buffer be an empty buffer that can have bits appended to it. 1364 | 1365 |
Let position be a position variable for data, initially 1366 | pointing at the start of data. 1367 | 1368 |
While position does not point past the end of data: 1370 | 1371 |
Find the code point pointed to by position in the second column of 1373 | Table 1: The Base 64 Alphabet of RFC 4648. Let n be the number given in the first cell 1374 | of the same row. [[!RFC4648]] 1375 | 1376 |
Append the six bits corresponding to n, most significant bit first, to 1377 | buffer. 1378 | 1379 |
If buffer has accumulated 24 bits, interpret them as three 8-bit big-endian 1380 | numbers. Append three bytes with values equal to those numbers to output, in the same 1381 | order, and then empty buffer. 1382 | 1383 |
Advance position by 1. 1384 |
If buffer is not empty, it contains either 12 or 18 bits. If it contains 12 bits, 1388 | then discard the last four and interpret the remaining eight as an 8-bit big-endian number. If it 1389 | contains 18 bits, then discard the last two and interpret the remaining 16 as two 8-bit big-endian 1390 | numbers. Append the one or two bytes with values equal to those one or two numbers to 1391 | output, in the same order.
1392 | 1393 |The discarded bits mean that, for instance, "YQ" and
1394 | "YR" both return `a`.
1395 |
1396 |
Return output. 1397 |
The HTML namespace is "http://www.w3.org/1999/xhtml".
1403 |
1404 |
The MathML namespace is "http://www.w3.org/1998/Math/MathML".
1405 |
1406 |
The SVG namespace is "http://www.w3.org/2000/svg".
1407 |
1408 |
The XLink namespace is "http://www.w3.org/1999/xlink".
1409 |
1410 |
The XML namespace is "http://www.w3.org/XML/1998/namespace".
1411 |
1412 |
The XMLNS namespace is "http://www.w3.org/2000/xmlns/".
1413 |
1414 |
1415 |
Many thanks to 1418 | Addison Phillips, 1419 | Aryeh Gregor, 1420 | Chris Rebert, 1421 | Daniel Ehrenberg, 1422 | Dominic Farolino, 1423 | Jake Archibald, 1424 | Jeff Hodges, 1425 | Jungkee Song, 1426 | Leonid Vasilyev, 1427 | Maciej Stachowiak, 1428 | Malika Aubakirova, 1429 | Michael™ Smith, 1430 | Mike West, 1431 | Ms2ger, 1432 | Pavel "Al Arz" Kurochkin, 1433 | Philip Jägenstedt, 1434 | Rashaun "Snuggs" Stovall, 1435 | Sergey Shekyan, 1436 | Simon Pieters, 1437 | Tab Atkins, 1438 | Tobie Langel, 1439 | triple-underscore, 1440 | and Xue Fuqiao 1441 | for being awesome! 1442 | 1443 |
This standard is written by Anne van Kesteren 1444 | (Mozilla, 1445 | annevk@annevk.nl) and 1446 | Domenic Denicola (Google, 1447 | d@domenic.me). 1448 | --------------------------------------------------------------------------------