├── .gitignore ├── .travis-github-deploy-key.enc ├── .travis.yml ├── LICENSE ├── README.md ├── docs └── index.html ├── missing-productions.json ├── package.json └── spec.emu /.gitignore: -------------------------------------------------------------------------------- 1 | #################################################################################################### 2 | # https://github.com/github/gitignore/blob/9a1d4adec95789f45efd3242099628a2369b2fc8/Node.gitignore 3 | #################################################################################################### 4 | 5 | # Logs 6 | logs 7 | *.log 8 | npm-debug.log* 9 | yarn-debug.log* 10 | yarn-error.log* 11 | 12 | # Runtime data 13 | pids 14 | *.pid 15 | *.seed 16 | *.pid.lock 17 | 18 | # Directory for instrumented libs generated by jscoverage/JSCover 19 | lib-cov 20 | 21 | # Coverage directory used by tools like istanbul 22 | coverage 23 | 24 | # nyc test coverage 25 | .nyc_output 26 | 27 | # Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 28 | .grunt 29 | 30 | # Bower dependency directory (https://bower.io/) 31 | bower_components 32 | 33 | # node-waf configuration 34 | .lock-wscript 35 | 36 | # Compiled binary addons (http://nodejs.org/api/addons.html) 37 | build/Release 38 | 39 | # Dependency directories 40 | node_modules/ 41 | jspm_packages/ 42 | 43 | # Typescript v1 declaration files 44 | typings/ 45 | 46 | # Optional npm cache directory 47 | .npm 48 | 49 | # Optional eslint cache 50 | .eslintcache 51 | 52 | # Optional REPL history 53 | .node_repl_history 54 | 55 | # Output of 'npm pack' 56 | *.tgz 57 | 58 | # Yarn Integrity file 59 | .yarn-integrity 60 | 61 | # dotenv environment variables file 62 | .env 63 | 64 | 65 | #################################################################################################### 66 | # https://github.com/github/gitignore/blob/9a1d4adec95789f45efd3242099628a2369b2fc8/Global/Vim.gitignore 67 | #################################################################################################### 68 | 69 | # Swap 70 | [._]*.s[a-v][a-z] 71 | [._]*.sw[a-p] 72 | [._]s[a-v][a-z] 73 | [._]sw[a-p] 74 | 75 | # Session 76 | Session.vim 77 | 78 | # Temporary 79 | .netrwhist 80 | *~ 81 | # Auto-generated tag files 82 | tags 83 | -------------------------------------------------------------------------------- /.travis-github-deploy-key.enc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tc39/proposal-well-formed-stringify/26e901e8356c01b6c0fc45a9c06df6740ae04f4c/.travis-github-deploy-key.enc -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | sudo: false 2 | language: node_js 3 | node_js: 4 | - node 5 | script: 6 | - npm run build && 7 | { [ "@$TRAVIS_BRANCH" = "@master" ] || exit 0; } && 8 | $(npm bin)/set-up-ssh 9 | --key "$encrypted_8dd0e096afa0_key" 10 | --iv "$encrypted_8dd0e096afa0_iv" 11 | --path-encrypted-key ".travis-github-deploy-key.enc" && 12 | git checkout "$TRAVIS_BRANCH" && 13 | git add docs && 14 | { 15 | git diff --staged --stat && 16 | git commit --no-verify -m "Generated content @ $TRAVIS_COMMIT [skip ci]" || 17 | exit 0; 18 | } && 19 | git push "git@github.com:${TRAVIS_REPO_SLUG}.git" HEAD 20 | 21 | git: 22 | depth: 1 23 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Richard Gibson 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Well-formed JSON.stringify 2 | 3 | A proposal to prevent `JSON.stringify` from returning ill-formed Unicode strings. 4 | 5 | ## Status 6 | This proposal is at stage 4 of [the TC39 Process](https://tc39.github.io/process-document/). 7 | 8 | ## Champions 9 | * Mathias Bynens 10 | 11 | ## Motivation 12 | [RFC 8259 section 8.1](https://tools.ietf.org/html/rfc8259#section-8.1) requires JSON text exchanged outside the scope of a closed ecosystem to be encoded using UTF-8, but [`JSON.stringify`](https://tc39.github.io/ecma262/#sec-json.stringify) can return strings including code points that have no representation in UTF-8 (specifically, surrogate code points U+D800 through U+DFFF). 13 | And contrary to [the description of `JSON.stringify`](https://tc39.github.io/ecma262/#sec-json.stringify), such strings are not "in UTF-16" because "isolated UTF-16 code units in the range D800₁₆..DFFF₁₆ are ill-formed" per [The Unicode Standard, Version 10.0.0, Section 3.4](https://unicode.org/versions/Unicode10.0.0/ch03.pdf#G7404) at definition D91 and excluded from being "in UTF-16" per definition D89. 14 | 15 | However, returning such invalid Unicode strings is unnecessary, because JSON strings can include Unicode escape sequences. 16 | 17 | ## Proposed Solution 18 | Rather than return unpaired surrogate code points as single UTF-16 code units, represent them with JSON escape sequences. 19 | 20 | ## Illustrative examples 21 | ```js 22 | // Non-BMP characters still serialize to surrogate pairs. 23 | JSON.stringify('𝌆') 24 | // → '"𝌆"' 25 | JSON.stringify('\uD834\uDF06') 26 | // → '"𝌆"' 27 | 28 | // Unpaired surrogate code units will serialize to escape sequences. 29 | JSON.stringify('\uDF06\uD834') 30 | // → '"\\udf06\\ud834"' 31 | JSON.stringify('\uDEAD') 32 | // → '"\\udead"' 33 | ``` 34 | 35 | ## Discussion 36 | ### Backwards Compatibility 37 | This change is backwards-compatible, under an assumption of consumer compliance with the JSON specification. 38 | User-visible effects will be limited to the replacement of some rare single UTF-16 code units in `JSON.stringify` output with equivalent six-character escape sequences that can be represented both in UTF-16 and in UTF-8. 39 | It is the authors' opinion that any consumer accepting the current ill-formed output will be unaffected by this change (this is true in particular of ECMAScript `JSON.parse`). 40 | Any consumer rejecting the current ill-formed output will have a new opportunity to accept its well-formed representation, although such consumers may still reject input that specifies strings including Unicode code points that are not scalar values (e.g., because they only accept [I-JSON](https://tools.ietf.org/html/rfc7493) input), but those that accept it must have mechanisms for dealing with unpaired surrogates (as mentioned in the specification of JSON). 41 | 42 | ### Validity 43 | Unicode escape sequences are valid JSON, and—being completely ASCII—are well-formed in both UTF-16 and UTF-8. 44 | 45 | ## Specification 46 | The specification is available in [ecmarkup](spec.emu) or [rendered HTML](https://tc39.github.io/proposal-well-formed-stringify/). 47 | 48 | ## Implementations 49 | * [V8](https://bugs.chromium.org/p/v8/issues/detail?id=7782), enabled by default in V8 v7.2.10 and Chrome 72 50 | * [SpiderMonkey](https://bugzilla.mozilla.org/show_bug.cgi?id=1469021), shipping in Firefox 64 51 | * [JavaScriptCore](https://bugs.webkit.org/show_bug.cgi?id=191677), shipping in [Safari Technology Preview 73](https://webkit.org/blog/8555/release-notes-for-safari-technology-preview-73/) 52 | -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Well-formed JSON.stringify

Proposal proposal-well-formed-stringify

Stage 4 Draft / January 29, 2019

Well-formed JSON.stringify

1807 | 1808 | 1809 | 1810 | 1811 |

1Runtime Semantics: QuoteJSONString ( value )

1812 |

The abstract operation QuoteJSONString with argument value wraps a String value in QUOTATION MARK code units and escapes certain other code units within it.

1813 |
  1. Let product be the String value consisting solely of the code unit 0x0022 (QUOTATION MARK).
  2. For each code unitpoint C in value when interpreted as UTF-16 encoded Unicode text as described in 6.1.4, do
    1. If the numeric value of C is listed in the Code Unit ValuePoint column of Table 1, then
      1. Set product to the string-concatenation of product and the Escape Sequence for C as specified in Table 1.
    2. Else if C has a numeric value less than 0x0020 (SPACE), or C has the same numeric value as a leading-surrogate code unit or trailing-surrogate code unit, then
      1. Let unit be a code unit whose numeric value is that of C.
      2. Set product to the string-concatenation of product and UnicodeEscape(unit).
    3. Else,
      1. Set product to the string-concatenation of product and the UTF16Encoding of C.
  3. Set product to the string-concatenation of product and the code unit 0x0022 (QUOTATION MARK).
  4. Return product. 1814 |
1815 |
Table 1: JSON Single Character Escape Sequences
1816 | 1817 | 1818 | 1819 | 1822 | 1826 | 1830 | 1831 | 1832 | 1835 | 1839 | 1843 | 1844 | 1845 | 1848 | 1852 | 1856 | 1857 | 1858 | 1861 | 1865 | 1869 | 1870 | 1871 | 1874 | 1878 | 1882 | 1883 | 1884 | 1887 | 1891 | 1895 | 1896 | 1897 | 1900 | 1904 | 1908 | 1909 | 1910 | 1913 | 1917 | 1921 | 1922 | 1923 |
1820 | Code Unit ValuePoint 1821 | 1823 | Unicode Character Name 1824 | 1825 | 1827 | Escape Sequence 1828 | 1829 |
1833 | 0x0008U+0008 1834 | 1836 | BACKSPACE 1837 | 1838 | 1840 | \b 1841 | 1842 |
1846 | 0x0009U+0009 1847 | 1849 | CHARACTER TABULATION 1850 | 1851 | 1853 | \t 1854 | 1855 |
1859 | 0x000AU+000A 1860 | 1862 | LINE FEED (LF) 1863 | 1864 | 1866 | \n 1867 | 1868 |
1872 | 0x000CU+000C 1873 | 1875 | FORM FEED (FF) 1876 | 1877 | 1879 | \f 1880 | 1881 |
1885 | 0x000DU+000D 1886 | 1888 | CARRIAGE RETURN (CR) 1889 | 1890 | 1892 | \r 1893 | 1894 |
1898 | 0x0022U+0022 1899 | 1901 | QUOTATION MARK 1902 | 1903 | 1905 | \" 1906 | 1907 |
1911 | 0x005CU+005C 1912 | 1914 | REVERSE SOLIDUS 1915 | 1916 | 1918 | \\ 1919 | 1920 |
1924 |
1925 |
1926 |

ACopyright & Software License

1927 | 1928 |

Copyright Notice

1929 |

© 2019 Richard Gibson

1930 | 1931 |

Software License

1932 |

All Software contained in this document ("Software") is protected by copyright and is being made available under the "BSD License", included below. This Software may be subject to third party rights (rights from parties other than Ecma International), including patent rights, and no licenses under such third party rights are granted under this license even if the third party concerned is a member of Ecma International. SEE THE ECMA CODE OF CONDUCT IN PATENT MATTERS AVAILABLE AT https://ecma-international.org/memento/codeofconduct.htm FOR INFORMATION REGARDING THE LICENSING OF PATENT CLAIMS THAT ARE REQUIRED TO IMPLEMENT ECMA INTERNATIONAL STANDARDS.

1933 | 1934 |

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1935 | 1936 |
    1937 |
  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. 1938 |
  3. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  4. 1939 |
  5. Neither the name of the authors nor Ecma International may be used to endorse or promote products derived from this software without specific prior written permission.
  6. 1940 |
1941 | 1942 |

THIS SOFTWARE IS PROVIDED BY THE ECMA INTERNATIONAL "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL ECMA INTERNATIONAL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

1943 | 1944 |
1945 |
-------------------------------------------------------------------------------- /missing-productions.json: -------------------------------------------------------------------------------- 1 | { 2 | "https://tc39.github.io/ecma262/": [ 3 | { 4 | "type": "production", 5 | "id": "prod-annexB-LegacyOctalEscapeSequence", 6 | "name": "LegacyOctalEscapeSequence" 7 | }, 8 | { 9 | "type": "term", 10 | "id": "leading-surrogate-code-unit", 11 | "term": "leading-surrogate code unit" 12 | }, 13 | { 14 | "type": "term", 15 | "id": "trailing-surrogate-code-unit", 16 | "term": "trailing-surrogate code unit" 17 | } 18 | ] 19 | } 20 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "private": true, 3 | "devDependencies": { 4 | "ecmarkup": "^3.0.0", 5 | "@alrra/travis-scripts": "^3.0.1" 6 | }, 7 | "scripts": { 8 | "test": "exit 0", 9 | "build": "ecmarkup spec.emu docs/index.html" 10 | } 11 | } 12 | -------------------------------------------------------------------------------- /spec.emu: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
  5 | title: Well-formed JSON.stringify
  6 | status: proposal
  7 | stage: 4
  8 | shortname: <a href="https://github.com/gibson042/ecma262-proposal-well-formed-stringify">proposal-well-formed-stringify</a>
  9 | contributors: Richard Gibson
 10 | 
11 | 12 | 13 | 14 | 15 |

Runtime Semantics: QuoteJSONString ( _value_ )

16 |

The abstract operation QuoteJSONString with argument _value_ wraps a String value in QUOTATION MARK code units and escapes certain other code units within it.

17 | 18 | 1. Let _product_ be the String value consisting solely of the code unit 0x0022 (QUOTATION MARK). 19 | 1. For each code unitpoint _C_ in _value_ when interpreted as UTF-16 encoded Unicode text as described in , do 20 | 1. If the numeric value of _C_ is listed in the Code Unit ValuePoint column of , then 21 | 1. Set _product_ to the string-concatenation of _product_ and the Escape Sequence for _C_ as specified in . 22 | 1. Else if _C_ has a numeric value less than 0x0020 (SPACE), or _C_ has the same numeric value as a or , then 23 | 1. Let _unit_ be a code unit whose numeric value is that of _C_. 24 | 1. Set _product_ to the string-concatenation of _product_ and UnicodeEscape(_unit_). 25 | 1. Else, 26 | 1. Set _product_ to the string-concatenation of _product_ and the of _C_. 27 | 1. Set _product_ to the string-concatenation of _product_ and the code unit 0x0022 (QUOTATION MARK). 28 | 1. Return _product_. 29 | 30 | 31 | 32 | 33 | 34 | 37 | 40 | 43 | 44 | 45 | 48 | 51 | 54 | 55 | 56 | 59 | 62 | 65 | 66 | 67 | 70 | 73 | 76 | 77 | 78 | 81 | 84 | 87 | 88 | 89 | 92 | 95 | 98 | 99 | 100 | 103 | 106 | 109 | 110 | 111 | 114 | 117 | 120 | 121 | 122 |
35 | Code Unit ValuePoint 36 | 38 | Unicode Character Name 39 | 41 | Escape Sequence 42 |
46 | `0x0008`U+0008 47 | 49 | BACKSPACE 50 | 52 | `\\b` 53 |
57 | `0x0009`U+0009 58 | 60 | CHARACTER TABULATION 61 | 63 | `\\t` 64 |
68 | `0x000A`U+000A 69 | 71 | LINE FEED (LF) 72 | 74 | `\\n` 75 |
79 | `0x000C`U+000C 80 | 82 | FORM FEED (FF) 83 | 85 | `\\f` 86 |
90 | `0x000D`U+000D 91 | 93 | CARRIAGE RETURN (CR) 94 | 96 | `\\r` 97 |
101 | `0x0022`U+0022 102 | 104 | QUOTATION MARK 105 | 107 | `\\"` 108 |
112 | `0x005C`U+005C 113 | 115 | REVERSE SOLIDUS 116 | 118 | `\\\\` 119 |
123 |
124 |
125 | --------------------------------------------------------------------------------