Template literals should allow the embedding of languages (DSLs etc.). But restrictions on escape sequences make this problematic.
13 |
14 |
For example, consider making a latex processor with templates:
15 |
16 |
functionlatex(strings) {
17 | // ...
18 | }
19 |
20 | letdocument = latex`
21 | \newcommand{\fun}{\textbf{Fun!}} // works just fine
22 | \newcommand{\unicode}{\textbf{Unicode!}} // Illegal token!
23 | \newcommand{\xerxes}{\textbf{King!}} // Illegal token!
24 |
25 | Breve over the h goes \u{h}ere // Illegal token!
26 | `
27 |
28 |
The problem here is that \u is the start of a unicode escape, but ES grammar forces it to be of the form \u00FF or \u{42}
29 | and considers the token \unicode illegal.
30 | Similarly \x is the start of a hex escape like \xFF but \xerxes is illegal. Octal literal escapes have the same problem; \0100 is illegal.
31 |
32 |
Proposal Overview
33 |
34 |
Remove the restriction on escape sequences.
35 |
36 |
Lifting the restriction raises the question of how to handle cooked template values that contain illegal escape sequences. Currently, cooked template values are supposed to replace escape sequences with the "Unicode code point represented by the escape sequence" but this can't happen if the escape sequence is not valid.
37 |
38 |
The proposed solution is to set the cooked value to undefined for template values that contain illegal escape sequences. The raw value is still accessible via .raw so embedded DSLs that might contain undefined cooked values can just use the raw string:
This loosening of the escape sequence restriction only applies to tagged template literals; untagged templates still throw an early error for invalid escape sequences:
47 |
48 |
let bad = `bad escape sequence: \unicode`; // throws early error
A template literal component is interpreted as a sequence of Unicode code points. The Template Value (TV) of a literal component is described in terms of code unit values (SV, 11.8.4) contributed by the various parts of the template literal component. As part of this process, some Unicode code points within the template component are interpreted as having a mathematical value (MV, 11.8.3). In determining a TV, escape sequences are replaced by the UTF-16 code unit(s) of the Unicode code point represented by the escape sequence. The Template Raw Value (TRV) is similar to a Template Value with the difference that in TRVs escape sequences are interpreted literally.
132 |
133 |
134 | The TV and TRV of
135 | NoSubstitutionTemplate::``
136 | is the empty code unit sequence.
137 |
138 |
139 |
140 | The TV and TRV of
141 | TemplateHead::`${
142 | is the empty code unit sequence.
143 |
144 |
145 |
146 | The TV and TRV of
147 | TemplateMiddle::}${
148 | is the empty code unit sequence.
149 |
150 |
151 |
152 | The TV and TRV of
153 | TemplateTail::}`
154 | is the empty code unit sequence.
155 |
156 |
294 | The TRV of
295 | TemplateCharacter::$
296 | is the code unit value 0x0024.
297 |
298 |
299 |
300 | The TRV of
301 | TemplateCharacter::\EscapeSequence
302 | is the sequence consisting of the code unit value 0x005C followed by the code units of TRV of EscapeSequence.
303 |
304 |
352 | The TRV of
353 | NotEscapeSequence::0DecimalDigit
354 | is the sequence consisting of the code unit value 0x0030 (DIGIT ZERO) followed by the code units of the TRV of DecimalDigit.
355 |
356 |
357 |
358 | The TRV of
359 | NotEscapeSequence::x[lookahead ∉ HexDigit]
360 | is the code unit value 0x0078.
361 |
362 |
363 |
364 | The TRV of
365 | NotEscapeSequence::xHexDigit[lookahead ∉ HexDigit]
366 | is the sequence consisting of the code unit value 0x0078 followed by the code units of the TRV of HexDigit.
367 |
368 |
369 |
370 | The TRV of
371 | NotEscapeSequence::u[lookahead ∉ { HexDigit, { }]
372 | is the code unit value 0x0075.
373 |
374 |
375 |
376 | The TRV of
377 | NotEscapeSequence::uHexDigit[lookahead ∉ { HexDigit, { }]
378 | is the sequence consisting of the code unit value 0x0075 followed by the code units of the TRV of HexDigit.
379 |
380 |
381 |
382 | The TRV of
383 | NotEscapeSequence::uHexDigitHexDigit[lookahead ∉ { HexDigit, { }]
384 | is the sequence consisting of the code unit value 0x0075 followed by the code units of the TRV of the first HexDigit followed by the code units of the TRV of the second HexDigit.
385 |
386 |
387 |
388 | The TRV of
389 | NotEscapeSequence::uHexDigitHexDigitHexDigit[lookahead ∉ { HexDigit, { }]
390 | is the sequence consisting of the code unit value 0x0075 followed by the code units of the TRV of the first HexDigit followed by the code units of the TRV of the second HexDigit followed by the code units of the TRV of the third HexDigit.
391 |
392 |
393 |
394 | The TRV of
395 | NotEscapeSequence::u{[lookahead ∉ HexDigit]
396 | is the sequence consisting of the code unit value 0x0075 followed by the code unit value 0x007B.
397 |
398 |
399 |
400 | The TRV of
401 | NotEscapeSequence::u{NotCodePoint
402 | is the sequence consisting of the code unit value 0x0075 followed by the code unit value 0x007B followed by the code units of the TRV of NotCodePoint.
403 |
404 |
405 |
406 | The TRV of
407 | NotEscapeSequence::u{CodePoint[lookahead ∉ }]
408 | is the sequence consisting of the code unit value 0x0075 followed by the code unit value 0x007B followed by the code units of the TRV of CodePoint.
409 |
410 |
411 |
412 | The TRV of
413 | DecimalDigit::one of0123456789
414 | is the SV of the SourceCharacter that is that single code point.
415 |
416 |
417 |
418 | The TRV of
419 | CodePoint::HexDigitsbut not if MV of HexDigits > 0x10FFFF
420 | is the sequence consisting of the code units of the TRV of HexDigits.
421 |
422 |
423 |
424 | The TRV of
425 | NotCodePoint::HexDigitsbut not if MV of HexDigits ≤ 0x10FFFF
426 | is the sequence consisting of the code units of the TRV of HexDigits.
427 |
428 |
444 | The TRV of
445 | SingleEscapeCharacter::one of'"\bfnrtv
446 | is the SV of the SourceCharacter that is that single code point.
447 |
448 |
449 |
450 | The TRV of
451 | HexEscapeSequence::xHexDigitHexDigit
452 | is the sequence consisting of code unit value 0x0078 followed by TRV of the first HexDigit followed by the TRV of the second HexDigit.
453 |
454 |
462 | The TRV of
463 | UnicodeEscapeSequence::u{HexDigits}
464 | is the sequence consisting of code unit value 0x0075 followed by code unit value 0x007B followed by TRV of HexDigits followed by code unit value 0x007D.
465 |
466 |
496 | The TRV of
497 | LineTerminatorSequence::<LF>
498 | is the code unit value 0x000A.
499 |
500 |
501 |
502 | The TRV of
503 | LineTerminatorSequence::<CR>
504 | is the code unit value 0x000A.
505 |
506 |
507 |
508 | The TRV of
509 | LineTerminatorSequence::<LS>
510 | is the code unit value 0x2028.
511 |
512 |
513 |
514 | The TRV of
515 | LineTerminatorSequence::<PS>
516 | is the code unit value 0x2029.
517 |
518 |
519 |
520 | The TRV of
521 | LineTerminatorSequence::<CR><LF>
522 | is the sequence consisting of the code unit value 0x000A.
523 |
524 |
525 |
526 | Note
527 |
TV excludes the code units of LineContinuation while TRV includes them. <CR><LF> and <CR> LineTerminatorSequences are normalized to <LF> for both TV and TRV. An explicit EscapeSequence is needed to include a <CR> or <CR><LF> sequence.
All Software contained in this document ("Software") is protected by copyright and is being made available under the "BSD License", included below. This Software may be subject to third party rights (rights from parties other than Ecma International), including patent rights, and no licenses under such third party rights are granted under this license even if the third party concerned is a member of Ecma International. SEE THE ECMA CODE OF CONDUCT IN PATENT MATTERS AVAILABLE AT http://www.ecma-international.org/memento/codeofconduct.htm FOR INFORMATION REGARDING THE LICENSING OF PATENT CLAIMS THAT ARE REQUIRED TO IMPLEMENT ECMA INTERNATIONAL STANDARDS.
732 |
733 |
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
734 |
735 |
736 |
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
737 |
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
738 |
Neither the name of the authors nor Ecma International may be used to endorse or promote products derived from this software without specific prior written permission.
739 |
740 |
741 |
THIS SOFTWARE IS PROVIDED BY THE ECMA INTERNATIONAL "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL ECMA INTERNATIONAL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Template literals should allow the embedding of languages (DSLs etc.). But restrictions on escape sequences make this problematic.
20 |
21 |
For example, consider making a latex processor with templates:
22 |
23 |
24 | function latex(strings) {
25 | // ...
26 | }
27 |
28 | let document = latex`
29 | \newcommand{\fun}{\textbf{Fun!}} // works just fine
30 | \newcommand{\unicode}{\textbf{Unicode!}} // Illegal token!
31 | \newcommand{\xerxes}{\textbf{King!}} // Illegal token!
32 |
33 | Breve over the h goes \u{h}ere // Illegal token!
34 | `
35 |
36 |
37 |
38 |
The problem here is that `\u` is the start of a unicode escape, but ES grammar forces it to be of the form `\u00FF` or `\u{42}`
39 | and considers the token `\unicode` illegal.
40 | Similarly `\x` is the start of a hex escape like `\xFF` but `\xerxes` is illegal. Octal literal escapes have the same problem; `\0100` is illegal.
41 |
42 |
Proposal Overview
43 |
44 |
Remove the restriction on escape sequences.
45 |
46 |
Lifting the restriction raises the question of how to handle cooked template values that contain illegal escape sequences. Currently, cooked template values are supposed to replace escape sequences with the "Unicode code point represented by the escape sequence" but this can't happen if the escape sequence is not valid.
47 |
48 |
The proposed solution is to set the cooked value to `undefined` for template values that contain illegal escape sequences. The raw value is still accessible via `.raw` so embedded DSLs that might contain `undefined` cooked values can just use the raw string:
49 |
50 |
51 | function tag(strs) {
52 | strs[0] === undefined
53 | strs.raw[0] === "\\unicode and \\u{55}";
54 | }
55 | tag`\unicode and \u{55}`
56 |
57 |
58 |
This loosening of the escape sequence restriction only applies to tagged template literals; untagged templates still throw an early error for invalid escape sequences:
59 |
60 |
61 | let bad = `bad escape sequence: \unicode`; // throws early error
62 |
A template literal component is interpreted as a sequence of Unicode code points. The Template Value (TV) of a literal component is described in terms of code unit values (SV, ) contributed by the various parts of the template literal component. As part of this process, some Unicode code points within the template component are interpreted as having a mathematical value (MV, ). In determining a TV, escape sequences are replaced by the UTF-16 code unit(s) of the Unicode code point represented by the escape sequence. The Template Raw Value (TRV) is similar to a Template Value with the difference that in TRVs escape sequences are interpreted literally.
150 |
151 |
152 | The TV and TRV of NoSubstitutionTemplate :: ``` ``` is the empty code unit sequence.
153 |
154 |
155 | The TV and TRV of TemplateHead :: ``` `${` is the empty code unit sequence.
156 |
157 |
158 | The TV and TRV of TemplateMiddle :: `}` `${` is the empty code unit sequence.
159 |
160 |
161 | The TV and TRV of TemplateTail :: `}` ``` is the empty code unit sequence.
162 |
163 |
164 | The TV of NoSubstitutionTemplate :: ``` TemplateCharacters ``` is the TV of |TemplateCharacters|.
165 |
166 |
167 | The TV of TemplateHead :: ``` TemplateCharacters `${` is the TV of |TemplateCharacters|.
168 |
169 |
170 | The TV of TemplateMiddle :: `}` TemplateCharacters `${` is the TV of |TemplateCharacters|.
171 |
172 |
173 | The TV of TemplateTail :: `}` TemplateCharacters ``` is the TV of |TemplateCharacters|.
174 |
175 |
176 | The TV of TemplateCharacters :: TemplateCharacter is the TV of |TemplateCharacter|.
177 |
178 |
179 |
180 | The TV of TemplateCharacters :: TemplateCharacter TemplateCharacters is a sequence consisting of the code units in the TV of |TemplateCharacter| followed by all the code units in the TV of |TemplateCharacters| in order.
181 |
182 |
183 |
184 |
185 | The TV of TemplateCharacters :: TemplateCharacter TemplateCharacters is:
186 |
187 |
*undefined* if the TV of |TemplateCharacter| is *undefined* or the TV of |TemplateCharacters| is *undefined*, or
188 |
a sequence consisting of the code units in the TV of |TemplateCharacter| followed by all the code units in the TV of |TemplateCharacters| in order.
189 |
190 |
191 |
192 |
193 | The TV of TemplateCharacter :: SourceCharacter but not one of ``` or `\` or `$` or LineTerminator is the UTF16Encoding of the code point value of |SourceCharacter|.
194 |
195 |
196 | The TV of TemplateCharacter :: `$` is the code unit value 0x0024.
197 |
198 |
199 | The TV of TemplateCharacter :: `\` EscapeSequence is the SV of |EscapeSequence|.
200 |
201 |
202 |
203 | The TV of TemplateCharacter :: `\` NotEscapeSequence is *undefined*.
204 |
205 |
206 |
207 | The TV of TemplateCharacter :: LineContinuation is the TV of |LineContinuation|.
208 |
209 |
210 | The TV of TemplateCharacter :: LineTerminatorSequence is the TRV of |LineTerminatorSequence|.
211 |
212 |
213 | The TV of LineContinuation :: `\` LineTerminatorSequence is the empty code unit sequence.
214 |
215 |
216 | The TRV of NoSubstitutionTemplate :: ``` TemplateCharacters ``` is the TRV of |TemplateCharacters|.
217 |
218 |
219 | The TRV of TemplateHead :: ``` TemplateCharacters `${` is the TRV of |TemplateCharacters|.
220 |
221 |
222 | The TRV of TemplateMiddle :: `}` TemplateCharacters `${` is the TRV of |TemplateCharacters|.
223 |
224 |
225 | The TRV of TemplateTail :: `}` TemplateCharacters ``` is the TRV of |TemplateCharacters|.
226 |
227 |
228 | The TRV of TemplateCharacters :: TemplateCharacter is the TRV of |TemplateCharacter|.
229 |
230 |
231 | The TRV of TemplateCharacters :: TemplateCharacter TemplateCharacters is a sequence consisting of the code units in the TRV of |TemplateCharacter| followed by all the code units in the TRV of |TemplateCharacters|, in order.
232 |
233 |
234 | The TRV of TemplateCharacter :: SourceCharacter but not one of ``` or `\` or `$` or LineTerminator is the UTF16Encoding of the code point value of |SourceCharacter|.
235 |
236 |
237 | The TRV of TemplateCharacter :: `$` is the code unit value 0x0024.
238 |
239 |
240 | The TRV of TemplateCharacter :: `\` EscapeSequence is the sequence consisting of the code unit value 0x005C followed by the code units of TRV of |EscapeSequence|.
241 |
242 |
243 |
244 | The TRV of TemplateCharacter :: `\` NotEscapeSequence is the sequence consisting of the code unit value 0x005C followed by the code units of TRV of |NotEscapeSequence|.
245 |
246 |
247 |
248 | The TRV of TemplateCharacter :: LineContinuation is the TRV of |LineContinuation|.
249 |
250 |
251 | The TRV of TemplateCharacter :: LineTerminatorSequence is the TRV of |LineTerminatorSequence|.
252 |
253 |
254 | The TRV of EscapeSequence :: CharacterEscapeSequence is the TRV of the |CharacterEscapeSequence|.
255 |
256 |
257 | The TRV of EscapeSequence :: `0` is the code unit value 0x0030 (DIGIT ZERO).
258 |
259 |
260 | The TRV of EscapeSequence :: HexEscapeSequence is the TRV of the |HexEscapeSequence|.
261 |
262 |
263 | The TRV of EscapeSequence :: UnicodeEscapeSequence is the TRV of the |UnicodeEscapeSequence|.
264 |
265 |
266 |
267 |
268 | The TRV of NotEscapeSequence :: `0` DecimalDigit is the sequence consisting of the code unit value 0x0030 (DIGIT ZERO) followed by the code units of the TRV of |DecimalDigit|.
269 |
270 |
271 | The TRV of NotEscapeSequence :: `x` [lookahead <! HexDigit] is the code unit value 0x0078.
272 |
273 |
274 | The TRV of NotEscapeSequence :: `x` HexDigit [lookahead <! HexDigit] is the sequence consisting of the code unit value 0x0078 followed by the code units of the TRV of |HexDigit|.
275 |
276 |
277 | The TRV of NotEscapeSequence :: `u` [lookahead <! { HexDigit, `{` }] is the code unit value 0x0075.
278 |
279 |
280 | The TRV of NotEscapeSequence :: `u` HexDigit [lookahead <! { HexDigit, `{` }] is the sequence consisting of the code unit value 0x0075 followed by the code units of the TRV of |HexDigit|.
281 |
282 |
283 | The TRV of NotEscapeSequence :: `u` HexDigit HexDigit [lookahead <! { HexDigit, `{` }] is the sequence consisting of the code unit value 0x0075 followed by the code units of the TRV of the first |HexDigit| followed by the code units of the TRV of the second |HexDigit|.
284 |
285 |
286 | The TRV of NotEscapeSequence :: `u` HexDigit HexDigit HexDigit [lookahead <! { HexDigit, `{` }] is the sequence consisting of the code unit value 0x0075 followed by the code units of the TRV of the first |HexDigit| followed by the code units of the TRV of the second |HexDigit| followed by the code units of the TRV of the third |HexDigit|.
287 |
288 |
289 | The TRV of NotEscapeSequence :: `u` `{` [lookahead <! HexDigit] is the sequence consisting of the code unit value 0x0075 followed by the code unit value 0x007B.
290 |
291 |
292 | The TRV of NotEscapeSequence :: `u` `{` NotCodePoint is the sequence consisting of the code unit value 0x0075 followed by the code unit value 0x007B followed by the code units of the TRV of |NotCodePoint|.
293 |
294 |
295 | The TRV of NotEscapeSequence :: `u` `{` CodePoint [lookahead <! `}`] is the sequence consisting of the code unit value 0x0075 followed by the code unit value 0x007B followed by the code units of the TRV of |CodePoint|.
296 |
297 |
298 | The TRV of DecimalDigit :: one of `0` `1` `2` `3` `4` `5` `6` `7` `8` `9` is the SV of the |SourceCharacter| that is that single code point.
299 |
300 |
301 | The TRV of CodePoint :: HexDigits [> but not if MV of HexDigits > 0x10FFFF ] is the sequence consisting of the code units of the TRV of |HexDigits|.
302 |
303 |
304 | The TRV of NotCodePoint :: HexDigits [> but not if MV of HexDigits ≤ 0x10FFFF ] is the sequence consisting of the code units of the TRV of |HexDigits|.
305 |
306 |
307 |
308 |
309 | The TRV of CharacterEscapeSequence :: SingleEscapeCharacter is the TRV of the |SingleEscapeCharacter|.
310 |
311 |
312 | The TRV of CharacterEscapeSequence :: NonEscapeCharacter is the SV of the |NonEscapeCharacter|.
313 |
314 |
315 | The TRV of SingleEscapeCharacter :: one of `'` `"` `\` `b` `f` `n` `r` `t` `v` is the SV of the |SourceCharacter| that is that single code point.
316 |
317 |
318 | The TRV of HexEscapeSequence :: `x` HexDigit HexDigit is the sequence consisting of code unit value 0x0078 followed by TRV of the first |HexDigit| followed by the TRV of the second |HexDigit|.
319 |
320 |
321 | The TRV of UnicodeEscapeSequence :: `u` Hex4Digits is the sequence consisting of code unit value 0x0075 followed by TRV of |Hex4Digits|.
322 |
323 |
324 | The TRV of UnicodeEscapeSequence :: `u{` HexDigits `}` is the sequence consisting of code unit value 0x0075 followed by code unit value 0x007B followed by TRV of |HexDigits| followed by code unit value 0x007D.
325 |
326 |
327 | The TRV of Hex4Digits :: HexDigit HexDigit HexDigit HexDigit is the sequence consisting of the TRV of the first |HexDigit| followed by the TRV of the second |HexDigit| followed by the TRV of the third |HexDigit| followed by the TRV of the fourth |HexDigit|.
328 |
329 |
330 | The TRV of HexDigits :: HexDigit is the TRV of |HexDigit|.
331 |
332 |
333 | The TRV of HexDigits :: HexDigits HexDigit is the sequence consisting of TRV of |HexDigits| followed by TRV of |HexDigit|.
334 |
335 |
336 | The TRV of a |HexDigit| is the SV of the |SourceCharacter| that is that |HexDigit|.
337 |
338 |
339 | The TRV of LineContinuation :: `\` LineTerminatorSequence is the sequence consisting of the code unit value 0x005C followed by the code units of TRV of |LineTerminatorSequence|.
340 |
341 |
342 | The TRV of LineTerminatorSequence :: <LF> is the code unit value 0x000A.
343 |
344 |
345 | The TRV of LineTerminatorSequence :: <CR> is the code unit value 0x000A.
346 |
347 |
348 | The TRV of LineTerminatorSequence :: <LS> is the code unit value 0x2028.
349 |
350 |
351 | The TRV of LineTerminatorSequence :: <PS> is the code unit value 0x2029.
352 |
353 |
354 | The TRV of LineTerminatorSequence :: <CR><LF> is the sequence consisting of the code unit value 0x000A.
355 |
356 |
357 |
358 |
TV excludes the code units of |LineContinuation| while TRV includes them. <CR><LF> and <CR> |LineTerminatorSequence|s are normalized to <LF> for both TV and TRV. An explicit |EscapeSequence| is needed to include a <CR> or <CR><LF> sequence.