├── README.md └── spec.html /README.md: -------------------------------------------------------------------------------- 1 | # RegExp Named Capture Groups 2 | 3 | Stage 4 4 | 5 | Champions: Daniel Ehrenberg (Igalia) & Mathias Bynens (Google) 6 | 7 | ## Introduction 8 | 9 | Numbered capture groups allow one to refer to certain portions of a string that a regular expression matches. Each capture group is assigned a unique number and can be referenced using that number, but this can make a regular expression hard to grasp and refactor. 10 | 11 | For example, given `/(\d{4})-(\d{2})-(\d{2})/` that matches a date, one cannot be sure which group corresponds to the month and which one is the day without examining the surrounding code. Also, if one wants to swap the order of the month and the day, the group references should also be updated. 12 | 13 | Named capture groups provide a nice solution for these issues. 14 | 15 | ## High Level API 16 | 17 | A capture group can be given a name using the `(?...)` syntax, for any identifier `name`. The regular expression for a date then can be written as `/(?\d{4})-(?\d{2})-(?\d{2})/u`. Each name should be unique and follow the grammar for ECMAScript IdentifierName. 18 | 19 | Named groups can be accessed from properties of a `groups` property of the regular expression result. Numbered references to the groups are also created, just as for non-named groups. For example: 20 | 21 | ```js 22 | let re = /(?\d{4})-(?\d{2})-(?\d{2})/u; 23 | let result = re.exec('2015-01-02'); 24 | // result.groups.year === '2015'; 25 | // result.groups.month === '01'; 26 | // result.groups.day === '02'; 27 | 28 | // result[0] === '2015-01-02'; 29 | // result[1] === '2015'; 30 | // result[2] === '01'; 31 | // result[3] === '02'; 32 | ``` 33 | 34 | The interface interacts nicely with destructuring, as in the following example: 35 | 36 | ```js 37 | let {groups: {one, two}} = /^(?.*):(?.*)$/u.exec('foo:bar'); 38 | console.log(`one: ${one}, two: ${two}`); // prints one: foo, two: bar 39 | ``` 40 | 41 | ### Backreferences 42 | 43 | A named group can be accessed within a regular expression via the `\k` construct. For example, 44 | 45 | ```js 46 | let duplicate = /^(?.*).\k$/u; 47 | duplicate.test('a*b'); // false 48 | duplicate.test('a*a'); // true 49 | ``` 50 | 51 | Named references can also be used simultaneously with numbered references. 52 | 53 | ```js 54 | let triplicate = /^(?.*).\k.\1$/u; 55 | triplicate.test('a*a*a'); // true 56 | triplicate.test('a*a*b'); // false 57 | ``` 58 | 59 | ### Replacement targets 60 | 61 | Named groups can be referenced from the replacement value passed to `String.prototype.replace` too. If the value is a string, named groups can be accessed using the `$` syntax. For example: 62 | 63 | ```js 64 | let re = /(?\d{4})-(?\d{2})-(?\d{2})/u; 65 | let result = '2015-01-02'.replace(re, '$/$/$'); 66 | // result === '02/01/2015' 67 | ``` 68 | 69 | Note that an ordinary string literal, not a template literal, is passed into `replace`, as that method will resolve the values of `day` etc rather than having them as local variables. An alternative would be to use `${day}` syntax (while remaining not a template string); this proposal uses `$` to draw a parallel to the definition of the group and a distinction from template literals. 70 | 71 | If the second argument to `String.prototype.replace` is a function, then the named groups can be accessed via a new parameter called `groups`. The new signature would be `function (matched, capture1, ..., captureN, position, S, groups)`. Named captures would still participate in numbering, as usual. For example: 72 | 73 | ```js 74 | let re = /(?\d{4})-(?\d{2})-(?\d{2})/u; 75 | let result = '2015-01-02'.replace(re, (...args) => { 76 | let {day, month, year} = args[args.length - 1]; 77 | return `${day}/${month}/${year}`; 78 | }); 79 | // result === '02/01/2015' 80 | ``` 81 | 82 | ## Details 83 | 84 | ### Overlapping group names 85 | 86 | RegExp result objects have some non-numerical properties already, which named capture groups may overlap with, namely `length`, `index` and `input`. In this proposal, to avoid ambiguity and edge cases around overlapping names, named group properties are placed on a separate `groups` object which is a property of the match object. This solution will permit additional properties to be placed on the result of `exec` in future ECMAScript versions without creating any web compatibility hazards. 87 | 88 | The groups object is only created for RegExps with named groups. It does not include numbered group properties, only the named ones. Properties are created on the `groups` object for all groups which are mentioned in the RegExp; if they are not encountered in the match, the value is `undefined`. 89 | 90 | ### Backwards compatibility of new syntax 91 | 92 | The syntax for creating a new named group, `/(?)/`, is currently a syntax error in ECMAScript RegExps, so it can be added to all RegExps without ambiguity. However, the named backreference syntax, `/\k/`, is currently permitted in non-Unicode RegExps and matches the literal string `"k"`. In Unicode RegExps, such escapes are banned. 93 | 94 | In this proposal, `\k` in non-Unicode RegExps will continue to match the literal string `"k"` *unless* the RegExp contains a named group, in which case it will match that group or be a syntax error, depending on whether or not the RegExp has a named group named `foo`. This does not affect existing code, since no currently valid RegExp can have a named group. It would be a refactoring hazard, although only for code which contained `\k` in a RegExp. 95 | 96 | ## Precedent in other programming languages 97 | 98 | This proposal is analogous to what many other programming languages have done for named capture groups. It seems to be what the consensus syntax is moving towards, though the Python syntax is an interesting and compelling outlier which would address the non-Unicode backreference issue. 99 | 100 | ### Perl [ref](http://perldoc.perl.org/perlre.html#Regular-Expressions) 101 | 102 | Perl uses the same syntax as this proposal for named capture groups `/(?)/` and backreferences `/\k/`. 103 | 104 | ### Python [ref](https://docs.python.org/2/library/re.html#regular-expression-syntax) 105 | 106 | Named captures have the syntax `"(?P)"` and have backrereferences with `(?P=name)`. 107 | 108 | ### Java [ref](https://blogs.oracle.com/xuemingshen/entry/named_capturing_group_in_jdk7) 109 | 110 | JDK7+ supports named capture groups with syntax like Perl and this proposal. 111 | 112 | ### .NET [ref](https://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.110).aspx#Anchor_1) 113 | 114 | C# and VB.NET support named capture groups with the syntax `"(?)"` as well as `"(?'name')"` and backreferences with `"\k"`. 115 | 116 | ### PHP 117 | 118 | According to a [Stack Overflow post](http://stackoverflow.com/questions/6971287/named-capture-in-php-using-regex) and a [comment on php.net docs](http://php.net/manual/en/function.preg-match.php#89418), PHP has long supported named groups with the syntax `"(?P)"`, which is available as a property in the resulting match object. 119 | 120 | ### Ruby [ref](https://ruby-doc.org/core-2.2.0/Regexp.html#class-Regexp-label-Capturing) 121 | 122 | Ruby's syntax is identical to .NET, with named capture groups with the syntax `"(?)"` as well as `"(?'name')"` and backreferences with `"\k"`. 123 | 124 | ## Draft specification 125 | 126 | [Draft spec](https://tc39.github.io/proposal-regexp-named-groups/) 127 | 128 | ## Implementations 129 | 130 | * [V8](https://bugs.chromium.org/p/v8/issues/detail?id=5437), shipping in Chrome 64 131 | * [XS](https://github.com/Moddable-OpenSource/moddable/blob/public/xs/sources/xsre.c), in [January 17, 2018 update](http://blog.moddable.tech/blog/january-17-2017-big-update-to-moddable-sdk/) 132 | * [Transpiler (Babel plugin)](https://github.com/DmitrySoshnikov/babel-plugin-transform-modern-regexp#named-capturing-groups) 133 | * [Safari](https://developer.apple.com/safari/technology-preview/release-notes/) beginning in Safari Technology Preview 40 134 | -------------------------------------------------------------------------------- /spec.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 11 | 12 | 13 | 14 | 15 |

Patterns (#sec-patterns)

16 |

Syntax

17 | 18 | Atom[U, N] :: 19 | PatternCharacter 20 | `.` 21 | `\` AtomEscape[?U, ?N] 22 | CharacterClass[?U, ?N] 23 | `(` GroupSpecifier Disjunction[?U, ?N] `)` 24 | `(` `?` `:` Disjunction[?U, ?N] `)` 25 | 26 | AtomEscape[U, N] :: 27 | DecimalEscape 28 | CharacterClassEscape 29 | CharacterEscape[?U] 30 | [+N] `k` GroupName[?U] 31 | 32 | GroupSpecifier[U] :: 33 | [empty] 34 | `?` GroupName[?U] 35 | 36 | GroupName[U] :: 37 | `<` RegExpIdentifierName[?U] `>` 38 | 39 | RegExpIdentifierName[U] :: 40 | RegExpIdentifierStart[?U] 41 | RegExpIdentifierName[?U] RegExpIdentifierPart[?U] 42 | 43 | RegExpIdentifierStart[U] :: 44 | UnicodeIDStart 45 | `$` 46 | `_` 47 | `\` RegExpUnicodeEscapeSequence[?U] 48 | 49 | RegExpIdentifierPart[U] :: 50 | UnicodeIDContinue 51 | `$` 52 | `_` 53 | `\` RegExpUnicodeEscapeSequence[?U] 54 | <ZWNJ> 55 | <ZWJ> 56 | 57 |
58 | 59 | 60 | 61 |

Static Semantics: Early Errors (#sec-patterns-static-semantics-early-errors)

62 | 63 | Pattern :: Disjunction 64 |
    65 |
  • 66 | It is a Syntax Error if |Pattern| contains multiple |GroupSpecifier|s whose enclosed |RegExpIdentifierName|s have the same StringValue. 67 |
  • 68 |
69 | 70 | AtomEscape[U] :: [+N] `k` GroupName 71 |
    72 |
  • 73 | It is a Syntax Error if the enclosing RegExp does not contain a |GroupSpecifier| with an enclosed |RegExpIdentifierName| whose StringValue equals the StringValue of the |RegExpIdentifierName| of this production's |GroupName|. 74 |
  • 75 |
76 | 77 | RegExpIdentifierStart[U] :: `\` RegExpUnicodeEscapeSequence[?U] 78 |
    79 |
  • 80 | It is a Syntax Error if SV(|RegExpUnicodeEscapeSequence|) is none of `"$"`, or `"_"`, or the UTF16Encoding of a code point matched by the |UnicodeIDStart| lexical grammar production. 81 |
  • 82 |
83 | RegExpIdentifierPart[U] :: `\` RegExpUnicodeEscapeSequence[?U] 84 |
    85 |
  • 86 | It is a Syntax Error if SV(|RegExpUnicodeEscapeSequence|) is none of `"$"`, or `"_"`, or the UTF16Encoding of either <ZWNJ> or <ZWJ>, or the UTF16Encoding of a Unicode code point that would be matched by the |UnicodeIDContinue| lexical grammar production. 87 |
  • 88 |
89 |
90 |
91 | 92 | 93 |

Static Semantics: StringValue

94 | 95 | 96 | RegExpIdentifierName[U] :: 97 | RegExpIdentifierStart[?U] 98 | RegExpIdentifierName[?U] RegExpIdentifierPart[?U] 99 | 100 | 101 | 1. Return the String value consisting of the sequence of code units corresponding to |RegExpIdentifierName|. In determining the sequence any occurrences of `\\` |RegExpUnicodeEscapeSequence| are first replaced with the code point represented by the |RegExpUnicodeEscapeSequence| and then the code points of the entire |RegExpIdentifierName| are converted to code units by UTF16Encoding each code point. 102 | 103 |
104 | 105 | 106 |

Runtime Semantics: BackreferenceMatcher Abstract Operation

107 |

The abstract operation BackreferenceMatcher takes one argument, an integer _n_, and performs the following steps:

108 | 109 | 1. Return an internal Matcher closure that takes two arguments, a State _x_ and a Continuation _c_, and performs the following steps: 110 | 1. Let _cap_ be _x_'s _captures_ List. 111 | 1. Let _s_ be _cap_[_n_]. 112 | 1. If _s_ is *undefined*, return _c_(_x_). 113 | 1. Let _e_ be _x_'s _endIndex_. 114 | 1. Let _len_ be the number of elements in _s_. 115 | 1. Let _f_ be _e_+_len_. 116 | 1. If _f_>_InputLength_, return ~failure~. 117 | 1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]) is not the same character value as Canonicalize(_Input_[_e_+_i_]), return ~failure~. 118 | 1. Let _y_ be the State (_f_, _cap_). 119 | 1. Call _c_(_y_) and return its result. 120 | 121 | This abstract operation is extracted from the runtime semantics of AtomEscape :: DecimalEscape, and when this text is integrated into the main specification, it would be called from there as well. 122 |
123 | 124 | 125 |

AtomEscape

126 | 127 |

The production AtomEscape[U] :: [+N] `k` GroupName evaluates as follows:

128 | 129 | 1. Search the enclosing RegExp for an instance of a |GroupSpecifier| for a |RegExpIdentifierName| which has a StringValue equal to the StringValue of the |RegExpIdentifierName| contained in |GroupName|. 130 | 1. Assert: A unique such |GroupSpecifier| is found. 131 | 1. Let _parenIndex_ be the number of left capturing parentheses in the entire regular expression that occur to the left of the located |GroupSpecifier|. This is the total number of times the Atom :: `(` GroupSpecifier Disjunction `)` production is expanded prior to that production's |Term| plus the total number of Atom :: `(` GroupSpecifier Disjunction `)` productions enclosing this |Term|. 132 | 1. Call BackreferenceMatcher(_parenIndex_) and return its Matcher result. 133 | 134 |
135 |
136 | 137 | 138 |

Runtime Semantics: RegExpInitialize ( _obj_, _pattern_, _flags_ )

139 |

When the abstract operation RegExpInitialize with arguments _obj_, _pattern_, and _flags_ is called, the following steps are taken:

140 | 141 | 1. If _pattern_ is *undefined*, let _P_ be the empty String. 142 | 1. Else, let _P_ be ? ToString(_pattern_). 143 | 1. If _flags_ is *undefined*, let _F_ be the empty String. 144 | 1. Else, let _F_ be ? ToString(_flags_). 145 | 1. If _F_ contains any code unit other than `"g"`, `"i"`, `"m"`, `"u"`, or `"y"` or if it contains the same code unit more than once, throw a *SyntaxError* exception. 146 | 1. If _F_ contains `"u"`, let _BMP_ be *false*; else let _BMP_ be *true*. 147 | 1. If _BMP_ is *true*, then 148 | 1. Parse _P_ using the grammars in and interpreting each of its 16-bit elements as a Unicode BMP code point. UTF-16 decoding is not applied to the elements. The goal symbol for the parse is |Pattern[~U, ~N]|. If the result of parsing contains a |GroupName|, reparse with the goal symbol |Pattern[~U, +N]| and use this result instead. Throw a *SyntaxError* exception if _P_ did not conform to the grammar in either parsing attempt, if any elements of _P_ were not matched by the parse, or if any Early Error conditions exist. 149 | 1. Let _patternCharacters_ be a List whose elements are the code unit elements of _P_. 150 | 1. Else, 151 | 1. Parse _P_ using the grammars in and interpreting _P_ as UTF-16 encoded Unicode code points (). The goal symbol for the parse is |Pattern[+U, +N]|. Throw a *SyntaxError* exception if _P_ did not conform to the grammar, if any elements of _P_ were not matched by the parse, or if any Early Error conditions exist. 152 | 1. Let _patternCharacters_ be a List whose elements are the code points resulting from applying UTF-16 decoding to _P_'s sequence of elements. 153 | 1. Set _obj_.[[OriginalSource]] to _P_. 154 | 1. Set _obj_.[[OriginalFlags]] to _F_. 155 | 1. Set _obj_.[[RegExpMatcher]] to the internal procedure that evaluates the above parse of _P_ by applying the semantics provided in using _patternCharacters_ as the pattern's List of |SourceCharacter| values and _F_ as the flag parameters. 156 | 1. Perform ? Set(_obj_, `"lastIndex"`, 0, *true*). 157 | 1. Return _obj_. 158 | 159 |
160 | 161 | 162 | 163 | 164 |

Runtime Semantics: RegExpBuiltinExec ( _R_, _S_ )

165 |

The abstract operation RegExpBuiltinExec with arguments _R_ and _S_ performs the following steps:

166 | 167 | 1. Assert: _R_ is an initialized RegExp instance. 168 | 1. Assert: Type(_S_) is String. 169 | 1. Let _length_ be the number of code units in _S_. 170 | 1. Let _flags_ be _R_.[[OriginalFlags]]. 171 | 1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*. 172 | 1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*. 173 | 1. If _global_ is *false* and _sticky_ is *false*, let _lastIndex_ be 0. 174 | 1. Else, let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)). 175 | 1. Let _matcher_ be _R_.[[RegExpMatcher]]. 176 | 1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*. 177 | 1. Let _matchSucceeded_ be *false*. 178 | 1. Repeat, while _matchSucceeded_ is *false* 179 | 1. If _lastIndex_ > _length_, then 180 | 1. If _global_ is *true* or _sticky_ is *true*, then 181 | 1. Perform ? Set(_R_, `"lastIndex"`, 0, *true*). 182 | 1. Return *null*. 183 | 1. Let _r_ be _matcher_(_S_, _lastIndex_). 184 | 1. If _r_ is ~failure~, then 185 | 1. If _sticky_ is *true*, then 186 | 1. Perform ? Set(_R_, `"lastIndex"`, 0, *true*). 187 | 1. Return *null*. 188 | 1. Let _lastIndex_ be AdvanceStringIndex(_S_, _lastIndex_, _fullUnicode_). 189 | 1. Else, 190 | 1. Assert: _r_ is a State. 191 | 1. Set _matchSucceeded_ to *true*. 192 | 1. Let _e_ be _r_'s _endIndex_ value. 193 | 1. If _fullUnicode_ is *true*, then 194 | 1. _e_ is an index into the _Input_ character list, derived from _S_, matched by _matcher_. Let _eUTF_ be the smallest index into _S_ that corresponds to the character at element _e_ of _Input_. If _e_ is greater than or equal to the length of _Input_, then _eUTF_ is the number of code units in _S_. 195 | 1. Let _e_ be _eUTF_. 196 | 1. If _global_ is *true* or _sticky_ is *true*, then 197 | 1. Perform ? Set(_R_, `"lastIndex"`, _e_, *true*). 198 | 1. Let _n_ be the length of _r_'s _captures_ List. (This is the same value as 's _NcapturingParens_.) 199 | 1. Let _A_ be ArrayCreate(_n_ + 1). 200 | 1. Assert: The value of _A_'s `"length"` property is _n_ + 1. 201 | 1. Let _matchIndex_ be _lastIndex_. 202 | 1. Perform ! CreateDataProperty(_A_, `"index"`, _matchIndex_). 203 | 1. Perform ! CreateDataProperty(_A_, `"input"`, _S_). 204 | 1. Let _matchedSubstr_ be the matched substring (i.e. the portion of _S_ between offset _lastIndex_ inclusive and offset _e_ exclusive). 205 | 1. Perform ! CreateDataProperty(_A_, `"0"`, _matchedSubstr_). 206 | 1. If _R_ contains any |GroupName|, then 207 | 1. Let _groups_ be ObjectCreate(*null*). 208 | 1. Else, 209 | 1. Let _groups_ be *undefined*. 210 | 1. Perform ! CreateDataProperty(_A_, `"groups"`, _groups_). 211 | 1. For each integer _i_ such that _i_ > 0 and _i_ ≤ _n_ 212 | 1. Let _captureI_ be _i_th element of _r_'s _captures_ List. 213 | 1. If _captureI_ is *undefined*, let _capturedValue_ be *undefined*. 214 | 1. Else if _fullUnicode_ is *true*, then 215 | 1. Assert: _captureI_ is a List of code points. 216 | 1. Let _capturedValue_ be a string whose code units are the UTF16Encoding of the code points of _captureI_. 217 | 1. Else _fullUnicode_ is *false*, 218 | 1. Assert: _captureI_ is a List of code units. 219 | 1. Let _capturedValue_ be a string consisting of the code units of _captureI_. 220 | 1. Perform ! CreateDataProperty(_A_, ! ToString(_i_), _capturedValue_). 221 | 1. If the _i_th capture of _R_ was defined with a |GroupName|, then 222 | 1. Let _s_ be the StringValue of the corresponding |RegExpIdentifierName|. 223 | 1. Perform ! CreateDataProperty(_groups_, _s_, _capturedValue_). 224 | 1. Return _A_. 225 | 226 |
227 | 228 | 229 |

String.prototype.replace ( _searchValue_, _replaceValue_ )

230 |

When the `replace` method is called with arguments _searchValue_ and _replaceValue_, the following steps are taken:

231 | 232 | 1. Let _O_ be ? RequireObjectCoercible(*this* value). 233 | 1. If _searchValue_ is neither *undefined* nor *null*, then 234 | 1. Let _replacer_ be ? GetMethod(_searchValue_, @@replace). 235 | 1. If _replacer_ is not *undefined*, then 236 | 1. Return ? Call(_replacer_, _searchValue_, « _O_, _replaceValue_ »). 237 | 1. Let _string_ be ? ToString(_O_). 238 | 1. Let _searchString_ be ? ToString(_searchValue_). 239 | 1. Let _functionalReplace_ be IsCallable(_replaceValue_). 240 | 1. If _functionalReplace_ is *false*, then 241 | 1. Let _replaceValue_ be ? ToString(_replaceValue_). 242 | 1. Search _string_ for the first occurrence of _searchString_ and let _pos_ be the index within _string_ of the first code unit of the matched substring and let _matched_ be _searchString_. If no occurrences of _searchString_ were found, return _string_. 243 | 1. If _functionalReplace_ is *true*, then 244 | 1. Let _replValue_ be ? Call(_replaceValue_, *undefined*, « _matched_, _pos_, _string_ »). 245 | 1. Let _replStr_ be ? ToString(_replValue_). 246 | 1. Else, 247 | 1. Let _captures_ be a new empty List. 248 | 1. Let _replStr_ be GetSubstitution(_matched_, _string_, _pos_, _captures_, *undefined*, _replaceValue_). 249 | 1. Let _tailPos_ be _pos_ + the number of code units in _matched_. 250 | 1. Let _newString_ be the String formed by concatenating the first _pos_ code units of _string_, _replStr_, and the trailing substring of _string_ starting at index _tailPos_. If _pos_ is 0, the first element of the concatenation will be the empty String. 251 | 1. Return _newString_. 252 | 253 | 254 |

The `replace` function is intentionally generic; it does not require that its *this* value be a String object. Therefore, it can be transferred to other kinds of objects for use as a method.

255 |
256 | 257 | 258 |

Runtime Semantics: GetSubstitution( _matched_, _str_, _position_, _captures_, _namedCaptures_, _replacement_ )

259 |

The abstract operation GetSubstitution performs the following steps:

260 | 261 | 1. Assert: Type(_matched_) is String. 262 | 1. Let _matchLength_ be the number of code units in _matched_. 263 | 1. Assert: Type(_str_) is String. 264 | 1. Let _stringLength_ be the number of code units in _str_. 265 | 1. Assert: _position_ is a nonnegative integer. 266 | 1. Assert: _position_ ≤ _stringLength_. 267 | 1. Assert: _captures_ is a possibly empty List of Strings. 268 | 1. Assert: Type(_replacement_) is String. 269 | 1. Let _tailPos_ be _position_ + _matchLength_. 270 | 1. Let _m_ be the number of elements in _captures_. 271 | 1. If _namedCaptures_ is not *undefined*, then 272 | 1. Let _namedCaptures_ be ? ToObject(_namedCaptures_). 273 | 1. Let _result_ be a String value derived from _replacement_ by copying code unit elements from _replacement_ to _result_ while performing replacements as specified in . These `$` replacements are done left-to-right, and, once such a replacement is performed, the new replacement text is not subject to further replacements. 274 | 1. Return _result_. 275 | 276 | 277 | 278 | 279 | 280 | 283 | 286 | 289 | 290 | 291 | 294 | 297 | 300 | 301 | 302 | 305 | 308 | 311 | 312 | 313 | 316 | 319 | 322 | 323 | 324 | 327 | 330 | 333 | 334 | 335 | 342 | 347 | 350 | 351 | 352 | 359 | 364 | 367 | 368 | 369 | 372 | 375 | 388 | 389 | 390 | 393 | 396 | 399 | 400 | 401 |
281 | Code units 282 | 284 | Unicode Characters 285 | 287 | Replacement text 288 |
292 | 0x0024, 0x0024 293 | 295 | `$$` 296 | 298 | `$` 299 |
303 | 0x0024, 0x0026 304 | 306 | `$&` 307 | 309 | _matched_ 310 |
314 | 0x0024, 0x0060 315 | 317 | $` 318 | 320 | If _position_ is 0, the replacement is the empty String. Otherwise the replacement is the substring of _str_ that starts at index 0 and whose last code unit is at index _position_ - 1. 321 |
325 | 0x0024, 0x0027 326 | 328 | `$'` 329 | 331 | If _tailPos_ ≥ _stringLength_, the replacement is the empty String. Otherwise the replacement is the substring of _str_ that starts at index _tailPos_ and continues to the end of _str_. 332 |
336 | 0x0024, N 337 |
338 | Where 339 |
340 | 0x0031 ≤ N ≤ 0x0039 341 |
343 | `$n` where 344 |
345 | `n` is one of `1 2 3 4 5 6 7 8 9` and `$n` is not followed by a decimal digit 346 |
348 | The _n_th element of _captures_, where _n_ is a single digit in the range 1 to 9. If _n_≤_m_ and the _n_th element of _captures_ is *undefined*, use the empty String instead. If _n_>_m_, the result is implementation-defined. 349 |
353 | 0x0024, N, N 354 |
355 | Where 356 |
357 | 0x0030 ≤ N ≤ 0x0039 358 |
360 | `$nn` where 361 |
362 | `n` is one of `0 1 2 3 4 5 6 7 8 9` 363 |
365 | The _nn_th element of _captures_, where _nn_ is a two-digit decimal number in the range 01 to 99. If _nn_≤_m_ and the _nn_th element of _captures_ is *undefined*, use the empty String instead. If _nn_ is 00 or _nn_>_m_, no replacement is done. 366 |
370 | 0x0024, 0x003C 371 | 373 | `$<` 374 | 376 | 377 | 1. If _namedCaptures_ is *undefined*, the replacement text is the String `"$<"`. 378 | 1. Otherwise, 379 | 1. Scan until the next `>`. 380 | 1. If none is found, the replacement text is the String `"$<"`. 381 | 1. Otherwise, 382 | 1. Let the enclosed substring be _groupName_. 383 | 1. Let _capture_ be ? Get(_namedCaptures_, _groupName_). 384 | 1. If _capture_ is *undefined*, replace the text through `>` with the empty string. 385 | 1. Otherwise, replace the text through this following `>` with ? ToString(_capture_). 386 | 387 |
391 | 0x0024 392 | 394 | `$` in any context that does not match any of the above. 395 | 397 | `$` 398 |
402 |
403 |
404 |
405 | 406 | 407 |

RegExp.prototype [ @@replace ] ( _string_, _replaceValue_ )

408 |

When the @@`replace` method is called with arguments _string_ and _replaceValue_, the following steps are taken:

409 | 410 | 1. Let _rx_ be the *this* value. 411 | 1. If Type(_rx_) is not Object, throw a *TypeError* exception. 412 | 1. Let _S_ be ? ToString(_string_). 413 | 1. Let _lengthS_ be the number of code unit elements in _S_. 414 | 1. Let _functionalReplace_ be IsCallable(_replaceValue_). 415 | 1. If _functionalReplace_ is *false*, then 416 | 1. Let _replaceValue_ be ? ToString(_replaceValue_). 417 | 1. Let _global_ be ToBoolean(? Get(_rx_, `"global"`)). 418 | 1. If _global_ is *true*, then 419 | 1. Let _fullUnicode_ be ToBoolean(? Get(_rx_, `"unicode"`)). 420 | 1. Perform ? Set(_rx_, `"lastIndex"`, 0, *true*). 421 | 1. Let _results_ be a new empty List. 422 | 1. Let _done_ be *false*. 423 | 1. Repeat, while _done_ is *false* 424 | 1. Let _result_ be ? RegExpExec(_rx_, _S_). 425 | 1. If _result_ is *null*, set _done_ to *true*. 426 | 1. Else _result_ is not *null*, 427 | 1. Append _result_ to the end of _results_. 428 | 1. If _global_ is *false*, set _done_ to *true*. 429 | 1. Else, 430 | 1. Let _matchStr_ be ? ToString(? Get(_result_, `"0"`)). 431 | 1. If _matchStr_ is the empty String, then 432 | 1. Let _thisIndex_ be ? ToLength(? Get(_rx_, `"lastIndex"`)). 433 | 1. Let _nextIndex_ be AdvanceStringIndex(_S_, _thisIndex_, _fullUnicode_). 434 | 1. Perform ? Set(_rx_, `"lastIndex"`, _nextIndex_, *true*). 435 | 1. Let _accumulatedResult_ be the empty String value. 436 | 1. Let _nextSourcePosition_ be 0. 437 | 1. Repeat, for each _result_ in _results_, 438 | 1. Let _nCaptures_ be ? ToLength(? Get(_result_, `"length"`)). 439 | 1. Let _nCaptures_ be max(_nCaptures_ - 1, 0). 440 | 1. Let _matched_ be ? ToString(? Get(_result_, `"0"`)). 441 | 1. Let _matchLength_ be the number of code units in _matched_. 442 | 1. Let _position_ be ? ToInteger(? Get(_result_, `"index"`)). 443 | 1. Let _position_ be max(min(_position_, _lengthS_), 0). 444 | 1. Let _n_ be 1. 445 | 1. Let _captures_ be a new empty List. 446 | 1. Repeat while _n_ ≤ _nCaptures_ 447 | 1. Let _capN_ be ? Get(_result_, ! ToString(_n_)). 448 | 1. If _capN_ is not *undefined*, then 449 | 1. Let _capN_ be ? ToString(_capN_). 450 | 1. Append _capN_ as the last element of _captures_. 451 | 1. Let _n_ be _n_+1. 452 | 1. Let _namedCaptures_ be ? Get(_result_, `"groups"`). 453 | 1. If _functionalReplace_ is *true*, then 454 | 1. Let _replacerArgs_ be « _matched_ ». 455 | 1. Append in list order the elements of _captures_ to the end of the List _replacerArgs_. 456 | 1. Append _position_ and _S_ as the last two elements ofto _replacerArgs_. 457 | 1. If _namedCaptures_ is not *undefined*, then 458 | 1. Append _namedCaptures_ as the last element of _replacerArgs_. 459 | 1. Let _replValue_ be ? Call(_replaceValue_, *undefined*, _replacerArgs_). 460 | 1. Let _replacement_ be ? ToString(_replValue_). 461 | 1. Else, 462 | 1. Let _replacement_ be GetSubstitution(_matched_, _S_, _position_, _captures_, _namedCaptures_, _replaceValue_). 463 | 1. If _position_ ≥ _nextSourcePosition_, then 464 | 1. NOTE _position_ should not normally move backwards. If it does, it is an indication of an ill-behaving RegExp subclass or use of an access triggered side-effect to change the global flag or other characteristics of _rx_. In such cases, the corresponding substitution is ignored. 465 | 1. Let _accumulatedResult_ be the String formed by concatenating the code units of the current value of _accumulatedResult_ with the substring of _S_ consisting of the code units from _nextSourcePosition_ (inclusive) up to _position_ (exclusive) and with the code units of _replacement_. 466 | 1. Let _nextSourcePosition_ be _position_ + _matchLength_. 467 | 1. If _nextSourcePosition_ ≥ _lengthS_, return _accumulatedResult_. 468 | 1. Return the String formed by concatenating the code units of _accumulatedResult_ with the substring of _S_ consisting of the code units from _nextSourcePosition_ (inclusive) up through the final code unit of _S_ (inclusive). 469 | 470 |

The value of the `name` property of this function is `"[Symbol.replace]"`.

471 |
472 | 473 | 474 |

Regular Expressions Patterns

475 |

The syntax of is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.

476 |

This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [U] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [U] parameter present on the goal symbol.

477 |

Syntax

478 | 479 | Term[U, N] :: 480 | [+U] Assertion[+U, ?N] 481 | [+U] Atom[+U, ?N] 482 | [+U] Atom[+U, ?N] Quantifier 483 | [~U] QuantifiableAssertion Quantifier 484 | [~U] Assertion[~U, ?N] 485 | [~U] ExtendedAtom[?N] Quantifier 486 | [~U] ExtendedAtom[?N] 487 | 488 | Assertion[U,N] :: 489 | `^` 490 | `$` 491 | `\` `b` 492 | `\` `B` 493 | [+U] `(` `?` `=` Disjunction[+U, ?N] `)` 494 | [+U] `(` `?` `!` Disjunction[+U, ?N] `)` 495 | [~U] QuantifiableAssertion[N] 496 | 497 | QuantifiableAssertion[N] :: 498 | `(` `?` `=` Disjunction[~U, ?N] `)` 499 | `(` `?` `!` Disjunction[~U, ?N] `)` 500 | 501 | ExtendedAtom[N] :: 502 | `.` 503 | `\` AtomEscape[~U, ?N] 504 | CharacterClass[~U, ?N] 505 | `(` Disjunction[~U, ?N] `)` 506 | `(` `?` `:` Disjunction[~U, ?N] `)` 507 | InvalidBracedQuantifier 508 | ExtendedPatternCharacter 509 | 510 | InvalidBracedQuantifier :: 511 | `{` DecimalDigits `}` 512 | `{` DecimalDigits `,` `}` 513 | `{` DecimalDigits `,` DecimalDigits `}` 514 | 515 | ExtendedPatternCharacter :: 516 | SourceCharacter but not one of `^` `$` `.` `*` `+` `?` `(` `)` `[` `|` 517 | 518 | AtomEscape[U, N] :: 519 | [+U] DecimalEscape 520 | [~U] DecimalEscape [> but only if the integer value of |DecimalEscape| is <= _NcapturingParens_] 521 | CharacterClassEscape 522 | CharacterEscape[~U, ?N] 523 | [+N] `k` GroupName 524 | 525 | CharacterEscape[U, N] :: 526 | ControlEscape 527 | `c` ControlLetter 528 | `0` [lookahead <! DecimalDigit] 529 | HexEscapeSequence 530 | RegExpUnicodeEscapeSequence[?U] 531 | [~U] LegacyOctalEscapeSequence 532 | IdentityEscape[?U, ?N] 533 | 534 | IdentityEscape[U, N] :: 535 | [+U] SyntaxCharacter 536 | [+U] `/` 537 | [~U] SourceCharacter but not `c` 538 | [~U] SourceCharacterIdentityEscape[?N] 539 | 540 | SourceCharacterIdentityEscape[N] :: 541 | [~N] SourceCharacter but not `c` 542 | [+N] SourceCharacter but not one of `c` or `k` 543 | 544 | 545 | ClassAtomNoDash[U, N] :: 546 | SourceCharacter but not one of `\` or `]` or `-` 547 | `\` ClassEscape[?U, ?N] 548 | `\` [lookahead == `c`] 549 | 550 | ClassEscape[U, N] :: 551 | `b` 552 | [+U] `-` 553 | [~U] `c` ClassControlLetter 554 | CharacterClassEscape 555 | CharacterEscape[?U, ?N] 556 | 557 | ClassControlLetter :: 558 | DecimalDigit 559 | `_` 560 | 561 | 562 |

When the same left hand sides occurs with both [+U] and [\~U] guards it is to control the disambiguation priority.

563 |
564 |
565 | --------------------------------------------------------------------------------