Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions files/en-us/web/javascript/guide/regular_expressions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -306,15 +306,16 @@ If you need to access the properties of a regular expression created with an obj
Regular expressions have optional flags that allow for functionality like global searching and case-insensitive searching.
These flags can be used separately or together in any order, and are included as part of the regular expression.

| Flag | Description | Corresponding property |
| ---- | --------------------------------------------------------------------------------------------- | --------------------------------------------- |
| `d` | Generate indices for substring matches. | {{jsxref("RegExp/hasIndices", "hasIndices")}} |
| `g` | Global search. | {{jsxref("RegExp/global", "global")}} |
| `i` | Case-insensitive search. | {{jsxref("RegExp/ignoreCase", "ignoreCase")}} |
| `m` | Allows `^` and `$` to match newline characters. | {{jsxref("RegExp/multiline", "multiline")}} |
| `s` | Allows `.` to match newline characters. | {{jsxref("RegExp/dotAll", "dotAll")}} |
| `u` | "Unicode"; treat a pattern as a sequence of Unicode code points. | {{jsxref("RegExp/unicode", "unicode")}} |
| `y` | Perform a "sticky" search that matches starting at the current position in the target string. | {{jsxref("RegExp/sticky", "sticky")}} |
| Flag | Description | Corresponding property |
| ---- | --------------------------------------------------------------------------------------------- | ----------------------------------------------- |
| `d` | Generate indices for substring matches. | {{jsxref("RegExp/hasIndices", "hasIndices")}} |
| `g` | Global search. | {{jsxref("RegExp/global", "global")}} |
| `i` | Case-insensitive search. | {{jsxref("RegExp/ignoreCase", "ignoreCase")}} |
| `m` | Allows `^` and `$` to match newline characters. | {{jsxref("RegExp/multiline", "multiline")}} |
| `s` | Allows `.` to match newline characters. | {{jsxref("RegExp/dotAll", "dotAll")}} |
| `u` | "Unicode"; treat a pattern as a sequence of Unicode code points. | {{jsxref("RegExp/unicode", "unicode")}} |
| `v` | An upgrade to the `u` mode with more Unicode features. | {{jsxref("RegExp/unicodeSets", "unicodeSets")}} |
| `y` | Perform a "sticky" search that matches starting at the current position in the target string. | {{jsxref("RegExp/sticky", "sticky")}} |

To include a flag with the regular expression, use this syntax:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The following properties are deprecated. This does not affect their use in [repl

The {{jsxref("RegExp/compile", "compile()")}} method is deprecated. Construct a new `RegExp` instance instead.

The following regex syntaxes are deprecated and only available in non-[unicode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) mode. In unicode mode, they are all syntax errors:
The following regex syntaxes are deprecated and only available in [Unicode-unaware mode](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode). In Unicode-aware mode, they are all syntax errors:

- [Lookahead assertions](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion) can have [quantifiers](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Quantifier).
- [Backreferences](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Backreference) that do not refer to an existing capturing group become [legacy octal escapes](#escape_sequences).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ page-type: javascript-error

{{jsSidebar("Errors")}}

The JavaScript exception "invalid regular expression flag" occurs when the flags in a regular expression contain any flag that is not one of: `d`, `g`, `i`, `m`, `s`, `u`, or `y`.
The JavaScript exception "invalid regular expression flag" occurs when the flags in a regular expression contain any flag that is not one of: `d`, `g`, `i`, `m`, `s`, `u`, `v`, or `y`.

It may also be raised if the expression contains more than one instance of a valid flag.

Expand All @@ -26,7 +26,7 @@ SyntaxError: Invalid regular expression: invalid flags (Safari)

The regular expression contains invalid flags, or valid flags have been used more than once in the expression.

The valid (allowed) flags are `d`, `g`, `i`, `m`, `s`, `u`, and `y`. They are introduced in more detail in [Regular expressions > Advanced searching with flags](/en-US/docs/Web/JavaScript/Guide/Regular_expressions#advanced_searching_with_flags).
The valid (allowed) flags are `d`, `g`, `i`, `m`, `s`, `u`, `v`, and `y`. They are introduced in more detail in [Regular expressions > Advanced searching with flags](/en-US/docs/Web/JavaScript/Guide/Regular_expressions#advanced_searching_with_flags).

## Examples

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ When the regex is sticky and global, it would still perform sticky matches — i
console.log("ab-c".match(/[abc]/gy)); // [ 'a', 'b' ]
```

If the current match is an empty string, the `lastIndex` would still be advanced — if the regex has the [`u`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) flag, it would advance by one Unicode code point; otherwise, it advances by one UTF-16 code unit.
If the current match is an empty string, the `lastIndex` would still be advanced — if the regex is [Unicode-aware](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode), it would advance by one Unicode code point; otherwise, it advances by one UTF-16 code unit.

```js
console.log("😄".match(/(?:)/g)); // [ '', '', '' ]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ When the regex is sticky and global, it would still perform sticky matches — i
console.log("aa-a".replace(/a/gy, "b")); // "bb-a"
```

If the current match is an empty string, the `lastIndex` would still be advanced — if the regex has the [`u`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) flag, it would advance by one Unicode code point; otherwise, it advances by one UTF-16 code unit.
If the current match is an empty string, the `lastIndex` would still be advanced — if the regex is [Unicode-aware](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode), it would advance by one Unicode code point; otherwise, it advances by one UTF-16 code unit.

```js
console.log("😄".replace(/(?:)/g, " ")); // " \ud83d \ude04 "
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ The `RegExp.prototype[@@split]()` base method exhibits the following behaviors:
- If the target string is empty, and the regexp can match empty strings (for example, `/a?/`), an empty array is returned. Otherwise, if the regexp can't match an empty string, `[""]` is returned.
- The matching proceeds by continuously calling `this.exec()`. Since the regexp is always sticky, this will move along the string, each time yielding a matching string, index, and any capturing groups.
- For each match, the substring between the last matched string's end and the current matched string's beginning is first appended to the result array. Then, the capturing groups' values are appended one-by-one.
- If the current match is an empty string, or if the regexp doesn't match at the current position (since it's sticky), the `lastIndex` would still be advanced — if the regex has the [`u`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) flag, it would advance by one Unicode code point; otherwise, it advances by one UTF-16 code unit.
- If the current match is an empty string, or if the regexp doesn't match at the current position (since it's sticky), the `lastIndex` would still be advanced — if the regex is [Unicode-aware](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode), it would advance by one Unicode code point; otherwise, it advances by one UTF-16 code unit.
- If the regexp doesn't match the target string, the target string is returned as-is, wrapped in an array.
- The returned array's length will never exceed the `limit` parameter, if provided, while trying to be as close as possible. Therefore, the last match and its capturing groups may not all be present in the returned array if the array is already filled.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The **`flags`** accessor property of {{jsxref("RegExp")}} instances returns the

## Description

`RegExp.prototype.flags` has a string as its value. Flags in the `flags` property are sorted alphabetically (from left to right, e.g. `"dgimsuy"`). It actually invokes the other flag accessors ([`hasIndices`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/hasIndices), [`global`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/global), etc.) one-by-one and concatenates the results.
`RegExp.prototype.flags` has a string as its value. Flags in the `flags` property are sorted alphabetically (from left to right, e.g. `"dgimsuvy"`). It actually invokes the other flag accessors ([`hasIndices`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/hasIndices), [`global`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/global), etc.) one-by-one and concatenates the results.

All built-in functions read the `flags` property instead of reading individual flag accessors.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ The **`ignoreCase`** accessor property of {{jsxref("RegExp")}} instances returns

## Description

`RegExp.prototype.ignoreCase` has the value `true` if the `i` flag was used; otherwise, `false`. The `i` flag indicates that case should be ignored while attempting a match in a string.
`RegExp.prototype.ignoreCase` has the value `true` if the `i` flag was used; otherwise, `false`. The `i` flag indicates that case should be ignored while attempting a match in a string. Case-insensitive matching is done by mapping both the expected character set and the matched string to the same casing.

If the regex has the [`unicode`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) flag, the case mapping happens as specified in [`CaseFolding.txt`](https://unicode.org/Public/UCD/latest/ucd/CaseFolding.txt). Otherwise, case mapping uses the [Unicode Default Case Conversion](https://unicode-org.github.io/icu/userguide/transforms/casemappings.html) — the same algorithm used in [`String.prototype.toUpperCase()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toUpperCase) and [`String.prototype.toLowerCase()`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLowerCase).
If the regex is [Unicode-aware](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode#unicode-aware_mode), the case mapping happens through _simple case folding_ specified in [`CaseFolding.txt`](https://unicode.org/Public/UCD/latest/ucd/CaseFolding.txt). The mapping always maps to a single code point, so it does not map, for example, `ß` (U+00DF LATIN SMALL LETTER SHARP S) to `ss` (which is _full case folding_, not _simple case folding_). It may however map code points outside the Basic Latin block to code points within it — for example, `ſ` (U+017F LATIN SMALL LETTER LONG S) case-folds to `s` (U+0073 LATIN SMALL LETTER S) and `K` (U+212A KELVIN SIGN) case-folds to `k` (U+006B LATIN SMALL LETTER K). Therefore, `ſ` and `K` can be matched by `/[a-z]/ui`.

If the regex is Unicode-unaware, case mapping uses the [Unicode Default Case Conversion](https://unicode-org.github.io/icu/userguide/transforms/casemappings.html) — the same algorithm used in {{jsxref("String.prototype.toUpperCase()")}}. For example, `Ω` (U+2126 OHM SIGN) and `Ω` (U+03A9 GREEK CAPITAL LETTER OMEGA) are both mapped by Default Case Conversion to themselves but by simple case folding to `ω` (U+03C9 GREEK SMALL LETTER OMEGA), so `"ω"` is matched by `/[\u2126]/ui` and `/[\u03a9]/ui` but not by `/[\u2126]/i` or `/[\u03a9]/i`. This algorithm prevents code points outside the Basic Latin block to be mapped to code points within it, so `ſ` and `K` mentioned previously are not matched by `/[a-z]/i`.

The set accessor of `ignoreCase` is `undefined`. You cannot change this property directly.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,8 @@ These properties are defined on `RegExp.prototype` and shared by all `RegExp` in
- : Whether or not the search is sticky.
- {{jsxref("RegExp.prototype.unicode")}}
- : Whether or not Unicode features are enabled.
- {{jsxref("RegExp.prototype.unicodeSets")}}
- : Whether or not the `v` flag, an upgrade to the `u` mode, is enabled.

These properties are own properties of each `RegExp` instance.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ RegExp(pattern, flags)
- : Allows `.` to match newlines.
- [`u` (unicode)](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode)
- : Treat `pattern` as a sequence of Unicode code points.
- [`v` (unicodeSets)](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets)
- : An upgrade to the `u` flag that enables set notation in character classes as well as properties of strings.
- [`y` (sticky)](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky)
- : Matches only from the index indicated by the `lastIndex` property of this regular expression in the target string. Does not attempt to match from any later indexes.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ There are other changes to the parsing behavior that prevent possible syntax mis

The set accessor of `unicode` is `undefined`. You cannot change this property directly.

### Unicode-aware mode

When we refer to _Unicode-aware mode_, we mean the regex has either the `u` or the [`v`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets) flag, in which case the regex enables Unicode-related features (such as [Unicode character class escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape)) and has much stricter syntax rules. Because `u` and `v` interpret the same regex in incompatible ways, using both flags results in a {{jsxref("SyntaxError")}}.

Similarly, a regex is _Unicode-unaware_ if it has neither the `u` nor the `v` flag. In this case, the regex is interpreted as a sequence of UTF-16 code units, and there are many legacy syntaxes that do not become syntax errors.

## Examples

### Using the unicode property
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: RegExp.prototype.unicodeSets
slug: Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets
page-type: javascript-instance-accessor-property
browser-compat: javascript.builtins.RegExp.unicodeSets
---

{{JSRef}}

The **`unicodeSets`** accessor property of {{jsxref("RegExp")}} instances returns whether or not the `v` flag is used with this regular expression.

## Description

`RegExp.prototype.unicodeSets` has the value `true` if the `v` flag was used; otherwise, `false`. The `v` flag is an "upgrade" to the [`u`](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode) flag that enables more Unicode-related features. ("v" is the next letter after "u" in the alphabet.) Because `u` and `v` interpret the same regex in incompatible ways, using both flags results in a {{jsxref("SyntaxError")}}. With the `v` flag, you get all features mentioned in the `u` flag description, plus:

- The [`\p`](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape) escape sequence can be additionally used to match properties of strings, instead of just characters.
- The [character class](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class) syntax is upgraded to allow intersection, union, and subtraction syntaxes, as well as matching multiple Unicode characters.
- The character class complement syntax `[^...]` constructs a complement class instead of negating the match result, avoiding some confusing behaviors with case-insensitive matching. For more information, see [Complement classes and case-insensitive matching](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_class#complement_classes_and_case-insensitive_matching).

Some valid `u`-mode regexes become invalid in `v`-mode. Specifically, the character class syntax is different:

- In addition to `]` and `\`, the following characters must be escaped in character classes if they represent literal characters: `(`, `)`, `[`, `{`, `}`, `/`, `-`, `|`. This list is somewhat similar to the list of [syntax characters](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character), except that `^`, `$`, `*`, `+`, and `?` are not reserved inside character classes, while `/` and `-` are not reserved outside character classes (although `/` may delimit a regex literal and therefore still needs to be escaped). All these characters may also be optionally escaped in `u`-mode character classes.
- The following "double punctuator" sequences must be escaped in character classes (but they don't make much sense without the `v` flag anyway): `&&`, `!!`, `##`, `$$`, `%%`, `**`, `++`, `,,`, `..`, `::`, `;;`, `<<`, `==`, `>>`, `??`, `@@`, `^^`, ` `` `, `~~`. In `u` mode, some of these characters can only appear literally within character classes and cause a syntax error when escaped. In `v` mode, they must be escaped when appearing in pairs, but can be optionally escaped when appearing alone. For example, `/[\!]/u` is invalid because it's an [identity escape](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Character_escape), but both `/[\!]/v` and `/[!]/v` are valid, while `/[!!]/v` is invalid. The [literal character](/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Literal_character) reference has a detailed table of which characters can appear escaped or unescaped.

> **Note:** The `v` mode does not interpret grapheme clusters as single characters; they are still multiple code points. For example, `/[🇺🇳]/v` is still able to match `"🇺"`.

The set accessor of `unicodeSets` is `undefined`. You cannot change this property directly.

## Examples

### Using the unicodeSets property

```js
const regex = /[\p{Script_Extensions=Greek}&&\p{Letter}]/v;

console.log(regex.unicodeSets); // true
```

## Specifications

{{Specifications}}

## Browser compatibility

{{Compat}}

## See also

- {{jsxref("RegExp.prototype.lastIndex")}}
- {{JSxRef("RegExp.prototype.dotAll")}}
- {{JSxRef("RegExp.prototype.global")}}
- {{JSxRef("RegExp.prototype.hasIndices")}}
- {{JSxRef("RegExp.prototype.ignoreCase")}}
- {{JSxRef("RegExp.prototype.multiline")}}
- {{JSxRef("RegExp.prototype.source")}}
- {{JSxRef("RegExp.prototype.sticky")}}
- {{JSxRef("RegExp.prototype.unicode")}}
- [RegExp v flag with set notation and properties of strings](https://v8.dev/features/regexp-v-flag) on v8.dev (June 27, 2022)
Loading