Skip to content

Conversation

@markandrus
Copy link

I had a problem very similar to the one mentioned here in #543. I'm trying to adapt some of the grammars from the ECMAScript standard. For example, here is (part of) the grammar for IdentifierName:

# https://tc39.es/ecma262/#sec-identifier-names

IdentifierName  -> IdentifierStart               {% id %}
                |  IdentifierName IdentifierPart {% xs => xs.join('') %}
IdentifierStart -> IdentifierStartChar           {% id %}
IdentifierPart  -> IdentifierPartChar            {% id %}

IdentifierStartChar -> UnicodeIDStart    {% id %}
                    |  "$"               {% id %}
                    |  "_"               {% id %}
IdentifierPartChar  -> UnicodeIDContinue {% id %}
                    |  "$"               {% id %}
                    |  ZWNJ              {% id %}
                    |  ZWJ               {% id %}

ZWNJ -> "\u200C" {% id %}
ZWJ  -> "\u200D" {% id %}

UnicodeIDStart    -> [\p{ID_Start}]    {% id %}
UnicodeIDContinue -> [\p{ID_Continue}] {% id %}

Crucially, UnicodeIDStart and UnicodeIDContinue are defined in terms of the Unicode properties. We need the \p{ID_Start} and \p{ID_Continue} syntax to work in the RegExp-based charclasses; however, to do that, we also need to enable the u flag.

I'm a very new user of Nearley, so I don't know if it's safe to turn this on for everyone, if it should be opt-in, or if it could cause other problems. What do you think? Is this useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant