Skip to content

Conversation

@badsyntax
Copy link
Contributor

Hi there.

Background: I've been experimenting with finding and replacing Unicode words in the DOM. Traditionally one would use the word boundary metacharacter (\b) to match words, but that only works for ASCII characters, so I've created a regular expression that matches Unicode words but it requires the use of capture groups.

Solution: I've modified this script slightly to support capture groups. I've added an additional argument that specifies which capture group to use in the match. This allows you to do something like:

findAndReplaceDOMText(/(TEST)hello/g, d, 'x', 1);

.. where it will match 'TESThello' but it will only replace 'TEST'.

I've had to drop using regex.lastIndex as that doesn't fit when using capture groups. Instead I'm using match.index and determining the index of the capture group within the match. NOTE: This also fixes a bug when running non-greedy regexes. Previously you were using indexOf to find the index of a match:

m = text.match(regex);
index = text.indexOf(m[0]);

... that will break if using a word boundary in your regex, with the following example: /\bat\b/ : 'matching at'

I've added tests for both the capture groups and word boundaries non-greedy matches.

Please let me know what you think of this change.

@padolsey padolsey merged commit 983349e into padolsey:master Oct 25, 2012
@padolsey
Copy link
Owner

Thanks very much for this change! And thanks for taking the time to create tests too.

Everything looks good. Merged. Also incremented version to 0.2 as it's a new feature.

@badsyntax
Copy link
Contributor Author

Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants