Skip to content

The “unwanted characters regex” matches wanted characters #35

@remi

Description

@remi

Hi!

I found a potential bug in the following line:

|> remove_unwanted_chars(separator, ~r/([^A-Za-z0-9가-힣])+/)

The line replaces all characters that does not match A-Za-z0-9가-힣 with the separator character.

However, we found an unwanted characters that fall under this expression, namely (U+2009).

source = "foo bar" # This is "foo<U+2009>bar"

String.replace(source, ~r/([^a-z0-9가-힣])+/, "-")
# => "foo bar"

String.replace(source, ~r/([^a-z0-9])+/, "-")
# => "foo-bar"

The behavior is the same for the U+2010, U+2011, U+2012, etc. characters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions