The “unwanted characters regex” matches wanted characters

Hi!

I found a potential bug in the following line:

```elixir
|> remove_unwanted_chars(separator, ~r/([^A-Za-z0-9가-힣])+/)
```

The line replaces all characters that does not match `A-Za-z0-9가-힣` with the separator character.

However, we found an unwanted characters that fall under this expression, namely ` ` ([U+2009](https://www.fileformat.info/info/unicode/char/2009/index.htm)).

```elixir
source = "foo bar" # This is "foo<U+2009>bar"

String.replace(source, ~r/([^a-z0-9가-힣])+/, "-")
# => "foo bar"

String.replace(source, ~r/([^a-z0-9])+/, "-")
# => "foo-bar"
```

The behavior is the same for the U+2010, U+2011, U+2012, etc. characters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The “unwanted characters regex” matches wanted characters #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The “unwanted characters regex” matches wanted characters #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions