Description & Motivation
We would like to allow chunking on double graphemes, such as ?!
Pitch
Currently, the chunk_text algorithm in textsplit.py assumes that we will be splitting on a single grapheme (! or. for instance). We would like to remove the assumption that this is the case.
Alternatives
No response
Additional context
To do this, textsplit.py and text_config.py would need to be refactored to accept strong and weak boundaries as lists. They are currently stored as strings.
Description & Motivation
We would like to allow chunking on double graphemes, such as ?!
Pitch
Currently, the chunk_text algorithm in textsplit.py assumes that we will be splitting on a single grapheme (! or. for instance). We would like to remove the assumption that this is the case.
Alternatives
No response
Additional context
To do this,
textsplit.pyandtext_config.pywould need to be refactored to accept strong and weak boundaries as lists. They are currently stored as strings.