Fix: validate regex pattern in split_with_pattern to prevent crash#12633
Merged
KevinHuSh merged 1 commit intoinfiniflow:mainfrom Jan 15, 2026
Merged
Conversation
…nfiniflow#12605) Add try-except block to handle invalid user-provided regex patterns gracefully. Instead of crashing with re.error, the function now logs a warning and falls back to returning the content as a single chunk.
Contributor
Author
|
@KevinHuSh @yongtenglei would you review my PR? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Fix regex pattern validation in split_with_pattern (#12605)
Problem
Parsing DOCX files with custom regex delimiters crashes with
re.error: nothing to repeat at position 9when users provide invalid regex patterns.Closes #12605
Solution
Validate and compile regex pattern before use. On invalid pattern, log warning and return content as single chunk instead of crashing.
Changes
rag/nlp/__init__.py: Add regex validation insplit_with_pattern()functionType of change
Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=42954461