Skip to content

Fix: validate regex pattern in split_with_pattern to prevent crash#12633

Merged
KevinHuSh merged 1 commit intoinfiniflow:mainfrom
eureka928:fix/validate-regex-delimiter-pattern
Jan 15, 2026
Merged

Fix: validate regex pattern in split_with_pattern to prevent crash#12633
KevinHuSh merged 1 commit intoinfiniflow:mainfrom
eureka928:fix/validate-regex-delimiter-pattern

Conversation

@eureka928
Copy link
Contributor

@eureka928 eureka928 commented Jan 15, 2026

What problem does this PR solve?

Fix regex pattern validation in split_with_pattern (#12605)

  • Add try-except block to validate user-provided regex patterns before use
  • Gracefully fallback to single chunk when invalid regex is provided
  • Prevent server crash during DOCX parsing with malformed delimiters

Problem

Parsing DOCX files with custom regex delimiters crashes with re.error: nothing to repeat at position 9 when users provide invalid regex patterns.

Closes #12605

Solution

Validate and compile regex pattern before use. On invalid pattern, log warning and return content as single chunk instead of crashing.

Changes

  • rag/nlp/__init__.py: Add regex validation in split_with_pattern() function

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=42954461

…nfiniflow#12605)

Add try-except block to handle invalid user-provided regex patterns gracefully.
Instead of crashing with re.error, the function now logs a warning and falls
back to returning the content as a single chunk.
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. labels Jan 15, 2026
@eureka928
Copy link
Contributor Author

@KevinHuSh @yongtenglei would you review my PR?
Thank you

@KevinHuSh KevinHuSh added the ci Continue Integration label Jan 15, 2026
@KevinHuSh KevinHuSh marked this pull request as draft January 15, 2026 05:47
@KevinHuSh KevinHuSh marked this pull request as ready for review January 15, 2026 05:47
@KevinHuSh KevinHuSh merged commit d8192f8 into infiniflow:main Jan 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Parse docx error!

2 participants