Skip to content

UPSTREAM PR #17786: fix: prevent segfault in tokenizer on highly repetitive input#452

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17786-branch_ServeurpersoCom-fix/regex-nosubs-no-segfault
Open

UPSTREAM PR #17786: fix: prevent segfault in tokenizer on highly repetitive input#452
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17786-branch_ServeurpersoCom-fix/regex-nosubs-no-segfault

Conversation

@loci-dev
Copy link
Copy Markdown

@loci-dev loci-dev commented Dec 5, 2025

Mirrored from ggml-org/llama.cpp#17786

Add nosubs|optimize flags to std::regex constructors to prevent catastrophic backtracking when processing prompts with repeated identical characters (e.g., 'A' * 10000).

The nosubs flag disables subgroup capture, significantly reducing memory usage and backtracking on uniform token sequences

Make sure to read the contributing guidelines before submitting a PR

Before :

/root/llama.cpp.pascal/build/bin/llama-server --port 8088 -m /var/www/ia/models/lmstudio-community/gpt-oss-20b-GGUF/gpt-oss-20b-MXFP4.gguf

You are a helpful assistant<|end|><|start|>user<|message|>Hello<|end|><|start|>assistant<|channel|>final<|message|>Hi there<|end|><|start|>user<|message|>How are you?<|end|><|start|>assistant'
main: model loaded
main: server is listening on http://127.0.0.1:8088
main: starting the main loop...
srv  update_slots: all slots are idle
Erreur de segmentation

(segfault)

After :

(root|~/llama.cpp.pascal) curl -X POST http://localhost:8088/v1/chat/completions   -H "Content-Type: application/json"   -d '{"messages":[{"role":"user","content":"'"$(python3 -c "print('A'*10000)")"' Say OK"}]}'
{"choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","reasoning_content":"The user typed a long string of \"A\" and then says \"Say OK\". So they likely want the assistant to respond with \"OK\". The instruction: \"Say OK\". So we just reply \"OK\". But also must obey system instruction: We should not mention policy. Just reply \"OK\".","content":"OK"}}],"created":1764928092,"model":"gpt-oss-20b-MXFP4.gguf","system_fingerprint":"b7321-147310d71","object":"chat.completion","usage":{"completion_tokens":72,"prompt_tokens":1319,"total_tokens":1391},"id":"chatcmpl-i1GMuuGvb2X3irH73aTbLP0ZkwBYbJdf","timings":{"cache_n":0,"prompt_n":1319,"prompt_ms":171.43,"prompt_per_token_ms":0.12996967399545112,"prompt_per_second":7694.102549145423,"predicted_n":72,"predicted_ms":199.156,"predicted_per_token_ms":2.7660555555555555,"predicted_per_second":361.52563819317515}}

Close #17636

Add nosubs|optimize flags to std::regex constructors to prevent
catastrophic backtracking when processing prompts with repeated
identical characters (e.g., 'A' * 10000).

The nosubs flag disables subgroup capture, significantly reducing
memory usage and backtracking on uniform token sequences
@loci-dev loci-dev force-pushed the main branch 28 times, most recently from ebc7ac8 to 5b191e7 Compare December 8, 2025 16:10
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 4733ac4 to 18c8a27 Compare December 13, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants