fix: handle non-capturing groups (?:...) in JSON schema pattern converter#21124
Merged
pwilkin merged 1 commit intoggml-org:masterfrom Mar 28, 2026
Merged
Conversation
Contributor
|
Can you add a test to |
42b7e0d to
dd6d5f7
Compare
…rter
The regex-to-grammar converter in _visit_pattern() crashes with SIGSEGV
when a JSON schema "pattern" field contains a non-capturing group (?:...).
Root cause: when the parser sees '(' followed by '?', it pushes a warning
but does not advance past '?:'. The recursive transform() call then
interprets '?' as a quantifier and calls seq.back() on an empty vector,
causing undefined behavior.
This commonly occurs when serving OpenAI-compatible tool calls from
clients that include complex regex patterns in their JSON schemas (e.g.,
date validation patterns like ^(?:(?:\d\d[2468][048]|...)-02-29|...)$).
The fix:
- Skip '?:' after '(' to treat non-capturing groups as regular groups
- For unsupported syntax (?=, ?!, etc.), skip to matching ')' safely,
handling escaped characters to avoid miscounting parenthesis depth
- Adjust the ')' unbalanced-parentheses check using direct char
comparisons instead of substr
- Add test cases for non-capturing groups (C++ only, as the JS/Python
implementations do not yet support this syntax)
dd6d5f7 to
0b7e5f1
Compare
pwilkin
approved these changes
Mar 28, 2026
aldehir
approved these changes
Mar 28, 2026
icex
added a commit
to icex/llama.cpp
that referenced
this pull request
Apr 5, 2026
Includes: - fix: handle non-capturing groups (?:...) in JSON schema pattern converter (ggml-org#21124) - memory: respect unified KV cache in hybrid memory for eval tasks (ggml-org#21224) - fix: CUDA FA kernel selection, head dimension 512 support - rotate activations for better quantization (ggml-org#21038) - Various parser, jinja, webui, and CI fixes Conflicts resolved: - llama-kv-cache.cpp: keep TurboQuant InnerQ stubs + upstream Hadamard helpers - llama-graph.cpp: keep TurboQuant V-padding + upstream self_v_rot - fattn-tile.cu: add upstream D=512 before TurboQuant HIP guard - fattn.cu: combine D=512 (upstream) + D=640 (TurboQuant) exclusions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_visit_pattern()when a JSON schemapatterncontains non-capturing groups(?:...)?:after(to treat non-capturing groups as regular groups)Root cause
In
common/json-schema-to-grammar.cpp,_visit_pattern()line ~420: when the parser sees(followed by?, it pushes a warning but does not advanceipast?:. The recursivetransform()call then interprets?as a quantifier and callsseq.back()on an empty vector (undefined behavior, SIGSEGV).Reproducer
Start
llama-serverwith any model, then:The server crashes with SIGSEGV before returning a response. With this fix, it returns HTTP 200 correctly.
This affects real-world clients (e.g. Claude Code with Notion MCP tools) that send tool schemas containing date validation patterns like
^(?:(?:\d\d[2468][048]|...)-02-29|...)$.