Skip to content

fix: handle non-capturing groups (?:...) in JSON schema pattern converter#21124

Merged
pwilkin merged 1 commit intoggml-org:masterfrom
XciD:fix/visit-pattern-non-capturing-group
Mar 28, 2026
Merged

fix: handle non-capturing groups (?:...) in JSON schema pattern converter#21124
pwilkin merged 1 commit intoggml-org:masterfrom
XciD:fix/visit-pattern-non-capturing-group

Conversation

@XciD
Copy link
Copy Markdown
Contributor

@XciD XciD commented Mar 28, 2026

Summary

  • Fix SIGSEGV in _visit_pattern() when a JSON schema pattern contains non-capturing groups (?:...)
  • Skip ?: after ( to treat non-capturing groups as regular groups
  • Safely handle unsupported lookahead/lookbehind syntax by skipping to matching )

Root cause

In common/json-schema-to-grammar.cpp, _visit_pattern() line ~420: when the parser sees ( followed by ?, it pushes a warning but does not advance i past ?:. The recursive transform() call then interprets ? as a quantifier and calls seq.back() on an empty vector (undefined behavior, SIGSEGV).

Reproducer

Start llama-server with any model, then:

curl -X POST http://127.0.0.1:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: dummy" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"any","max_tokens":50,"messages":[{"role":"user","content":"hi"}],
  "tools":[{"name":"t","description":"t","input_schema":{"type":"object",
  "properties":{"arr":{"type":"array","items":{"anyOf":[{"type":"object",
  "properties":{"d":{"type":"string","pattern":"^(?:foo)$"}},
  "required":["d"]}]}}}}}],"tool_choice":{"type":"auto"}}'

The server crashes with SIGSEGV before returning a response. With this fix, it returns HTTP 200 correctly.

This affects real-world clients (e.g. Claude Code with Notion MCP tools) that send tool schemas containing date validation patterns like ^(?:(?:\d\d[2468][048]|...)-02-29|...)$.

@XciD XciD requested a review from a team as a code owner March 28, 2026 15:55
@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented Mar 28, 2026

Can you add a test to tests/test-json-schema-to-grammar.cpp?

@XciD XciD force-pushed the fix/visit-pattern-non-capturing-group branch from 42b7e0d to dd6d5f7 Compare March 28, 2026 15:59
@XciD XciD requested a review from ggerganov as a code owner March 28, 2026 15:59
@github-actions github-actions bot added the testing Everything test related label Mar 28, 2026
…rter

The regex-to-grammar converter in _visit_pattern() crashes with SIGSEGV
when a JSON schema "pattern" field contains a non-capturing group (?:...).

Root cause: when the parser sees '(' followed by '?', it pushes a warning
but does not advance past '?:'. The recursive transform() call then
interprets '?' as a quantifier and calls seq.back() on an empty vector,
causing undefined behavior.

This commonly occurs when serving OpenAI-compatible tool calls from
clients that include complex regex patterns in their JSON schemas (e.g.,
date validation patterns like ^(?:(?:\d\d[2468][048]|...)-02-29|...)$).

The fix:
- Skip '?:' after '(' to treat non-capturing groups as regular groups
- For unsupported syntax (?=, ?!, etc.), skip to matching ')' safely,
  handling escaped characters to avoid miscounting parenthesis depth
- Adjust the ')' unbalanced-parentheses check using direct char
  comparisons instead of substr
- Add test cases for non-capturing groups (C++ only, as the JS/Python
  implementations do not yet support this syntax)
@XciD XciD force-pushed the fix/visit-pattern-non-capturing-group branch from dd6d5f7 to 0b7e5f1 Compare March 28, 2026 16:02
@pwilkin pwilkin merged commit e397d38 into ggml-org:master Mar 28, 2026
44 of 45 checks passed
icex added a commit to icex/llama.cpp that referenced this pull request Apr 5, 2026
Includes:
- fix: handle non-capturing groups (?:...) in JSON schema pattern converter (ggml-org#21124)
- memory: respect unified KV cache in hybrid memory for eval tasks (ggml-org#21224)
- fix: CUDA FA kernel selection, head dimension 512 support
- rotate activations for better quantization (ggml-org#21038)
- Various parser, jinja, webui, and CI fixes

Conflicts resolved:
- llama-kv-cache.cpp: keep TurboQuant InnerQ stubs + upstream Hadamard helpers
- llama-graph.cpp: keep TurboQuant V-padding + upstream self_v_rot
- fattn-tile.cu: add upstream D=512 before TurboQuant HIP guard
- fattn.cu: combine D=512 (upstream) + D=640 (TurboQuant) exclusions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants