common : restore grammar-based rejection sampling by ggerganov · Pull Request #18137 · ggml-org/llama.cpp

ggerganov · 2025-12-17T12:15:48Z

cont #17937
ref #18107 (comment)
rel #18135

I keep underestimating how much the grammar sampling is used and how slow it is. For now lets restore the old behaviour and I'll try to figure out a way to adapt #17004 to this.

aldehir · 2025-12-17T12:37:13Z

Works as it did before.

I'll research into other constrain decoding implementations. It does seem like a challenging problem to solve, though.

pwilkin · 2025-12-17T13:03:15Z

Dumb idea - what if we precomputed, for each grammar state, the entire set of accepted / rejected tokens?

aldehir · 2025-12-17T13:08:24Z

@pwilkin it's not dumb, I was considering this too but doing it lazily by caching the result.

The tricky part comes in handling utf8 codepoints that span tokens. So you'd need to consider that part of the state as well.

pwilkin · 2025-12-17T13:17:06Z

Fair point, but I think that can be optimized this way:
-> for most cases, you can wildcard the unicode codepoints (best case scenario, you can reject based on the first codepoint already if all characters from that codepoint are rejected)
-> for the cases where it matters, you can introduce pseudo-states - pairs of <grammar state, codepoints so far> - this will introduce extra memory overhead, but since most codepoints can be optimized away like in the point above, it shouldn't matter that much

* common : restart grammar-based rejection sampling * sampling : allow null samplers

common : restart grammar-based rejection sampling

746c47e

ggerganov changed the title ~~common : restart grammar-based rejection sampling~~ common : restore grammar-based rejection sampling Dec 17, 2025

ggerganov requested a review from aldehir December 17, 2025 12:16

aldehir approved these changes Dec 17, 2025

View reviewed changes

sampling : allow null samplers

c3c1300

loci-dev mentioned this pull request Dec 17, 2025

UPSTREAM PR #18137: common : restore grammar-based rejection sampling auroralabs-loci/llama.cpp#607

Open

github-actions bot added the examples label Dec 17, 2025

ggerganov merged commit 4301e27 into master Dec 17, 2025
64 of 71 checks passed

ggerganov deleted the gg/common-restore-grammar branch December 17, 2025 17:46

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

common : restore grammar-based rejection sampling (ggml-org#18137)

8511415

* common : restart grammar-based rejection sampling * sampling : allow null samplers

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

common : restore grammar-based rejection sampling (#18137)

baa2a66

* common : restart grammar-based rejection sampling * sampling : allow null samplers

wallentri88 mentioned this pull request Feb 24, 2026

Eval bug: qwen35 and qwen35moe graph split issues (Severe PP impact, crashes) #19864

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : restore grammar-based rejection sampling#18137

common : restore grammar-based rejection sampling#18137
ggerganov merged 2 commits intomasterfrom
gg/common-restore-grammar

ggerganov commented Dec 17, 2025

Uh oh!

aldehir commented Dec 17, 2025

Uh oh!

pwilkin commented Dec 17, 2025

Uh oh!

aldehir commented Dec 17, 2025 •

edited

Loading

Uh oh!

pwilkin commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ggerganov commented Dec 17, 2025

Uh oh!

aldehir commented Dec 17, 2025

Uh oh!

pwilkin commented Dec 17, 2025

Uh oh!

aldehir commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aldehir commented Dec 17, 2025 •

edited

Loading