tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice by angt · Pull Request #19954 · ggml-org/llama.cpp

angt · 2026-02-27T14:37:17Z

llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande

winogrande_score : tokenizing selected tasks
winogrande_score : calculating winogrande score over selected tasks.
split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
decode: failed to find a memory slot for batch of size 46
failed to decode the batch, n_batch = 2048, ret = 1
winogrande_score: llama_decode() failed

same for hellaswag:

split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag)
decode: failed to find a memory slot for batch of size 99
failed to decode the batch, n_batch = 2048, ret = 1
hellaswag_score: llama_decode() failed

…choice llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande winogrande_score : tokenizing selected tasks winogrande_score : calculating winogrande score over selected tasks. split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 46 failed to decode the batch, n_batch = 2048, ret = 1 winogrande_score: llama_decode() failed same for hellaswag: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 99 failed to decode the batch, n_batch = 2048, ret = 1 hellaswag_score: llama_decode() failed Signed-off-by: Adrien Gallouët <[email protected]>

angt · 2026-03-13T20:05:38Z

cc @ggerganov
Maybe you prefer to avoid this change but we just add -kvu in the example?

ggerganov · 2026-03-13T20:11:45Z

Think it is OK as proposed.

…choice (ggml-org#19954) llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande winogrande_score : tokenizing selected tasks winogrande_score : calculating winogrande score over selected tasks. split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 46 failed to decode the batch, n_batch = 2048, ret = 1 winogrande_score: llama_decode() failed same for hellaswag: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 99 failed to decode the batch, n_batch = 2048, ret = 1 hellaswag_score: llama_decode() failed Signed-off-by: Adrien Gallouët <[email protected]>

The hybrid memory paths (`llama-memory-hybrid.cpp` and `llama-memory-hybrid-iswa.cpp`) always used sequential equal split, ignoring the unified KV cache flag. This caused hellaswag, winogrande, and multiple-choice evaluations to fail on hybrid models (models with both attention and recurrent/SSM layers, such as Qwen3.5-35B-A3B) with: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) PR ggml-org#19954 fixed this for `llama-kv-cache-iswa.cpp` by automatically enabling unified KV mode and setting n_parallel >= 4 for multi-choice eval tasks. However, the hybrid memory paths were not updated. This commit mirrors the iswa fix: use non-sequential split when KV cache is unified (n_stream == 1), which is automatically set by llama-perplexity for hellaswag/winogrande/multiple-choice since ggml-org#19954. Tested on Qwen3.5-35B-A3B (hybrid attention+SSM MoE model): - HellaSwag: 83.0% (400 tasks) - Winogrande: 74.5% (400 tasks) - MMLU: 41.2% - ARC-Challenge: 56.2% - TruthfulQA: 37.7% All previously failed with llama_decode() error.

) The hybrid memory paths (`llama-memory-hybrid.cpp` and `llama-memory-hybrid-iswa.cpp`) always used sequential equal split, ignoring the unified KV cache flag. This caused hellaswag, winogrande, and multiple-choice evaluations to fail on hybrid models (models with both attention and recurrent/SSM layers, such as Qwen3.5-35B-A3B) with: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) PR #19954 fixed this for `llama-kv-cache-iswa.cpp` by automatically enabling unified KV mode and setting n_parallel >= 4 for multi-choice eval tasks. However, the hybrid memory paths were not updated. This commit mirrors the iswa fix: use non-sequential split when KV cache is unified (n_stream == 1), which is automatically set by llama-perplexity for hellaswag/winogrande/multiple-choice since #19954. Tested on Qwen3.5-35B-A3B (hybrid attention+SSM MoE model): - HellaSwag: 83.0% (400 tasks) - Winogrande: 74.5% (400 tasks) - MMLU: 41.2% - ARC-Challenge: 56.2% - TruthfulQA: 37.7% All previously failed with llama_decode() error.

angt requested a review from ggerganov as a code owner February 27, 2026 14:37

github-actions bot added the examples label Feb 27, 2026

angt changed the title ~~tools : enable kvu in perplexity for hellaswag, winogrande, multiple-…~~ tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice Feb 27, 2026

ggerganov approved these changes Mar 13, 2026

View reviewed changes

angt merged commit 463b6a9 into ggml-org:master Mar 13, 2026
77 of 78 checks passed

mudler mentioned this pull request Mar 31, 2026

memory: respect unified KV cache in hybrid memory for eval tasks #21224

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice#19954

tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice#19954
angt merged 1 commit intoggml-org:masterfrom
angt:tools-enable-kvu-in-perplexity-for-hellaswag-winogrande-multiple-choice

angt commented Feb 27, 2026 •

edited

Loading

Uh oh!

angt commented Mar 13, 2026

Uh oh!

ggerganov commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

angt commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angt commented Mar 13, 2026

Uh oh!

ggerganov commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

angt commented Feb 27, 2026 •

edited

Loading