vulkan: fix noncontig check for mat_mul_id splitting by jeffbolznv · Pull Request #14683 · ggml-org/llama.cpp

jeffbolznv · 2025-07-14T21:43:41Z

Reported at ikawrakow/ik_llama.cpp#608 (comment), but a different fix.

I'm still seeing flash attention fail with this model, but I'll look into that separately.

Remove supports_op check for > 4096 (splitting fixes this)

jeffbolznv · 2025-07-14T21:44:09Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

    return
        tensor->nb[0] == ggml_type_size(tensor->type) &&
        tensor->nb[1] == (tensor->nb[0]*tensor->ne[0])/ggml_blck_size(tensor->type) &&
-        tensor->nb[3] == tensor->nb[2]*tensor->ne[2];


@0cc4m do you recall where there is a check for dim3 here at all? Based on the function name it seems like it should only care about dims 0,1.

Yeah, it should. I'm not 100% sure, but it was maybe related to multiple mul_mat calls or broadcasting. When this was written the mul_mat shader handled only the first two dimensions and was called multiple times to do the other dimensions.

If I remove the last part of the check, there are some failures in mul_mat tests. Maybe worth looking into, but I think this change is OK for now.

Probably because it falls back to dequant to fp16 + matmul in a few cases due to the third check.

jeffbolznv · 2025-07-15T14:34:06Z

I'm still seeing flash attention fail with this model, but I'll look into that separately.

I found that this was hitting the dequant path in mul_mat and was only dequantizing the first batch. Most recent commit fixes this. I still can see some failures in IQ quants if I force this path, but those happen even when the batch dimension is 1.

0cc4m

LGTM

* vulkan: fix noncontig check for mat_mul_id splitting Remove supports_op check for > 4096 (splitting fixes this) * vulkan: fix batched matmul dequant for Q*_K

vulkan: fix noncontig check for mat_mul_id splitting

69b7db8

Remove supports_op check for > 4096 (splitting fixes this)

jeffbolznv requested a review from 0cc4m July 14, 2025 21:43

jeffbolznv commented Jul 14, 2025

View reviewed changes

jeffbolznv mentioned this pull request Jul 14, 2025

Vulkan: a fresh start ikawrakow/ik_llama.cpp#608

Merged

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 14, 2025

vulkan: fix batched matmul dequant for Q*_K

168a4b2

0cc4m approved these changes Jul 15, 2025

View reviewed changes

0cc4m merged commit ba1ceb3 into ggml-org:master Jul 15, 2025
44 of 48 checks passed

slaren mentioned this pull request Nov 16, 2025

Eval bug: unsloth/Kimi-K2-Thinking-GGUF:UD-Q4_K_XL -ot ".ffn_.*_exps.=CPU" #17269

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: fix noncontig check for mat_mul_id splitting#14683

vulkan: fix noncontig check for mat_mul_id splitting#14683
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:mul_mat_id_contig

jeffbolznv commented Jul 14, 2025

Uh oh!

jeffbolznv Jul 14, 2025

Uh oh!

0cc4m Jul 15, 2025

Uh oh!

jeffbolznv Jul 15, 2025

Uh oh!

0cc4m Jul 15, 2025

Uh oh!

jeffbolznv commented Jul 15, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeffbolznv commented Jul 14, 2025

Uh oh!

jeffbolznv Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

0cc4m Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

jeffbolznv commented Jul 15, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants