Skip to content

vulkan: fix noncontig check for mat_mul_id splitting#14683

Merged
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:mul_mat_id_contig
Jul 15, 2025
Merged

vulkan: fix noncontig check for mat_mul_id splitting#14683
0cc4m merged 2 commits intoggml-org:masterfrom
jeffbolznv:mul_mat_id_contig

Conversation

@jeffbolznv
Copy link
Contributor

Reported at ikawrakow/ik_llama.cpp#608 (comment), but a different fix.

I'm still seeing flash attention fail with this model, but I'll look into that separately.

Remove supports_op check for > 4096 (splitting fixes this)
@jeffbolznv jeffbolznv requested a review from 0cc4m July 14, 2025 21:43
return
tensor->nb[0] == ggml_type_size(tensor->type) &&
tensor->nb[1] == (tensor->nb[0]*tensor->ne[0])/ggml_blck_size(tensor->type) &&
tensor->nb[3] == tensor->nb[2]*tensor->ne[2];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0cc4m do you recall where there is a check for dim3 here at all? Based on the function name it seems like it should only care about dims 0,1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should. I'm not 100% sure, but it was maybe related to multiple mul_mat calls or broadcasting. When this was written the mul_mat shader handled only the first two dimensions and was called multiple times to do the other dimensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the last part of the check, there are some failures in mul_mat tests. Maybe worth looking into, but I think this change is OK for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably because it falls back to dequant to fp16 + matmul in a few cases due to the third check.

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 14, 2025
@jeffbolznv
Copy link
Contributor Author

I'm still seeing flash attention fail with this model, but I'll look into that separately.

I found that this was hitting the dequant path in mul_mat and was only dequantizing the first batch. Most recent commit fixes this. I still can see some failures in IQ quants if I force this path, but those happen even when the batch dimension is 1.

Copy link
Contributor

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0cc4m 0cc4m merged commit ba1ceb3 into ggml-org:master Jul 15, 2025
44 of 48 checks passed
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* vulkan: fix noncontig check for mat_mul_id splitting

Remove supports_op check for > 4096 (splitting fixes this)

* vulkan: fix batched matmul dequant for Q*_K
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants