UPSTREAM PR #17606: vulkan : fix FA mask load with bounds check (coopmat2) by loci-dev · Pull Request #369 · auroralabs-loci/llama.cpp

loci-dev · 2025-11-29T23:33:56Z

Mirrored from ggml-org/llama.cpp#17606

Bug was caused by variable shadowing in flash_attn_cm2, introduced in #17186

It was only observable with coopmat2 and GGML_KQ_MASK_PAD=1

Repro:

modify ggml.h and set GGML_KQ_MASK_PAD 1
run test-backend-ops-o FLASH_ATTN_EXT

loci-review · 2025-11-30T00:15:07Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #369

Analysis: Vulkan shader bug fix removing variable shadowing in Flash Attention cooperative matrix implementation.

Performance Impact: Zero measurable change across all binaries. Power consumption analysis shows 0.0% variation in all 16 binaries (libllama.so: 193,067 nJ unchanged, llama-run: 192,101 nJ unchanged). No functions exhibit response time or throughput changes.

Code Change: Single-line modification in flash_attn_cm2.comp removes redundant mv variable declaration that shadowed outer-scope variable. This correctness fix eliminates undefined behavior when GGML_KQ_MASK_PAD=1 is enabled with Vulkan coopmat2 backend.

Inference Impact: No tokens-per-second impact. Core inference functions (llama_decode, llama_encode, llama_tokenize) show identical execution profiles between versions.

vulkan : fix FA mask load with bounds check (coopmat2)

c6a217c

loci-dev had a problem deploying to PROD__AL_DEMO November 29, 2025 23:34 — with GitHub Actions Failure

loci-dev force-pushed the main branch 27 times, most recently from 1c3cc79 to 0332e09 Compare December 2, 2025 21:09

loci-dev force-pushed the main branch 30 times, most recently from 6649a5f to 7d0b0c3 Compare December 6, 2025 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17606: vulkan : fix FA mask load with bounds check (coopmat2)#369

UPSTREAM PR #17606: vulkan : fix FA mask load with bounds check (coopmat2)#369
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17606-branch_Acly-vulkan-fix-fa-mask-nopad

loci-dev commented Nov 29, 2025

Uh oh!

loci-review bot commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Nov 29, 2025

Uh oh!

loci-review bot commented Nov 30, 2025

Performance Analysis Summary - PR #369

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants