Skip to content

UPSTREAM PR #17606: vulkan : fix FA mask load with bounds check (coopmat2)#369

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17606-branch_Acly-vulkan-fix-fa-mask-nopad
Open

UPSTREAM PR #17606: vulkan : fix FA mask load with bounds check (coopmat2)#369
loci-dev wants to merge 1 commit intomainfrom
upstream-PR17606-branch_Acly-vulkan-fix-fa-mask-nopad

Conversation

@loci-dev
Copy link
Copy Markdown

Mirrored from ggml-org/llama.cpp#17606

Bug was caused by variable shadowing in flash_attn_cm2, introduced in #17186

It was only observable with coopmat2 and GGML_KQ_MASK_PAD=1

Repro:

  1. modify ggml.h and set GGML_KQ_MASK_PAD 1
  2. run test-backend-ops-o FLASH_ATTN_EXT

@loci-review
Copy link
Copy Markdown

loci-review bot commented Nov 30, 2025

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #369

Analysis: Vulkan shader bug fix removing variable shadowing in Flash Attention cooperative matrix implementation.

Performance Impact: Zero measurable change across all binaries. Power consumption analysis shows 0.0% variation in all 16 binaries (libllama.so: 193,067 nJ unchanged, llama-run: 192,101 nJ unchanged). No functions exhibit response time or throughput changes.

Code Change: Single-line modification in flash_attn_cm2.comp removes redundant mv variable declaration that shadowed outer-scope variable. This correctness fix eliminates undefined behavior when GGML_KQ_MASK_PAD=1 is enabled with Vulkan coopmat2 backend.

Inference Impact: No tokens-per-second impact. Core inference functions (llama_decode, llama_encode, llama_tokenize) show identical execution profiles between versions.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 1c3cc79 to 0332e09 Compare December 2, 2025 21:09
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 6649a5f to 7d0b0c3 Compare December 6, 2025 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants