UPSTREAM PR #17606: vulkan : fix FA mask load with bounds check (coopmat2)#369
UPSTREAM PR #17606: vulkan : fix FA mask load with bounds check (coopmat2)#369
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary - PR #369Analysis: Vulkan shader bug fix removing variable shadowing in Flash Attention cooperative matrix implementation. Performance Impact: Zero measurable change across all binaries. Power consumption analysis shows 0.0% variation in all 16 binaries (libllama.so: 193,067 nJ unchanged, llama-run: 192,101 nJ unchanged). No functions exhibit response time or throughput changes. Code Change: Single-line modification in Inference Impact: No tokens-per-second impact. Core inference functions (llama_decode, llama_encode, llama_tokenize) show identical execution profiles between versions. |
1c3cc79 to
0332e09
Compare
6649a5f to
7d0b0c3
Compare
Mirrored from ggml-org/llama.cpp#17606
Bug was caused by variable shadowing in flash_attn_cm2, introduced in #17186
It was only observable with coopmat2 and
GGML_KQ_MASK_PAD=1Repro:
ggml.hand setGGML_KQ_MASK_PAD 1test-backend-ops-o FLASH_ATTN_EXT