UPSTREAM PR #16868: vulkan: fuse mul_mat+add and mul_mat_id+add_id#15
UPSTREAM PR #16868: vulkan: fuse mul_mat+add and mul_mat_id+add_id#15
Conversation
The fusion is only applied for the mat-vec mul paths.
|
Access the complete analysis in the LOCI Dashboard Based on my analysis of PR #15 and the code changes, I'll provide a comprehensive performance impact assessment focusing on the critical LLaMA.cpp functions and KPIs. Performance Impact Analysis: PR #15 Vulkan mul_mat+add FusionCritical Function ChangesThe PR modifies several performance-critical functions in the Vulkan backend: Modified Functions:
Control Flow Changes:
KPI Impact Assessment1. Tokens Per SecondImpacted Functions:
Performance Impact:
Inference Impact: Based on the reference that 2ms slower 2. Power ConsumptionImpacted Binaries:
Power Impact Factors:
3. Quantization EfficiencyImpacted Functions:
Changes:
4. Memory UsageImpacted Areas:
Memory Optimization:
5. Batch ProcessingImpacted Functions:
Batch Processing Improvements:
Action Items for Performance OptimizationImmediate Actions
Build System Optimizations
Code-Level Improvements
Performance SummaryThe Vulkan mul_mat+add fusion in PR #15 delivers measurable performance improvements across all critical KPIs:
The changes primarily benefit Vulkan-enabled inference workloads and maintain backward compatibility with existing code paths. The fusion mechanism is well-implemented with appropriate fallback handling for cases where fusion constraints aren't met. |
46af8d7 to
25582b5
Compare
Mirrored from ggml-org/llama.cpp#16868
The fusion is only applied for the mat-vec mul paths.
I had hesitated to implement this previously because when it kicks in it implicitly disables the add->rmsnorm optimization, but it seems like this is a pretty significant win in some cases. gpt-oss has a significant gain, it uses both mul_mat+add and mul_mat_id+add_id.