UPSTREAM PR #17030: ggml-cpu: handle 3d tensors in repack mat_mul#94
UPSTREAM PR #17030: ggml-cpu: handle 3d tensors in repack mat_mul#94
Conversation
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: PR #94 - 3D Tensor Support in Matrix MultiplicationOverviewPR #94 introduces 3D tensor support for batched matrix multiplication operations in the GGML CPU backend, specifically targeting models like LFM2 that require Key FindingsPerformance Impact:
Core Function Impact: Power Consumption Analysis:
Technical Analysis:
Implementation Details: Actionable Recommendations:
The modifications successfully enable batched matrix operations for advanced model architectures while introducing acceptable performance overhead in specialized code paths. |
eadb483 to
0b86651
Compare
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: PR #94 - 3D Tensor SupportOverviewPR #94 introduces 3D tensor support for repack matrix multiplication operations, specifically targeting models like LFM2 that require batched operations with Key FindingsPerformance Impact:
Core Function Impact: Power Consumption Analysis:
Technical Analysis:
Implementation Changes:
Scope Assessment: |
b1ace60 to
bff7103
Compare
733e776 to
2c7fec2
Compare
|
Access the complete analysis in the LOCI Dashboard Performance Analysis SummaryOverviewAnalysis of PR #94 implementing 3D tensor support in repack matrix multiplication reveals measurable performance impacts in the CPU backend, with the primary changes affecting quantized matrix operations rather than core inference functions. Key FindingsPerformance Metrics:
Core Function Impact: Power Consumption:
Technical Analysis:
Affected Components:
Actionable Recommendations:
The changes successfully address correctness for 3D tensor operations while maintaining inference performance for standard workloads. |
9ea0205 to
1308d3f
Compare
Mirrored from ggml-org/llama.cpp#17030
While testing #16739, perplexities for LFM2 skyrocketed. @ggerganov pointed out that some matrix shapes would probably not be supported.
LFM2 has some layers that have two batches, so MAT_MULs were only done partially, leading to incorrect results. See ggml-org/llama.cpp#16739 (comment)
This patch adds basic support for tensors with
ne2 > 1, with very naive chunking based on the non repack MUL MAT.Perplexities using this patch:
I can provide logs for other models if needed.