UPSTREAM PR #17241: ggml-cpu: handle 3d tensors in repack mat_mul#191
UPSTREAM PR #17241: ggml-cpu: handle 3d tensors in repack mat_mul#191
Conversation
|
Access the complete analysis in the LOCI Dashboard Performance Analysis Summary: PR #191 - 3D Tensor Support in GGML CPU RepackOverviewPull Request #191 introduces 3D tensor support to the GGML CPU repack matrix multiplication system. The changes enable processing of transformer models with batch dimensions while maintaining numerical accuracy, but introduce measurable performance overhead in quantized operations. Key FindingsPerformance Impact:
Core Function Impact: Inference Performance Impact: Technical Analysis:
Affected Binaries:
The implementation successfully enables 3D tensor processing while maintaining backward compatibility, with performance trade-offs concentrated in specific quantized operation paths. |
701e6c7 to
6196a56
Compare
53eeb3f to
2531f8a
Compare
Mirrored from ggml-org/llama.cpp#17241
This is a continuation of #17030 after a performance regression was reported.
Perplexity Comparison (Repack vs Non-Repack)
Command:
Llama-bench
build: c77bafd (6967) THIS PR
build: 2776db6 (7047) MASTER