UPSTREAM PR #17694: model : add ASR support for LFM2-Audio-1.5B#578
UPSTREAM PR #17694: model : add ASR support for LFM2-Audio-1.5B#578
Conversation
|
Explore the complete analysis inside the Version Insights Performance Analysis Summary: PR #578 - LFM2-Audio ASR SupportOverviewPR #578 introduces ASR support for the LFM2-Audio-1.5B multimodal model through 678 additions across 16 files. The changes implement a new conformer-based audio encoder architecture, extend CUDA SSM convolution kernels to support kernel size 9, and add dynamic mel spectrogram generation with configurable FFT parameters. Key FindingsPerformance-Critical Function ImpactThe analysis reveals no modifications to core inference functions ( CUDA SSM Convolution Kernel (
Audio Preprocessing (
LFM2 Encoder Graph (
Tokens Per Second ImpactNo degradation expected for text inference. The reference metric (7% TPS reduction for 2000 ns slower
For audio-to-text inference, the new LFM2 encoder introduces expected computational overhead inherent to the conformer architecture, but this represents new functionality rather than regression of existing capabilities. Power Consumption AnalysisBinary-level analysis shows no changes to core llama inference binaries. The PR adds new code paths within the mtmd (multimodal) tooling:
Power consumption impact is limited to audio processing workloads. Text inference power draw remains unchanged as the execution path does not invoke audio encoder operations. Code Implementation AssessmentThe changes implement well-structured additions:
The implementation represents a feature addition rather than modification of existing inference paths, explaining the absence of performance impact on core text generation metrics. |
4664cb4 to
799183f
Compare
5044ab6 to
ba9e597
Compare
193b250 to
88be9c1
Compare
43ae401 to
37b9287
Compare
Mirrored from ggml-org/llama.cpp#17694
LFM2-Audio-1.5B supports audio input and audio output.
PR adds only ASR support. To perform ASR invoke CLI with
Changes to existing code:
-sysenabled forllama-mtmd-clin_fftvaluesOP_SSM_CONVfor CUDA backend is extended to support kernel size 9cc: @ngxson