Skip to content

UPSTREAM PR #18876: mtmd : fix ASR for LFM2.5-Audio-1.5B#939

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18876-branch_tdakhran-tarek/fix/fix-asr
Open

UPSTREAM PR #18876: mtmd : fix ASR for LFM2.5-Audio-1.5B#939
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18876-branch_tdakhran-tarek/fix/fix-asr

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18876

The callback was renaming the input tensor and leading to error

/data/git/llama.cpp/tools/mtmd/clip.cpp:3358: Failed to get tensor inp_raw

The first commit causing the issue is ggml-org/llama.cpp#17914.

@loci-review
Copy link

loci-review bot commented Jan 16, 2026

Explore the complete analysis inside the Version Insights

Performance Review Report

Summary

This update introduces negligible performance impact with a single commit fixing ASR (Automatic Speech Recognition) for LFM2.5-Audio-1.5B model. The changes affect only non-critical logging and metadata utility functions in the mtmd (multimodal) library, with improvements under 50ns.

Performance Impact

Two functions in build.bin.libmtmd.so show minor improvements through compiler optimizations without source code changes:

  • clip_log_internal_v: Response time reduced by 46ns (269ns → 223ns) via entry block consolidation
  • gguf_data_to_str: Response time reduced by 38ns (1802ns → 1764ns) through instruction optimization

Both functions operate outside inference hot paths—one handles diagnostic logging for InternVL vision models, the other converts GGUF metadata to strings during model loading.

Power Consumption

The mtmd library shows a 0.233% increase in power consumption (178,378 → 178,793 nJ), representing a 415 nJ increase. This negligible change aligns with the addition of 37 new files for ASR functionality. All other binaries show zero power consumption change.

Code Changes

The commit adds ASR support for the LFM2.5-Audio-1.5B audio model with 37 new files and 2 modifications. The performance improvements in the two analyzed functions result from compiler-level optimizations rather than intentional code changes, likely due to build configuration updates accompanying the new ASR feature.

@loci-dev loci-dev force-pushed the main branch 3 times, most recently from d36a5a9 to 49ab1eb Compare January 16, 2026 13:20
@Melisa-aurora
Copy link

@loci-dev What is the biggest difference in response time?

@loci-review
Copy link

loci-review bot commented Jan 16, 2026

Based on the performance review report, the biggest difference in response time is 46ns for the function clip_log_internal_v, which improved from 269ns to 223ns.

This represents the largest absolute change among the functions analyzed in this update. The only other function with a response time change is gguf_data_to_str, which improved by 38ns (from 1802ns to 1764ns).

Both of these are very small improvements in absolute terms - we're talking about nanoseconds - and they occurred in non-critical utility functions rather than in the main inference hot paths.

@loci-dev loci-dev force-pushed the main branch 22 times, most recently from d664a5a to 48924ee Compare January 21, 2026 12:17
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 6d56868 to a50395f Compare January 27, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants