Skip to content

UPSTREAM PR #17042: ggml-hexagon: fix test-backend-ops failures on specific binary ops#100

Open
DajanaV wants to merge 1 commit intomainfrom
upstream-PR17042-branch_chraac-dev-fix-test-failure
Open

UPSTREAM PR #17042: ggml-hexagon: fix test-backend-ops failures on specific binary ops#100
DajanaV wants to merge 1 commit intomainfrom
upstream-PR17042-branch_chraac-dev-fix-test-failure

Conversation

@DajanaV
Copy link
Copy Markdown
Collaborator

@DajanaV DajanaV commented Nov 6, 2025

Mirrored from ggml-org/llama.cpp#17042

Summary

Fixes test-backend-ops failures in ggml-hexagon by correcting the index calculation for binary operations.

Changes

  • Fixed index calculation in binary ops to align with the cpu implementation in ggml/src/ggml-cpu/ops.cpp

Testing

TODO:

@DajanaV DajanaV force-pushed the main branch 4 times, most recently from b16251e to 95f6e9b Compare November 6, 2025 13:17
@loci-review
Copy link
Copy Markdown

loci-review bot commented Nov 6, 2025

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Analysis of version 274a4670-5520-4a90-8221-9afd5d526898 compared to base version 0211d925-b37b-4bdf-9ef8-7fce0422fc7b reveals minimal performance variations with no impact on core inference functions.

Key Findings

Performance Metrics:

  • Highest Response Time change: can_reuse function (+0.096%, +0.06 ns) in build.bin.libllama.so
  • Highest Throughput change: common_sampler_accept function (-0.11%, -0.07 ns improvement) in build.bin.llama-tts
  • Both functions are non-core utilities with negligible absolute impact

Core Function Impact:
No changes detected in critical inference functions (llama_decode, llama_encode, llama_tokenize). The modified functions operate in auxiliary roles:

  • can_reuse: Graph parameter validation utility
  • common_sampler_accept: Sampling mechanism helper

Inference Performance Impact:
Token throughput remains unaffected as no core tokenization or inference functions show performance changes. The reference baseline (7% tokens/second reduction per 2ms llama_decode slowdown) does not apply since llama_decode shows no measurable change.

Power Consumption Analysis:
System-wide power consumption remains stable across all binaries with changes <0.001%. Affected binaries show negligible decreases:

  • build.bin.libllama.so: -0.0002% power reduction
  • build.bin.llama-tts: -0.0002% power reduction

Technical Analysis:

  • Flame Graph: can_reuse exhibits simple leaf function behavior with 65 ns self-contained execution
  • CFG Comparison: Identical assembly code between versions indicates performance variation stems from external factors (memory layout, cache alignment) rather than algorithmic changes
  • Code Review: PR UPSTREAM PR #17042: ggml-hexagon: fix test-backend-ops failures on specific binary ops #100 modifies GGML Hexagon backend binary operations, unrelated to the measured performance changes

Conclusion:
The analysis reveals stable performance with minor variations in non-critical functions. No regressions affect core inference capabilities, and the system maintains consistent energy efficiency and computational performance.

@DajanaV DajanaV force-pushed the main branch 22 times, most recently from aa2fc28 to 0ad40ce Compare November 9, 2025 17:06
@DajanaV DajanaV force-pushed the main branch 30 times, most recently from e97d4a6 to 29827de Compare November 15, 2025 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants