Skip to content

UPSTREAM PR #17279: convert : set expert gating func in base class#217

Open
DajanaV wants to merge 1 commit intomainfrom
upstream-PR17279-branch_ggml-org-cisc/convert-common-expert-gating-func
Open

UPSTREAM PR #17279: convert : set expert gating func in base class#217
DajanaV wants to merge 1 commit intomainfrom
upstream-PR17279-branch_ggml-org-cisc/convert-common-expert-gating-func

Conversation

@DajanaV
Copy link
Copy Markdown
Collaborator

@DajanaV DajanaV commented Nov 14, 2025

Mirrored from ggml-org/llama.cpp#17279

Move add_expert_gating_func call to base class, no point in duplicating this.

Also fixes conversion failure for dots1 since the following fixes to the model:

@loci-review
Copy link
Copy Markdown

loci-review bot commented Nov 15, 2025

Access the complete analysis in the LOCI Dashboard

Performance Analysis Summary

Overview

Pull Request #217 implements a code refactoring to centralize expert gating function configuration in the base TextModel class, eliminating duplicate implementations across multiple model classes. The changes affect the Python model conversion script (convert_hf_to_gguf.py) rather than the core C++ inference engine.

Performance Impact Assessment

The analysis identified minimal performance variations in unrelated functions:

  • Highest Response Time Change: linenoiseSetCompletionCallback (+0.076%, +0.011 ns absolute)
  • Highest Throughput Change: make_unique<llm_graph_input_attn_no_cache> (+0.112%, +0.078 ns absolute)
  • Power Consumption: 0.0% change across all binaries

These performance changes are unrelated to the PR modifications and represent normal compilation variance rather than functional impacts.

Code Analysis

The refactoring consolidates expert gating function logic by:

  • Adding centralized parameter detection for ["score_function", "scoring_func", "score_func"]
  • Removing 25+ lines of duplicate code across 5 model classes
  • Standardizing error handling for unsupported gating functions
  • Fixing conversion failures for the dots1 model

Key Findings

Core Function Impact: None. The changes affect only the model conversion pipeline, not the critical inference functions (llama_decode, llama_encode, llama_tokenize) that determine tokens-per-second performance.

Performance Metrics: All detected changes fall within measurement noise levels (<1 ns absolute change). The functions showing performance variations (linenoiseSetCompletionCallback, template instantiation) are unrelated to the code modifications.

Power Consumption: No measurable impact across any binary components, confirming the changes don't affect runtime execution efficiency.

Code Quality: The refactoring improves maintainability by eliminating code duplication and providing consistent parameter handling across model classes.

Critical Issues: None identified. The implementation maintains backward compatibility while fixing conversion issues for specific model types.

The changes represent a positive code quality improvement with no meaningful performance impact on the inference engine.

@DajanaV DajanaV force-pushed the main branch 25 times, most recently from f333350 to 9c4623f Compare November 18, 2025 09:10
@loci-dev loci-dev force-pushed the main branch 2 times, most recently from 64f477c to 7c4fc52 Compare November 20, 2025 11:08
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 409b78f to b789b13 Compare November 27, 2025 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants