UPSTREAM PR #18013: convert : refactor rope scaling handling by loci-dev · Pull Request #560 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-14T03:07:32Z

Handle rope scaling in set_gguf_parameters to deduplicate code and support the new rope_parameters (where rope_theta also has moved) introduced in huggingface/transformers#39847

Obsoletes #18008

loci-review · 2025-12-14T03:54:20Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #560

Overview

This PR refactors RoPE scaling parameter handling in the HuggingFace-to-GGUF conversion script (convert_hf_to_gguf.py). The changes are purely architectural, consolidating 269 lines of duplicated RoPE configuration logic across 30+ model classes into a unified base class implementation. No runtime inference code or compiled binaries are modified.

Performance Impact

Zero performance impact on inference. The refactoring affects only the Python-based model conversion script, which executes before inference begins. The compiled binaries (libllama.so, libggml-cpu.so, etc.) remain unchanged, as confirmed by power consumption analysis showing 0.0% change across all 16 binaries.

Key inference functions unaffected:

llama_decode - No changes (tokenization/inference path intact)
llama_encode - No changes
llama_tokenize - No changes
ggml_mul_mat - No changes (matrix operations unchanged)
llama_graph_compute - No changes (computation graph unchanged)

Tokens per second: No impact. Since no inference-path functions are modified, token generation throughput remains identical to the baseline.

Power consumption: All binaries show 0.0% change, with maximum observed variance of ±0.0002% (measurement noise). Binaries analyzed: libllama.so (195,281 nJ), libggml-cpu.so (116,389 nJ), llama-run (220,352 nJ), and 13 others.

Code Changes

The refactoring introduces self.rope_parameters as a unified configuration dictionary in ModelBase.__init__(), normalizing RoPE parameter access across all model architectures. The base class set_gguf_parameters() now handles five RoPE scaling types (linear, yarn, longrope, dynamic, llama3) that were previously duplicated in individual model classes. Model-specific implementations (Llama2Model, Qwen2Model, Ministral3Model, etc.) now delegate common parameter handling to the base class via super().set_gguf_parameters(), retaining only architecture-specific logic.

The changes improve code maintainability and support the new rope_parameters format from HuggingFace Transformers while maintaining full backward compatibility with existing rope_scaling configurations. GGUF metadata output remains functionally equivalent.

loci-review · 2025-12-14T04:39:19Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary - PR #560

Overview

This PR refactors RoPE (Rotary Position Embedding) parameter handling in the Python conversion script convert_hf_to_gguf.py. The changes affect model conversion workflow only, with no modifications to C++ inference binaries or performance-critical execution paths.

Analysis Results

Performance Metrics: No function-level performance data available. The summary report returned "no_data" status, indicating no measurable changes in Response Time or Throughput across all analyzed functions.

Power Consumption: All 16 binaries show 0.0% change in estimated power consumption between versions. The three binaries with negligible deltas (< 0.001%) are:

libllama.so: -0.56 nJ (-0.0003%)
llama-run: -0.19 nJ (-0.0001%)
llama-tts: -1.05 nJ (-0.0004%)

These sub-nanojoule variations represent measurement noise rather than functional changes.

Code Changes: The PR centralizes RoPE configuration management by introducing self.rope_parameters in ModelBase.__init__ and consolidating scaling logic in the base set_gguf_parameters() method. This eliminates 269 lines of duplicated code across 30+ model classes while adding 88 lines of unified handling logic.

Inference Impact: None. The refactoring affects only the Python-based model conversion process. No changes were made to:

llama_decode - primary inference function
llama_encode - prompt processing
llama_tokenize - tokenization logic
ggml_mul_mat - matrix operations
ggml_rope - RoPE computation kernels

The C++ inference engine remains unchanged, meaning tokens per second performance is unaffected. RoPE parameters are read from GGUF metadata at model load time, not during token generation.

Conclusion: This is a code quality improvement with zero runtime performance impact. The conversion script refactoring improves maintainability and adds support for new HuggingFace Transformers config formats without affecting inference speed or power consumption.

CISC added 2 commits December 14, 2025 03:52

refactor rope scaling handling

8a68748

ws--

468db7b

loci-dev temporarily deployed to PROD__AL_DEMO December 14, 2025 03:07 — with GitHub Actions Inactive

missed a couple

140d70b

loci-dev temporarily deployed to PROD__AL_DEMO December 14, 2025 03:51 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 23 times, most recently from 765e416 to 3c6cece Compare December 16, 2025 18:12

loci-dev force-pushed the main branch 30 times, most recently from 7ac0e44 to 5b544dd Compare December 22, 2025 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18013: convert : refactor rope scaling handling#560

UPSTREAM PR #18013: convert : refactor rope scaling handling#560
loci-dev wants to merge 3 commits intomainfrom
upstream-PR18013-branch_ggml-org-cisc/convert-rope-parameters

loci-dev commented Dec 14, 2025

Uh oh!

loci-review bot commented Dec 14, 2025

Uh oh!

loci-review bot commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 14, 2025

Uh oh!

loci-review bot commented Dec 14, 2025

Performance Analysis Summary - PR #560

Overview

Performance Impact

Code Changes

Uh oh!

loci-review bot commented Dec 14, 2025

Performance Analysis Summary - PR #560

Overview

Analysis Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants