Skip to content

UPSTREAM PR #17898: model : Qwen3-Next-80B-A3B has 48 layers#506

Open
loci-dev wants to merge 2 commits intomainfrom
upstream-PR17898-branch_EZForever-model-qwen3next-layers
Open

UPSTREAM PR #17898: model : Qwen3-Next-80B-A3B has 48 layers#506
loci-dev wants to merge 2 commits intomainfrom
upstream-PR17898-branch_EZForever-model-qwen3next-layers

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#17898

Qwen3-Next-80B-A3B has 48 layers instead of 80, as pointed out by model README and a comment in original PR.

This change should be purely cosmetic, fixes "?B" model names shown by llama-bench, etc.

@loci-review
Copy link

loci-review bot commented Dec 10, 2025

Explore the complete analysis inside the Version Insights

Pull Request #506 Technical Review

PR Summary

Title: UPSTREAM PR #17898: model : Qwen3-Next-80B-A3B has 48 layers
Changes: Corrects layer count for Qwen3-Next-80B-A3B model from 80 to 48 layers and adds missing type name string mapping.

Code Changes Analysis

Modified File: src/llama-model.cpp

Change 1 - llm_type_name() function (line 123):

  • Added case statement: case LLM_TYPE_80B_A3B: return "80B.A3B";
  • Purpose: Provides string representation for the 80B_A3B model type enum
  • Impact: Enables proper model name display in llama-bench and other tools

Change 2 - llama_model::load_hparams() function (line 2261):

  • Modified layer count check: case 80:case 48:
  • Purpose: Corrects model architecture detection for Qwen3-Next-80B-A3B
  • Impact: Ensures correct model type assignment during model loading based on actual layer count

Performance Impact Assessment

Function: llm_type_name()

  • Base response time: 57 ns
  • Current response time: 62 ns
  • Absolute change: +5 ns
  • Analysis: The addition of one case statement in the switch block adds minimal overhead. The 5 ns increase represents a single additional comparison in the switch dispatch logic.

Function: llama_model::load_hparams()

  • This function is part of model loading, not inference path
  • Changes affect model initialization only, executed once per model load
  • No impact on per-token inference performance

Inference Performance:
No functions in the inference path (llama_decode, llama_encode, llama_tokenize) were modified. The changes are isolated to model metadata handling and initialization logic. Tokens per second remains unaffected.

Power Consumption:

  • Binary: build.bin.libllama.so
  • Change: +0.018% (+36 nJ)
  • Analysis: Negligible increase consistent with one additional switch case

The changes are cosmetic corrections to model metadata with no measurable impact on inference performance or throughput.

@loci-dev loci-dev force-pushed the main branch 26 times, most recently from de9b0c0 to b28744d Compare December 13, 2025 10:08
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 6c677ac to c39aef9 Compare December 18, 2025 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants