Skip to content

UPSTREAM PR #18009: model-conversion : cast logits to float32#557

Open
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18009-branch_ggml-org-gg/fix-logits-type
Open

UPSTREAM PR #18009: model-conversion : cast logits to float32#557
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18009-branch_ggml-org-gg/fix-logits-type

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18009

Always dump F32 logits

@loci-review
Copy link

loci-review bot commented Dec 13, 2025

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #557

Overview

PR #557 introduces a single-line change in a Python validation script (run-org-model.py) that adds explicit float32 casting to logits before NumPy conversion. This modification affects model conversion testing infrastructure, not the runtime inference engine.

Analysis Result: No performance impact detected on llama.cpp binaries or inference functions.

Key Findings

Code Change Scope:

  • Modified file: examples/model-conversion/scripts/causal/run-org-model.py (testing utility)
  • Change: Added .float() call to ensure FP32 precision in logits extraction
  • Purpose: Guarantees consistent float32 output for model validation comparisons

Performance Metrics:

  • Response Time changes: 0 functions affected
  • Throughput changes: 0 functions affected
  • Power consumption: All binaries show 0.0% change
    • build.bin.libllama.so: 0 nJ change
    • build.bin.libggml.so: 0 nJ change
    • build.bin.llama-bench: 0 nJ change
    • All other binaries: 0 nJ change

Inference Impact:

  • No changes to llama_decode, llama_encode, or llama_tokenize functions
  • No modifications to core inference paths in C++ codebase
  • Tokens per second: Unaffected (0 ns change in inference functions)

Technical Context:
The modified script is a validation tool that runs PyTorch models to generate reference logits for comparison with llama.cpp outputs. The .float() addition ensures precision consistency when models use mixed precision (FP16/BF16), preventing false positives in validation tests. This change operates outside the runtime inference pipeline and executes only during model conversion validation workflows.

Conclusion:
This PR improves correctness of the model validation tooling without impacting runtime performance, binary efficiency, or token generation throughput.

@loci-dev loci-dev force-pushed the main branch 26 times, most recently from ac67b1d to ba9dcbb Compare December 17, 2025 00:35
@loci-dev loci-dev force-pushed the main branch 30 times, most recently from 25154fc to 7ac0e44 Compare December 22, 2025 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants