UPSTREAM PR #18009: model-conversion : cast logits to float32 by loci-dev · Pull Request #557 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-13T20:36:31Z

Always dump F32 logits

loci-review · 2025-12-13T21:38:19Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #557

Overview

PR #557 introduces a single-line change in a Python validation script (run-org-model.py) that adds explicit float32 casting to logits before NumPy conversion. This modification affects model conversion testing infrastructure, not the runtime inference engine.

Analysis Result: No performance impact detected on llama.cpp binaries or inference functions.

Key Findings

Code Change Scope:

Modified file: examples/model-conversion/scripts/causal/run-org-model.py (testing utility)
Change: Added .float() call to ensure FP32 precision in logits extraction
Purpose: Guarantees consistent float32 output for model validation comparisons

Performance Metrics:

Response Time changes: 0 functions affected
Throughput changes: 0 functions affected
Power consumption: All binaries show 0.0% change
- build.bin.libllama.so: 0 nJ change
- build.bin.libggml.so: 0 nJ change
- build.bin.llama-bench: 0 nJ change
- All other binaries: 0 nJ change

Inference Impact:

No changes to llama_decode, llama_encode, or llama_tokenize functions
No modifications to core inference paths in C++ codebase
Tokens per second: Unaffected (0 ns change in inference functions)

Technical Context:
The modified script is a validation tool that runs PyTorch models to generate reference logits for comparison with llama.cpp outputs. The .float() addition ensures precision consistency when models use mixed precision (FP16/BF16), preventing false positives in validation tests. This change operates outside the runtime inference pipeline and executes only during model conversion validation workflows.

Conclusion:
This PR improves correctness of the model validation tooling without impacting runtime performance, binary efficiency, or token generation throughput.

model-conversion : cast logits to float32

292f8e2

loci-dev temporarily deployed to PROD__AL_DEMO December 13, 2025 20:36 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from 18c8a27 to 453137d Compare December 13, 2025 21:07

loci-dev force-pushed the main branch 26 times, most recently from ac67b1d to ba9dcbb Compare December 17, 2025 00:35

loci-dev force-pushed the main branch 30 times, most recently from 25154fc to 7ac0e44 Compare December 22, 2025 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18009: model-conversion : cast logits to float32#557

UPSTREAM PR #18009: model-conversion : cast logits to float32#557
loci-dev wants to merge 1 commit intomainfrom
upstream-PR18009-branch_ggml-org-gg/fix-logits-type

loci-dev commented Dec 13, 2025

Uh oh!

loci-review bot commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 13, 2025

Uh oh!

loci-review bot commented Dec 13, 2025

Performance Analysis Summary: PR #557

Overview

Key Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants