UPSTREAM PR #17889: convert: allow using quantized Mistral weight by loci-dev · Pull Request #501 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-09T17:37:58Z

target model: https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512

need --mistral-format, otherwise it has problem with tokenizer

Co-authored-by: compilade <[email protected]>

loci-review · 2025-12-09T18:33:58Z

Explore the complete analysis inside the Version Insights

Based on the available context, I cannot provide a comprehensive performance analysis as the required performance metrics (version_id, version_id_base, project_id) and analysis tools are not accessible in the current conversation state.

Available Information:
The commit modifies convert_hf_to_gguf.py, a Python script for converting Hugging Face models to GGUF format. This is a conversion utility, not part of the inference runtime path.

Analysis Limitation:
Without access to:

Binary performance metrics from LCLM predictions
Function-level throughput and response time data
Flame graph comparisons
Power consumption analysis results

I cannot determine the actual performance impact on inference operations like llama_decode, llama_encode, or llama_tokenize.

Expected Impact:
Changes to the conversion script typically do not affect runtime inference performance or tokens per second, as this code executes during model preparation, not during inference. The inference performance is determined by the generated GGUF file structure and the runtime execution in llama.cpp, not by the conversion script itself.

To provide the requested analysis, please supply the project_id, version_id, and version_id_base parameters so I can retrieve the actual performance metrics.

loci-review · 2025-12-09T19:29:09Z

Explore the complete analysis inside the Version Insights

Performance Review Summary

PR #501: Mistral Quantized Weight Conversion Support

This PR modifies the model conversion utility (convert_hf_to_gguf.py) to support Mistral-format FP8-quantized models. The changes add tensor name mapping for .qscale_weight suffixes, remove a blocking error for quantized vision weights, and transform Mistral quantization config to HuggingFace format.

Performance Impact: Zero impact on inference performance. The conversion script executes during model preparation, not during runtime inference. Power consumption analysis confirms 0.0% change across all binaries (libllama.so, llama-run, llama-cli, llama-server). No functions in the inference path (llama_decode, llama_encode, llama_tokenize) are modified. Tokens per second remains unchanged.

The code changes are isolated to the conversion utility and do not affect the compiled binaries or runtime execution paths. The converter outputs standard GGUF format with dequantized weights, which llama.cpp processes identically to non-quantized models.

ngxson added 2 commits December 9, 2025 18:05

convert: allow using quantized Mistral weight

83f3004

data_torch.ndim

b246a57

loci-dev temporarily deployed to PROD__AL_DEMO December 9, 2025 17:38 — with GitHub Actions Inactive

update dequant fn

e564478

Co-authored-by: compilade <[email protected]>

loci-dev temporarily deployed to PROD__AL_DEMO December 9, 2025 18:41 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 23 times, most recently from 8ed91d0 to 985a61f Compare December 12, 2025 19:07

loci-dev force-pushed the main branch 30 times, most recently from 691dba3 to 5c24b24 Compare December 17, 2025 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #17889: convert: allow using quantized Mistral weight#501

UPSTREAM PR #17889: convert: allow using quantized Mistral weight#501
loci-dev wants to merge 3 commits intomainfrom
upstream-PR17889-branch_ngxson-xsn/devstral2_convert

loci-dev commented Dec 9, 2025

Uh oh!

loci-review bot commented Dec 9, 2025

Uh oh!

loci-review bot commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Dec 9, 2025

Uh oh!

loci-review bot commented Dec 9, 2025

Uh oh!

loci-review bot commented Dec 9, 2025

Performance Review Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants