fix: convert_hf_to_gguf - map new mistral-common valid_tokenizer_files output to avoid crash with --mistral-format by SmartestWashingMachine · Pull Request #17712 · ggml-org/llama.cpp

SmartestWashingMachine · 2025-12-03T04:48:18Z

mistral-common updated _filter_valid_tokenizer_files to return additional data we don't need or expect, causing the conversion to crash when --mistral-format is used.

This change just maps the output back into the format originally expected.

Tested on mistral-common==1.8.3 and mistral-common==1.8.6

... that said, there's a different issue with the new Ministral models, which is partially related to how --mistral-format works:

get_community_chat_templates() is invoked with --mistral-format, and the logic there results in the model getting the "unsloth-mistral-Devstral-Small-2507.jinja" template - not Ministral's local chat_template.jinja - that's only used for SpecialVocab models (not MistralVocab).

To avoid this, users currently need to manually specify the chat template when running the model on e.g: llama-server with --jinja --chat-template-file "./chat_template.jinja"

This issue is the case regardless of this PR - as long as --mistral-format is used I think.

…istral-common versions.

CISC · 2025-12-03T08:20:43Z

I think the correct fix is to try to import and useget_one_valid_tokenizer_file instead and if it doesn't exist have a fallback that uses _filter_valid_tokenizer_files.

… fallback to old logic otherwise.

SmartestWashingMachine · 2025-12-03T09:37:41Z

Yeah makes sense. Checking for tuple didn't feel right...

What about this?

also get_one_valid_tokenizer_file() seems to return a path no matter what, which when combined with base_path / tokenizer_file later in loading the tokenizer causes a malformed path e.g: mistral3b\mistral3b\tekken.json

so there's another (kind of redundant) Path(tokenizer_file_path).name call instead to just get tekken.json, same behavior as the old code.

Tested on mistral-common==1.8.3 and mistral-common==1.8.6

…ctories.

CISC · 2025-12-03T12:40:20Z

Thank you for taking the time to fix this, will merge once CI gets its act together. :)

CISC · 2025-12-03T14:57:36Z

get_community_chat_templates() is invoked with --mistral-format, and the logic there results in the model getting the "unsloth-mistral-Devstral-Small-2507.jinja" template - not Ministral's local chat_template.jinja - that's only used for SpecialVocab models (not MistralVocab).

Mistral never provided chat templates before, but it would make sense to use chat_template.jinja if one exists, PR is welcome.

…l-format) (ggml-org#17712) * fix convert_hf_to_gguf.py failing with --mistral-format using later mistral-common versions. * use get_one_valid_tokenizer_file from mistral-common if available and fallback to old logic otherwise. * use file name instead of file path for get_one_valid_tokenizer_file. * fix --mistral-format tokenizer file failing for tokenizers in subdirectories. * move get_one_valid_tokenizer_file import to avoid nested try-except.

* origin/master: server: strip content-length header on proxy (ggml-org#17734) server: move msg diffs tracking to HTTP thread (ggml-org#17740) examples : add missing code block end marker [no ci] (ggml-org#17756) common : skip model validation when --help is requested (ggml-org#17755) ggml-cpu : remove asserts always evaluating to false (ggml-org#17728) convert: use existing local chat_template if mistral-format model has one. (ggml-org#17749) cmake : simplify build info detection using standard variables (ggml-org#17423) ci : disable ggml-ci-x64-amd-* (ggml-org#17753) common: use native MultiByteToWideChar (ggml-org#17738) metal : use params per pipeline instance (ggml-org#17739) llama : fix sanity checks during quantization (ggml-org#17721) build : move _WIN32_WINNT definition to headers (ggml-org#17736) build: enable parallel builds in msbuild using MTT (ggml-org#17708) ggml-cpu: remove duplicate conditional check 'iid' (ggml-org#17650) Add a couple of file types to the text section (ggml-org#17670) convert : support latest mistral-common (fix conversion with --mistral-format) (ggml-org#17712) Use OpenAI-compatible `/v1/models` endpoint by default (ggml-org#17689) webui: Fix zero pasteLongTextToFileLen to disable conversion being overridden (ggml-org#17445)

…l-format) (ggml-org#17712) * fix convert_hf_to_gguf.py failing with --mistral-format using later mistral-common versions. * use get_one_valid_tokenizer_file from mistral-common if available and fallback to old logic otherwise. * use file name instead of file path for get_one_valid_tokenizer_file. * fix --mistral-format tokenizer file failing for tokenizers in subdirectories. * move get_one_valid_tokenizer_file import to avoid nested try-except.

…l-format) (#17712) * fix convert_hf_to_gguf.py failing with --mistral-format using later mistral-common versions. * use get_one_valid_tokenizer_file from mistral-common if available and fallback to old logic otherwise. * use file name instead of file path for get_one_valid_tokenizer_file. * fix --mistral-format tokenizer file failing for tokenizers in subdirectories. * move get_one_valid_tokenizer_file import to avoid nested try-except.

…l-format) (ggml-org#17712) * fix convert_hf_to_gguf.py failing with --mistral-format using later mistral-common versions. * use get_one_valid_tokenizer_file from mistral-common if available and fallback to old logic otherwise. * use file name instead of file path for get_one_valid_tokenizer_file. * fix --mistral-format tokenizer file failing for tokenizers in subdirectories. * move get_one_valid_tokenizer_file import to avoid nested try-except.

fix convert_hf_to_gguf.py failing with --mistral-format using later m…

85f9d68

…istral-common versions.

SmartestWashingMachine requested a review from CISC as a code owner December 3, 2025 04:48

loci-dev mentioned this pull request Dec 3, 2025

UPSTREAM PR #17712: fix: convert_hf_to_gguf - map new mistral-common valid_tokenizer_files output to avoid crash with --mistral-format auroralabs-loci/llama.cpp#410

Open

github-actions Bot added the python python script changes label Dec 3, 2025

SmartestWashingMachine added 2 commits December 3, 2025 20:16

use get_one_valid_tokenizer_file from mistral-common if available and…

c2ac9c4

… fallback to old logic otherwise.

use file name instead of file path for get_one_valid_tokenizer_file.

6bd9bea

CISC reviewed Dec 3, 2025

View reviewed changes

Comment thread gguf-py/gguf/vocab.py Outdated

fix --mistral-format tokenizer file failing for tokenizers in subdire…

eadc0b3

…ctories.

CISC reviewed Dec 3, 2025

View reviewed changes

Comment thread gguf-py/gguf/vocab.py Outdated

move get_one_valid_tokenizer_file import to avoid nested try-except.

555ee3b

CISC approved these changes Dec 3, 2025

View reviewed changes

CISC merged commit 424c579 into ggml-org:master Dec 3, 2025
4 checks passed

SmartestWashingMachine mentioned this pull request Dec 4, 2025

fix: convert_hf_to_gguf - use existing local chat_template if mistral-format model has one. #17749

Merged

loci-dev mentioned this pull request Dec 4, 2025

UPSTREAM PR #17749: fix: convert_hf_to_gguf - use existing local chat_template if mistral-format model has one. auroralabs-loci/llama.cpp#424

Open

gabe-l-hart mentioned this pull request Dec 10, 2025

feat: llama.cpp bump (17f7f4) for SSM performance improvements ollama/ollama#13408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: convert_hf_to_gguf - map new mistral-common valid_tokenizer_files output to avoid crash with --mistral-format#17712

fix: convert_hf_to_gguf - map new mistral-common valid_tokenizer_files output to avoid crash with --mistral-format#17712
CISC merged 5 commits into
ggml-org:masterfrom
SmartestWashingMachine:fix_mistral_format_conversion

SmartestWashingMachine commented Dec 3, 2025 •

edited

Loading

Uh oh!

CISC commented Dec 3, 2025

Uh oh!

SmartestWashingMachine commented Dec 3, 2025

Uh oh!

Uh oh!

CISC commented Dec 3, 2025

Uh oh!

Uh oh!

CISC commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SmartestWashingMachine commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Dec 3, 2025

Uh oh!

SmartestWashingMachine commented Dec 3, 2025

Uh oh!

Uh oh!

CISC commented Dec 3, 2025

Uh oh!

Uh oh!

CISC commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SmartestWashingMachine commented Dec 3, 2025 •

edited

Loading