[Model] Support Mistral3 in the HF Transformers format #15505

mgoin · 2025-03-25T23:42:36Z

Works for text input and single image batches. Requires a fix to the pixtral processing in Transformers (huggingface/transformers#37019). It still fails to succeed on a full eval of chartqa in vLLM V1, seemingly due to batching encoding issues, so I forced this model to only run with V0 for now.

FIX #15212

Testing

Server started with

vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic

ChartQA Eval

FP8 checkpoint:

python -m eval.run eval_vllm --model_name nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "chartqa"
Waiting for VLLM server to come online at http://0.0.0.0:9000/health ...
Timeout is 120s
Waiting for server (0s) ...
Server is up!
Loading lmms-lab/ChartQA [test]: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:11<00:00, 210.68it/s]
Querying model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [06:53<00:00,  6.05it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:00<00:00, 22477.08it/s]
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.8136,
    "anywhere_in_answer_relaxed_correctness": 0.8144
}
================================================================================

Original checkpoint:

python -m eval.run eval_vllm --model_name mistralai/Mistral-Small-3.1-24B-Instruct-2503 --url http://0.0.0.0:9000 --output_dir output/ --eval_name "chartqa"
Waiting for VLLM server to come online at http://0.0.0.0:9000/health ...
Timeout is 120s
Waiting for server (0s) ...
Server is up!
Loading lmms-lab/ChartQA [test]: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:11<00:00, 220.94it/s]
Querying model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [09:40<00:00,  4.31it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:00<00:00, 27337.92it/s]
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.818,
    "anywhere_in_answer_relaxed_correctness": 0.8192
}
================================================================================

Single-image example script

Client script:

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
model_id = client.models.list().data[0].id

# Text inference
chat_response = client.chat.completions.create(
    model=model_id,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Who are you?"},
        ],
    }],
)
print("Text Chat completion output:", chat_response.choices[0].message.content)

# Single-image input inference
image_url = [
    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/d/da/2015_Kaczka_krzy%C5%BCowka_w_wodzie_%28samiec%29.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/7/77/002_The_lion_king_Snyggve_in_the_Serengeti_National_Park_Photo_by_Giles_Laurent.jpg",
]
for img in image_url:
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": img}},
        ],
    }]
    chat_response = client.chat.completions.create(model=model_id, messages=messages)
    print("Single image Chat completion output:", chat_response.choices[0].message.content)

Output:

Text Chat completion output: 
I am Mistral Small 3, a Large Language Model created by Mistral AI, a French startup located in Paris. I help answer questions, provide explanations, and assist with a variety of tasks to the best of my ability. I don't have personal experiences or feelings, but I'm here to provide helpful and respectful assistance.


Single image Chat completion output: 
The image depicts a serene and picturesque natural landscape. In the foreground, there is a wooden walkway that leads the viewer's eye through a lush, green field of tall grass. The grass appears vibrant and well-maintained, creating a sense of depth and leading towards the horizon.

In the background, there are various types of vegetation, including bushes and trees, which add layers of color and texture to the scene. The sky above is vast and filled with soft, wispy clouds, contributing to the overall tranquil atmosphere of the image.

The image captures the beauty of nature, inviting the viewer to imagine walking along the path and exploring the peaceful surroundings. The vibrant green of the grass and the calming presence of the sky create a sense of relaxation and connection with the natural world. There are no human-made structures or people visible in the image, emphasizing the untouched beauty of the landscape.


Single image Chat completion output: 
The image depicts a duck floating on calm, reflective water. The duck has a distinctive, colorful appearance:

- Its head is a bright green color.
- The body has a combination of brown and cream-colored patches.
- The duck's bill is yellow with a black tip.
- The surrounding water is clear and slightly rippling, which creates a serene atmosphere.

This particular duck is likely a Mallard, a common type of duck found in many regions around the world.


Single image Chat completion output: 
This image features a **lion** standing in a field of tall, dry grass. The lion has a distinctive **brownish-gold mane** and appears to be looking directly at the camera. The setting appears to be a natural habitat, likely a savanna or grassland, characterized by the tall, golden grass that surrounds the lion. The lighting suggests that the photo might have been taken during the early morning or late afternoon, casting a warm glow over the scene. 

The lion's stance and gaze give it a commanding and majestic presence. There are no other animals or distinctive landmarks visible in the image. The focal point is entirely on the lion.

Signed-off-by: mgoin <[email protected]>

github-actions · 2025-03-25T23:42:45Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

requirements/common.txt

Signed-off-by: mgoin <[email protected]>

vllm/model_executor/models/pixtral.py

vllm/model_executor/models/utils.py

vllm/model_executor/models/registry.py

vllm/model_executor/models/mistral3.py

…/vllm into mistral-3-hf-support

Signed-off-by: mgoin <[email protected]>

DarkLight1337 · 2025-03-31T16:21:22Z

The transformers PR has been merge, can you update this one?

Signed-off-by: mgoin <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-04-01T06:34:16Z

Can you merge neuralmagic#56 into this PR? Otherwise the PR LGTM

docs/source/models/supported_models.md

thies1006 · 2025-04-01T08:16:44Z

Hello, is there also an example which makes use of the chat_template?
There is a template provided but it seems not working once I put a system prompt.

Co-authored-by: Cyrus Leung <[email protected]>

DarkLight1337 · 2025-04-01T08:57:13Z

Hello, is there also an example which makes use of the chat_template?
There is a template provided but it seems not working once I put a system prompt.

What do you mean by not working?

thies1006 · 2025-04-01T09:10:56Z

I start the model:

VLLM_USE_V1=0 vllm serve /secondary/thies/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic/ --tensor-parallel-size 8 --max-model-len 8000 --gpu-memory-utilization 0.9

Now when I send a query, e.g. curl IP:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"/secondary/thies/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic/", "messages": [{"role":"system","content":"you are a helpful assistant"},{"role": "user", "content": "whats your name"}], "max_tokens": 64, "temperature": 0}'

I get this error:

ERROR 04-01 10:14:02 [serving_chat.py:201] Error in preprocessing prompt inputs
ERROR 04-01 10:14:02 [serving_chat.py:201] Traceback (most recent call last):
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/vllm/vllm/entrypoints/openai/serving_chat.py", line 184, in create_chat_completion
ERROR 04-01 10:14:02 [serving_chat.py:201]     ) = await self._preprocess_chat(
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/vllm/vllm/entrypoints/openai/serving_engine.py", line 417, in _preprocess_chat
ERROR 04-01 10:14:02 [serving_chat.py:201]     request_prompt = apply_hf_chat_template(
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/vllm/vllm/entrypoints/chat_utils.py", line 1174, in apply_hf_chat_template
ERROR 04-01 10:14:02 [serving_chat.py:201]     return tokenizer.apply_chat_template(
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/.virtualenvs/vllm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1702, in apply_chat_template
ERROR 04-01 10:14:02 [serving_chat.py:201]     rendered_chat = compiled_template.render(
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/.virtualenvs/vllm/lib/python3.10/site-packages/jinja2/environment.py", line 1295, in render
ERROR 04-01 10:14:02 [serving_chat.py:201]     self.environment.handle_exception()
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/.virtualenvs/vllm/lib/python3.10/site-packages/jinja2/environment.py", line 942, in handle_exception
ERROR 04-01 10:14:02 [serving_chat.py:201]     raise rewrite_traceback_stack(source=source)
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "<template>", line 13, in top-level template code
ERROR 04-01 10:14:02 [serving_chat.py:201] TypeError: can only concatenate str (not "list") to str

Without the system prompt it works.
I have not modified any files from the repository so there is no chat template in tokenizer_config.json. But there is a file chat_template.json which I guess is used (?).

mgoin · 2025-04-01T09:33:29Z

Yes the chat_template.json in the model card is used by default now according to the behavior of Transformers. I would expect this to be a behavior issue with the chat template itself and not vLLM, so I would recommend opening an issue on the upstream model card

…/vllm into mistral-3-hf-support

karuko24 · 2025-04-01T10:14:07Z

@thies1006 I had the same issue and am now using a modified version of the template (set via --chat-template):

EDIT: fixed system prompt

{%- set today = strftime_now(\"%Y-%m-%d\") %}\n{%- set default_system_message = \"You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.\\nYour knowledge base was last updated on 2023-10-01. The current date is \" + today + \".\\n\\nWhen you're not sure about some information, you say that you don't have the information and don't make up anything.\\nIf the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \\\"What are some good restaurants around me?\\\" => \\\"Where are you?\\\" or \\\"When is the next flight to Tokyo\\\" => \\\"Where do you travel from?\\\")\" %}\n\n{{- bos_token }}\n\n{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'][0]['text'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = default_system_message %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{{- '[SYSTEM_PROMPT]' + system_message + '[/SYSTEM_PROMPT]' }}\n\n{%- for message in loop_messages %}\n    {%- if message['role'] == 'user' %}\n\t    {%- if message['content'] is string %}\n            {{- '[INST]' + message['content']|join + '[/INST]' }}\n\t    {%- else %}\n\t\t    {{- '[INST]' }}\n\t\t    {%- for block in message['content'] %}\n\t\t\t    {%- if block['type'] == 'text' %}\n\t\t\t\t    {{- block['text'] }}\n\t\t\t    {%- elif block['type'] == 'image' or block['type'] == 'image_url' %}\n\t\t\t\t    {{- '[IMG]' }}\n\t\t\t\t{%- else %}\n\t\t\t\t    {{- raise_exception('Only text and image blocks are supported in message content!') }}\n\t\t\t\t{%- endif %}\n\t\t\t{%- endfor %}\n\t\t    {{- '[/INST]' }}\n\t\t{%- endif %}\n    {%- elif message['role'] == 'system' %}\n        {{- '[SYSTEM_PROMPT]' + message['content'] + '[/SYSTEM_PROMPT]' }}\n    {%- elif message['role'] == 'assistant' %}\n        {{- message['content'][0]['text'] + eos_token }}\n    {%- else %}\n        {{- raise_exception('Only user, system and assistant roles are supported!') }}\n    {%- endif %}\n{%- endfor %}

I've opened a PR on HF

…15505) Signed-off-by: mgoin <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Wang Ran (汪然) <[email protected]>

…15505) Signed-off-by: mgoin <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

…15505) Signed-off-by: mgoin <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…15505) Signed-off-by: mgoin <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

…15505) Signed-off-by: mgoin <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Support HF format Mistral3

258b112

Signed-off-by: mgoin <[email protected]>

mergify bot added ci/build frontend labels Mar 25, 2025

jeejeelee reviewed Mar 26, 2025

View reviewed changes

requirements/common.txt Outdated Show resolved Hide resolved

This was referenced Mar 26, 2025

[Feature]: Mistral Small 3.1 HF support #15212

Closed

Fix PixtralProcessor patch_size when spatial_merge_size is used huggingface/transformers#37019

Merged

Updates to simplify

c5b12ee

Signed-off-by: mgoin <[email protected]>

mgoin changed the title ~~Support HF format Mistral3~~ [Model] Support Mistral3 in the HF Transformers format Mar 27, 2025

Update common.txt

39e43cc

mgoin added the new-model Requests to new models label Mar 27, 2025

mgoin marked this pull request as ready for review March 27, 2025 04:04

mgoin requested review from DarkLight1337 and jeejeelee March 27, 2025 04:04

Works well with V0!

5bc512b

Signed-off-by: mgoin <[email protected]>

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 27, 2025

DarkLight1337 reviewed Mar 27, 2025

View reviewed changes

vllm/model_executor/models/pixtral.py Outdated Show resolved Hide resolved