Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Mar 25, 2025

Works for text input and single image batches. Requires a fix to the pixtral processing in Transformers (huggingface/transformers#37019). It still fails to succeed on a full eval of chartqa in vLLM V1, seemingly due to batching encoding issues, so I forced this model to only run with V0 for now.

FIX #15212

Testing

Server started with

vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic

ChartQA Eval

FP8 checkpoint:

python -m eval.run eval_vllm --model_name nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "chartqa"
Waiting for VLLM server to come online at http://0.0.0.0:9000/health ...
Timeout is 120s
Waiting for server (0s) ...
Server is up!
Loading lmms-lab/ChartQA [test]: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:11<00:00, 210.68it/s]
Querying model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [06:53<00:00,  6.05it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:00<00:00, 22477.08it/s]
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.8136,
    "anywhere_in_answer_relaxed_correctness": 0.8144
}
================================================================================

Original checkpoint:

python -m eval.run eval_vllm --model_name mistralai/Mistral-Small-3.1-24B-Instruct-2503 --url http://0.0.0.0:9000 --output_dir output/ --eval_name "chartqa"
Waiting for VLLM server to come online at http://0.0.0.0:9000/health ...
Timeout is 120s
Waiting for server (0s) ...
Server is up!
Loading lmms-lab/ChartQA [test]: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:11<00:00, 220.94it/s]
Querying model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [09:40<00:00,  4.31it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [00:00<00:00, 27337.92it/s]
================================================================================
Metrics:
{
    "explicit_prompt_relaxed_correctness": 0.818,
    "anywhere_in_answer_relaxed_correctness": 0.8192
}
================================================================================

Single-image example script

Client script:

from openai import OpenAI

openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"
client = OpenAI(api_key=openai_api_key, base_url=openai_api_base)
model_id = client.models.list().data[0].id

# Text inference
chat_response = client.chat.completions.create(
    model=model_id,
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Who are you?"},
        ],
    }],
)
print("Text Chat completion output:", chat_response.choices[0].message.content)

# Single-image input inference
image_url = [
    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/d/da/2015_Kaczka_krzy%C5%BCowka_w_wodzie_%28samiec%29.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/7/77/002_The_lion_king_Snyggve_in_the_Serengeti_National_Park_Photo_by_Giles_Laurent.jpg",
]
for img in image_url:
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": img}},
        ],
    }]
    chat_response = client.chat.completions.create(model=model_id, messages=messages)
    print("Single image Chat completion output:", chat_response.choices[0].message.content)

Output:

Text Chat completion output: 
I am Mistral Small 3, a Large Language Model created by Mistral AI, a French startup located in Paris. I help answer questions, provide explanations, and assist with a variety of tasks to the best of my ability. I don't have personal experiences or feelings, but I'm here to provide helpful and respectful assistance.


Single image Chat completion output: 
The image depicts a serene and picturesque natural landscape. In the foreground, there is a wooden walkway that leads the viewer's eye through a lush, green field of tall grass. The grass appears vibrant and well-maintained, creating a sense of depth and leading towards the horizon.

In the background, there are various types of vegetation, including bushes and trees, which add layers of color and texture to the scene. The sky above is vast and filled with soft, wispy clouds, contributing to the overall tranquil atmosphere of the image.

The image captures the beauty of nature, inviting the viewer to imagine walking along the path and exploring the peaceful surroundings. The vibrant green of the grass and the calming presence of the sky create a sense of relaxation and connection with the natural world. There are no human-made structures or people visible in the image, emphasizing the untouched beauty of the landscape.


Single image Chat completion output: 
The image depicts a duck floating on calm, reflective water. The duck has a distinctive, colorful appearance:

- Its head is a bright green color.
- The body has a combination of brown and cream-colored patches.
- The duck's bill is yellow with a black tip.
- The surrounding water is clear and slightly rippling, which creates a serene atmosphere.

This particular duck is likely a Mallard, a common type of duck found in many regions around the world.


Single image Chat completion output: 
This image features a **lion** standing in a field of tall, dry grass. The lion has a distinctive **brownish-gold mane** and appears to be looking directly at the camera. The setting appears to be a natural habitat, likely a savanna or grassland, characterized by the tall, golden grass that surrounds the lion. The lighting suggests that the photo might have been taken during the early morning or late afternoon, casting a warm glow over the scene. 

The lion's stance and gaze give it a commanding and majestic presence. There are no other animals or distinctive landmarks visible in the image. The focal point is entirely on the lion.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: mgoin <[email protected]>
@mgoin mgoin changed the title Support HF format Mistral3 [Model] Support Mistral3 in the HF Transformers format Mar 27, 2025
@mgoin mgoin added the new-model Requests to new models label Mar 27, 2025
@mgoin mgoin marked this pull request as ready for review March 27, 2025 04:04
Signed-off-by: mgoin <[email protected]>
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 27, 2025
@mgoin mgoin requested a review from ywang96 as a code owner March 27, 2025 17:17
@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 27, 2025
Signed-off-by: mgoin <[email protected]>
@DarkLight1337
Copy link
Member

The transformers PR has been merge, can you update this one?

mgoin and others added 3 commits March 31, 2025 20:28
@DarkLight1337
Copy link
Member

Can you merge neuralmagic#56 into this PR? Otherwise the PR LGTM

@thies1006
Copy link

Hello, is there also an example which makes use of the chat_template?
There is a template provided but it seems not working once I put a system prompt.

@DarkLight1337
Copy link
Member

Hello, is there also an example which makes use of the chat_template?
There is a template provided but it seems not working once I put a system prompt.

What do you mean by not working?

@thies1006
Copy link

I start the model:

VLLM_USE_V1=0 vllm serve /secondary/thies/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic/ --tensor-parallel-size 8 --max-model-len 8000 --gpu-memory-utilization 0.9

Now when I send a query, e.g. curl IP:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"/secondary/thies/Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic/", "messages": [{"role":"system","content":"you are a helpful assistant"},{"role": "user", "content": "whats your name"}], "max_tokens": 64, "temperature": 0}'

I get this error:

ERROR 04-01 10:14:02 [serving_chat.py:201] Error in preprocessing prompt inputs
ERROR 04-01 10:14:02 [serving_chat.py:201] Traceback (most recent call last):
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/vllm/vllm/entrypoints/openai/serving_chat.py", line 184, in create_chat_completion
ERROR 04-01 10:14:02 [serving_chat.py:201]     ) = await self._preprocess_chat(
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/vllm/vllm/entrypoints/openai/serving_engine.py", line 417, in _preprocess_chat
ERROR 04-01 10:14:02 [serving_chat.py:201]     request_prompt = apply_hf_chat_template(
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/vllm/vllm/entrypoints/chat_utils.py", line 1174, in apply_hf_chat_template
ERROR 04-01 10:14:02 [serving_chat.py:201]     return tokenizer.apply_chat_template(
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/.virtualenvs/vllm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1702, in apply_chat_template
ERROR 04-01 10:14:02 [serving_chat.py:201]     rendered_chat = compiled_template.render(
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/.virtualenvs/vllm/lib/python3.10/site-packages/jinja2/environment.py", line 1295, in render
ERROR 04-01 10:14:02 [serving_chat.py:201]     self.environment.handle_exception()
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "/secondary/thies/.virtualenvs/vllm/lib/python3.10/site-packages/jinja2/environment.py", line 942, in handle_exception
ERROR 04-01 10:14:02 [serving_chat.py:201]     raise rewrite_traceback_stack(source=source)
ERROR 04-01 10:14:02 [serving_chat.py:201]   File "<template>", line 13, in top-level template code
ERROR 04-01 10:14:02 [serving_chat.py:201] TypeError: can only concatenate str (not "list") to str

Without the system prompt it works.
I have not modified any files from the repository so there is no chat template in tokenizer_config.json. But there is a file chat_template.json which I guess is used (?).

@mgoin
Copy link
Member Author

mgoin commented Apr 1, 2025

Yes the chat_template.json in the model card is used by default now according to the behavior of Transformers. I would expect this to be a behavior issue with the chat template itself and not vLLM, so I would recommend opening an issue on the upstream model card

@karuko24
Copy link

karuko24 commented Apr 1, 2025

@thies1006 I had the same issue and am now using a modified version of the template (set via --chat-template):

EDIT: fixed system prompt

{%- set today = strftime_now(\"%Y-%m-%d\") %}\n{%- set default_system_message = \"You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.\\nYour knowledge base was last updated on 2023-10-01. The current date is \" + today + \".\\n\\nWhen you're not sure about some information, you say that you don't have the information and don't make up anything.\\nIf the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \\\"What are some good restaurants around me?\\\" => \\\"Where are you?\\\" or \\\"When is the next flight to Tokyo\\\" => \\\"Where do you travel from?\\\")\" %}\n\n{{- bos_token }}\n\n{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'][0]['text'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = default_system_message %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n{{- '[SYSTEM_PROMPT]' + system_message + '[/SYSTEM_PROMPT]' }}\n\n{%- for message in loop_messages %}\n    {%- if message['role'] == 'user' %}\n\t    {%- if message['content'] is string %}\n            {{- '[INST]' + message['content']|join + '[/INST]' }}\n\t    {%- else %}\n\t\t    {{- '[INST]' }}\n\t\t    {%- for block in message['content'] %}\n\t\t\t    {%- if block['type'] == 'text' %}\n\t\t\t\t    {{- block['text'] }}\n\t\t\t    {%- elif block['type'] == 'image' or block['type'] == 'image_url' %}\n\t\t\t\t    {{- '[IMG]' }}\n\t\t\t\t{%- else %}\n\t\t\t\t    {{- raise_exception('Only text and image blocks are supported in message content!') }}\n\t\t\t\t{%- endif %}\n\t\t\t{%- endfor %}\n\t\t    {{- '[/INST]' }}\n\t\t{%- endif %}\n    {%- elif message['role'] == 'system' %}\n        {{- '[SYSTEM_PROMPT]' + message['content'] + '[/SYSTEM_PROMPT]' }}\n    {%- elif message['role'] == 'assistant' %}\n        {{- message['content'][0]['text'] + eos_token }}\n    {%- else %}\n        {{- raise_exception('Only user, system and assistant roles are supported!') }}\n    {%- endif %}\n{%- endfor %}

I've opened a PR on HF

@vllm-bot vllm-bot merged commit 51d7c6a into vllm-project:main Apr 1, 2025
33 of 35 checks passed
WrRan pushed a commit to WrRan/vllm that referenced this pull request Apr 1, 2025
…15505)

Signed-off-by: mgoin <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Wang Ran (汪然) <[email protected]>
Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025
…15505)

Signed-off-by: mgoin <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: xinyuxiao <[email protected]>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…15505)

Signed-off-by: mgoin <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Louis Ulmer <[email protected]>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
…15505)

Signed-off-by: mgoin <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
…15505)

Signed-off-by: mgoin <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
…15505)

Signed-off-by: mgoin <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation frontend new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Mistral Small 3.1 HF support

9 participants