Your current environment
I confirmed that it still fails on current main at the time of writing. See below for reproduction instructions:
🐛 Describe the bug
We found an edge-case that causes requests to error out when using the MistralTokenizer with a model.
example serving command using the MistralTokenizer (with a model using the Tekken tokenizer):
vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer-mode mistral --config-format mistral --load-format mistral
simplified reproduction case:
curl -s -X POST \
-H "Content-Type: application/json" \
"http://localhost:8000/v1/chat/completions" \
--data-binary @- << _EOF
{
"model": "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
"logprobs": true,
"top_logprobs": 2,
"messages": [
{
"role": "user",
"content": " "
}
],
"guided_json": {"properties": {}}
}
_EOF
The relevant part of the stacktrace
...
File "/workspace/my-vllm/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 477, in create_chat_completion
generator = await handler.create_chat_completion(request, raw_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 267, in create_chat_completion
return await self.chat_completion_full_generator(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 925, in chat_completion_full_generator
logprobs = self._create_chat_logprobs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/my-vllm/lib64/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 1145, in _create_chat_logprobs
step_token = step_top_logprobs[token_id]
~~~~~~~~~~~~~~~~~^^^^^^^^^^
KeyError: 2
From my investigation, this occurs when all of the top logprobs are special tokens. For this case with MistralTokenizer decoded_tokens ends up being an empty list, resulting in an empty dict in logprobs that is then accessed via step_top_logprobs. When skipping special tokens (which is required/default for the MistralTokenizer), convert_ids_list_to_tokens can return a list smaller than the list of input tokens.
Before submitting a new issue...
Your current environment
I confirmed that it still fails on current main at the time of writing. See below for reproduction instructions:
🐛 Describe the bug
We found an edge-case that causes requests to error out when using the MistralTokenizer with a model.
example serving command using the MistralTokenizer (with a model using the Tekken tokenizer):
simplified reproduction case:
The relevant part of the stacktrace
From my investigation, this occurs when all of the top logprobs are special tokens. For this case with MistralTokenizer
decoded_tokensends up being an empty list, resulting in an empty dict inlogprobsthat is then accessed viastep_top_logprobs. When skipping special tokens (which is required/default for the MistralTokenizer),convert_ids_list_to_tokenscan return a list smaller than the list of input tokens.Before submitting a new issue...