-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Description
According to the OpenAI's documentation, the logprobs are returned in a different format when using the Chat Completion API, which is different from the format used in the old Completions one:
https://platform.openai.com/docs/api-reference/chat/create

But vLLM uses the Completion API's way to return the logprobs for the ChatCompletion's responses too.
This is the example chat completion output from the OpenAI API's documentation:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1702685778,
"model": "gpt-3.5-turbo-0125",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello"
},
"logprobs": {
"content": [
{
"token": "Hello",
"logprob": -0.31725305,
"bytes": [72, 101, 108, 108, 111],
"top_logprobs": [
{
"token": "Hello",
"logprob": -0.31725305,
"bytes": [72, 101, 108, 108, 111]
},
{
"token": "Hi",
"logprob": -1.3190403,
"bytes": [72, 105]
}
]
}
]
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 1,
"total_tokens": 11
},
"system_fingerprint": null
}And this is what vLLM is returning with commit 901cf4c:
{
"id": "cmpl-feb5333e02ef436e95c09b8f2255e4c0",
"object": "chat.completion",
"created": 997761,
"model": "Qwen/Qwen1.5-72B-Chat-GPTQ-Int4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello"
},
"logprobs": {
"text_offset": [
0
],
"token_logprobs": [
-0.015691734850406647
],
"tokens": [
"Hello"
],
"top_logprobs": [
{
"Hello": -0.015691734850406647,
"How": -5.140691757202148
}
]
},
"finish_reason": "length"
}
],
"usage": {
"prompt_tokens": 10,
"total_tokens": 11,
"completion_tokens": 1
}
}Here's how the Completion's LogProbs schema is reused inside the ChatCompletion response:
vllm/vllm/entrypoints/openai/protocol.py
Lines 245 to 249 in 901cf4c
| class LogProbs(BaseModel): | |
| text_offset: List[int] = Field(default_factory=list) | |
| token_logprobs: List[Optional[float]] = Field(default_factory=list) | |
| tokens: List[str] = Field(default_factory=list) | |
| top_logprobs: Optional[List[Optional[Dict[int, float]]]] = None |
vllm/vllm/entrypoints/openai/protocol.py
Lines 289 to 293 in 901cf4c
| class ChatCompletionResponseChoice(BaseModel): | |
| index: int | |
| message: ChatMessage | |
| logprobs: Optional[LogProbs] = None | |
| finish_reason: Optional[Literal["stop", "length"]] = None |
And then the _create_logprobs function that was made for the Completion API is reused in serving_chat.py:
vllm/vllm/entrypoints/openai/serving_chat.py
Line 241 in 901cf4c
| logprobs = self._create_logprobs( |