The logprobs in the ChatCompletion's responses is incorrectly reusing the Completions's schema; not following the OpenAI API's spec

According to the OpenAI's documentation, the logprobs are returned in a different format when using the Chat Completion API, which is different from the format used in the old Completions one:
https://platform.openai.com/docs/api-reference/chat/create
![image1](https://github.com/vllm-project/vllm/assets/131767832/1a2e48f1-4031-4dcf-8217-885d45c62958)

But vLLM uses the Completion API's way to return the logprobs for the ChatCompletion's responses too.
This is the example chat completion output from the OpenAI API's documentation:
``` json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1702685778,
  "model": "gpt-3.5-turbo-0125",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello"
      },
      "logprobs": {
        "content": [
          {
            "token": "Hello",
            "logprob": -0.31725305,
            "bytes": [72, 101, 108, 108, 111],
            "top_logprobs": [
              {
                "token": "Hello",
                "logprob": -0.31725305,
                "bytes": [72, 101, 108, 108, 111]
              },
              {
                "token": "Hi",
                "logprob": -1.3190403,
                "bytes": [72, 105]
              }
            ]
          }
        ]
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 1,
    "total_tokens": 11
  },
  "system_fingerprint": null
}
```

And this is what vLLM is returning with commit 901cf4c5:
``` json
{
  "id": "cmpl-feb5333e02ef436e95c09b8f2255e4c0",
  "object": "chat.completion",
  "created": 997761,
  "model": "Qwen/Qwen1.5-72B-Chat-GPTQ-Int4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello"
      },
      "logprobs": {
        "text_offset": [
          0
        ],
        "token_logprobs": [
          -0.015691734850406647
        ],
        "tokens": [
          "Hello"
        ],
        "top_logprobs": [
          {
            "Hello": -0.015691734850406647,
            "How": -5.140691757202148
          }
        ]
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 11,
    "completion_tokens": 1
  }
}
```

Here's how the Completion's LogProbs schema is reused inside the ChatCompletion response:
https://github.com/vllm-project/vllm/blob/901cf4c52bf65472ca13aa4f996d631d00c2228d/vllm/entrypoints/openai/protocol.py#L245-L249

https://github.com/vllm-project/vllm/blob/901cf4c52bf65472ca13aa4f996d631d00c2228d/vllm/entrypoints/openai/protocol.py#L289-L293

And then the `_create_logprobs` function that was made for the Completion API is reused in `serving_chat.py`:
https://github.com/vllm-project/vllm/blob/901cf4c52bf65472ca13aa4f996d631d00c2228d/vllm/entrypoints/openai/serving_chat.py#L241

	class LogProbs(BaseModel):
	text_offset: List[int] = Field(default_factory=list)
	token_logprobs: List[Optional[float]] = Field(default_factory=list)
	tokens: List[str] = Field(default_factory=list)
	top_logprobs: Optional[List[Optional[Dict[int, float]]]] = None

	class ChatCompletionResponseChoice(BaseModel):
	index: int
	message: ChatMessage
	logprobs: Optional[LogProbs] = None
	finish_reason: Optional[Literal["stop", "length"]] = None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The logprobs in the ChatCompletion's responses is incorrectly reusing the Completions's schema; not following the OpenAI API's spec #3179

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

The logprobs in the ChatCompletion's responses is incorrectly reusing the Completions's schema; not following the OpenAI API's spec #3179

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions