-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Misc. bug: server is always sending usage statistic #16048
Copy link
Copy link
Closed
Labels
Description
Name and Version
version: 6497 (cd08fc3)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -m models/gemma-3-1b-it-q4_0.ggufProblem description & steps to reproduce
The server is always sending a special chunk with empty list of chat completion choices that contains usage statistics, e.g.:
...
data: {"choices":[],"created":1758108431,"id":"chatcmpl-Ca3q6j8NlTmm9h34TBbwr3wjgO47kWpz","model":"google/gemma-3-1b-it-qat-q4_0","system_fingerprint":"b6497-cd08fc3e","object":"chat.completion.chunk","usage":{"completion_tokens":31,"prompt_tokens":75,"total_tokens":106},"timings":{"cache_n":0,"prompt_n":75,"prompt_ms":180.025,"prompt_per_token_ms":2.400333333333333,"prompt_per_second":416.60880433273155,"predicted_n":31,"predicted_ms":875.429,"predicted_per_token_ms":28.239645161290323,"predicted_per_second":35.41120981827196}}
data: [DONE]
This was introduced with PR #15444 which makes the server to always send them at the end. While the spec says they should be sent if "stream_options": {"include_usage": true} is set in the request.
We should change the server to send stats only when user request them, not always.
First Bad Commit
No response
Relevant log output
Reactions are currently unavailable