server : avoid antiprompt in probabilities of final response#2849
Merged
jhen0409 merged 1 commit intoggml-org:masterfrom Sep 2, 2023
Merged
server : avoid antiprompt in probabilities of final response#2849jhen0409 merged 1 commit intoggml-org:masterfrom
jhen0409 merged 1 commit intoggml-org:masterfrom
Conversation
SlyEcho
reviewed
Sep 1, 2023
Contributor
SlyEcho
left a comment
There was a problem hiding this comment.
Yep, that does the trick.
SlyEcho
approved these changes
Sep 1, 2023
sayap
added a commit
to sayap/ik_llama.cpp
that referenced
this pull request
Nov 22, 2025
The logic to skip the logprobs of the stop token was originally from ggml-org/llama.cpp#2849, and was later modified as part of ggml-org/llama.cpp#10643 to be applied only to STOP_TYPE_WORD. The latter change wasn't included in ikawrakow#723. Then, after ikawrakow#958 got merged, the logic got inadvertently applied to GLM-4.5/4.6 and Kimi K2, resulting in truncated logprobs when streaming is off. This commit reverts the logic from ggml-org/llama.cpp#2849, such that the logprobs of the stop token will always be included in the response, when logprobs is enabled. From testing, this matches with the behavior of Fireworks inference server, for both chat completions and text completions endpoints. Also fix logprobs param handling for the text completion endpoint.
4 tasks
ikawrakow
pushed a commit
to ikawrakow/ik_llama.cpp
that referenced
this pull request
Nov 24, 2025
The logic to skip the logprobs of the stop token was originally from ggml-org/llama.cpp#2849, and was later modified as part of ggml-org/llama.cpp#10643 to be applied only to STOP_TYPE_WORD. The latter change wasn't included in #723. Then, after #958 got merged, the logic got inadvertently applied to GLM-4.5/4.6 and Kimi K2, resulting in truncated logprobs when streaming is off. This commit reverts the logic from ggml-org/llama.cpp#2849, such that the logprobs of the stop token will always be included in the response, when logprobs is enabled. From testing, this matches with the behavior of Fireworks inference server, for both chat completions and text completions endpoints. Also fix logprobs param handling for the text completion endpoint.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fix the probabilities of stopping_word are included in the final response of
/completion.To test response without stream mode:
With stream mode (see the last event):
Or see the console output in the web UI.