Skip to content

[Frontend] Improve error message when tool_choice=required hits max_tokens#36883

Closed
chenwenxiaolive wants to merge 1 commit intovllm-project:mainfrom
chenwenxiaolive:fix/improve-tool-choice-required-error-message
Closed

[Frontend] Improve error message when tool_choice=required hits max_tokens#36883
chenwenxiaolive wants to merge 1 commit intovllm-project:mainfrom
chenwenxiaolive:fix/improve-tool-choice-required-error-message

Conversation

@chenwenxiaolive
Copy link
Copy Markdown

@chenwenxiaolive chenwenxiaolive commented Mar 12, 2026

Summary

  • When tool_choice='required' is set and the model exhausts max_tokens before producing a complete tool call, users previously received a cryptic JSON parse error (Invalid JSON: EOF while parsing a string). This PR adds clear, actionable error messages that guide users to increase max_tokens.
  • Added an early check in chat_completion_full_generator() that detects finish_reason='length' with tool_choice='required' and raises a descriptive error before attempting JSON parsing.
  • Wrapped validate_json() in _parse_tool_calls_from_content() with a try-except to provide a helpful error message as a fallback.

Fixes #36794

Test plan

  • Added unit test test_parse_tool_calls_incomplete_json_gives_clear_error verifying the improved error message
  • Existing tool_choice=required tests continue to pass
  • Manual verification: set tool_choice="required" with a low max_tokens and confirm the new error message appears

🤖 Generated with Claude Code

…okens

When tool_choice='required' is set and the model exhausts max_tokens
before producing a complete tool call, users previously got a cryptic
JSON parse error. This adds clear, actionable error messages that
guide users to increase max_tokens.

Fixes vllm-project#36794

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 12, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chenwenxiaolive.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

see #36841

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves error messaging for truncated tool calls when tool_choice='required'. The changes add an early check for finish_reason='length' and a fallback for JSON parsing errors, which is a good approach. I have one high-severity suggestion regarding test coverage for the new logic to ensure its robustness.

Comment on lines +1458 to +1467
if (request.tool_choice == "required"
and output.finish_reason == "length"):
raise ValueError(
"Model reached the max token limit "
"(finish_reason='length') before generating "
"complete tool calls, but tool_choice='required' "
"demands valid tool calls. Please increase "
"`max_tokens` to allow the model to finish "
"generating the tool call."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This is a good addition for providing a clear error message. However, this new logic path is not covered by automated tests. The new test in test_tool_choice_required.py only covers the fallback mechanism in _parse_tool_calls_from_content. To prevent future regressions, it's important to add a unit test that specifically triggers this early check. This would likely involve mocking self.engine.generate to return a result with output.finish_reason='length' and asserting that the correct ValueError is raised.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves error handling for cases where tool_choice='required' is used and the model fails to produce valid tool call JSON, particularly when hitting a token limit. It introduces an early check for finish_reason='length' in chat_completion_full_generator and adds a more descriptive error in _parse_tool_calls_from_content when JSON parsing fails. While the changes are a good improvement, there is a logical inconsistency between the two new error handling paths that results in a potentially misleading error message. I've suggested a change to address this.

Comment on lines +1133 to +1141
raise ValueError(
"When tool_choice='required', the model must produce "
"valid JSON tool calls, but the generated content "
"could not be parsed. This can happen when the model "
"hits the max token limit (finish_reason='length') "
"before completing the tool call. Consider increasing "
"`max_tokens` to allow the model to finish generating "
"the tool call."
) from e
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new pre-check in chat_completion/serving.py specifically handles the finish_reason='length' case. This means when this except block is reached from the chat_completion_full_generator call path, the finish reason is known not to be length. Therefore, the part of this error message suggesting the cause could be max token limit (finish_reason='length') is misleading in this primary call path. The error message should be made more general to reflect that the cause is a JSON parsing failure for other reasons. Note that this change will require updating the assertion in test_parse_tool_calls_incomplete_json_gives_clear_error.

                raise ValueError(
                    "When tool_choice='required', the model must produce "
                    "valid JSON tool calls, but the generated content "
                    "could not be parsed. This may be due to the model "
                    "failing to follow tool use instructions or generating "
                    "malformed JSON."
                ) from e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feature]: Improve Error Message When Max Length Reached With Tool Call=Required

2 participants