[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens by chaunceyjiang · Pull Request #37258 · vllm-project/vllm

chaunceyjiang · 2026-03-17T06:28:05Z

Purpose

follow up #36841

FIX https://buildkite.com/vllm/ci/builds/56537?group_by=test#019cf9dc-06da-4341-aa86-6e0d6cb06ec8


[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     return await self.responses_full_generator(
--
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/responses/serving.py", line 711, in responses_full_generator
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     output = self._make_response_output_items(request, final_output, tokenizer)
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/responses/serving.py", line 904, in _make_response_output_items
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     return parser.extract_response_outputs(
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/vllm/parser/abstract_parser.py", line 325, in extract_response_outputs
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     tool_calls, content = self._parse_tool_calls(
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]                           ^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/vllm/parser/abstract_parser.py", line 426, in _parse_tool_calls
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     tool_calls = TypeAdapter(list[FunctionDefinition]).validate_json(content)
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/pydantic/type_adapter.py", line 492, in validate_json
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     return self.validator.validate_json(
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374] pydantic_core._pydantic_core.ValidationError: 1 validation error for list[function-wrap[__log_extra_fields__()]]

Test Plan

see e2e

Test gpt-5 with openai

response = client.responses.create(
    model="gpt-5",
    input=prompt,
    tools=tools,
    tool_choice="required",
    max_output_tokens=1,
)

Test Result

gpt-5

{
    "id": "resp_0c59c8e31591e43d0069b8f1e2a17c8190bffa061a344becb7",
    "created_at": 1773728226.0,
    "error": null,
    "incomplete_details": {
        "reason": "max_output_tokens"
    },
    "instructions": null,
    "metadata": {},
    "model": "gpt-5",
    "object": "response",
    "output": [
        {
            "id": "rs_0c59c8e31591e43d0069b8f1e3de108190b7204a20852a1dca",
            "summary": [],
            "type": "reasoning",
            "content": null,
            "encrypted_content": null,
            "status": null
        }
    ],
    "parallel_tool_calls": true,
    "temperature": 1.0,
    "tool_choice": "required",
....
}

vllm

{
    "id": "resp_89d52120b02c63ff",
    "created_at": 1773728620.0,
    "error": null,
    "incomplete_details": {
        "reason": "max_output_tokens"
    },
    "instructions": null,
    "metadata": null,
    "model": "my-model",
    "object": "response",
    "output": [
        {
            "id": "rs_a1ff3b9137e892f3",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": "The",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        }
    ],
    "parallel_tool_calls": true,
    "temperature": 1.0,
    "tool_choice": "required",

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…x_output_tokens Signed-off-by: chaunceyjiang <[email protected]>

gemini-code-assist

Code Review

This pull request addresses a crash in the Responses API when tool_choice="required" and the generated output for the tool call exceeds max_output_tokens. The fix correctly handles potential ValidationError during JSON parsing of the model's output by suppressing the exception. This prevents the crash and ensures that if the tool call JSON is invalid or truncated, no tool call is returned, which is the desired behavior. A new test case is added to validate this fix, confirming that the system remains stable under these conditions.

…x_output_tokens Signed-off-by: chaunceyjiang <[email protected]>

chaunceyjiang · 2026-03-17T06:48:09Z

/cc @DarkLight1337 PTAL.

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Monishver Chandrasekaran <[email protected]>

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Vinay Damodaran <[email protected]>

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: EricccYang <[email protected]>

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

32bfefa

…x_output_tokens Signed-off-by: chaunceyjiang <[email protected]>

chaunceyjiang requested review from DarkLight1337, NickLucche, aarnphm and robertgshaw2-redhat as code owners March 17, 2026 06:28

mergify bot added the bug Something isn't working label Mar 17, 2026

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

chaunceyjiang added 2 commits March 17, 2026 14:32

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

3af2655

…x_output_tokens Signed-off-by: chaunceyjiang <[email protected]>

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

b663a2a

…x_output_tokens Signed-off-by: chaunceyjiang <[email protected]>

chaunceyjiang mentioned this pull request Mar 17, 2026

[Build] Bump python openai version #32316

Merged

5 tasks

DarkLight1337 approved these changes Mar 17, 2026

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 17, 2026 07:00

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2026

DarkLight1337 merged commit 132bfd4 into vllm-project:main Mar 17, 2026
47 checks passed

chaunceyjiang deleted the response_required_max_tokens branch March 17, 2026 09:03

zhenwei-intel pushed a commit to zhenwei-intel/vllm that referenced this pull request Mar 17, 2026

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

c1ab60f

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

will-deines mentioned this pull request Mar 17, 2026

[Responses API] Unified tool_choice + structured output via triggered tags will-deines/vllm#1

Closed

7 tasks

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

ac2afcd

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

andylolu2 pushed a commit to andylolu2/vllm that referenced this pull request Mar 18, 2026

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

c4a158f

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

5e7ade3

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

will-deines mentioned this pull request Mar 18, 2026

[Responses API] tool_choice support (auto / required / none) for GPT-OSS #37433

Open

12 tasks

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

26994a3

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

d3aa127

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

42f7743

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…

f1b950b

…x_output_tokens (vllm-project#37258) Signed-off-by: chaunceyjiang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens#37258

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens#37258
DarkLight1337 merged 3 commits intovllm-project:mainfrom
chaunceyjiang:response_required_max_tokens

chaunceyjiang commented Mar 17, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chaunceyjiang commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chaunceyjiang commented Mar 17, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chaunceyjiang commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaunceyjiang commented Mar 17, 2026 •

edited by github-actions bot

Loading