Skip to content

fix: detect and protect against truncated tool call output#2021

Merged
tanzhenxin merged 2 commits intoQwenLM:mainfrom
sundapeng:fix/truncated-tool-call-protection
Mar 2, 2026
Merged

fix: detect and protect against truncated tool call output#2021
tanzhenxin merged 2 commits intoQwenLM:mainfrom
sundapeng:fix/truncated-tool-call-protection

Conversation

@sundapeng
Copy link
Copy Markdown
Contributor

fix: detect and protect against truncated tool call output

TLDR

When LLM output hits max_tokens, tool call JSON gets truncated mid-stream. Some providers (DashScope/Qwen) misreport finish_reason: "stop" instead of "length", making truncation invisible. This can silently write incomplete files to disk via write_file/edit.

This PR detects incomplete JSON at the streaming parser level, overrides misleading finish_reason, and rejects Kind.Edit tool calls with recovery guidance.

Dive Deeper

Problem

Two failure modes when tool call JSON is truncated:

  1. Retry loop: Truncated JSON fails validation, but the error gives no truncation hint — the LLM retries with the same oversized content indefinitely.
  2. Silent corruption: jsonrepair "fixes" truncated JSON into valid-but-incomplete params — write_file writes a half-finished file with no warning.

Compounding this, some OpenAI-compatible providers report finish_reason: "stop" even when output was cut off, so the existing MAX_TOKENS check never fires.

Solution

Four-layer defense chain:

StreamingToolCallParser → OpenAI Converter → Turn → CoreToolScheduler
  (detect incomplete JSON)  (fix finish_reason)  (set flag)  (reject + guide)
  1. streamingToolCallParser.tshasIncompleteToolCalls() checks JSON parsing state (depth > 0 or inString) for structural completeness, independent of provider-reported finish_reason.
  2. converter.ts — Overrides finish_reason to "length" when the parser detects incomplete JSON but the provider reported "stop".
  3. turn.ts — Stamps wasOutputTruncated = true on all pending ToolCallRequestInfo when finishReason === MAX_TOKENS.
  4. coreToolScheduler.ts — Rejects Kind.Edit tools outright with TRUNCATION_EDIT_REJECTION; appends TRUNCATION_PARAM_GUIDANCE to any truncated param validation errors.

Scope

  • Only Kind.Edit (write_file, edit) is rejected — non-edit tools fail safely at worst.
  • Only the OpenAI path needs this fix. Anthropic uses structured content_block_stop events (truncated calls are never emitted). Gemini SDK returns complete Part objects with reliable finishReason.

Files Changed

File Change
streamingToolCallParser.ts Add hasIncompleteToolCalls()
converter.ts Override finish_reason when incomplete JSON detected
turn.ts Add wasOutputTruncated flag to ToolCallRequestInfo
coreToolScheduler.ts Reject truncated edit calls; append guidance to param errors
tool-error.ts Add ToolErrorType.OUTPUT_TRUNCATED
*.test.ts (3 files) 607 lines of new test coverage

Reviewer Test Plan

Automated

npx vitest run packages/core/src/core/coreToolScheduler.test.ts \
  packages/core/src/core/turn.test.ts \
  packages/core/src/core/openaiContentGenerator/converter.test.ts \
  packages/core/src/core/openaiContentGenerator/streamingToolCallParser.test.ts

Manual

  1. Build: npm run build
  2. Use a model with low max_tokens or request a very large file write (e.g., "Write a 500-line React component to /tmp/test.tsx").
  3. Before fix: File written with truncated content, no error.
  4. After fix: Tool call rejected with clear message + LLM retries with split content.

Key Scenarios

Scenario Expected
write_file truncated, provider reports stop Rejected, OUTPUT_TRUNCATED
write_file truncated, provider reports length Rejected, OUTPUT_TRUNCATED
write_file complete, normal stop Executes normally
edit truncated Rejected, OUTPUT_TRUNCATED
read_file truncated Executes normally
Multi-chunk streaming truncation finishReason overridden to MAX_TOKENS

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

When LLM streaming output exceeds token limits, JSON arguments for tool calls
can be truncated mid-stream. This causes validation errors or silent data
corruption when the truncated JSON passes validation but writes incomplete files.

The fix adds truncation detection at the streaming parser level and overrides
misleading finish_reason values from providers (e.g., DashScope/Qwen reporting
'stop' instead of 'length'). This ensures downstream code correctly identifies
truncated responses and provides clear guidance to the LLM for retrying with
split content.

Changes:
- turn.ts: Add wasOutputTruncated flag to ToolCallRequestInfo
- coreToolScheduler.ts: Reject truncated edit tool calls, append guidance for write_file
- converter.ts: Override finish_reason when streaming parser detects incomplete JSON
- streamingToolCallParser.ts: Add hasIncompleteToolCalls() method
- Tests: Add comprehensive test coverage for truncation detection scenarios

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Copy link
Copy Markdown
Collaborator

@tanzhenxin tanzhenxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tanzhenxin tanzhenxin merged commit f770be4 into QwenLM:main Mar 2, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants