fix(core): hard-stop repeated identical tool calls#5036
Conversation
| userMessageTimestamp, | ||
| ); | ||
| resetIdenticalToolCallLoop(); | ||
| return StreamProcessingStatus.Completed; |
There was a problem hiding this comment.
[Critical] The early return StreamProcessingStatus.Completed here is inside the try block. The finally block runs (buffer cleanup), but dualOutput?.finalizeAssistantMessage() at line 1725 is skipped entirely. For --output-format json and other dual-output consumers, message_start has no matching message_stop, producing malformed stream output.
| return StreamProcessingStatus.Completed; | |
| if (recordToolCallForHardLoopGuard(event.value)) { | |
| toolCallRequests.length = 0; | |
| dualOutput?.finalizeAssistantMessage(); | |
| addItem( | |
| { | |
| type: 'info', | |
| text: | |
| `Stopped repeated identical tool calls after ` + | |
| `${HARD_IDENTICAL_TOOL_CALL_LIMIT} consecutive attempts.`, | |
| }, | |
| userMessageTimestamp, | |
| ); | |
| resetIdenticalToolCallLoop(); | |
| return StreamProcessingStatus.Completed; | |
| } |
— qwen3.7-max via Qwen Code /review
| case ServerGeminiEventType.ToolCallRequest: | ||
| flushBufferedStreamEvents(); | ||
| if (recordToolCallForHardLoopGuard(event.value)) { | ||
| toolCallRequests.length = 0; |
There was a problem hiding this comment.
[Suggestion] toolCallRequests.length = 0 discards all accumulated tool call requests — including legitimate, distinct calls that preceded the identical loop. For example, [read_file("a.ts"), run_shell_command("echo x") × 10] would lose the valid read_file call.
Consider clearing only from the first identical call onward (e.g., track the index where the identical run started and splice from there), or add a comment explaining this is intentional given the pathological nature of 10 identical calls.
Also note: the Retry event handler (line ~1674) clears toolCallRequests.length = 0 but does not call resetIdenticalToolCallLoop(). This means identical calls before and after a retry accumulate — the counter could reach 10 across two retry attempts without either attempt individually looking like a loop.
— qwen3.7-max via Qwen Code /review
| expect(result.current.streamingState).toBe(StreamingState.Idle); | ||
| }); | ||
|
|
||
| it('should hard-stop repeated identical tool calls before scheduling them', async () => { |
There was a problem hiding this comment.
[Suggestion] Only the exact-trigger case (10 identical calls → guard fires) is tested. Consider adding boundary tests to catch regressions:
- 9 identical calls: should proceed normally (
mockScheduleToolCallsIS called, no info message) - Mixed calls: e.g., 5×A, 1×B, 5×A — counter should reset on B, guard should NOT fire
- Cross-turn reset: same tool in two sequential user queries — counter should reset between turns
— qwen3.7-max via Qwen Code /review
a0c4266 to
853ce2a
Compare
|
Addressed the review on the hard-stop guard and force-pushed the updated head Changes:
Validation:
I also tried |
wenshao
left a comment
There was a problem hiding this comment.
All R1 issues (early return skipping finalize, discarding distinct calls, missing boundary tests) are correctly addressed. The break streamLoop → finally → finalizeAssistantMessage flow is sound, splice math correctly preserves distinct earlier calls, and counter reset paths (retry, new query, hard-stop trigger) are all correct. 5 new tests cover the key scenarios well. tsc 0, eslint 0, 108/108 focused tests pass. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Local build & live-replay verification on macOS (merge reference)I built this PR locally and replayed the #5015 attack shape against the real CLI — interactive TUI driven via tmux ( Environment: macOS (Darwin 25.5.0), Node v22.22.2, head Checks
Live replay matrix
Headline numbers: under default settings the interactive TUI goes from 60 executions (mock-capped, unbounded) on baseline to 9 executions + clean stop with this PR. Post-stop recoveryAfter a hard stop I sent follow-up messages and inspected the subsequent request payloads. The aborted assistant turn (with its tool_calls) is never recorded into history, and the orphaned tool response of the kept distinct call is stripped before the next send — the wire payload stays valid for strict OpenAI-compatible providers (no dangling Merge-relevant caveats (recorded for follow-up, not blockers for this scoped guard)
Overall: within its scope the guard behaves exactly as designed, implementation details (stable key serialization, splice bounds, counter resets on new top-level submits and stream 中文版(Chinese version)macOS 本地构建与真实回放验证(合并参考)我在本地构建了此 PR,并用本地确定性 OpenAI 兼容 mock provider 回放 #5015 的攻击形态,对真实 CLI 做了验证——交互式 TUI 用 tmux 驱动( 环境:macOS(Darwin 25.5.0)、Node v22.22.2、head 检查项
真实回放矩阵
核心数字:默认设置下,交互式 TUI 从基线的 60 次执行(mock 截断,实际无界) 降到本 PR 的 9 次执行 + 干净停止。 停止后的恢复硬停止后我继续发送消息并检查了后续请求的 payload:被中断的助手回合(连同其 tool_calls)不会写入历史,被保留的不同调用的孤儿工具响应也会在下次发送前被剥离——发往严格 OpenAI 兼容 provider 的 payload 保持合法(没有悬空 与合并相关的注意事项(供后续跟进记录,不阻塞这个限定范围的保护)
总体:在其限定范围内,该保护的行为与设计完全一致;实现细节(稳定序列化 key、splice 边界、新顶层提交与流 |
|
After re-examining the layering, I'm walking back my approval and requesting a placement change: this guard should live in core, not in the TUI hook. What I verified in the codebase:
Suggested change: move the deterministic identical-call hard stop into core — e.g. in the 中文说明重新审视分层后,我收回此前的 approve,建议把这个守卫挪到 core:
建议:把确定性的 identical-call 硬熔断下沉到 core——在 |
Dismissing my approval — after re-checking the layering I think the hard stop belongs in core (it currently only protects the interactive TUI, and only when skipLoopDetection is on). Details: #5036 (comment)
853ce2a to
087d858
Compare
|
Thanks for the correction. I reworked this so the guard now lives in core instead of the TUI hook. What changed in this revision:
Validation run on Windows: npm run test --workspace=packages/core -- src/services/loopDetectionService.test.ts
npm run test --workspace=packages/core -- src/core/client.test.ts
npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
npm run typecheck --workspace=packages/core
npm run typecheck --workspace=packages/cli
npx prettier --check packages/core/src/core/client.ts packages/core/src/core/client.test.ts packages/core/src/services/loopDetectionService.ts packages/core/src/services/loopDetectionService.test.ts packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx
npx eslint packages/core/src/core/client.ts packages/core/src/core/client.test.ts packages/core/src/services/loopDetectionService.ts packages/core/src/services/loopDetectionService.test.ts packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx --max-warnings 0
git diff --check upstream/main --I also rebuilt |
qqqys
left a comment
There was a problem hiding this comment.
Critical-only re-review: the prior core-placement blocker is addressed on head 087d858. The deterministic identical-tool-call backstop now runs in core, still runs when skipLoopDetection is true, trims the repeated pending tail, and CI is green. I did not find a new critical blocker in this pass.
Re-verification of the core-placement rework (merge reference)This supersedes my two earlier comments: the live-replay report on the old TUI-hook implementation (head Headline: the rework does what my placement comment asked — the deterministic identical-tool-call backstop now runs in core for every client, ungated by What the rework does (confirmed in the diff at
|
| Command | Result |
|---|---|
vitest run loopDetectionService.test.ts client.test.ts (core) |
237 passed (45 + 192) |
vitest run useGeminiStream.test.tsx (cli) |
105 passed (the removed CLI-guard tests are gone, as intended) |
typecheck core + cli |
exit 0 / exit 0 |
prettier --check + eslint --max-warnings 0 (4 files) + git diff --check |
clean |
2. Mutation test — the new core tests are non-vacuous
Reverted client.ts to the merge-base 0db3273174 (restoring the old if (!getSkipLoopDetection()) gate), kept the PR's tests:
× client.test.ts › keeps deterministic tool-call checks when skipLoopDetection is true
× client.test.ts › hard-stops identical tool calls even when skipLoopDetection is true
Tests 2 failed | 190 passed
Exactly the two new integration tests fail (the others stay green → backward-compatible). The headline test — hard-stops identical tool calls even when skipLoopDetection is true — is precisely the behavior the whole PR is about, and it flips red on pre-fix client.ts.
3. Live A/B in the real binary — the #5015 headless path is now bounded
Local deterministic OpenAI-compatible mock: every response requests the same run_shell_command with identical args; each execution appends one byte to a side-effect log (execution counter); the mock caps tool-call turns at 40 so a missing backstop still terminates. Config: model.skipLoopDetection: true (the shipping default), --approval-mode yolo, isolated $HOME. Baseline = the four files reverted to merge-base and the workspace rebuilt.
| Path | Build | skipLoopDetection |
shell executions | provider requests | outcome |
|---|---|---|---|---|---|
headless -p |
baseline | true | 40 (capped → unbounded) | 42 | no stop — reproduces #5015 / my report-1 case C |
headless -p |
PR | true | 4 | 5 | ✅ Loop detection halted the run (consecutive_identical_tool_calls…) |
| interactive TUI (tmux) | PR | true | 4 | 5 | ✅ "A potential loop was detected" dialog → Esc (keep enabled) → request has been halted, returns to idle, no further executions |
So under the shipping default, the headless -p runaway my first report called out goes from 40 executions (mock-capped, i.e. unbounded) to 4 + a clean halt, and the deterministic backstop now fires through the same LoopDetected path in the TUI too. The 5th identical call trips it (threshold 5 → ~4 side effects), and core stops before scheduling the repeated tail.
Notes (non-blocking)
- Threshold is now 5, not the old CLI guard's 10 — slightly more aggressive, and consistent with core's existing identical-call threshold. Intentional per the PR.
- Splice / distinct-call preservation (one response carrying a distinct call + an identical streak) is covered by the new unit tests and was exercised live in my first report (case E); I did not re-run it here.
- Escape hatch: choosing "Disable loop detection for this session" in the TUI dialog calls
disableForSession(), which also disables the deterministic backstop for the rest of that session — a deliberate, user-initiated opt-out, distinct from the passiveskipLoopDetectiondefault. Headless-phas no such dialog, so it is always protected. - The mixed alternating-shape residual (report-1 case G) is orthogonal to this placement change and unchanged.
Verdict: the rework implements the core placement I asked for; all clients — crucially the previously-unbounded headless -p/ACP paths — now inherit the deterministic identical-call backstop, with one implementation and clean tests. Re-confirmed end-to-end at 087d858d91. LGTM to merge.
中文版(Chinese version)
core 放置重构的复验(合并参考)
本评论取代我之前的两条:针对旧的 TUI-hook 实现的真实回放报告(head 853ce2a14),以及要求把熔断下沉到 core 的评论。本修订(087d858d91)完全按该要求重构,因此我在 Linux(Node 22.22.2)上重新从零构建真实 qwen 二进制做了验证。
要点:重构正是我放置评论所要求的——确定性的"相同工具调用"熔断现在在 core 中对所有客户端生效,不再受 skipLoopDetection 门控。我第一份报告指出的仍然无界的 headless -p 路径(注意事项 #2 / 用例 C)现在被收敛了。实测:40 → 4 次执行。建议合并。
重构做了什么(已在 087d858d91 的 diff 中确认)
LoopDetectionService拆分:addAndCheckDeterministicToolCallLoop()(name+args 相同调用计数,阈值 5)与addAndCheckHeuristicLoops()(read-file/停滞/内容启发式)分离。- 在
GeminiClient.sendMessageStream()中,确定性熔断无条件运行;只有启发式检测器仍在!getSkipLoopDetection()之后。原来把所有检测都门控住的if (!getSkipLoopDetection()) { addAndCheck }包裹已移除。 - 确定性命中
ToolCallRequest时,core 仅把重复的尾部从turn.pendingToolCalls中splice掉(splice(len - repeatedCount)),保留更早的不同 pending 调用。 - 此前 CLI 本地调度器守卫(阈值 10、仅 TUI)已删除——一份实现、一个阈值(5),覆盖 TUI / 非交互
-p/ ACP / serve / SDK。 - 确定性熔断仍尊重显式的
disableForSession()逃生通道(对话框中的"本会话禁用"),因此它相对被动的skipLoopDetection设置是无条件的,但相对用户主动选择禁用则不是。
1. PR 测试计划 —— 新 head 上全绿
| 命令 | 结果 |
|---|---|
vitest run loopDetectionService.test.ts client.test.ts(core) |
237 通过(45 + 192) |
vitest run useGeminiStream.test.tsx(cli) |
105 通过(被移除的 CLI 守卫测试已随之删除,符合预期) |
typecheck core + cli |
exit 0 / exit 0 |
prettier --check + eslint --max-warnings 0(4 文件)+ git diff --check |
干净 |
2. 变异测试 —— 新增 core 测试非空壳
把 client.ts 回退到 merge-base 0db3273174(恢复旧的 if (!getSkipLoopDetection()) 门控),保留 PR 测试:
× client.test.ts › keeps deterministic tool-call checks when skipLoopDetection is true
× client.test.ts › hard-stops identical tool calls even when skipLoopDetection is true
Tests 2 failed | 190 passed
恰好两个新增集成测试失败(其余保持绿色 → 向后兼容)。其中承重的那条——skipLoopDetection 为 true 时仍硬熔断相同工具调用——正是整个 PR 的核心行为,在 pre-fix 的 client.ts 上翻红。
3. 真实二进制 A/B —— #5015 的 headless 路径现已收敛
本地确定性 OpenAI 兼容 mock:每次响应都请求同一个 run_shell_command、参数相同;每次执行向副作用日志追加一字节(执行计数);mock 把工具调用轮数截断在 40,以便熔断缺失时仍能终止。配置:model.skipLoopDetection: true(出厂默认)、--approval-mode yolo、隔离 $HOME。基线 = 四个文件回退到 merge-base 并重建工作区。
| 路径 | 构建 | skipLoopDetection |
shell 执行次数 | provider 请求数 | 结果 |
|---|---|---|---|---|---|
headless -p |
基线 | true | 40(截断 → 无界) | 42 | 不停止——复现 #5015 / 我报告 1 的用例 C |
headless -p |
PR | true | 4 | 5 | ✅ Loop detection halted the run (consecutive_identical_tool_calls…) |
| 交互 TUI(tmux) | PR | true | 4 | 5 | ✅ "A potential loop was detected" 对话框 → Esc(保持启用)→ request has been halted,回到空闲,无后续执行 |
因此在出厂默认下,我第一份报告点名的 headless -p runaway 从 40 次执行(mock 截断,即无界) 降到 4 次 + 干净停止;确定性熔断在 TUI 中也走同一个 LoopDetected 路径。第 5 次相同调用触发(阈值 5 → 约 4 次副作用),core 在调度重复尾部之前停止。
备注(不阻断)
- 阈值现在是 5,不是旧 CLI 守卫的 10——略更激进,且与 core 既有的相同调用阈值一致。按 PR 设计如此。
- splice / 保留不同调用(单条响应同时包含一个不同调用和一串相同调用)由新增单元测试覆盖,并在我第一份报告(用例 E)中实测过;此处未重跑。
- 逃生通道:在 TUI 对话框选"本会话禁用循环检测"会调用
disableForSession(),这会在本会话剩余时间内也关闭确定性熔断——这是用户主动选择的退出,区别于被动的skipLoopDetection默认值。headless-p没有该对话框,因此始终受保护。 - 混合交替形态的残余(报告 1 用例 G)与本次放置变更正交,未改变。
结论:重构实现了我要求的 core 放置;所有客户端——尤其是此前无界的 headless -p/ACP 路径——现在都继承了确定性的相同调用熔断,一份实现、测试干净。在 087d858d91 上端到端再次确认。建议合并。
|
@qwen-code /triage |
|
Thanks for the PR! Template looks good ✓ On direction: this is squarely aligned — #5015 is a real reliability bug where deterministic provider streams can execute the same side-effecting tool call indefinitely, burning API credits and causing unintended side effects. A hard backstop in core that covers all entry points (TUI, headless On approach: the scope is tight and the layering is clean. Splitting Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ 方向:完全对齐——#5015 是真实的可靠性问题,确定性 provider 流可能无限重复同一副作用工具调用,消耗 API 额度并产生意外副作用。在 core 中设置覆盖所有入口(TUI、headless 方案:范围紧凑,分层清晰。将 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewIndependent proposal before reading the diff: split The PR's approach matches this exactly. The implementation is clean and minimal:
No correctness bugs, no security issues, no AGENTS.md violations. The backward-compatible Test ResultsNew tests are non-vacuous — Smoke Test (tmux)Build succeeds with PR changes. CLI starts cleanly: Full reproduction of the #5015 scenario requires a deterministic mock provider (identical tool calls on every response) — not achievable with a real LLM prompt. Comprehensive live A/B testing was already done by reviewer @wenshao (cases A–H, head 中文说明代码审查独立方案(读 diff 之前):在 PR 方案与此完全一致。实现干净且最小化:
无正确性 bug、无安全问题、无 AGENTS.md 违规。向后兼容的 测试结果新测试非空壳—— 冒烟测试 (tmux)PR 改动构建成功。CLI 正常启动(见上方终端输出)。 完整复现 #5015 场景需要确定性 mock provider(每次响应返回相同工具调用),无法用真实 LLM prompt 实现。审查者 @wenshao 已完成全面的真实 A/B 测试(用例 A–H,head — Qwen Code · qwen3.7-max |
VerdictThis PR does one thing well: it moves the deterministic identical-tool-call backstop from a TUI-only hook into the core stream loop, so every client — interactive, headless, ACP, SDK — gets the same protection. The layering is right: All 344 unit tests pass. Typecheck and lint clean. The code is straightforward — two methods where there was one, a splice that preserves earlier distinct calls, and a counter getter. Nothing over-engineered, nothing speculative. Reviewer @wenshao's live A/B testing (head LGTM — approving. ✅ 中文说明结论这个 PR 做好了一件事:把确定性的相同工具调用熔断从仅 TUI 的 hook 移到 core 流循环中,让所有客户端——交互式、headless、ACP、SDK——获得同样的保护。分层正确: 344 个单元测试全部通过。Typecheck 和 lint 干净。代码直白——一个方法拆成两个、一个保留不同调用的 splice、一个计数器 getter。没有过度设计,没有投机性代码。 审查者 @wenshao 的真实 A/B 测试(head 建议合并 ✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
…op (#5128) #5036 carved the deterministic identical-tool-call check out of the `model.skipLoopDetection` gate, turning it into a hard-stop that fires even when loop detection is disabled. Because `skipLoopDetection` defaults to true (settingsSchema: "to avoid false-positive interruptions"), this silently re-enabled loop halts for the default configuration and broke the documented escape hatch — the non-interactive guidance in nonInteractiveCli.ts told users to set `model.skipLoopDetection: true`, which no longer disabled the halt and is unreachable in non-interactive mode (no disable dialog). Gate both the deterministic and heuristic detector paths behind the single flag again. The deterministic split, retry-reset, and pending tool-call splice introduced by #5036 still apply once detection is explicitly enabled (skipLoopDetection: false), so the runaway guard remains available as opt-in without overriding the default-off contract.
What this PR does
Moves the repeated-identical-tool-call hard stop into the core stream loop instead of the TUI hook.
Concretely:
LoopDetectionServicenow exposes a deterministic identical-tool-call backstop separately from heuristic loop checks.GeminiClient.sendMessageStream()always runs that deterministic backstop before yielding or scheduling more stream events, even whenmodel.skipLoopDetectionis enabled.model.skipLoopDetectionand the existing session disable path.Turn.pendingToolCallsbefore returning, so earlier distinct pending calls are preserved.Why it's needed
Issue #5015 shows a deterministic provider stream that can repeat the exact same tool call and arguments indefinitely. The existing identical-call detector already recognizes this pattern, but the check lived behind the soft loop-detection gate in the core client path. That means clients or modes that skip heuristic loop detection could still execute repeated side-effecting calls.
This PR keeps
skipLoopDetectionas a heuristic-loop setting while making the narrow deterministic identical-tool-call backstop run in core for every client.Reviewer Test Plan
How to verify
Run the focused core/client tests and confirm that repeated identical
ToolCallRequestevents produceLoopDetectedeven whenskipLoopDetectionis true. Also confirm the CLI hook tests still pass after removing the TUI-local guard.Evidence (Before & After)
Before this change,
GeminiClient.sendMessageStream()skipped all loop checks whenmodel.skipLoopDetectionwas true. After this change, only heuristic detectors are skipped; the deterministic identical-tool-call backstop still fires and core stops before handing another repeated tail to pending tool scheduling.Tested on
Environment
Windows 11, Node/npm from the existing repo workspace.
Commands run:
I also rebuilt
packages/coreandpackages/acp-bridgebefore rerunning the CLI typecheck so the local workspace outputs matched the rebased source.Risk & Scope
Linked Issues
Fixes #5015
中文说明
这个 PR 做了什么
把连续相同工具调用的硬保护从 TUI hook 移到 core 的流处理路径。
具体来说:
LoopDetectionService将确定性的“连续相同 tool call”保护和启发式 loop 检测拆开。GeminiClient.sendMessageStream()会始终先运行这个确定性保护,即使开启了model.skipLoopDetection。model.skipLoopDetection和现有 session disable 路径。Turn.pendingToolCalls中移除重复尾部,再返回;此前不同的 pending tool call 会保留。为什么需要
#5015 展示了一个确定性 provider 流:模型可能持续请求完全相同的工具和参数。已有 detector 能识别这个模式,但 core client 里整体被软性 loop detection gate 控制。这样在跳过启发式 loop detection 的客户端或模式下,仍可能继续执行重复的副作用工具调用。
这个 PR 保留
skipLoopDetection作为启发式检测开关,但让范围更窄、更确定的 identical-tool-call backstop 在 core 中始终生效。验证
已运行 core service/client focused tests、CLI hook focused test、core/CLI typecheck、Prettier、ESLint 和
git diff --check。命令见英文 Test Plan。