Skip to content

fix(core): hard-stop repeated identical tool calls#5036

Merged
wenshao merged 1 commit into
QwenLM:mainfrom
he-yufeng:fix/hard-stop-identical-tool-loop
Jun 14, 2026
Merged

fix(core): hard-stop repeated identical tool calls#5036
wenshao merged 1 commit into
QwenLM:mainfrom
he-yufeng:fix/hard-stop-identical-tool-loop

Conversation

@he-yufeng

@he-yufeng he-yufeng commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Moves the repeated-identical-tool-call hard stop into the core stream loop instead of the TUI hook.

Concretely:

  • LoopDetectionService now exposes a deterministic identical-tool-call backstop separately from heuristic loop checks.
  • GeminiClient.sendMessageStream() always runs that deterministic backstop before yielding or scheduling more stream events, even when model.skipLoopDetection is enabled.
  • Heuristic loop detection still respects model.skipLoopDetection and the existing session disable path.
  • When the deterministic backstop fires, core removes the repeated pending tool-call tail from Turn.pendingToolCalls before returning, so earlier distinct pending calls are preserved.
  • The previous CLI-local scheduler guard is removed, so the behavior applies consistently to TUI, non-interactive mode, ACP/serve, and SDK callers.

Why it's needed

Issue #5015 shows a deterministic provider stream that can repeat the exact same tool call and arguments indefinitely. The existing identical-call detector already recognizes this pattern, but the check lived behind the soft loop-detection gate in the core client path. That means clients or modes that skip heuristic loop detection could still execute repeated side-effecting calls.

This PR keeps skipLoopDetection as a heuristic-loop setting while making the narrow deterministic identical-tool-call backstop run in core for every client.

Reviewer Test Plan

How to verify

Run the focused core/client tests and confirm that repeated identical ToolCallRequest events produce LoopDetected even when skipLoopDetection is true. Also confirm the CLI hook tests still pass after removing the TUI-local guard.

Evidence (Before & After)

Before this change, GeminiClient.sendMessageStream() skipped all loop checks when model.skipLoopDetection was true. After this change, only heuristic detectors are skipped; the deterministic identical-tool-call backstop still fires and core stops before handing another repeated tail to pending tool scheduling.

Tested on

OS Status
🍏 macOS ⚠️ not tested
🪟 Windows ✅ tested
🐧 Linux ⚠️ not tested

Environment

Windows 11, Node/npm from the existing repo workspace.

Commands run:

npm run test --workspace=packages/core -- src/services/loopDetectionService.test.ts
npm run test --workspace=packages/core -- src/core/client.test.ts
npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
npm run typecheck --workspace=packages/core
npm run typecheck --workspace=packages/cli
npx prettier --check packages/core/src/core/client.ts packages/core/src/core/client.test.ts packages/core/src/services/loopDetectionService.ts packages/core/src/services/loopDetectionService.test.ts packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx
npx eslint packages/core/src/core/client.ts packages/core/src/core/client.test.ts packages/core/src/services/loopDetectionService.ts packages/core/src/services/loopDetectionService.test.ts packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx --max-warnings 0
git diff --check upstream/main --

I also rebuilt packages/core and packages/acp-bridge before rerunning the CLI typecheck so the local workspace outputs matched the rebased source.

Risk & Scope

  • Main risk or tradeoff: a legitimate workflow that asks for the exact same tool and arguments five times in a row now hits the same core identical-call backstop even if heuristic loop detection is skipped. That is intentionally narrower than keeping all loop detectors enabled.
  • Not validated / out of scope: live provider replay from the gist in Qwen Code executes repeated identical tool calls #5015 and end-to-end TUI recording.
  • Breaking changes / migration notes: none.

Linked Issues

Fixes #5015

中文说明

这个 PR 做了什么

把连续相同工具调用的硬保护从 TUI hook 移到 core 的流处理路径。

具体来说:

  • LoopDetectionService 将确定性的“连续相同 tool call”保护和启发式 loop 检测拆开。
  • GeminiClient.sendMessageStream() 会始终先运行这个确定性保护,即使开启了 model.skipLoopDetection
  • 其它启发式 loop 检测仍然遵守 model.skipLoopDetection 和现有 session disable 路径。
  • 当确定性保护触发时,core 会先从 Turn.pendingToolCalls 中移除重复尾部,再返回;此前不同的 pending tool call 会保留。
  • 旧的 CLI 本地调度保护已移除,因此 TUI、非交互模式、ACP/serve 和 SDK 调用方都会走同一套 core 行为。

为什么需要

#5015 展示了一个确定性 provider 流:模型可能持续请求完全相同的工具和参数。已有 detector 能识别这个模式,但 core client 里整体被软性 loop detection gate 控制。这样在跳过启发式 loop detection 的客户端或模式下,仍可能继续执行重复的副作用工具调用。

这个 PR 保留 skipLoopDetection 作为启发式检测开关,但让范围更窄、更确定的 identical-tool-call backstop 在 core 中始终生效。

验证

已运行 core service/client focused tests、CLI hook focused test、core/CLI typecheck、Prettier、ESLint 和 git diff --check。命令见英文 Test Plan。

userMessageTimestamp,
);
resetIdenticalToolCallLoop();
return StreamProcessingStatus.Completed;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] The early return StreamProcessingStatus.Completed here is inside the try block. The finally block runs (buffer cleanup), but dualOutput?.finalizeAssistantMessage() at line 1725 is skipped entirely. For --output-format json and other dual-output consumers, message_start has no matching message_stop, producing malformed stream output.

Suggested change
return StreamProcessingStatus.Completed;
if (recordToolCallForHardLoopGuard(event.value)) {
toolCallRequests.length = 0;
dualOutput?.finalizeAssistantMessage();
addItem(
{
type: 'info',
text:
`Stopped repeated identical tool calls after ` +
`${HARD_IDENTICAL_TOOL_CALL_LIMIT} consecutive attempts.`,
},
userMessageTimestamp,
);
resetIdenticalToolCallLoop();
return StreamProcessingStatus.Completed;
}

— qwen3.7-max via Qwen Code /review

case ServerGeminiEventType.ToolCallRequest:
flushBufferedStreamEvents();
if (recordToolCallForHardLoopGuard(event.value)) {
toolCallRequests.length = 0;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] toolCallRequests.length = 0 discards all accumulated tool call requests — including legitimate, distinct calls that preceded the identical loop. For example, [read_file("a.ts"), run_shell_command("echo x") × 10] would lose the valid read_file call.

Consider clearing only from the first identical call onward (e.g., track the index where the identical run started and splice from there), or add a comment explaining this is intentional given the pathological nature of 10 identical calls.

Also note: the Retry event handler (line ~1674) clears toolCallRequests.length = 0 but does not call resetIdenticalToolCallLoop(). This means identical calls before and after a retry accumulate — the counter could reach 10 across two retry attempts without either attempt individually looking like a loop.

— qwen3.7-max via Qwen Code /review

expect(result.current.streamingState).toBe(StreamingState.Idle);
});

it('should hard-stop repeated identical tool calls before scheduling them', async () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Only the exact-trigger case (10 identical calls → guard fires) is tested. Consider adding boundary tests to catch regressions:

  • 9 identical calls: should proceed normally (mockScheduleToolCalls IS called, no info message)
  • Mixed calls: e.g., 5×A, 1×B, 5×A — counter should reset on B, guard should NOT fire
  • Cross-turn reset: same tool in two sequential user queries — counter should reset between turns

— qwen3.7-max via Qwen Code /review

@he-yufeng he-yufeng force-pushed the fix/hard-stop-identical-tool-loop branch from a0c4266 to 853ce2a Compare June 12, 2026 12:22
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Addressed the review on the hard-stop guard and force-pushed the updated head 853ce2a14.

Changes:

  • The hard-stop path no longer returns from inside the stream try block. It breaks the stream loop, then falls through the existing cleanup/finalization path so dual-output consumers still get the matching assistant-message finalization.
  • The guard now removes only the repeated tail. Distinct tool calls that arrived before the repeated run are still scheduled.
  • Retry now resets the identical-tool-call guard together with the stale pending tool requests.
  • Added regression coverage for:
    • 10 identical calls hard-stopping without scheduling repeats
    • a distinct earlier call surviving a repeated tail
    • 9 identical calls staying below the hard-stop threshold
    • mixed 5xA, 1xB, 5xA resetting on the distinct call
    • retry resetting the guard instead of accumulating across attempts

Validation:

  • npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx -> 108 passed
  • npx prettier --check packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx -> passed
  • npx eslint packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx --max-warnings 0 -> passed
  • git diff --check upstream/main..HEAD -> passed

I also tried npm run typecheck --workspace=packages/cli; it currently fails in unrelated ACP/export files (ToolNames.ENTER_PLAN_MODE, setTitleRecordedCallback, and adjacent signature mismatches), outside this PR's two touched files.

qqqys
qqqys previously approved these changes Jun 12, 2026

@qqqys qqqys left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical issue from the previous review appears resolved on head 853ce2a; I did not find any new critical blocker in this pass.

wenshao
wenshao previously approved these changes Jun 12, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All R1 issues (early return skipping finalize, discarding distinct calls, missing boundary tests) are correctly addressed. The break streamLoop → finally → finalizeAssistantMessage flow is sound, splice math correctly preserves distinct earlier calls, and counter reset paths (retry, new query, hard-stop trigger) are all correct. 5 new tests cover the key scenarios well. tsc 0, eslint 0, 108/108 focused tests pass. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Local build & live-replay verification on macOS (merge reference)

I built this PR locally and replayed the #5015 attack shape against the real CLI — interactive TUI driven via tmux (send-keys / capture-pane) and headless -p runs — using a local deterministic OpenAI-compatible mock provider (same shape as the gist: every response requests the same run_shell_command with identical arguments; executions counted via an append-to-log side effect; the mock stops feeding tool calls after a request cap so nothing runs away). Baseline = the same workspace with the two PR files reverted to merge-base 78f0635.

Environment: macOS (Darwin 25.5.0), Node v22.22.2, head 853ce2a14, --auth-type openai against the mock, --approval-mode yolo, isolated $HOME.

Checks

  • vitest run src/ui/hooks/useGeminiStream.test.tsx108/108 pass (incl. the 5 new cases); npm run lint ✅ and npm run typecheck ✅ on @qwen-code/qwen-code — this covers the macOS row left untested in the PR test plan.
  • Test validity (mutation check): with the new tests kept but useGeminiStream.ts reverted to merge-base, exactly the 2 behavior-guarding cases fail (hard-stop … before scheduling, keep earlier distinct tool calls) and the 3 negative-boundary cases pass by design — the new tests genuinely pin the new behavior.

Live replay matrix

# Build Mode model.skipLoopDetection Stream shape Provider requests Repeated-call executions Outcome
A PR interactive TUI default (true) identical call every response 10 9 ✅ hard stop: ● Stopped repeated identical tool calls after 10 consecutive attempts. shown, TUI returns to idle input
B baseline interactive TUI default same 62 (mock-capped) 60 ❌ unbounded
C PR headless -p default same 62 (mock-capped) 60 ❌ unbounded — see caveat 2
D PR interactive TUI false same 5 4 ✅ soft detector fires first (confirmation dialog); the two layers don't interfere
E PR interactive TUI default 1 distinct read_file + 12 identical shell calls in one streamed response 4 0 ✅ splice keeps and runs the distinct call, drops the 12 repeats, turn continues normally
F baseline headless -p default identical every response 62 (mock-capped) 60 ❌ unbounded (reproduces #5015 against current main)
G PR interactive TUI default E's mixed shape on every response 27 (mock-capped) 0 ⚠️ repeated side effects fully eliminated, but the kept distinct call sustains the request cycle (guard fires once per response)
H PR headless -p false identical every response 5 4 ✅ soft detector halts headless: Loop detection halted the run (consecutive_identical_tool_calls …)

Headline numbers: under default settings the interactive TUI goes from 60 executions (mock-capped, unbounded) on baseline to 9 executions + clean stop with this PR.

Post-stop recovery

After a hard stop I sent follow-up messages and inspected the subsequent request payloads. The aborted assistant turn (with its tool_calls) is never recorded into history, and the orphaned tool response of the kept distinct call is stripped before the next send — the wire payload stays valid for strict OpenAI-compatible providers (no dangling tool_calls, so no 400 risk), and the session remains fully usable afterwards. The cost is that the model keeps no record of the aborted turn, which is consistent with the existing soft-loop stop semantics.

Merge-relevant caveats (recorded for follow-up, not blockers for this scoped guard)

  1. Soft loop detection is off by default. The settingsSchema default for model.skipLoopDetection is true (and packages/cli/src/config/config.ts applies ?? true). So this guard is not a backstop behind an active detector — under default settings it is the only protection, and only for the interactive TUI.
  2. The Qwen Code executes repeated identical tool calls #5015 reproducer path stays unbounded (case C). The gist invokes qwen … -p, which goes through nonInteractiveCli.ts's own stream loop, not useGeminiStream; the ACP session loop is likewise uncovered. I'd treat this PR as a partial mitigation for Qwen Code executes repeated identical tool calls #5015: either keep the issue open / file a follow-up for the non-interactive and ACP paths, or revisit the skipLoopDetection default for unattended runs (case H shows re-enabling it bounds headless at 4 executions today).
  3. Mixed-shape residual (case G). When every response pairs the identical streak with one distinct call, the kept distinct call keeps the request cycle alive (one guard fire + one distinct execution per response, one info line per cycle in the UI). Still strictly better than baseline, which would additionally execute all 12 repeats per cycle — recording as residual risk, not a regression.

Overall: within its scope the guard behaves exactly as designed, implementation details (stable key serialization, splice bounds, counter resets on new top-level submits and stream Retry) all check out in live runs, and my earlier R1 review items remain correctly addressed on this head.

中文版(Chinese version)

macOS 本地构建与真实回放验证(合并参考)

我在本地构建了此 PR,并用本地确定性 OpenAI 兼容 mock provider 回放 #5015 的攻击形态,对真实 CLI 做了验证——交互式 TUI 用 tmux 驱动(send-keys / capture-pane),headless 用 -p 运行。mock 与 gist 形态一致:每次响应都请求同一个 run_shell_command、参数完全相同;通过追加日志的副作用统计执行次数;mock 在请求数达到上限后停止下发工具调用以防失控。基线 = 同一工作区将 PR 的两个文件还原到 merge-base 78f0635 后重新构建。

环境:macOS(Darwin 25.5.0)、Node v22.22.2、head 853ce2a14--auth-type openai 指向 mock、--approval-mode yolo、隔离 $HOME

检查项

  • vitest run src/ui/hooks/useGeminiStream.test.tsx108/108 通过(含新增 5 个用例);@qwen-code/qwen-codenpm run lint ✅、npm run typecheck ✅——补上了 PR 测试计划中 macOS 未测试的一行。
  • 测试有效性(mutation check):保留新测试、把 useGeminiStream.ts 还原到 merge-base 后,恰好 2 个守卫行为的用例失败(hard-stop … before schedulingkeep earlier distinct tool calls),3 个阴性边界用例按设计通过——新测试确实锁住了新行为。

真实回放矩阵

# 构建 模式 model.skipLoopDetection 流形态 provider 请求数 重复调用执行次数 结果
A PR 交互 TUI 默认(true 每次响应同一调用 10 9 ✅ 硬停止:显示 ● Stopped repeated identical tool calls after 10 consecutive attempts.,TUI 回到空闲输入态
B 基线 交互 TUI 默认 同上 62(mock 截断) 60 ❌ 无界
C PR headless -p 默认 同上 62(mock 截断) 60 ❌ 无界——见注意事项 2
D PR 交互 TUI false 同上 5 4 ✅ 软检测先触发(确认弹窗);两层保护互不干扰
E PR 交互 TUI 默认 单条流响应内 1 个不同的 read_file + 12 个相同 shell 调用 4 0 ✅ splice 保留并执行了不同调用、丢弃 12 个重复调用,回合正常继续
F 基线 headless -p 默认 每次响应同一调用 62(mock 截断) 60 ❌ 无界(在当前 main 上复现 #5015
G PR 交互 TUI 默认 每次响应都是 E 的混合形态 27(mock 截断) 0 ⚠️ 重复副作用被完全消除,但保留的不同调用维持了请求循环(每次响应触发一次硬停止)
H PR headless -p false 每次响应同一调用 5 4 ✅ 软检测拦住 headless:Loop detection halted the run (consecutive_identical_tool_calls …)

核心数字:默认设置下,交互式 TUI 从基线的 60 次执行(mock 截断,实际无界) 降到本 PR 的 9 次执行 + 干净停止

停止后的恢复

硬停止后我继续发送消息并检查了后续请求的 payload:被中断的助手回合(连同其 tool_calls)不会写入历史,被保留的不同调用的孤儿工具响应也会在下次发送前被剥离——发往严格 OpenAI 兼容 provider 的 payload 保持合法(没有悬空 tool_calls,无 400 风险),会话之后完全可用。代价是模型不会记得被中断的回合,这与现有软循环停止的语义一致。

与合并相关的注意事项(供后续跟进记录,不阻塞这个限定范围的保护)

  1. 软循环检测默认是关闭的。 settingsSchemamodel.skipLoopDetection 默认值为 truepackages/cli/src/config/config.ts 也是 ?? true)。因此这个硬保护并不是"已启用检测器之后的兜底"——默认设置下它是唯一的保护,且仅覆盖交互式 TUI。
  2. Qwen Code executes repeated identical tool calls #5015 复现路径仍然无界(用例 C)。gist 用 qwen … -p 调用,走的是 nonInteractiveCli.ts 自己的流循环,不经过 useGeminiStream;ACP 会话循环同样不被覆盖。建议把此 PR 视为 Qwen Code executes repeated identical tool calls #5015 的部分缓解:要么保持 issue 开启 / 为非交互和 ACP 路径建后续任务,要么重新评估无人值守场景下 skipLoopDetection 的默认值(用例 H 表明现在重新启用它即可把 headless 限制在 4 次执行)。
  3. 混合形态残余风险(用例 G)。当每次响应都把相同调用串和一个不同调用配对时,被保留的不同调用会维持请求循环(每次响应触发一次硬停止 + 执行一次不同调用,UI 每轮多一行提示)。仍严格优于基线(基线每轮还会把 12 个重复调用全部执行)——记录为残余风险,不是回归。

总体:在其限定范围内,该保护的行为与设计完全一致;实现细节(稳定序列化 key、splice 边界、新顶层提交与流 Retry 时的计数器重置)在真实运行中全部验证无误;我此前 R1 review 提出的问题在该 head 上依然保持正确修复。

@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

After re-examining the layering, I'm walking back my approval and requesting a placement change: this guard should live in core, not in the TUI hook.

What I verified in the codebase:

  1. Core already enforces loop detection centrally and authoritatively. In GeminiClient.sendMessageStream (packages/core/src/core/client.ts), every stream event goes through loopDetector.addAndCheck(event), and on detection core itself terminates the turn (return turn). This line of defense covers every client — TUI, non-interactive -p, ACP, serve/daemon, SDK — because they all consume the same stream. The identical tool-call (same name + same args) threshold there is 5 (TOOL_CALL_LOOP_THRESHOLD in loopDetectionService.ts).

  2. Consequently, with soft loop detection enabled, core stops the turn at the 5th identical call — this PR's 10-call guard in useGeminiStream can never fire. The new guard only ever has effect when model.skipLoopDetection: true.

  3. But in exactly that configuration, ACP (Zed etc.), qwen serve / daemon, non-interactive qwen -p, and SDK consumers bypass useGeminiStream entirely and get zero protection. Those unattended paths are where a Qwen Code executes repeated identical tool calls #5015-style runaway hurts most — nobody is watching to press ESC, and the API bill keeps burning. The interactive TUI is the one entry point that already has a human in the loop.

  4. It also makes the setting's semantics inconsistent: the same model.skipLoopDetection: true now means "soft detection off, hard backstop on" in the TUI but "everything off" everywhere else.

  5. Maintenance-wise this adds a second, independent implementation of "stable-serialize name+args and count consecutive repeats" next to core's existing checkToolCallLoop, with a different threshold (10 vs 5).

Suggested change: move the deterministic identical-call hard stop into core — e.g. in the sendMessageStream event loop where addAndCheck already runs, keep a minimal identical-tool-call counter that is not gated by getSkipLoopDetection(). skipLoopDetection would continue to disable the heuristic detectors (content-chunk repetition, LLM-based judgment — the false-positive-prone ones users actually want to turn off), while the deterministic "N identical calls in a row" backstop stays always-on for every client. That keeps this PR's intent and roughly its size, but all entry points inherit the protection and there's only one implementation to maintain.

中文说明

重新审视分层后,我收回此前的 approve,建议把这个守卫挪到 core:

  1. core 已经有集中且强制执行的循环检测:client.tssendMessageStream 对每个事件跑 loopDetector.addAndCheck(),命中后由 core 自己终止本轮(return turn),天然覆盖所有客户端(TUI、-p 非交互、ACP、serve/daemon、SDK)。其中"相同工具+相同参数"的阈值是 5(TOOL_CALL_LOOP_THRESHOLD)。
  2. 因此软检测开启时 core 在第 5 次就拦截,本 PR 的第 10 次守卫永远不会触发——它只在 model.skipLoopDetection: true 时才有作用
  3. 而恰恰在这个配置下,ACP、serve/daemon、非交互 -p、SDK 都不经过 useGeminiStream,零保护。这些无人值守的入口才是 Qwen Code executes repeated identical tool calls #5015 这类 runaway 伤害最大的地方;TUI 反而是唯一有人随时能按 ESC 的入口。
  4. 同一个设置在不同入口含义不同(TUI:"关软检测、留硬底线";其他:"全关"),语义不一致。
  5. CLI 层重复实现了一份"name+args 序列化+连续计数",与 core 的 checkToolCallLoop 重复且阈值不同(10 vs 5)。

建议:把确定性的 identical-call 硬熔断下沉到 core——在 sendMessageStream 的事件循环里维护一个不受 getSkipLoopDetection() 控制的最小计数器;skipLoopDetection 继续只关闭启发式/LLM 类易误报的软检测(这才是用户关它的本意)。改动量与现 PR 相当,但所有客户端一次性继承,且只有一份实现需要维护。

@wenshao wenshao dismissed their stale review June 12, 2026 23:02

Dismissing my approval — after re-checking the layering I think the hard stop belongs in core (it currently only protects the interactive TUI, and only when skipLoopDetection is on). Details: #5036 (comment)

@he-yufeng he-yufeng force-pushed the fix/hard-stop-identical-tool-loop branch from 853ce2a to 087d858 Compare June 13, 2026 19:19
@he-yufeng he-yufeng changed the title fix(cli): hard-stop repeated identical tool calls fix(core): hard-stop repeated identical tool calls Jun 13, 2026
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Thanks for the correction. I reworked this so the guard now lives in core instead of the TUI hook.

What changed in this revision:

  • moved the deterministic repeated-identical-tool-call backstop into GeminiClient.sendMessageStream();
  • split LoopDetectionService so the deterministic tool-call backstop is separate from heuristic loop checks;
  • kept heuristic detectors behind model.skipLoopDetection, while the deterministic identical-call backstop still runs when that setting is enabled;
  • removed the previous CLI-local scheduler guard;
  • when the backstop fires, core removes only the repeated pending tail from Turn.pendingToolCalls, preserving earlier distinct pending tool calls.

Validation run on Windows:

npm run test --workspace=packages/core -- src/services/loopDetectionService.test.ts
npm run test --workspace=packages/core -- src/core/client.test.ts
npm run test --workspace=packages/cli -- src/ui/hooks/useGeminiStream.test.tsx
npm run typecheck --workspace=packages/core
npm run typecheck --workspace=packages/cli
npx prettier --check packages/core/src/core/client.ts packages/core/src/core/client.test.ts packages/core/src/services/loopDetectionService.ts packages/core/src/services/loopDetectionService.test.ts packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx
npx eslint packages/core/src/core/client.ts packages/core/src/core/client.test.ts packages/core/src/services/loopDetectionService.ts packages/core/src/services/loopDetectionService.test.ts packages/cli/src/ui/hooks/useGeminiStream.ts packages/cli/src/ui/hooks/useGeminiStream.test.tsx --max-warnings 0
git diff --check upstream/main --

I also rebuilt packages/core and packages/acp-bridge before rerunning the CLI typecheck so the workspace outputs were fresh after the rebase.

@qqqys qqqys left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical-only re-review: the prior core-placement blocker is addressed on head 087d858. The deterministic identical-tool-call backstop now runs in core, still runs when skipLoopDetection is true, trims the repeated pending tail, and CI is green. I did not find a new critical blocker in this pass.

@wenshao

wenshao commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Re-verification of the core-placement rework (merge reference)

This supersedes my two earlier comments: the live-replay report on the old TUI-hook implementation (head 853ce2a14), and the placement request to move the backstop into core. This revision (087d858d91) reworks the PR exactly as requested, so I re-verified from scratch on Linux (Node 22.22.2), built into the real qwen binary.

Headline: the rework does what my placement comment asked — the deterministic identical-tool-call backstop now runs in core for every client, ungated by skipLoopDetection. The headless -p path that my first report flagged as still-unbounded (caveat #2 / case C) is now bounded. Verified live: 40 → 4 executions. LGTM.

What the rework does (confirmed in the diff at 087d858d91)

  • LoopDetectionService is split: addAndCheckDeterministicToolCallLoop() (the name+args identical-call counter, threshold 5) is separate from addAndCheckHeuristicLoops() (read-file/stagnation/content heuristics).
  • In GeminiClient.sendMessageStream(), the deterministic backstop runs unconditionally; only the heuristic detectors stay behind !getSkipLoopDetection(). The old if (!getSkipLoopDetection()) { addAndCheck } wrapper that gated everything is gone.
  • On a deterministic hit for a ToolCallRequest, core splices only the repeated tail off turn.pendingToolCalls (splice(len - repeatedCount)), preserving earlier distinct pending calls.
  • The previous CLI-local scheduler guard (threshold 10, TUI-only) is removed — one implementation, one threshold (5), for TUI / non-interactive -p / ACP / serve / SDK.
  • The deterministic backstop still honors the explicit disableForSession() escape hatch (the dialog's "Disable for this session"), so it is unconditional w.r.t. the passive skipLoopDetection setting but not w.r.t. a deliberate user opt-out.

1. PR test plan — all green at 087d858d91

Command Result
vitest run loopDetectionService.test.ts client.test.ts (core) 237 passed (45 + 192)
vitest run useGeminiStream.test.tsx (cli) 105 passed (the removed CLI-guard tests are gone, as intended)
typecheck core + cli exit 0 / exit 0
prettier --check + eslint --max-warnings 0 (4 files) + git diff --check clean

2. Mutation test — the new core tests are non-vacuous

Reverted client.ts to the merge-base 0db3273174 (restoring the old if (!getSkipLoopDetection()) gate), kept the PR's tests:

× client.test.ts › keeps deterministic tool-call checks when skipLoopDetection is true
× client.test.ts › hard-stops identical tool calls even when skipLoopDetection is true
 Tests  2 failed | 190 passed

Exactly the two new integration tests fail (the others stay green → backward-compatible). The headline test — hard-stops identical tool calls even when skipLoopDetection is true — is precisely the behavior the whole PR is about, and it flips red on pre-fix client.ts.

3. Live A/B in the real binary — the #5015 headless path is now bounded

Local deterministic OpenAI-compatible mock: every response requests the same run_shell_command with identical args; each execution appends one byte to a side-effect log (execution counter); the mock caps tool-call turns at 40 so a missing backstop still terminates. Config: model.skipLoopDetection: true (the shipping default), --approval-mode yolo, isolated $HOME. Baseline = the four files reverted to merge-base and the workspace rebuilt.

Path Build skipLoopDetection shell executions provider requests outcome
headless -p baseline true 40 (capped → unbounded) 42 no stop — reproduces #5015 / my report-1 case C
headless -p PR true 4 5 Loop detection halted the run (consecutive_identical_tool_calls…)
interactive TUI (tmux) PR true 4 5 ✅ "A potential loop was detected" dialog → Esc (keep enabled) → request has been halted, returns to idle, no further executions

So under the shipping default, the headless -p runaway my first report called out goes from 40 executions (mock-capped, i.e. unbounded) to 4 + a clean halt, and the deterministic backstop now fires through the same LoopDetected path in the TUI too. The 5th identical call trips it (threshold 5 → ~4 side effects), and core stops before scheduling the repeated tail.

Notes (non-blocking)

  • Threshold is now 5, not the old CLI guard's 10 — slightly more aggressive, and consistent with core's existing identical-call threshold. Intentional per the PR.
  • Splice / distinct-call preservation (one response carrying a distinct call + an identical streak) is covered by the new unit tests and was exercised live in my first report (case E); I did not re-run it here.
  • Escape hatch: choosing "Disable loop detection for this session" in the TUI dialog calls disableForSession(), which also disables the deterministic backstop for the rest of that session — a deliberate, user-initiated opt-out, distinct from the passive skipLoopDetection default. Headless -p has no such dialog, so it is always protected.
  • The mixed alternating-shape residual (report-1 case G) is orthogonal to this placement change and unchanged.

Verdict: the rework implements the core placement I asked for; all clients — crucially the previously-unbounded headless -p/ACP paths — now inherit the deterministic identical-call backstop, with one implementation and clean tests. Re-confirmed end-to-end at 087d858d91. LGTM to merge.

中文版(Chinese version)

core 放置重构的复验(合并参考)

本评论取代我之前的两条:针对旧的 TUI-hook 实现的真实回放报告(head 853ce2a14),以及要求把熔断下沉到 core 的评论。本修订(087d858d91)完全按该要求重构,因此我在 Linux(Node 22.22.2)上重新从零构建真实 qwen 二进制做了验证。

要点:重构正是我放置评论所要求的——确定性的"相同工具调用"熔断现在在 core 中对所有客户端生效,不再受 skipLoopDetection 门控。我第一份报告指出的仍然无界的 headless -p 路径(注意事项 #2 / 用例 C)现在被收敛了。实测:40 → 4 次执行。建议合并。

重构做了什么(已在 087d858d91 的 diff 中确认)

  • LoopDetectionService 拆分:addAndCheckDeterministicToolCallLoop()(name+args 相同调用计数,阈值 5)与 addAndCheckHeuristicLoops()(read-file/停滞/内容启发式)分离。
  • GeminiClient.sendMessageStream() 中,确定性熔断无条件运行;只有启发式检测器仍在 !getSkipLoopDetection() 之后。原来把所有检测都门控住的 if (!getSkipLoopDetection()) { addAndCheck } 包裹已移除。
  • 确定性命中 ToolCallRequest 时,core 仅把重复的尾部从 turn.pendingToolCallssplice 掉(splice(len - repeatedCount)),保留更早的不同 pending 调用。
  • 此前 CLI 本地调度器守卫(阈值 10、仅 TUI)已删除——一份实现、一个阈值(5),覆盖 TUI / 非交互 -p / ACP / serve / SDK。
  • 确定性熔断仍尊重显式的 disableForSession() 逃生通道(对话框中的"本会话禁用"),因此它相对被动的 skipLoopDetection 设置是无条件的,但相对用户主动选择禁用则不是。

1. PR 测试计划 —— 新 head 上全绿

命令 结果
vitest run loopDetectionService.test.ts client.test.ts(core) 237 通过(45 + 192)
vitest run useGeminiStream.test.tsx(cli) 105 通过(被移除的 CLI 守卫测试已随之删除,符合预期)
typecheck core + cli exit 0 / exit 0
prettier --check + eslint --max-warnings 0(4 文件)+ git diff --check 干净

2. 变异测试 —— 新增 core 测试非空壳

client.ts 回退到 merge-base 0db3273174(恢复旧的 if (!getSkipLoopDetection()) 门控),保留 PR 测试:

× client.test.ts › keeps deterministic tool-call checks when skipLoopDetection is true
× client.test.ts › hard-stops identical tool calls even when skipLoopDetection is true
 Tests  2 failed | 190 passed

恰好两个新增集成测试失败(其余保持绿色 → 向后兼容)。其中承重的那条——skipLoopDetection 为 true 时仍硬熔断相同工具调用——正是整个 PR 的核心行为,在 pre-fix 的 client.ts 上翻红。

3. 真实二进制 A/B —— #5015 的 headless 路径现已收敛

本地确定性 OpenAI 兼容 mock:每次响应都请求同一个 run_shell_command、参数相同;每次执行向副作用日志追加一字节(执行计数);mock 把工具调用轮数截断在 40,以便熔断缺失时仍能终止。配置:model.skipLoopDetection: true(出厂默认)、--approval-mode yolo、隔离 $HOME。基线 = 四个文件回退到 merge-base 并重建工作区。

路径 构建 skipLoopDetection shell 执行次数 provider 请求数 结果
headless -p 基线 true 40(截断 → 无界) 42 不停止——复现 #5015 / 我报告 1 的用例 C
headless -p PR true 4 5 Loop detection halted the run (consecutive_identical_tool_calls…)
交互 TUI(tmux) PR true 4 5 ✅ "A potential loop was detected" 对话框 → Esc(保持启用)→ request has been halted,回到空闲,无后续执行

因此在出厂默认下,我第一份报告点名的 headless -p runaway 从 40 次执行(mock 截断,即无界) 降到 4 次 + 干净停止;确定性熔断在 TUI 中也走同一个 LoopDetected 路径。第 5 次相同调用触发(阈值 5 → 约 4 次副作用),core 在调度重复尾部之前停止。

备注(不阻断)

  • 阈值现在是 5,不是旧 CLI 守卫的 10——略更激进,且与 core 既有的相同调用阈值一致。按 PR 设计如此。
  • splice / 保留不同调用(单条响应同时包含一个不同调用和一串相同调用)由新增单元测试覆盖,并在我第一份报告(用例 E)中实测过;此处未重跑。
  • 逃生通道:在 TUI 对话框选"本会话禁用循环检测"会调用 disableForSession(),这会在本会话剩余时间内也关闭确定性熔断——这是用户主动选择的退出,区别于被动的 skipLoopDetection 默认值。headless -p 没有该对话框,因此始终受保护。
  • 混合交替形态的残余(报告 1 用例 G)与本次放置变更正交,未改变。

结论:重构实现了我要求的 core 放置;所有客户端——尤其是此前无界的 headless -p/ACP 路径——现在都继承了确定性的相同调用熔断,一份实现、测试干净。在 087d858d91 上端到端再次确认。建议合并。

@wenshao

wenshao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Thanks for the PR!

Template looks good ✓

On direction: this is squarely aligned — #5015 is a real reliability bug where deterministic provider streams can execute the same side-effecting tool call indefinitely, burning API credits and causing unintended side effects. A hard backstop in core that covers all entry points (TUI, headless -p, ACP, SDK) is exactly the right fix.

On approach: the scope is tight and the layering is clean. Splitting LoopDetectionService into a deterministic identical-tool-call check (always-on) and heuristic detectors (behind skipLoopDetection) is the minimal correct change — one implementation, one threshold, every client inherits the protection. The splice logic that preserves earlier distinct pending calls is a nice touch. This reads as a focused bug fix, not a feature creep PR.

Moving on to code review. 🔍

中文说明

感谢贡献!

模板完整 ✓

方向:完全对齐——#5015 是真实的可靠性问题,确定性 provider 流可能无限重复同一副作用工具调用,消耗 API 额度并产生意外副作用。在 core 中设置覆盖所有入口(TUI、headless -p、ACP、SDK)的硬保护是正确的修复方向。

方案:范围紧凑,分层清晰。将 LoopDetectionService 拆分为确定性相同工具调用检查(始终生效)和启发式检测器(受 skipLoopDetection 控制)是最小正确改动——一份实现、一个阈值,所有客户端自动继承保护。splice 保留此前不同 pending 调用的逻辑处理得当。这是一个聚焦的 bug 修复,没有功能蔓延。

进入代码审查 🔍

Qwen Code · qwen3.7-max

@wenshao wenshao merged commit e2fc161 into QwenLM:main Jun 14, 2026
23 checks passed
@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Code Review

Independent proposal before reading the diff: split addAndCheck in LoopDetectionService so the deterministic identical-tool-call check can run unconditionally in GeminiClient.sendMessageStream(), while heuristic detectors stay behind skipLoopDetection. Add a getter for the repetition count so core can splice the repeated tail from pendingToolCalls.

The PR's approach matches this exactly. The implementation is clean and minimal:

  • addAndCheckDeterministicToolCallLoop() — always-on, handles Retry counter reset, respects disableForSession(). Correct.
  • addAndCheckHeuristicLoops() — existing heuristic detectors minus checkToolCallLoop (now in the deterministic path). The trackToolCall call is correctly preserved here for non-deterministic heuristic tracking.
  • Client splice logic (Math.max(0, len - repeatedCount)) safely removes only the repeated tail, preserving earlier distinct pending calls.
  • getConsecutiveToolCallCount() exposes the counter for the splice. Clean.

No correctness bugs, no security issues, no AGENTS.md violations. The backward-compatible addAndCheck() wrapper is preserved for any external callers.

Test Results

✓ src/services/loopDetectionService.test.ts  45 passed
✓ src/core/client.test.ts                   192 passed
✓ src/ui/hooks/useGeminiStream.test.tsx      107 passed
✓ typecheck (core)                          exit 0
✓ eslint --max-warnings 0 (4 files)         clean

New tests are non-vacuous — hard-stops identical tool calls even when skipLoopDetection is true directly pins the headline behavior, and the retry-reset + count-exposure tests cover the splice support path.

Smoke Test (tmux)

Build succeeds with PR changes. CLI starts cleanly:

runner@runnervm1li68:~/work/qwen-code/qwen-code$ npm run dev -- --version 2>&1

> @qwen-code/qwen-code@0.18.0 dev
> node scripts/dev.js --version

DEV is set to true, but the React DevTools server is not running. Start it with:

$ npx react-devtools

dev
runner@runnervm1li68:~/work/qwen-code/qwen-code$

Full reproduction of the #5015 scenario requires a deterministic mock provider (identical tool calls on every response) — not achievable with a real LLM prompt. Comprehensive live A/B testing was already done by reviewer @wenshao (cases A–H, head 087d858d91), confirming the headless -p path goes from 40 executions (unbounded) to 4 + clean halt, and the interactive TUI shows the expected hard-stop dialog.

中文说明

代码审查

独立方案(读 diff 之前):在 LoopDetectionService 中拆分 addAndCheck,让确定性相同工具调用检查在 GeminiClient.sendMessageStream() 中无条件运行,启发式检测器留在 skipLoopDetection 后面。暴露重复计数器以便 core 从 pendingToolCalls 中 splice 重复尾部。

PR 方案与此完全一致。实现干净且最小化:

  • addAndCheckDeterministicToolCallLoop() — 始终生效,处理 Retry 计数器重置,尊重 disableForSession()。正确。
  • addAndCheckHeuristicLoops() — 现有启发式检测器减去 checkToolCallLoop(已移至确定性路径)。trackToolCall 调用正确保留。
  • 客户端 splice 逻辑(Math.max(0, len - repeatedCount))安全移除重复尾部,保留此前不同的 pending 调用。
  • getConsecutiveToolCallCount() 暴露计数器供 splice 使用。干净。

无正确性 bug、无安全问题、无 AGENTS.md 违规。向后兼容的 addAndCheck() 包装器保留给外部调用方。

测试结果

✓ src/services/loopDetectionService.test.ts  45 通过
✓ src/core/client.test.ts                   192 通过
✓ src/ui/hooks/useGeminiStream.test.tsx      107 通过
✓ typecheck (core)                          exit 0
✓ eslint --max-warnings 0 (4 文件)          干净

新测试非空壳——skipLoopDetection 为 true 时仍硬熔断相同工具调用直接锁住核心行为,retry-reset 和 count-exposure 测试覆盖 splice 支持路径。

冒烟测试 (tmux)

PR 改动构建成功。CLI 正常启动(见上方终端输出)。

完整复现 #5015 场景需要确定性 mock provider(每次响应返回相同工具调用),无法用真实 LLM prompt 实现。审查者 @wenshao 已完成全面的真实 A/B 测试(用例 A–H,head 087d858d91),确认 headless -p 路径从 40 次执行(无界) 降到 4 次 + 干净停止,交互 TUI 显示预期的硬停止对话框。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Verdict

This PR does one thing well: it moves the deterministic identical-tool-call backstop from a TUI-only hook into the core stream loop, so every client — interactive, headless, ACP, SDK — gets the same protection. The layering is right: skipLoopDetection still controls the heuristic detectors that users actually want to toggle, while the narrow "5 identical calls in a row" hard stop is always on.

All 344 unit tests pass. Typecheck and lint clean. The code is straightforward — two methods where there was one, a splice that preserves earlier distinct calls, and a counter getter. Nothing over-engineered, nothing speculative.

Reviewer @wenshao's live A/B testing (head 087d858d91, Linux) confirmed the #5015 headless path goes from unbounded to 4 executions + clean halt. That's the key gap this rework closes over the original TUI-only approach.

LGTM — approving. ✅

中文说明

结论

这个 PR 做好了一件事:把确定性的相同工具调用熔断从仅 TUI 的 hook 移到 core 流循环中,让所有客户端——交互式、headless、ACP、SDK——获得同样的保护。分层正确:skipLoopDetection 仍然控制用户真正想切换的启发式检测器,而窄范围的"连续 5 次相同调用"硬停止始终生效。

344 个单元测试全部通过。Typecheck 和 lint 干净。代码直白——一个方法拆成两个、一个保留不同调用的 splice、一个计数器 getter。没有过度设计,没有投机性代码。

审查者 @wenshao 的真实 A/B 测试(head 087d858d91,Linux)确认 #5015 的 headless 路径从无界降到 4 次执行 + 干净停止。这是此次重构相对原始 TUI 方案关闭的关键缺口。

建议合并 ✅

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

wenshao added a commit that referenced this pull request Jun 15, 2026
…op (#5128)

#5036 carved the deterministic identical-tool-call check out of the
`model.skipLoopDetection` gate, turning it into a hard-stop that fires
even when loop detection is disabled. Because `skipLoopDetection`
defaults to true (settingsSchema: "to avoid false-positive
interruptions"), this silently re-enabled loop halts for the default
configuration and broke the documented escape hatch — the
non-interactive guidance in nonInteractiveCli.ts told users to set
`model.skipLoopDetection: true`, which no longer disabled the halt and
is unreachable in non-interactive mode (no disable dialog).

Gate both the deterministic and heuristic detector paths behind the
single flag again. The deterministic split, retry-reset, and pending
tool-call splice introduced by #5036 still apply once detection is
explicitly enabled (skipLoopDetection: false), so the runaway guard
remains available as opt-in without overriding the default-off contract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen Code executes repeated identical tool calls

4 participants