test: stabilize simple MCP integration check by he-yufeng · Pull Request #5072 · QwenLM/qwen-code

he-yufeng · 2026-06-13T03:09:29Z

What this PR does

This makes the simple-mcp-server integration test ask the model to fetch an opaque token from the MCP server instead of asking it to add 5 + 10.

The test still verifies the same integration path: the local MCP server is discovered, its tool is exposed as mcp__addition-server__get_integration_token, the model calls that tool, and the final output contains the returned value.

Why it's needed

The current release run in #5068 failed in both sandbox modes because simple-mcp-server.test.ts never observed the MCP add tool call. The prompt asks the model to add two small numbers, so a model can answer 15 directly without calling the tool. That makes the release gate depend on model choice rather than MCP wiring.

Using an opaque token keeps the assertion focused on MCP tool availability: the expected value is not inferable from the prompt, so the model has a much stronger reason to call the MCP tool.

Reviewer Test Plan

How to verify

Run the release integration jobs or the targeted command with the same OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_MODEL, and auth configuration used by release CI:

npx vitest run --root ./integration-tests cli/simple-mcp-server.test.ts

Expected result: the test records mcp__addition-server__get_integration_token in telemetry and the model output contains qwen-mcp-tool-token-7f31d0.

Evidence (Before & After)

Before: release run 27450768357 failed in both Integration Tests (No Sandbox) and Integration Tests (Docker) with Expected to find an add tool call: expected false to be truthy in cli/simple-mcp-server.test.ts.

After: the test no longer asks for a trivial arithmetic result. It requires the model to use an MCP tool to retrieve a token it cannot derive from the prompt.

Tested on

OS	Status
🍏 macOS	⚠️ not tested
🪟 Windows	✅ `npm ci --no-audit --progress=false`, `npm run build`, `npm exec -- eslint integration-tests\\cli\\simple-mcp-server.test.ts`, `npm exec -- prettier --check integration-tests\\cli\\simple-mcp-server.test.ts`, `git diff --check`; ⚠️ targeted integration test is auth-gated locally
🐧 Linux	⚠️ not tested

Environment (optional)

Local Windows shell has OPENAI_API_KEY but not the full release auth/model setup, so the targeted integration command stops before the patched assertion with No auth type is selected. Release CI already supplies the required OpenAI-compatible environment for this test.

Risk & Scope

Main risk or tradeoff: this changes the fixture tool from arithmetic to a token-returning tool, so it is a test-only behavior change.
Not validated / out of scope: I did not run a successful live model-backed integration call locally because the local auth configuration is incomplete.
Breaking changes / migration notes: none.

Linked Issues

Fixes #5068

中文说明

What this PR does

这个 PR 把 simple-mcp-server 集成测试从“让模型计算 5 + 10”改成“让模型从 MCP server 获取一个不可从 prompt 推断出来的 token”。

测试覆盖的路径不变：本地 MCP server 被发现，工具暴露为 mcp__addition-server__get_integration_token，模型调用该工具，最终输出包含工具返回值。

Why it's needed

#5068 对应的 release run 在无 sandbox 和 Docker 两个集成测试 job 里都失败了，失败点是 simple-mcp-server.test.ts 没观察到 MCP add 工具调用。原 prompt 要求模型加两个很小的数字，模型可以直接回答 15，不一定会调用工具。这样 release gate 实际上依赖模型选择，而不是只验证 MCP wiring。

改成 opaque token 后，断言更聚焦：期望值不能从 prompt 推出来，模型必须通过 MCP 工具拿到结果。

Reviewer Test Plan

用 release CI 相同的 OPENAI_API_KEY、OPENAI_BASE_URL、OPENAI_MODEL 和 auth 配置运行：

npx vitest run --root ./integration-tests cli/simple-mcp-server.test.ts

期望结果：telemetry 里出现 mcp__addition-server__get_integration_token，模型输出包含 qwen-mcp-tool-token-7f31d0。

Evidence (Before & After)

Before：release run 27450768357 的两个集成测试 job 都在 cli/simple-mcp-server.test.ts 失败，错误是 Expected to find an add tool call: expected false to be truthy。

After：测试不再要求模型给出一个可以心算的小算术结果，而是要求模型通过 MCP 工具获取无法从 prompt 推断的 token。

Risk & Scope

主要风险是 fixture 工具从加法工具变成 token 返回工具，但这是纯测试行为变更。由于本地 auth 配置不完整，我没有在本地跑通真实模型集成调用；release CI 具备对应环境。

wenshao · 2026-06-13T11:14:49Z

  };
 });

+const INTEGRATION_TOKEN = 'qwen-mcp-tool-token-7f31d0';


[Suggestion] The token value 'qwen-mcp-tool-token-7f31d0' is hardcoded in two independent locations: here as INTEGRATION_TOKEN (inside the serverScript template literal) and again as integrationToken in the describe block (line 166). If a future maintainer updates one but not the other, the tool call will succeed but the output assertion will fail with a confusing "Expected output to contain the MCP tool token" error that doesn't reveal the root cause is a token mismatch.

Since serverScript is already a template literal, you can hoist the constant to module scope (before the const serverScript = ... declaration) and interpolate it here via ${INTEGRATION_TOKEN}. Then reference the same module-scope constant in the describe block's assertions, giving a single source of truth.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-06-13T12:45:52Z

Maintainer verification report — local real-run against a live model

TL;DR. The add → opaque-token change is a sound, strictly-stronger assertion and is safe (test-only). But it does not make this test pass, and it does not fix #5068. I ran the patched test end-to-end against a live model and it still fails — because the fixture MCP server is skipped by the new workspace approval gating (#4615), so zero MCP tools are ever discovered (mcp_tools_count = 0). The model never sees get_integration_token (nor add), so the waitForToolCall assertion can never be satisfied — for either the old or the new fixture. The token rename addresses a different (and largely hypothetical, given the gating) failure mode.

How this was verified (real run, not a mock)

Item	Value
Bundle	Built fresh from this PR's head `5f33525` (`npm ci` → `build` → `bundle`); confirmed `QWEN_CODE_LEGACY_MCP_BLOCKING` is present in `dist/chunks/*.js`
Command	`npx vitest run --root ./integration-tests cli/simple-mcp-server.test.ts` (the PR's own command), driven in `tmux`
Model	`claude-opus-4-6` via an Anthropic-compatible endpoint, `--yolo`, `QWEN_SANDBOX=false` (mirrors the No-Sandbox job)
Config	Isolated global dir via `QWEN_HOME` so nothing from my local setup leaks in (mirrors a clean CI runner)

Provider note: I could not use the idealab OpenAI endpoint because qwen-code classifies any *.alibaba-inc.com host as DashScope and injects a metadata field the GPT backend rejects ('metadata' … only allowed when 'store' is enabled). That is environment-specific and unrelated to this PR. The root cause below is provider- and model-independent — it triggers before any model call.

What actually happens

All runs FAIL, and telemetry shows the server connects but exposes no tools:

mcp_servers_count: [1, 1, 1]
mcp_tools_count:   [0, 0, 0]      ← the addition-server registers but yields ZERO tools

The --debug log gives the reason directly:

[MCP] Skipping MCP server pending approval: addition-server
ToolRegistry created: ["tool_search","agent",…]   ← no mcp__addition-server__* present

With the tool absent, the model just spins (calling the built-in tool_search 14× looking for it) or states "No such tool exists… neither built-in, deferred, nor MCP." Either way: Expected to find a get_integration_token tool call: expected false to be truthy.

Root cause: workspace approval gating (#4615), not model triviality

MCP servers from workspace/project settings are gated (isGatedMcpScope → scope === 'project' | 'workspace'). The fixture is written to <testDir>/.qwen/settings.json → workspace scope → gated.
A gated server stays pending until explicitly approved, and the only code path that approves one is the interactive startup dialog (useMcpApproval.ts). There is no --yolo / headless / integration-test bypass. In non-interactive --prompt mode (how this test and release CI run), a gated server can therefore never be approved → it is skipped before tools/list is ever called.

The timeline makes this conclusive for #5068:

Event	Time (UTC)	Commit
Gating merged (`#4615`/`#4713`)	`2026-06-13 00:23:33`	`44627a24be`
Failing release run `27450768357` (cited by #5068)	`2026-06-13 00:24:51` (78 s later)	`44627a24be` (same commit)

The release run that #5068 is built on ran on the exact commit that introduced the gating, and failed with "Expected to find an add tool call" — i.e. the add tool was never discovered, because its server was skipped. Same mechanism I reproduce here with the token.

Proof the PR's token mechanism is otherwise fine

When the server is made non-gated (defined in user scope, which isGatedMcpScope does not gate), the tool is discovered and the patched test's exact prompt works end-to-end:

ToolRegistry created: ["mcp__addition-server__get_integration_token", …]
model output: The returned token is: `qwen-mcp-tool-token-7f31d0`

I also confirmed the PR's core premise empirically (no tools available):

old prompt add 5 and 10 → model answers 15 3/3 (so the old assertion was satisfiable without the tool),
new prompt → model produces the token 0/3 (it refuses to guess).

So the assertion hardening is legitimate — it just isn't what's failing.

Why the rename can't fix #5068

The gating skips the server, before any tool name is known (Skipping MCP server pending approval). Changing the tool (add → get_integration_token) cannot change a server-level skip. Both fixtures fail identically with mcp_tools_count = 0.

Recommendation

The token change is a fine improvement and can stay, but to actually turn the gate green the test must make the fixture server reachable in non-interactive mode. Options, roughly in order of preference:

Pre-seed an approval in the harness — test-helper sets QWEN_CODE_MCP_APPROVALS_PATH and writes an approved record for the test's projectRoot + server (bound to hashMcpServerConfig(config)). Most faithful to real usage.
Define the fixture server in a non-gated scope (user-scope) for the test.
(Arguably the real bug) #4615 left non-interactive/headless runs with no way to approve a gated server — this likely breaks all headless MCP usage, not just this test. A --yolo/headless auto-approve (or a documented QWEN_CODE_MCP_APPROVALS_PATH recipe) would fix the class of problem.

Net: I'd hold this as the fix for #5068 — merging it will not make simple-mcp-server.test.ts pass in release CI. Happy to send a follow-up implementing option 1 or 3 if useful.

中文版（点击展开）

维护者本地真机验证报告（对接真实模型）

结论速览。 把 add 改成不可猜测的 token 是一个更强、且安全的断言改进（纯测试改动）。但它并不能让这个测试通过，也修不了 #5068。 我用真实模型端到端跑了打补丁后的测试，仍然失败——原因是 fixture 里的 MCP server 被新的 workspace 审批门禁（#4615）跳过了，导致根本没有发现任何 MCP 工具（mcp_tools_count = 0）。模型既看不到 get_integration_token 也看不到 add，所以 waitForToolCall 断言永远无法满足——新旧 fixture 都一样。token 改名解决的是另一个（在门禁存在的前提下基本不会发生的）失败模式。

验证方式（真机，非 mock）

项	值
构建	从本 PR head `5f33525` 全新构建（`npm ci`→`build`→`bundle`）；确认 `dist/chunks/*.js` 含 `QWEN_CODE_LEGACY_MCP_BLOCKING`
命令	`npx vitest run --root ./integration-tests cli/simple-mcp-server.test.ts`（即 PR 自带命令），在 `tmux` 中运行
模型	`claude-opus-4-6`（Anthropic 兼容端点），`--yolo`，`QWEN_SANDBOX=false`（对齐 No-Sandbox job）
配置	用 `QWEN_HOME` 隔离全局配置，避免本地配置干扰（对齐干净的 CI runner）

关于 provider：无法用 idealab 的 OpenAI 端点——qwen-code 把任何 *.alibaba-inc.com 主机判定为 DashScope，会注入一个 metadata 字段，被 GPT 后端拒绝（'metadata' … only allowed when 'store' is enabled）。这是环境问题，与本 PR 无关。下面的根因是与 provider/模型无关的——它发生在任何模型调用之前。

实际发生了什么

所有运行都失败，telemetry 显示 server 连上了但没有暴露任何工具：

mcp_servers_count: [1, 1, 1]
mcp_tools_count:   [0, 0, 0]      ← addition-server 注册了，但发现 0 个工具

--debug 日志直接给出原因：

[MCP] Skipping MCP server pending approval: addition-server
ToolRegistry created: ["tool_search","agent",…]   ← 没有 mcp__addition-server__*

工具缺失后，模型只能空转（调用内置 tool_search 14 次去找它），或直接回答"不存在这个工具"。结果都是：Expected to find a get_integration_token tool call: expected false to be truthy。

根因：workspace 审批门禁（#4615），不是"模型耍小聪明"

来自 workspace/project 设置的 MCP server 是受门禁的（isGatedMcpScope → scope === 'project' | 'workspace'）。fixture 写在 <testDir>/.qwen/settings.json → workspace 作用域 → 受门禁。
受门禁的 server 在被显式批准前一直是 pending，而唯一能批准它的代码路径是交互式启动对话框（useMcpApproval.ts）。没有 --yolo/headless/集成测试的绕过路径。在非交互 --prompt 模式下（本测试和 release CI 都是这种模式），受门禁的 server 永远无法被批准 → 在调用 tools/list 之前就被跳过。

时间线让 #5068 的归因变得确凿：

事件	时间 (UTC)	commit
门禁合并（`#4615`/`#4713`）	`2026-06-13 00:23:33`	`44627a24be`
#5068 引用的失败 release run `27450768357`	`2026-06-13 00:24:51`（78 秒后）	`44627a24be`（同一 commit）

#5068 依据的那次 release run 恰好跑在引入门禁的那个 commit 上，失败信息是 *"Expected to find an add tool call"*——即 add` 工具从未被发现，因为它的 server 被跳过了。和我这里用 token 复现的机制完全一致。

证明 PR 的 token 机制本身没问题

当把 server 改成不受门禁（定义在 user 作用域，isGatedMcpScope 不门禁 user 作用域）时，工具被正常发现，打补丁后测试的原样 prompt 端到端可用：

ToolRegistry created: ["mcp__addition-server__get_integration_token", …]
模型输出：The returned token is: `qwen-mcp-tool-token-7f31d0`

我也用实验验证了 PR 的核心前提（不给任何工具时）：

旧 prompt add 5 and 10 → 模型 3/3 都答 15（说明旧断言不调工具也能满足），
新 prompt → 模型 0/3 给不出 token（拒绝猜测）。

所以断言加固本身是成立的——只是它不是当前失败的原因。

为什么改名修不了 #5068

门禁跳过的是 server，发生在还不知道任何工具名之前（Skipping MCP server pending approval）。改工具（add → get_integration_token）无法改变 server 级别的跳过。两个 fixture 都以 mcp_tools_count = 0 同样失败。

建议

token 改动本身不错，可以保留；但要真正让 gate 变绿，测试必须让 fixture server 在非交互模式下可达。按优先级：

在 harness 里预置审批 —— test-helper 设置 QWEN_CODE_MCP_APPROVALS_PATH，为测试的 projectRoot + server 写一条 approved 记录（绑定 hashMcpServerConfig(config)）。最贴近真实用法。
把 fixture server 定义到不受门禁的作用域（user scope）。
（这才更可能是真正的 bug） #4615 让非交互/headless 运行完全没有批准受门禁 server 的途径——这很可能不止影响本测试，而是所有 headless 下的 MCP 使用。加一个 --yolo/headless 自动批准（或文档化 QWEN_CODE_MCP_APPROVALS_PATH 用法）能修掉这一整类问题。

总之：不建议把它当作 #5068 的修复合并——合了它，simple-mcp-server.test.ts 在 release CI 里仍不会通过。如果有用，我可以再发一个 follow-up 实现方案 1 或 3。

_{Verified locally on macOS via tmux: PR head 5f33525, fresh bundle, claude-opus-4-6 (Anthropic-compatible endpoint), isolated QWEN_HOME, QWEN_SANDBOX=false. Root cause (#4615 approval gating) is provider/model-independent and reproduced across 3 runs + a direct discovery trace; the inverse (user-scope) run returns the token.}

he-yufeng · 2026-06-13T18:30:41Z

Updated the branch to address the real failure mode from your local run.

The PR no longer only renames the fixture tool. The test now:

keeps the opaque token as a single module-level source of truth
writes a hash-bound mcpApprovals.json for the test workspace before spawning the CLI
keeps the fixture in workspace settings, so the test still exercises the gated workspace-server path instead of moving the server to a trusted scope

I also rebased on latest upstream/main.

Validation run locally:

npx prettier --check integration-tests/cli/simple-mcp-server.test.ts
npx eslint integration-tests/cli/simple-mcp-server.test.ts --max-warnings 0
npm run build
npm run typecheck --workspace=packages/cli
git diff --check
a small local sanity check using getPendingGatedMcpServers confirmed the generated approval record leaves addition-server out of the pending list

I also tried the full no-sandbox integration command:

npx cross-env QWEN_SANDBOX=false vitest run --root ./integration-tests cli/simple-mcp-server.test.ts

That still cannot run to the MCP phase on my Windows machine because this checkout has no non-interactive auth type configured:

No auth type is selected. Please configure an auth type (e.g. via settings or --auth-type) before running in non-interactive mode.

So the end-to-end model run is still best verified in your release/test environment, but the approval-gate part is now covered directly by the fixture setup instead of relying on the interactive dialog.

wenshao · 2026-06-13T18:45:01Z

🔁 Re-verification — head `007237623`: the root-cause fix works end-to-end ✅

Follow-up to my first review, where I showed the original add → token rename did not fix #5068: the fixture MCP server is gated by the #4615 workspace-approval check, so in non-interactive mode it’s skipped before tools/list (mcp_tools_count = 0) and the tool is never discovered — for either fixture. I recommended seeding a hash-bound approval (option 1). This force-push implements exactly that, and I re-verified it end-to-end on Linux 6.12 / Node v22.22.2 against a live model (the run the PR’s test plan couldn’t do locally).

Verdict: the new approach fixes the real failure mode. The gated workspace server is now reachable in non-interactive mode and the test passes end-to-end. Recommend merge.

What the force-push does (and why it’s the right shape)

Seeds <testDir>/.qwen/mcpApprovals.json via QWEN_CODE_MCP_APPROVALS_PATH before spawning the CLI: { [resolve(testDir)]: { 'addition-server': { hash: hashMcpServerConfig(config), status: 'approved' } } }, restored in afterAll.
Keeps the fixture in workspace scope — so it still exercises the gated path (the actual Release Failed for v0.18.0-nightly.20260613.44627a24b on 2026-06-13 #5068 surface), rather than relocating the server to a trusted scope (which would have hidden the gating). 👍
Keeps the opaque token as a single module-level source of truth.

End-to-end run against a live model (DeepSeek, OpenAI-compatible, `--yolo`, `QWEN_SANDBOX=false`)

npx vitest run --root ./integration-tests cli/simple-mcp-server.test.ts
→ The returned token is: `qwen-mcp-tool-token-7f31d0`
→ MCP server test: Model output validated successfully.
✓ simple-mcp-server > should call an MCP tool and return its result (3.7s)
  Test Files 1 passed (1) | Tests 1 passed (1)

Telemetry from that run confirms the root cause is gone:

Signal	Before fix (my 1st review)	After fix (this run)
`mcp_servers_count`	1	1
`mcp_tools_count`	0 (gated → skipped)	1 (discovered)
`mcp__addition-server__get_integration_token` call	never	recorded ✓
`Skipping MCP server pending approval`	present	absent

So the seeded hash actually matched the runtime config hash (otherwise getState returns pending and the server stays gated) — the make-or-break detail, confirmed empirically rather than assumed.

Counter-proof — the seeding is exactly what un-gates it

I changed only the seeded hash to a bogus value and re-ran the same live-model test:

mcp_tools_count: 0
model: "I don't have the get_integration_token tool available …"
waitForToolCall → poll attempts 5,10,…,40 (never satisfied) → test FAILS

A hash mismatch sends the server back to pending (per mcpApprovals.getState: record.hash !== hashMcpServerConfig(config) ⇒ pending), reproducing the exact mcp_tools_count = 0 failure from my first review. Restoring the real hash → green again. This isolates the fix precisely.

Bonus robustness note (non-blocking)

In the gated counter-proof run the model, denied the tool, read the token straight out of mcp-server.cjs in the workspace and printed it. The test still failed — because the primary assertion is waitForToolCall('mcp__addition-server__get_integration_token'), which a source-read can’t satisfy. Good: the tool-call assertion is the real gate; the output.includes(TOKEN) check is secondary (and technically satisfiable by reading the fixture source, so it’s right that it isn’t the sole gate).

Static

prettier --check ✅, eslint … --max-warnings 0 exit 0.
Typecheck: no type errors attributable to the test file. (tsc -p integration-tests emits one TS5063 for the "//" doc-key in integration-tests/tsconfig.json — pre-existing on main, not introduced here.)
File is byte-identical to PR head after my counter-proof edits were reverted.

CI

007237623: Lint ✅, CodeQL ✅, Classify ✅; the OS Test jobs were still running at post time. Note the simple-mcp-server integration test runs in the release/integration jobs (auth-gated), which is why a local live-model run is the meaningful check here — and it passes.

🇨🇳 中文版（点击展开）

🔁 复核 — head `007237623`：根因修复端到端有效 ✅

承接我的首次评审：我当时指出把 add 改名成 token 并不能修复 #5068——fixture 的 MCP server 被 #4615 的 workspace 审批门禁拦住，非交互模式下在 tools/list 之前就被跳过（mcp_tools_count = 0），工具根本不会被发现，新旧 fixture 都一样。我建议预置一条 hash 绑定的审批记录（方案 1）。这次强推正是这么做的，我在 Linux 6.12 / Node v22.22.2 上用真实模型端到端复核了（也就是 PR 测试计划本地跑不了的那一步）。

结论：新做法修复了真正的失败模式。被门禁的 workspace server 现在在非交互模式下可达，测试端到端通过。建议合并。

强推做了什么（且形态正确）

在 spawn CLI 之前，通过 QWEN_CODE_MCP_APPROVALS_PATH 写入 <testDir>/.qwen/mcpApprovals.json：{ [resolve(testDir)]: { 'addition-server': { hash: hashMcpServerConfig(config), status: 'approved' } } }，并在 afterAll 还原。
fixture 仍保留在 workspace 作用域 —— 因此依然走受门禁的路径（即 Release Failed for v0.18.0-nightly.20260613.44627a24b on 2026-06-13 #5068 真正的面），而不是把 server 挪到受信任作用域（那样会把门禁问题掩盖掉）。👍
opaque token 作为唯一的模块级真值来源。

对接真实模型的端到端运行（DeepSeek，OpenAI 兼容，--yolo，QWEN_SANDBOX=false）

npx vitest run --root ./integration-tests cli/simple-mcp-server.test.ts
→ The returned token is: `qwen-mcp-tool-token-7f31d0`
→ MCP server test: Model output validated successfully.
✓ simple-mcp-server > should call an MCP tool and return its result (3.7s)
  Test Files 1 passed (1) | Tests 1 passed (1)

该运行的 telemetry 证实根因已消除：

信号	修复前（我的首评）	修复后（本次）
`mcp_servers_count`	1	1
`mcp_tools_count`	0（门禁→跳过）	1（已发现）
`mcp__addition-server__get_integration_token` 调用	从未	已记录 ✓
`Skipping MCP server pending approval`	出现	消失

即预置的 hash 确实与运行时 config 的 hash 一致（否则 getState 返回 pending、server 仍被门禁）——这个成败关键点是实测确认的，不是假设。

反证 —— 预置审批正是“解禁”的关键
我只把预置的 hash 改成一个错误值，再跑同样的真实模型测试：

mcp_tools_count: 0
模型："I don't have the get_integration_token tool available …"
waitForToolCall → 轮询 5,10,…,40（始终不满足）→ 测试失败

hash 不匹配会让 server 回到 pending（见 mcpApprovals.getState：record.hash !== hashMcpServerConfig(config) ⇒ pending），完美复现我首评里的 mcp_tools_count = 0 失败。还原正确 hash → 重新变绿。修复因果被精确隔离。

附带的健壮性说明（不阻塞）
在反证（被门禁）那次运行里，模型拿不到工具，就直接从 workspace 的 mcp-server.cjs 源码里读出 token 并打印。测试仍然失败——因为主断言是 waitForToolCall('mcp__addition-server__get_integration_token')，读源码满足不了它。很好：工具调用断言才是真正的 gate；output.includes(TOKEN) 是次要的（且理论上能靠读 fixture 源码满足，所以它不作为唯一 gate 是对的）。

静态检查

prettier --check ✅、eslint … --max-warnings 0 exit 0。
类型检查：测试文件本身无类型错误。（tsc -p integration-tests 会报一条 TS5063，针对 integration-tests/tsconfig.json 里的 "//" 文档键——这是 main 上既有的，不是本 PR 引入。）
反证改动还原后，文件与 PR head 字节一致。

CI：007237623 —— Lint ✅、CodeQL ✅、Classify ✅；撰写时各 OS Test job 仍在运行。注意 simple-mcp-server 集成测试跑在release/集成（auth-gated）job 里，所以本地真实模型运行才是这里有意义的验证——而它通过了。

wenshao · 2026-06-14T16:40:53Z

@qwen-code /triage

qwen-code-ci-bot · 2026-06-14T16:42:38Z

Thanks for the PR!

Template looks good ✓

On direction: this is squarely aligned. The release gate (simple-mcp-server.test.ts) was failing because the model could answer 5 + 10 = 15 without calling the MCP tool, making the assertion model-dependent rather than wiring-dependent. Switching to an opaque token that can't be inferred from the prompt is the right fix for that class of flakiness. The second issue — workspace-scope MCP servers being gated by the #4615 approval check with no non-interactive bypass — is the actual root cause of the #5068 failure, and seeding mcpApprovals.json with a hash-bound approval record addresses it directly.

On approach: scope is tight — one file, test-only, two logically distinct improvements that both need to land together. The additionServerConfig defined at module scope and shared between the settings object and the approval hash is a nice touch: if either side drifts, the hash won't match and the test fails, which is exactly the invariant you want. Nothing to cut here.

Moving on to code review. 🔍

中文说明

感谢贡献！

模板完整 ✓

方向：完全对齐。release gate（simple-mcp-server.test.ts）失败的原因是模型可以不依赖工具就回答 5 + 10 = 15，导致断言依赖模型选择而非 MCP wiring。改用不可从 prompt 推断的 opaque token 是修复此类 flaky 的正确做法。第二个问题——workspace 作用域的 MCP server 被 #4615 审批门禁拦住且非交互模式无绕过途径——才是 #5068 失败的真正根因，用 hash 绑定的 approval record 预置审批直接解决了它。

方案：范围紧凑——单文件、纯测试改动、两个逻辑上需要一起落地的改进。additionServerConfig 在模块级定义并在 settings 和 approval hash 间共享，如果任一侧漂移，hash 不匹配测试就会失败，正好是期望的不变量。没有可砍的部分。

进入代码审查 🔍

— Qwen Code · qwen3.7-max

qwen-code-ci-bot · 2026-06-14T16:43:06Z

Code Review

Reviewed the diff against the current main (integration-tests/cli/simple-mcp-server.test.ts).

Independent proposal (before reading diff): I would have done two things — (1) replace the arithmetic assertion with something the model can't short-circuit (opaque token, UUID, etc.), and (2) pre-seed a workspace approval so the #4615 gating doesn't skip the fixture server in non-interactive mode. The PR does exactly this.

Comparison: The implementation matches and slightly exceeds my proposal. The additionServerConfig constant shared between rig.setup settings and hashMcpServerConfig is a nice self-guarding invariant — the hash only matches if the config object is identical. The INTEGRATION_TOKEN as a single module-level source of truth (used in both the generated server script and the assertion) eliminates the risk of string drift. The afterAll cleanup correctly restores the env var. The QWEN_CODE_LEGACY_MCP_BLOCKING env var from the original test is preserved, which is necessary until #4163 is resolved.

No blockers found. Code is clean, idiomatic, follows project conventions (ESM imports, noImplicitAny-compatible, vitest patterns). hashMcpServerConfig is properly exported from @qwen-code/qwen-code-core. QWEN_CODE_MCP_APPROVALS_PATH is a documented env var consumed by packages/cli/src/config/mcpApprovals.ts.

Testing

This integration test is auth-gated (requires OPENAI_API_KEY, OPENAI_BASE_URL, OPENAI_MODEL, and full auth configuration from release CI). It cannot be exercised end-to-end in a local triage environment.

Collaborator @wenshao independently verified the PR head (007237623) end-to-end against a live model (DeepSeek, OpenAI-compatible, --yolo, QWEN_SANDBOX=false):

Test passes: ✓ simple-mcp-server > should call an MCP tool and return its result (3.7s)
Telemetry confirms mcp_tools_count went from 0 (gated) to 1 (discovered)
Counter-proof: changing only the seeded hash to a bogus value reproduces mcp_tools_count: 0 and test failure, isolating the fix causally

CI: all 7 checks passing (Classify, CodeQL, Lint, Test on macOS/Ubuntu/Windows), 11 skipped (bot review jobs still pending).

中文说明

代码审查

对照当前 main（integration-tests/cli/simple-mcp-server.test.ts）审查了 diff。

独立提案（读 diff 前）： 我会做两件事——(1) 用模型无法绕过的值（opaque token、UUID 等）替换算术断言，(2) 预置 workspace 审批以避免 #4615 门禁在非交互模式下跳过 fixture server。PR 正好做了这两件事。

对比： 实现匹配并略优于我的提案。additionServerConfig 常量在 rig.setup settings 和 hashMcpServerConfig 之间共享，形成自守护不变量——只有 config 对象完全一致时 hash 才匹配。INTEGRATION_TOKEN 作为模块级唯一真值来源（在生成的 server 脚本和断言中同时使用）消除了字符串漂移风险。afterAll 清理正确还原环境变量。原有的 QWEN_CODE_LEGACY_MCP_BLOCKING 被保留（#4163 修复前仍需要）。

未发现阻塞问题。 代码干净、符合项目规范（ESM 导入、noImplicitAny 兼容、vitest 模式）。hashMcpServerConfig 从 @qwen-code/qwen-code-core 正确导出。QWEN_CODE_MCP_APPROVALS_PATH 是被 packages/cli/src/config/mcpApprovals.ts 消费的已文档化环境变量。

测试

此集成测试需要 auth 凭证（OPENAI_API_KEY、OPENAI_BASE_URL、OPENAI_MODEL 及 release CI 的完整 auth 配置），无法在本地 triage 环境中端到端运行。

协作者 @wenshao 已独立验证 PR head（007237623）端到端通过真实模型（DeepSeek，OpenAI 兼容，--yolo，QWEN_SANDBOX=false）：

测试通过：✓ simple-mcp-server > should call an MCP tool and return its result (3.7s)
Telemetry 确认 mcp_tools_count 从 0（被门禁）变为 1（已发现）
反证：仅将预置 hash 改为错误值即复现 mcp_tools_count: 0 和测试失败，因果隔离了修复

CI：全部 7 项检查通过（Classify、CodeQL、Lint、macOS/Ubuntu/Windows 测试），11 项跳过（bot review jobs 待处理）。

— Qwen Code · qwen3.7-max

qwen-code-ci-bot · 2026-06-14T16:43:29Z

Stepping back: this PR does two things, both correct, and the combination is what makes it actually fix the problem rather than just paper over it.

The opaque token swap is the "obvious" improvement — stop asking the model to do something it can answer from training data. But the real value is the approval seeding, which addresses the actual root cause (the #4615 workspace-scope gating that silently skips the fixture server in non-interactive mode). Without the seeding, the token change would be irrelevant because the MCP server never gets discovered in the first place. Without the token change, the seeding would fix the immediate failure but leave a weaker assertion that could still be short-circuited by a model that happens to call the tool and gets the arithmetic right by luck.

The implementation is the right shape for the problem: one file, test-only, shared constants that form a self-guarding invariant. If someone changes the fixture config without updating the approval, the hash won't match and the test fails loudly. That's better than a comment saying "keep these in sync."

The counter-proof from @wenshao's live verification is the strongest evidence available — same code, same model, same environment, one hash value flipped, and the test flips from green to red. That isolates the causal mechanism.

My independent proposal would have been the same two changes, implemented less elegantly (I would have hardcoded the hash string or duplicated the config object). The PR's approach is strictly better.

Verdict: ship it. ✅

中文说明

退一步看：这个 PR 做了两件事，都正确，且组合起来才真正修复了问题而非仅掩盖症状。

opaque token 替换是"显而易见的"改进——不再让模型做它可以从训练数据中直接回答的事情。但真正的价值在于审批预置，它解决了实际的根因（#4615 的 workspace 作用域门禁在非交互模式下静默跳过 fixture server）。没有预置，token 改动无意义，因为 MCP server 根本不会被发现。没有 token 改动，预置虽然修复了当前的失败，但断言仍然较弱——模型可能碰巧调用了工具又凑巧算对了算术。

实现的形态正确：单文件、纯测试、共享常量形成自守护不变量。如果有人改了 fixture config 但没更新 approval，hash 不匹配测试就会大声失败。这比写一行注释说"保持同步"好得多。

@wenshao 的反证是可获得的最强证据——同样的代码、同样的模型、同样的环境，只改一个 hash 值，测试就从绿变红。因果机制被精确隔离。

我的独立提案会是同样的两处改动，但实现不会这么优雅（我可能会硬编码 hash 字符串或复制 config 对象）。PR 的方案严格更优。

结论：合并 ✅

— Qwen Code · qwen3.7-max

qwen-code-ci-bot

LGTM, looks ready to ship. ✅

wenshao reviewed Jun 13, 2026

View reviewed changes

test: stabilize simple MCP integration check

0072376

he-yufeng force-pushed the fix/stabilize-simple-mcp-integration branch from 5f33525 to 0072376 Compare June 13, 2026 18:30

wenshao approved these changes Jun 14, 2026

View reviewed changes

qwen-code-ci-bot approved these changes Jun 14, 2026

View reviewed changes

wenshao merged commit 5689d29 into QwenLM:main Jun 14, 2026
23 checks passed

doudouOUC pushed a commit that referenced this pull request Jun 15, 2026

test: stabilize simple MCP integration check (#5072)

085bedc

Conversation

he-yufeng commented Jun 13, 2026

What this PR does

Why it's needed

Reviewer Test Plan

How to verify

Evidence (Before & After)

Tested on

Environment (optional)

Risk & Scope

Linked Issues

What this PR does

Why it's needed

Reviewer Test Plan

Evidence (Before & After)

Risk & Scope

Uh oh!

wenshao Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

wenshao commented Jun 13, 2026

Maintainer verification report — local real-run against a live model

How this was verified (real run, not a mock)

What actually happens

Root cause: workspace approval gating (#4615), not model triviality

Proof the PR's token mechanism is otherwise fine

Why the rename can't fix #5068

Recommendation

维护者本地真机验证报告（对接真实模型）

验证方式（真机，非 mock）

实际发生了什么

根因：workspace 审批门禁（#4615），不是"模型耍小聪明"

证明 PR 的 token 机制本身没问题

为什么改名修不了 #5068

建议

Uh oh!

he-yufeng commented Jun 13, 2026

Uh oh!

wenshao commented Jun 13, 2026

🔁 Re-verification — head 007237623: the root-cause fix works end-to-end ✅

What the force-push does (and why it’s the right shape)

End-to-end run against a live model (DeepSeek, OpenAI-compatible, --yolo, QWEN_SANDBOX=false)

Counter-proof — the seeding is exactly what un-gates it

Bonus robustness note (non-blocking)

Static

CI

🔁 复核 — head 007237623：根因修复端到端有效 ✅

Uh oh!

wenshao commented Jun 14, 2026

Uh oh!

qwen-code-ci-bot commented Jun 14, 2026

Uh oh!

qwen-code-ci-bot commented Jun 14, 2026

Code Review

Testing

代码审查

测试

Uh oh!

qwen-code-ci-bot commented Jun 14, 2026

Uh oh!

qwen-code-ci-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🔁 Re-verification — head `007237623`: the root-cause fix works end-to-end ✅

End-to-end run against a live model (DeepSeek, OpenAI-compatible, `--yolo`, `QWEN_SANDBOX=false`)

🔁 复核 — head `007237623`：根因修复端到端有效 ✅