test: stabilize simple MCP integration check#5072
Conversation
| }; | ||
| }); | ||
|
|
||
| const INTEGRATION_TOKEN = 'qwen-mcp-tool-token-7f31d0'; |
There was a problem hiding this comment.
[Suggestion] The token value 'qwen-mcp-tool-token-7f31d0' is hardcoded in two independent locations: here as INTEGRATION_TOKEN (inside the serverScript template literal) and again as integrationToken in the describe block (line 166). If a future maintainer updates one but not the other, the tool call will succeed but the output assertion will fail with a confusing "Expected output to contain the MCP tool token" error that doesn't reveal the root cause is a token mismatch.
Since serverScript is already a template literal, you can hoist the constant to module scope (before the const serverScript = ... declaration) and interpolate it here via ${INTEGRATION_TOKEN}. Then reference the same module-scope constant in the describe block's assertions, giving a single source of truth.
— qwen3.7-max via Qwen Code /review
Maintainer verification report — local real-run against a live modelTL;DR. The How this was verified (real run, not a mock)
What actually happensAll runs FAIL, and telemetry shows the server connects but exposes no tools: The With the tool absent, the model just spins (calling the built-in Root cause: workspace approval gating (#4615), not model triviality
The timeline makes this conclusive for #5068:
The release run that #5068 is built on ran on the exact commit that introduced the gating, and failed with "Expected to find an Proof the PR's token mechanism is otherwise fineWhen the server is made non-gated (defined in user scope, which I also confirmed the PR's core premise empirically (no tools available):
So the assertion hardening is legitimate — it just isn't what's failing. Why the rename can't fix #5068The gating skips the server, before any tool name is known ( RecommendationThe token change is a fine improvement and can stay, but to actually turn the gate green the test must make the fixture server reachable in non-interactive mode. Options, roughly in order of preference:
Net: I'd hold this as the fix for #5068 — merging it will not make 中文版(点击展开)维护者本地真机验证报告(对接真实模型)结论速览。 把 验证方式(真机,非 mock)
实际发生了什么所有运行都失败,telemetry 显示 server 连上了但没有暴露任何工具:
工具缺失后,模型只能空转(调用内置 根因:workspace 审批门禁(#4615),不是"模型耍小聪明"
时间线让 #5068 的归因变得确凿:
#5068 依据的那次 release run 恰好跑在引入门禁的那个 commit 上,失败信息是 *"Expected to find an 证明 PR 的 token 机制本身没问题当把 server 改成不受门禁(定义在 user 作用域, 我也用实验验证了 PR 的核心前提(不给任何工具时):
所以断言加固本身是成立的——只是它不是当前失败的原因。 为什么改名修不了 #5068门禁跳过的是 server,发生在还不知道任何工具名之前( 建议token 改动本身不错,可以保留;但要真正让 gate 变绿,测试必须让 fixture server 在非交互模式下可达。按优先级:
总之:不建议把它当作 #5068 的修复合并——合了它, Verified locally on macOS via tmux: PR head |
5f33525 to
0072376
Compare
|
Updated the branch to address the real failure mode from your local run. The PR no longer only renames the fixture tool. The test now:
I also rebased on latest Validation run locally:
I also tried the full no-sandbox integration command:
That still cannot run to the MCP phase on my Windows machine because this checkout has no non-interactive auth type configured:
So the end-to-end model run is still best verified in your release/test environment, but the approval-gate part is now covered directly by the fixture setup instead of relying on the interactive dialog. |
🔁 Re-verification — head
|
| Signal | Before fix (my 1st review) | After fix (this run) |
|---|---|---|
mcp_servers_count |
1 | 1 |
mcp_tools_count |
0 (gated → skipped) | 1 (discovered) |
mcp__addition-server__get_integration_token call |
never | recorded ✓ |
Skipping MCP server pending approval |
present | absent |
So the seeded hash actually matched the runtime config hash (otherwise getState returns pending and the server stays gated) — the make-or-break detail, confirmed empirically rather than assumed.
Counter-proof — the seeding is exactly what un-gates it
I changed only the seeded hash to a bogus value and re-ran the same live-model test:
mcp_tools_count: 0
model: "I don't have the get_integration_token tool available …"
waitForToolCall → poll attempts 5,10,…,40 (never satisfied) → test FAILS
A hash mismatch sends the server back to pending (per mcpApprovals.getState: record.hash !== hashMcpServerConfig(config) ⇒ pending), reproducing the exact mcp_tools_count = 0 failure from my first review. Restoring the real hash → green again. This isolates the fix precisely.
Bonus robustness note (non-blocking)
In the gated counter-proof run the model, denied the tool, read the token straight out of mcp-server.cjs in the workspace and printed it. The test still failed — because the primary assertion is waitForToolCall('mcp__addition-server__get_integration_token'), which a source-read can’t satisfy. Good: the tool-call assertion is the real gate; the output.includes(TOKEN) check is secondary (and technically satisfiable by reading the fixture source, so it’s right that it isn’t the sole gate).
Static
prettier --check✅,eslint … --max-warnings 0exit0.- Typecheck: no type errors attributable to the test file. (
tsc -p integration-testsemits oneTS5063for the"//"doc-key inintegration-tests/tsconfig.json— pre-existing onmain, not introduced here.) - File is byte-identical to PR head after my counter-proof edits were reverted.
CI
007237623: Lint ✅, CodeQL ✅, Classify ✅; the OS Test jobs were still running at post time. Note the simple-mcp-server integration test runs in the release/integration jobs (auth-gated), which is why a local live-model run is the meaningful check here — and it passes.
🇨🇳 中文版(点击展开)
🔁 复核 — head 007237623:根因修复端到端有效 ✅
承接我的首次评审:我当时指出把 add 改名成 token 并不能修复 #5068——fixture 的 MCP server 被 #4615 的 workspace 审批门禁拦住,非交互模式下在 tools/list 之前就被跳过(mcp_tools_count = 0),工具根本不会被发现,新旧 fixture 都一样。我建议预置一条 hash 绑定的审批记录(方案 1)。这次强推正是这么做的,我在 Linux 6.12 / Node v22.22.2 上用真实模型端到端复核了(也就是 PR 测试计划本地跑不了的那一步)。
结论:新做法修复了真正的失败模式。被门禁的 workspace server 现在在非交互模式下可达,测试端到端通过。建议合并。
强推做了什么(且形态正确)
- 在 spawn CLI 之前,通过
QWEN_CODE_MCP_APPROVALS_PATH写入<testDir>/.qwen/mcpApprovals.json:{ [resolve(testDir)]: { 'addition-server': { hash: hashMcpServerConfig(config), status: 'approved' } } },并在afterAll还原。 - fixture 仍保留在 workspace 作用域 —— 因此依然走受门禁的路径(即 Release Failed for v0.18.0-nightly.20260613.44627a24b on 2026-06-13 #5068 真正的面),而不是把 server 挪到受信任作用域(那样会把门禁问题掩盖掉)。👍
- opaque token 作为唯一的模块级真值来源。
对接真实模型的端到端运行(DeepSeek,OpenAI 兼容,--yolo,QWEN_SANDBOX=false)
npx vitest run --root ./integration-tests cli/simple-mcp-server.test.ts
→ The returned token is: `qwen-mcp-tool-token-7f31d0`
→ MCP server test: Model output validated successfully.
✓ simple-mcp-server > should call an MCP tool and return its result (3.7s)
Test Files 1 passed (1) | Tests 1 passed (1)
该运行的 telemetry 证实根因已消除:
| 信号 | 修复前(我的首评) | 修复后(本次) |
|---|---|---|
mcp_servers_count |
1 | 1 |
mcp_tools_count |
0(门禁→跳过) | 1(已发现) |
mcp__addition-server__get_integration_token 调用 |
从未 | 已记录 ✓ |
Skipping MCP server pending approval |
出现 | 消失 |
即预置的 hash 确实与运行时 config 的 hash 一致(否则 getState 返回 pending、server 仍被门禁)——这个成败关键点是实测确认的,不是假设。
反证 —— 预置审批正是“解禁”的关键
我只把预置的 hash 改成一个错误值,再跑同样的真实模型测试:
mcp_tools_count: 0
模型:"I don't have the get_integration_token tool available …"
waitForToolCall → 轮询 5,10,…,40(始终不满足)→ 测试失败
hash 不匹配会让 server 回到 pending(见 mcpApprovals.getState:record.hash !== hashMcpServerConfig(config) ⇒ pending),完美复现我首评里的 mcp_tools_count = 0 失败。还原正确 hash → 重新变绿。修复因果被精确隔离。
附带的健壮性说明(不阻塞)
在反证(被门禁)那次运行里,模型拿不到工具,就直接从 workspace 的 mcp-server.cjs 源码里读出 token 并打印。测试仍然失败——因为主断言是 waitForToolCall('mcp__addition-server__get_integration_token'),读源码满足不了它。很好:工具调用断言才是真正的 gate;output.includes(TOKEN) 是次要的(且理论上能靠读 fixture 源码满足,所以它不作为唯一 gate 是对的)。
静态检查
prettier --check✅、eslint … --max-warnings 0exit0。- 类型检查:测试文件本身无类型错误。(
tsc -p integration-tests会报一条TS5063,针对integration-tests/tsconfig.json里的"//"文档键——这是main上既有的,不是本 PR 引入。) - 反证改动还原后,文件与 PR head 字节一致。
CI:007237623 —— Lint ✅、CodeQL ✅、Classify ✅;撰写时各 OS Test job 仍在运行。注意 simple-mcp-server 集成测试跑在release/集成(auth-gated)job 里,所以本地真实模型运行才是这里有意义的验证——而它通过了。
|
@qwen-code /triage |
|
Thanks for the PR! Template looks good ✓ On direction: this is squarely aligned. The release gate ( On approach: scope is tight — one file, test-only, two logically distinct improvements that both need to land together. The Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ 方向:完全对齐。release gate( 方案:范围紧凑——单文件、纯测试改动、两个逻辑上需要一起落地的改进。 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewReviewed the diff against the current Independent proposal (before reading diff): I would have done two things — (1) replace the arithmetic assertion with something the model can't short-circuit (opaque token, UUID, etc.), and (2) pre-seed a workspace approval so the #4615 gating doesn't skip the fixture server in non-interactive mode. The PR does exactly this. Comparison: The implementation matches and slightly exceeds my proposal. The No blockers found. Code is clean, idiomatic, follows project conventions (ESM imports, TestingThis integration test is auth-gated (requires Collaborator @wenshao independently verified the PR head (
CI: all 7 checks passing (Classify, CodeQL, Lint, Test on macOS/Ubuntu/Windows), 11 skipped (bot review jobs still pending). 中文说明代码审查对照当前 独立提案(读 diff 前): 我会做两件事——(1) 用模型无法绕过的值(opaque token、UUID 等)替换算术断言,(2) 预置 workspace 审批以避免 #4615 门禁在非交互模式下跳过 fixture server。PR 正好做了这两件事。 对比: 实现匹配并略优于我的提案。 未发现阻塞问题。 代码干净、符合项目规范(ESM 导入、 测试此集成测试需要 auth 凭证( 协作者 @wenshao 已独立验证 PR head(
CI:全部 7 项检查通过(Classify、CodeQL、Lint、macOS/Ubuntu/Windows 测试),11 项跳过(bot review jobs 待处理)。 — Qwen Code · qwen3.7-max |
|
Stepping back: this PR does two things, both correct, and the combination is what makes it actually fix the problem rather than just paper over it. The opaque token swap is the "obvious" improvement — stop asking the model to do something it can answer from training data. But the real value is the approval seeding, which addresses the actual root cause (the #4615 workspace-scope gating that silently skips the fixture server in non-interactive mode). Without the seeding, the token change would be irrelevant because the MCP server never gets discovered in the first place. Without the token change, the seeding would fix the immediate failure but leave a weaker assertion that could still be short-circuited by a model that happens to call the tool and gets the arithmetic right by luck. The implementation is the right shape for the problem: one file, test-only, shared constants that form a self-guarding invariant. If someone changes the fixture config without updating the approval, the hash won't match and the test fails loudly. That's better than a comment saying "keep these in sync." The counter-proof from @wenshao's live verification is the strongest evidence available — same code, same model, same environment, one hash value flipped, and the test flips from green to red. That isolates the causal mechanism. My independent proposal would have been the same two changes, implemented less elegantly (I would have hardcoded the hash string or duplicated the config object). The PR's approach is strictly better. Verdict: ship it. ✅ 中文说明退一步看:这个 PR 做了两件事,都正确,且组合起来才真正修复了问题而非仅掩盖症状。 opaque token 替换是"显而易见的"改进——不再让模型做它可以从训练数据中直接回答的事情。但真正的价值在于审批预置,它解决了实际的根因(#4615 的 workspace 作用域门禁在非交互模式下静默跳过 fixture server)。没有预置,token 改动无意义,因为 MCP server 根本不会被发现。没有 token 改动,预置虽然修复了当前的失败,但断言仍然较弱——模型可能碰巧调用了工具又凑巧算对了算术。 实现的形态正确:单文件、纯测试、共享常量形成自守护不变量。如果有人改了 fixture config 但没更新 approval,hash 不匹配测试就会大声失败。这比写一行注释说"保持同步"好得多。 @wenshao 的反证是可获得的最强证据——同样的代码、同样的模型、同样的环境,只改一个 hash 值,测试就从绿变红。因果机制被精确隔离。 我的独立提案会是同样的两处改动,但实现不会这么优雅(我可能会硬编码 hash 字符串或复制 config 对象)。PR 的方案严格更优。 结论:合并 ✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
What this PR does
This makes the
simple-mcp-serverintegration test ask the model to fetch an opaque token from the MCP server instead of asking it to add5 + 10.The test still verifies the same integration path: the local MCP server is discovered, its tool is exposed as
mcp__addition-server__get_integration_token, the model calls that tool, and the final output contains the returned value.Why it's needed
The current release run in #5068 failed in both sandbox modes because
simple-mcp-server.test.tsnever observed the MCPaddtool call. The prompt asks the model to add two small numbers, so a model can answer15directly without calling the tool. That makes the release gate depend on model choice rather than MCP wiring.Using an opaque token keeps the assertion focused on MCP tool availability: the expected value is not inferable from the prompt, so the model has a much stronger reason to call the MCP tool.
Reviewer Test Plan
How to verify
Run the release integration jobs or the targeted command with the same
OPENAI_API_KEY,OPENAI_BASE_URL,OPENAI_MODEL, and auth configuration used by release CI:Expected result: the test records
mcp__addition-server__get_integration_tokenin telemetry and the model output containsqwen-mcp-tool-token-7f31d0.Evidence (Before & After)
Before: release run 27450768357 failed in both
Integration Tests (No Sandbox)andIntegration Tests (Docker)withExpected to find an add tool call: expected false to be truthyincli/simple-mcp-server.test.ts.After: the test no longer asks for a trivial arithmetic result. It requires the model to use an MCP tool to retrieve a token it cannot derive from the prompt.
Tested on
npm ci --no-audit --progress=false,npm run build,npm exec -- eslint integration-tests\\cli\\simple-mcp-server.test.ts,npm exec -- prettier --check integration-tests\\cli\\simple-mcp-server.test.ts,git diff --check;Environment (optional)
Local Windows shell has
OPENAI_API_KEYbut not the full release auth/model setup, so the targeted integration command stops before the patched assertion withNo auth type is selected. Release CI already supplies the required OpenAI-compatible environment for this test.Risk & Scope
Linked Issues
Fixes #5068
中文说明
What this PR does
这个 PR 把
simple-mcp-server集成测试从“让模型计算 5 + 10”改成“让模型从 MCP server 获取一个不可从 prompt 推断出来的 token”。测试覆盖的路径不变:本地 MCP server 被发现,工具暴露为
mcp__addition-server__get_integration_token,模型调用该工具,最终输出包含工具返回值。Why it's needed
#5068 对应的 release run 在无 sandbox 和 Docker 两个集成测试 job 里都失败了,失败点是
simple-mcp-server.test.ts没观察到 MCPadd工具调用。原 prompt 要求模型加两个很小的数字,模型可以直接回答15,不一定会调用工具。这样 release gate 实际上依赖模型选择,而不是只验证 MCP wiring。改成 opaque token 后,断言更聚焦:期望值不能从 prompt 推出来,模型必须通过 MCP 工具拿到结果。
Reviewer Test Plan
用 release CI 相同的
OPENAI_API_KEY、OPENAI_BASE_URL、OPENAI_MODEL和 auth 配置运行:期望结果:telemetry 里出现
mcp__addition-server__get_integration_token,模型输出包含qwen-mcp-tool-token-7f31d0。Evidence (Before & After)
Before:release run 27450768357 的两个集成测试 job 都在
cli/simple-mcp-server.test.ts失败,错误是Expected to find an add tool call: expected false to be truthy。After:测试不再要求模型给出一个可以心算的小算术结果,而是要求模型通过 MCP 工具获取无法从 prompt 推断的 token。
Risk & Scope
主要风险是 fixture 工具从加法工具变成 token 返回工具,但这是纯测试行为变更。由于本地 auth 配置不完整,我没有在本地跑通真实模型集成调用;release CI 具备对应环境。