fix(memory): avoid stale tool schema recall#5058
Conversation
wenshao
left a comment
There was a problem hiding this comment.
Thanks — the diagnosis is right, the three-layer approach (extraction prompt / model selector / heuristic fallback) is the correct shape, and the PR itself is clean and well-tested with an honest description. But while tracing the call chain I found a structural gap that changes what this PR actually does at runtime, plus a naming-convention mismatch. Details inline; summary:
Blocking
recentToolsis never populated in production. The only production caller of recall isclient.ts:1692(memory prefetch), and it passes onlyconfig / excludedFilePaths / abortSignal— on main and on this branch. Sooptions.recentTools ?? []is always[]: the new heuristic filter never fires, and theRecently used tools: ...context line (gated onrecentTools.length > 0) never reaches the model selector either. Of the three changes, only the extraction-prompt bullet has live effect; the recall-side hardening — the headline of this PR — is currently dead code, and the tests pass because they injectrecentToolsexplicitly. Either wire it up inclient.ts(collect the tool names executed recently in this session), or reposition the PR as "extraction-side fix + recall-side groundwork" and softenFixes #4976toRefs #4976.toolAliasessplits on'::', but qwen-code registers MCP tools asmcp__<server>__<tool>(seegenerateValidName(mcp__${serverName}__${serverToolName})in mcp-tool.ts).ata::article-list-queryis the issue #4976 transcript format, not the runtime format — once wiring lands, the alias derivation won't match real names.
Strongly suggested
3. The old prompt's explicit carve-out — "DO still select memories containing warnings, gotchas, or known issues about those tools — active use is exactly when those matter" — was dropped. Combined with broad heuristic markers (mcp tool, tool usage, api docs), the highest-value memories (behavioral warnings about an active tool) get suppressed exactly when they matter most. Excluding schemas/failed transcripts and keeping behavioral warnings are compatible goals.
4. The extraction bullet's blanket "failed tool-call attempts" would also block saving confirmed workarounds for recurring failures — worth distinguishing raw transcripts/guessed schemas (don't save) from confirmed workarounds/gotchas (do save).
Positives: filtering happens before scoring so excluded docs free up top-5 slots; the model-success path and telemetry (docs_scanned stays unfiltered) are untouched; tests cover each layer; bilingual PR description with a real test plan.
中文版本
感谢这个 PR —— 问题诊断正确,三层防护(提取 prompt / 模型选择器 / heuristic fallback)的思路也对,diff 干净、测试齐全、描述诚实。但在追调用链时发现一个结构性缺口,它改变了这个 PR 在运行时的实际效果,另有一个命名格式不匹配。细节见 inline 评论,摘要如下:
Blocking
recentTools在生产代码里从未被填充。recall 的唯一生产调用方是client.ts:1692(memory prefetch),它只传config / excludedFilePaths / abortSignal—— main 和本分支都是如此。所以options.recentTools ?? []永远是空数组:新的 heuristic 过滤永不触发,模型选择器侧的Recently used tools: ...上下文行(条件recentTools.length > 0)也永远不出现。三个改动里只有提取侧 bullet 有真实效果;召回侧防护(PR 的主体)目前是死代码,测试能过是因为显式注入了recentTools。建议要么在client.ts接线(收集本会话最近执行的工具名),要么把 PR 重新定位为“提取侧修复 + 召回侧预备”,并把Fixes #4976改成Refs #4976。toolAliases按'::'切分,但 qwen-code 的 MCP 工具注册名是mcp__<server>__<tool>(见 mcp-tool.ts 的generateValidName)。ata::article-list-query是 issue #4976 转录文本的格式,不是运行时格式 —— 一旦接线,别名推导不会命中真实名字。
强烈建议
3. 旧 prompt 里明确的 carve-out —— “DO still select memories containing warnings, gotchas, or known issues about those tools — active use is exactly when those matter” —— 被删掉了。叠加宽泛的 heuristic 标记词(mcp tool、tool usage、api docs),价值最高的记忆(活跃工具的行为型警告)恰好在最需要时被压掉。排除 schema/失败转录 与 保留行为型警告 并不冲突。
4. 提取侧 bullet 一刀切的 “failed tool-call attempts” 会连带阻止保存“反复失败 + 已确认的 workaround” —— 建议区分原始转录/猜测 schema(不存)与确认过的 workaround/gotcha(可存)。
亮点:过滤在打分之前,被排除文档把 top-5 槽位让出来;模型成功路径与遥测(docs_scanned 用未过滤总数)不受影响;三层各有测试;双语描述 + 真实测试计划。
| } | ||
|
|
||
| const selectedDocs = selectRelevantAutoMemoryDocuments(query, docs, limit); | ||
| const heuristicDocs = docs.filter( |
There was a problem hiding this comment.
This filter never fires in production: the only production caller of recall is client.ts:1692 (the memory prefetch in sendMessageStream), and its options carry only config / excludedFilePaths / abortSignal — recentTools is not passed anywhere outside tests (this is true on main too; the parameter has had no producer since it was introduced). So options.recentTools ?? [] is always [] and isActiveToolUsageMemory returns false on its first line for every doc. The same applies to the model-selector side: the Recently used tools: ... context line is gated on recentTools.length > 0, so the strengthened system-prompt instruction is vacuous at runtime.
To actually close #4976's recall side, client.ts needs to collect the tool names executed recently in the session and pass them here. If that's intended as a follow-up, consider repositioning this PR (Fixes → Refs).
中文版本
这个过滤在生产中永远不会触发:recall 的唯一生产调用方是 client.ts:1692(sendMessageStream 里的 memory prefetch),它的 options 只有 config / excludedFilePaths / abortSignal —— 除测试外没有任何地方传 recentTools(main 上也一样,该参数自引入起就没有 producer)。所以 options.recentTools ?? [] 永远是空数组,isActiveToolUsageMemory 第一行就对所有文档返回 false。模型选择器侧同理:Recently used tools: ... 行的条件是 recentTools.length > 0,强化后的 system prompt 指令在运行时是空话。
要真正闭环 #4976 的召回侧,需要在 client.ts 收集本会话最近执行的工具名并传进来。如果打算放到后续 PR,建议把本 PR 重新定位(Fixes → Refs)。
| function toolAliases(toolName: string): string[] { | ||
| return Array.from( | ||
| new Set( | ||
| [toolName, toolName.split('::').at(-1) ?? ''] |
There was a problem hiding this comment.
The '::' separator comes from the issue #4976 transcript, not from this codebase: qwen-code registers MCP tools as mcp__<server>__<tool> (generateValidName(\mcp__${serverName}__${serverToolName}`)inmcp-tool.ts). Once recentToolsis wired with real registry names likemcp__ata__article-list-query, this split is a no-op, the alias list contains only the full name, and a memory that mentions just article-list-query` won't match — the filter misses its own target scenario.
Suggest splitting on both '::' and '__', dropping the mcp prefix segment, and adding a test that uses the real naming format (that test would also have surfaced the wiring gap).
Micro-nit: .at(-1) ?? '' — split() never returns an empty array, so the ?? '' branch is dead (harmless).
中文版本
'::' 分隔符来自 issue #4976 的转录文本,不是代码库约定:qwen-code 的 MCP 工具注册名是 mcp__<server>__<tool>(mcp-tool.ts 的 generateValidName)。一旦 recentTools 接线传入真实注册名(如 mcp__ata__article-list-query),这个 split 等于没切,别名列表只剩完整名字,而记忆文档里通常只写 article-list-query —— 子串匹配不中,过滤会错过它自己要防的场景。
建议同时按 '::' 和 '__' 切分、剥掉 mcp 前缀段,并补一条使用真实命名格式的测试(这条测试也能暴露上面的接线缺口)。
顺带:.at(-1) ?? '' 里 split() 不会返回空数组,?? '' 是死分支(无害)。
| const MAX_DOC_BODY_CHARS = 1_200; | ||
| const debugLogger = createDebugLogger('AUTO_MEMORY_RECALL'); | ||
|
|
||
| const ACTIVE_TOOL_USAGE_MEMORY_MARKERS = [ |
There was a problem hiding this comment.
Two notes on the marker list:
- Singular/plural pairs are redundant under substring matching —
parameter schemaalready matchesparameter schemas, same forfield mapping/tool schema. The list can shrink by ~a third. - The broad markers (
mcp tool,tool usage,api docs) make this filter catch warnings-class memories too: a feedback note like "when using mcp tool X, always pass locale=zh or it silently returns empty" names the active tool + hitsmcp tool→ filtered, exactly when it's most needed. Consider skippingtype: 'feedback'docs here, or narrowing the markers to schema-ish terms only (schema / mapping / guessed / failed-call), to stay consistent with the warnings carve-out discussion on the selector prompt.
中文版本
关于标记词列表的两点:
- 子串匹配下单复数成对是冗余的 ——
parameter schema已覆盖parameter schemas,field mapping/tool schema同理,列表可以缩掉约三分之一。 - 宽泛标记(
mcp tool、tool usage、api docs)会让过滤连警告型记忆一起命中:比如 feedback 记忆 “使用 mcp tool X 时必须传 locale=zh,否则静默返回空” —— 提到了活跃工具名 + 命中mcp tool→ 被滤掉,而这恰好是最需要它的时刻。建议这里跳过type: 'feedback'文档,或把标记收窄到纯 schema 类词(schema / mapping / guessed / failed-call),与选择器 prompt 上关于 warnings carve-out 的讨论保持一致。
| - If you are unsure if a memory will be useful in processing the user's query, then do not include it in your list. Be selective and discerning. | ||
| - If there are no memories in the list that would clearly be useful, feel free to return an empty list. | ||
| - If a list of recently-used tools is provided, do not select memories that are usage reference or API documentation for those tools (the assistant is already exercising them). DO still select memories containing warnings, gotchas, or known issues about those tools — active use is exactly when those matter.`; | ||
| - If a list of recently-used tools is provided, do not select memories that are usage reference, API documentation, parameter schemas, field mappings, guessed call formats, or failed-call transcripts for those tools. Live tool definitions are the source of truth. Only select a tool-related memory if it contains durable operational context that cannot be obtained from the live schema, such as credentials location, ownership, or an external escalation path.`; |
There was a problem hiding this comment.
The old wording carried an explicit carve-out: "DO still select memories containing warnings, gotchas, or known issues about those tools — active use is exactly when those matter." The new whitelist (credentials location / ownership / escalation path) doesn't obviously cover it, and the preceding exclusion list could plausibly classify a parameter gotcha ("date param must be UTC or it silently returns empty") as schema-ish — suppressing the highest-value memory class right when it matters most.
The two goals are compatible: exclude schemas / guessed formats / failed-call transcripts, AND keep behavioral warnings. Suggest re-adding the carve-out sentence alongside the new exclusions.
(Also note: this instruction is currently unreachable at runtime — the Recently used tools: line never appears because no production caller passes recentTools; see the recall.ts comment.)
中文版本
旧文案有一条明确的 carve-out:“DO still select memories containing warnings, gotchas, or known issues about those tools — active use is exactly when those matter.” 新白名单(credentials / ownership / escalation path)并不能明显覆盖它,而前面的排除列表很容易把参数类 gotcha(“date 参数必须 UTC,否则静默返回空”)归类为 schema 类 —— 在最需要的时刻压掉价值最高的一类记忆。
两个目标不冲突:排除 schema / 猜测格式 / 失败转录,同时保留行为型警告。建议在新排除项旁边把 carve-out 那句加回来。
(另注:这条指令目前在运行时不可达 —— Recently used tools: 行永远不出现,因为没有生产调用方传 recentTools,见 recall.ts 的评论。)
| '- Code patterns, conventions, architecture, file paths, or project structure — these can be derived by reading the current project state.', | ||
| '- Git history, recent changes, or who-changed-what — `git log` / `git blame` are authoritative.', | ||
| '- Debugging solutions or fix recipes — the fix is in the code; the commit message has the context.', | ||
| '- MCP tool names, parameter schemas, field mappings, guessed tool-call formats, or failed tool-call attempts — live tool definitions are authoritative and may change.', |
There was a problem hiding this comment.
The blanket "failed tool-call attempts" is a bit over-broad: a recurring failure with its confirmed workaround is exactly the kind of durable feedback memory worth saving ("tool X 500s when page>10 — paginate with cursor instead"). Raw failure transcripts and guessed schemas, agreed, should never be saved. Suggest wording that distinguishes the two, e.g. "...or raw failed-call transcripts — live tool definitions are authoritative. (Confirmed workarounds or gotchas about a tool's behavior are still worth saving.)"
中文版本
一刀切的 “failed tool-call attempts” 偏宽:反复失败 + 已确认的 workaround 恰恰是值得保存的持久 feedback 记忆(“工具 X 在 page>10 时 500 —— 改用 cursor 分页”)。原始失败转录、猜测的 schema 确实永远不该存。建议措辞区分两者,例如:“...or raw failed-call transcripts — live tool definitions are authoritative. (Confirmed workarounds or gotchas about a tool's behavior are still worth saving.)”
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
03eaab4 to
e02d6ae
Compare
|
Thanks for the careful trace. I pushed an update that addresses the runtime gap rather than just changing the tests:
Validation:
|
| private lastSentIdeContext: IdeContext | undefined; | ||
| private forceFullIdeContext = true; | ||
| private recentCompletedToolNames: string[] = []; | ||
| private pendingMemoryPrefetch: MemoryPrefetchHandle | undefined; |
There was a problem hiding this comment.
[Suggestion] resetChat() (line ~628) does not clear recentCompletedToolNames. Every other piece of per-conversation state is explicitly reset there (surfacedRelevantAutoMemoryPaths, cachedGitStatus, lastApiCompletionTimestamp, file read cache, deferred tools, etc.), but this new field was omitted.
After a /clear, stale tool names from the previous conversation persist and continue to drive the isActiveToolUsageMemory filter — ephemeral tool-schema memories for tools used in the old conversation will be incorrectly suppressed in the new one.
| private pendingMemoryPrefetch: MemoryPrefetchHandle | undefined; | |
| private recentCompletedToolNames: string[] = []; | |
| // NOTE: also reset in resetChat() |
— qwen3.7-max via Qwen Code /review
| @@ -286,7 +288,10 @@ export class GeminiClient { | |||
| // Check if we're resuming from a previous session | |||
| const resumedSessionData = this.config.getResumedSessionData(); | |||
There was a problem hiding this comment.
[Suggestion] When a session is resumed (resumedSessionData is truthy), the code replays UI telemetry and rebuilds API history but never extracts tool-call names to seed recentCompletedToolNames. The field stays at [] until the first new tool call completes, so the active-tool filter is inert during the resumed session's first query — which is the query most likely to benefit from memory recall.
Consider walking resumedSessionData.conversation.messages after buildApiHistoryFromConversation to extract the last N assistant functionCall.name values and assign them to this.recentCompletedToolNames.
— qwen3.7-max via Qwen Code /review
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
|
Thanks for the follow-up review. I pushed one more update for both points:
Validation:
|
qqqys
left a comment
There was a problem hiding this comment.
Critical follow-up rechecked: the production recent-tool context is now wired into recall, MCP tool aliases handle the runtime mcp__server__tool shape, resetChat clears stale tool context, and resumed sessions seed completed tool names from matched call/response history. I did not find any remaining critical issue in this pass.
Local runtime verification report (Linux, real build + real MCP server)I built this branch ( Rig. A local stdio MCP server registered as
The relevance selector only ever sees doc headers, so a sentinel can reach the main model only via actual recall injection — "model echoes the token" is injection proof (probes forbade file reads; transcripts were checked for stray reads, and one contaminated probe was discarded). For per-turn ground truth I added log-only instrumentation to the compiled Verified working (PR build, vs merge-base control)1. The production wiring gap is genuinely fixed. After a real MCP call in turn 1, the next turn's prefetch carries runtime tool names into the selector prompt: Merge-base control, same scenario: 2. The deterministic fallback layer works end-to-end. With the side query deliberately failing fast (workspace Probe answer: 3.
4. Extraction guard suppresses the harmful save class. Same mixed prompt in fresh workspaces ("remember: I guessed the call format {topic: string} for the ata tool and the call failed with a type error; also the on-call owner is the platform team in #ata-ops"):
5. Focused unit tests: all 4 touched test files pass in the worktree (212/212). Core + CLI build clean. Findings to weigh (none are regressions — the PR beats base in every scenario I ran)F1. On the happy path the protection is advisory, and glm-4.7 often ignores it. With the tools section present and a doc literally described as "Parameter schema and usage reference for the ", the model selector still selected the schema doc in 3 of 4 instrumented probes (temperature 0; the probe asks how to use the tool, so query-relevance and the exclusion rule conflict — which is representative of the #4976 scenario). The deterministic filter only runs when the side query fails. If we want the guarantee on the healthy path too, F2. Heuristic markers are English-only; Chinese memories pass through. For zh-locale users the extraction agent writes Chinese descriptions/bodies (live demo: merge-base extraction produced F3. (Pre-existing, for awareness) A slow-failing side model skips the fallback entirely. With a misconfigured provider that fails slowly (real 502s × 5 retries ≈ 5.5 s), the prefetch is aborted when the first stream completes, the VerdictThe three fixes from the earlier review rounds (production wiring, runtime alias shape, resetChat/resume lifecycle) are all real and verified live against the merge-base. The deterministic fallback works end-to-end, and the extraction guard measurably stops the harmful save class. F1/F2 bound the protection but never make anything worse than base — fine as follow-up issues rather than blockers. LGTM from the runtime-behavior side. 中文版(完整验证报告)本地运行时验证报告(Linux,真实构建 + 真实 MCP server)我在本地构建了该分支( 测试装置。 本地 stdio MCP server 注册为
relevance selector 只能看到文档头部(描述),正文哨兵只有在 recall 真正注入后才会进入主模型上下文——所以"模型回答出 token"即注入证明(probe 禁止读文件;逐轮核对了 transcript,发现一次被模型读文件污染的 probe 已弃用)。为了拿到逐轮的客观依据,我在两个构建的编译产物 验证通过项(PR 构建,与 merge-base 对照)1. 生产链路缺口确实修复了。 第 1 轮真实 MCP 调用后,下一轮 prefetch 携带运行时工具名并进入 selector 提示: merge-base 对照组同场景:每一轮都是 2. 确定性 fallback 层端到端可用。 让 side query 快速失败(workspace probe 回答: 3.
4. 提取守卫能压制有害保存类别。 相同混合 prompt("记住:我猜了 ata 工具的调用格式 {topic: string} 且首次调用类型报错;另外 ATA 故障值班是 platform team,#ata-ops 频道")在全新 workspace 各跑:
5. 焦点单测:4 个被改测试文件在 worktree 中全部通过(212/212)。core + cli 构建干净。 需要权衡的发现(都不是回归——所有场景下 PR 均优于 base)F1. 正常路径上的保护是建议性的,glm-4.7 经常不遵守。 工具列表在场、文档描述明写 "Parameter schema and usage reference for the <活跃工具>" 的情况下,模型 selector 在 4 次插桩 probe 中仍有 3 次选中了 schema 文档(temperature 0;probe 询问如何使用该工具,"与查询相关"和"排除规则"互相冲突——这正是 #4976 的典型场景)。确定性过滤只在 side query 失败时运行。若想在健康路径上也有确定性保证,可以在 F2. heuristic 标记只有英文;中文记忆会穿透。 中文环境下提取 agent 写中文描述/正文(实测:merge-base 提取产出了 F3.(既有行为,仅提醒)side 模型慢失败时 fallback 完全不会运行。 配置一个慢失败的 provider(真实 502 × 5 次重试 ≈ 5.5 秒)时,prefetch 在首个流结束时被 abort, 结论前几轮 review 提出的三处修复(生产链路、运行时 alias 形态、resetChat/resume 生命周期)全部真实生效并通过与 merge-base 的实测对比验证;确定性 fallback 端到端可用;提取守卫可测量地阻止了有害保存类别。F1/F2 界定了保护边界但从未劣于 base——适合作为 follow-up issue 而非阻塞项。从运行时行为角度 LGTM。 |
|
@qwen-code /triage |
|
Thanks for the PR, @he-yufeng! Template looks good ✓ — all required sections present, bilingual, linked to #4976. On direction: this is a clear win. Issue #4976 shows a textbook case of auto-memory reinforcing a guessed tool-call pattern across sessions, turning a one-time detour into a persistent feedback loop. Making memory more conservative about live tool definitions is exactly the right fix — tool schemas should come from the live registry, not from stale notes. No CHANGELOG reference in other agents, but the area is core to qwen-code's memory system. On approach: the three-layer defense (extraction guard → model selector prompt → deterministic heuristic fallback) is well-scoped and each layer covers a distinct failure mode. I considered whether a single post-filter on recall output would suffice, but the extraction guard prevents the bad saves in the first place — which the post-filter alone can't do. The scope feels right: it constrains memory behavior without touching MCP discovery or tool loading. Moving on to code review. 🔍 中文说明感谢贡献 @he-yufeng! 模板完整 ✓ — 所有必需段落齐全,中英双语,关联了 #4976。 方向:这个修复很明确。#4976 展示了一个典型案例——auto-memory 将猜测的工具调用模式固化下来,跨会话形成反馈循环。让 memory 对 live tool definitions 更保守是正确做法——工具 schema 应该来自实时注册表而非过时笔记。其他 agent 的 CHANGELOG 中没有直接参考,但 memory 系统是 qwen-code 的核心能力。 方案:三层防御(提取守卫 → 模型选择器提示 → 确定性 heuristic fallback)范围合理,每层覆盖不同失败模式。我考虑过是否只做 recall 输出的后置过滤就够了,但提取守卫从源头阻止了有害保存——后置过滤做不到。范围控制得当:约束了 memory 行为而不触及 MCP 发现或工具加载流程。 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewThe implementation is clean and well-structured. Three layers of defense, each independently useful:
No critical blockers. No AGENTS.md violations. Code follows project conventions (ESM, strict TS, collocated tests). Known limitations (all from wenshao's runtime verification, none are regressions):
TestsUnit tests for Typecheck errors in the worktree are all in files not touched by this PR ( Smoke Test (main working tree,
|
ReflectionGoing back to my independent proposal before reading the diff: I'd have done essentially the same thing — add a "don't save tool schemas" rule to the extraction prompt, pass recent tool names to the relevance selector with a skip instruction, and add a deterministic post-filter on the heuristic fallback. The PR matches this and adds the lifecycle wiring (resume seeding, resetChat clearing) that I would have missed in a first pass. That's a sign the author thought through the full lifecycle, not just the happy path. The problem is real — #4976 is a vivid 14-turn detour where guessed tool patterns got reinforced by memory. The fix is narrowly scoped: it touches exactly the three points where stale tool details can enter or persist in memory (extraction, model selection, heuristic fallback). No scope creep, no speculative abstractions. wenshao's runtime verification is the strongest evidence I've seen on a community PR — sentinel tokens, merge-base A/B comparison, deterministic assertions against compiled output, all three layers verified independently. The three findings (F1 advisory happy-path, F2 English-only markers, F3 slow-fail gap) are valid follow-ups that don't make anything worse than the current baseline. The code is straightforward, the tests are focused, and the implementation follows project conventions. If I had to maintain this in six months, I'd thank the author for the clear layering and the conservative filter logic (three conditions must all hold, with a durable whitelist override). Verdict: LGTM. Approving. ✅ 中文说明反思回到我在看 diff 之前的独立方案:我大概会做同样的事——在提取提示里加"不要保存工具 schema"规则,把近期工具名传给 relevance selector 并加跳过指令,再给 heuristic fallback 加确定性后置过滤。PR 匹配了这个方案,还加上了我初版会遗漏的生命周期处理(resume 播种、resetChat 清空)。这说明作者考虑了完整生命周期,而非只看正常路径。 问题是真实的——#4976 是一次 14 轮弯路,猜测的工具模式被 memory 强化。修复范围精确:恰好触及过时工具细节可能进入或留存于 memory 的三个点(提取、模型选择、heuristic fallback)。没有范围蔓延,没有投机性抽象。 wenshao 的运行时验证是我在社区 PR 上见过的最扎实的——哨兵 token、merge-base A/B 对比、针对编译产物的确定性断言,三层均独立验证。三个发现(F1 正常路径建议性、F2 仅英文标记、F3 慢失败缺口)是值得 follow-up 的有效点,但都不会让情况比当前基线更差。 代码直接,测试聚焦,实现遵循项目规范。如果六个月后我来维护这段代码,我会感谢作者清晰的层次划分和保守的过滤逻辑(三个条件同时满足才过滤,加上持久白名单覆盖)。 结论:LGTM。批准。 ✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
* fix(memory): avoid stale tool schema recall Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * fix(memory): seed resumed tool recall context Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> --------- Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
What this PR does
This makes managed auto-memory more conservative around MCP and other live tool definitions. It stops auto-memory from saving MCP tool names, parameter schemas, field mappings, guessed call formats, or failed tool-call attempts, tightens model-based recall so active tool schemas are not selected while the tool is in use, and applies the same protection to the heuristic recall fallback.
Why it's needed
Issue #4976 describes a long tool-call loop where stale or guessed tool usage could become persistent context and influence later calls. Live tool definitions are a better source of truth than memory for names, parameter shapes, and call formats. Keeping those details out of extraction and recall reduces the chance that old failed attempts get injected near a future tool call.
Reviewer Test Plan
How to verify
Run the focused memory tests and checks below. The new tests cover the extraction prompt guard, model-selector prompt guidance, and heuristic fallback filtering for recently used tools.
Evidence (Before & After)
N/A for UI. Before this change, the heuristic fallback ignored
recentTools, so a memory document aboutata::article-list-queryparameter schemas or failed calls could still be selected when the model selector failed. After this change, those active-tool schema memories are filtered while durable operational context, such as an escalation path, can still be recalled.Tested on
Environment (optional)
Node.js from the local repository environment after
npm ci.Commands run:
npm run test --workspace=@qwen-code/qwen-code-core -- src/memory/prompt.test.ts src/memory/relevanceSelector.test.ts src/memory/recall.test.ts npx eslint packages/core/src/memory/prompt.ts packages/core/src/memory/relevanceSelector.ts packages/core/src/memory/recall.ts packages/core/src/memory/prompt.test.ts packages/core/src/memory/relevanceSelector.test.ts packages/core/src/memory/recall.test.ts --max-warnings 0 npx prettier --check packages/core/src/memory/prompt.ts packages/core/src/memory/relevanceSelector.ts packages/core/src/memory/recall.ts packages/core/src/memory/prompt.test.ts packages/core/src/memory/relevanceSelector.test.ts packages/core/src/memory/recall.test.ts npm run typecheck --workspace=@qwen-code/qwen-code-core git diff --checkRisk & Scope
Linked Issues
Fixes #4976
中文说明
这个 PR 做了什么
这个 PR 让 managed auto-memory 在 MCP 和其他实时工具定义相关场景下更保守:不再保存 MCP 工具名、参数 schema、字段映射、猜测出来的调用格式或失败工具调用记录;同时加强模型选择器的提示,避免在工具正在使用时召回该工具的旧 schema 记忆;并给 heuristic fallback 补上同样的过滤。
为什么需要
#4976 描述了一次很长的工具调用绕路。问题之一是旧的、猜测出来的工具用法可能被保存进 memory,并在后续会话中靠近工具调用上下文被注入。工具名、参数形状、调用格式应该以当前 live tool definitions 为准,而不是以旧 memory 为准。这个改动降低了错误调用被跨会话放大的概率。
验证
本地在 Windows 上跑了 memory 相关的 focused tests、ESLint、Prettier check、core typecheck 和
git diff --check。新增测试覆盖提取提示、模型选择器提示,以及模型选择器失败后 heuristic fallback 对活跃工具 schema/失败调用 memory 的过滤。范围和风险
这个 PR 不重做 MCP 工具发现流程,也不实现 schema-first 工具加载;它只修 memory 侧,避免 memory 强化过时工具调用细节。fallback 过滤是有条件的:必须同时命中当前活跃工具名和 schema/参数/失败调用等标记;owner、escalation path 这类持久运维上下文仍可召回。