fix(evals): update save_memory evals and simplify tool description by NTaylorMullen · Pull Request #18610 · google-gemini/gemini-cli

NTaylorMullen · 2026-02-09T07:59:54Z

Summary

This PR fixes the failing save_memory behavioral evaluations by aligning them with the tool's core mandate: storing global user context, not workspace-local facts.

Details

The original evaluations were 'testing for failure' by asking the agent to store project-specific details (DB schemas, entry points) in long-term global memory. This caused significant regressions, especially with Gemini 3 Pro, which correctly identified these as project-local facts and refused to call the tool.

Key changes:

Eval Updates: Converted project-specific memory tests into negative evaluations that assert save_memory is NOT called.
Term Standardization: Replaced 'project' with 'workspace' to clarify that context local to the current environment (including sub-folders) should remain local.
Tool Description: Simplified and strengthened the save_memory description to explicitly forbid workspace-local facts while minimizing token usage.
Environment Stability: Added standard file tools to evaluations to prevent hallucinations when models try to verify workspace context.

Verified stable passage across both Gemini 2.5 Pro and Gemini 3 Pro.

Related Issues

Part of the ongoing behavioral evaluation stabilization effort.

gemini-code-assist · 2026-02-09T08:00:13Z

Summary of Changes

Hello @NTaylorMullen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses issues with the save_memory tool's behavioral evaluations by refining its intended use. The core change ensures that the tool is exclusively used for storing global user context, such as personal preferences, and explicitly prevents its invocation for workspace-specific details like file paths or project commands. This clarification is reinforced through updated evaluation tests, standardized terminology, and a more precise tool description, leading to more stable and accurate model behavior.

Highlights

Evaluation Updates: Refactored save_memory evaluations to correctly test that the tool is not called for workspace-specific information, addressing previous failures where models incorrectly attempted to save local facts globally.
Tool Description Clarity: The save_memory tool's description has been significantly simplified and strengthened to explicitly state it's for global user context only, strictly forbidding the storage of workspace-local facts.
Terminology Alignment: Standardized the use of 'workspace' instead of 'project' in evaluations to better reflect the scope of local context and avoid ambiguity.
Evaluation Environment Enhancement: Enhanced evaluation settings by including standard file tools (list_directory, read_file, run_shell_command) to prevent model hallucinations and provide a more stable testing environment.

Changelog

evals/save_memory.eval.ts
- Changed several evalTest cases from expecting save_memory to be called for "project's database schema location", "specific project test command", and "project's main entry point" to expecting it not to be called for "workspace's database schema location", "workspace build artifact location", and "workspace's main entry point".
- Updated the assert logic in these negative tests to check rig.readToolLogs().some(...) and expect(...).toBe(false).
- Added list_directory, read_file, run_shell_command to the tools.core settings for these evaluations.
- Replaced the "Agent remembers project's main entry point" test with "Agent remembers user's birthday", asserting save_memory is called for this global personal fact.
- Removed checkModelOutputContent calls from the negative evaluations.
packages/core/src/tools/memoryTool.ts
- Removed the explicit FunctionDeclaration import and memoryToolSchemaData constant, inlining the schema definition.
- Rewrote the memoryToolDescription to be much more concise and emphatic about its purpose: "Saves concise global user context (preferences, facts) for use across ALL workspaces." It now includes a "CRITICAL: GLOBAL CONTEXT ONLY" section explicitly forbidding workspace-specific context.
- Appended examples like "Always lint after building" to the description passed to the super constructor.

Activity

No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-09T08:04:12Z

Size Change: -1.03 kB (0%)

Total Size: 23.9 MB

Filename	Size	Change
`./bundle/gemini.js`	23.8 MB	-1.03 kB (0%)

ℹ️ View Unchanged

Filename	Size
`./bundle/sandbox-macos-permissive-closed.sb`	1.03 kB
`./bundle/sandbox-macos-permissive-open.sb`	890 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB
`./bundle/sandbox-macos-restrictive-closed.sb`	3.29 kB
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB

_{compressed-size-action}

gemini-code-assist

Code Review

This pull request effectively addresses the failing save_memory evaluations by correctly aligning them with the tool's intended purpose of storing global, not workspace-local, context. The simplification and strengthening of the save_memory tool description is a significant improvement that makes the instructions for the model much clearer and more forceful. The changes in the evaluation tests are also well-aligned with this new, stricter definition of the tool's function. The suggestion to further improve the formatting of the tool description prompt is valid and has been retained.

packages/core/src/tools/memoryTool.ts

- Updated behavioral evaluations to verify workspace-local memory restriction using clear examples (schema, artifacts, entry points). - Simplified and consolidated save_memory tool description and schema to explicitly forbid workspace-specific facts while maintaining token efficiency. - Ensured evaluations have appropriate file tools to prevent hallucinations or incorrect behavior. - Updated MemoryTool unit test expectation to match the new description. - Refined save_memory evaluation prompts to be more explicitly workspace-specific to reduce flakiness. - Verified stable passage across both Gemini 2.5 and Gemini 3 models.

…oogle-gemini#18610)

* Fix newline insertion bug in replace tool (google-gemini#18595) * fix(evals): update save_memory evals and simplify tool description (google-gemini#18610) * chore(evals): update validation_fidelity_pre_existing_errors to USUALLY_PASSES (google-gemini#18617) * fix: shorten tool call IDs and fix duplicate tool name in truncated output filenames (google-gemini#18600) * feat(cli): implement atomic writes and safety checks for trusted folders (google-gemini#18406) * Remove relative docs links (google-gemini#18650) * docs: add legacy snippets convention to GEMINI.md (google-gemini#18597) * fix(chore): Support linting for cjs (google-gemini#18639) Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com> * feat: move shell efficiency guidelines to tool description (google-gemini#18614) * Added "" as default value, since getText() used to expect a string only and thus crashed when undefined... Fixes google-gemini#18076 (google-gemini#18099) * Allow @-includes outside of workspaces (with permission) (google-gemini#18470) * chore: make `ask_user` header description more clear (google-gemini#18657) * bug(core): Fix minor bug in migration logic. (google-gemini#18661) * Harded code assist converter. (google-gemini#18656) * refactor(core): model-dependent tool definitions (google-gemini#18563) * feat: enable plan mode experiment in settings (google-gemini#18636) * refactor: push isValidPath() into parsePastedPaths() (google-gemini#18664) * fix(cli): correct 'esc to cancel' position and restore duration display (google-gemini#18534) * feat(cli): add DevTools integration with gemini-cli-devtools (google-gemini#18648) * chore: remove unused exports and redundant hook files (google-gemini#18681) * Fix number of lines being reported in rewind confirmation dialog (google-gemini#18675) * feat(cli): disable folder trust in headless mode (google-gemini#18407) * Disallow unsafe type assertions (google-gemini#18688) * Change event type for release (google-gemini#18693) * feat: handle multiple dynamic context filenames in system prompt (google-gemini#18598) * Properly parse at-commands with narrow non-breaking spaces (google-gemini#18677) * refactor(core): centralize core tool definitions and support model-specific schemas (google-gemini#18662) * feat(core): Render memory hierarchically in context. (google-gemini#18350) * feat: Ctrl+O to expand paste placeholder (google-gemini#18103) * fix(cli): Improve header spacing (google-gemini#18531) * Feature/quota visibility 16795 (google-gemini#18203) * docs: remove TOC marker from Plan Mode header (google-gemini#18678) * Inline thinking bubbles with summary/full modes (google-gemini#18033) Co-authored-by: Jacob Richman <jacob314@gmail.com> * fix(ui): remove redundant newlines in Gemini messages (google-gemini#18538) * test(cli): fix AppContainer act() warnings and improve waitFor resilience (google-gemini#18676) * refactor(core): refine Security & System Integrity section in system prompt (google-gemini#18601) * Fix layout rounding. (google-gemini#18667) * docs(skills): enhance pr-creator safety and interactivity (google-gemini#18616) * test(core): remove hardcoded model from TestRig (google-gemini#18710) * feat(core): optimize sub-agents system prompt intro (google-gemini#18608) * feat(cli): update approval mode labels and shortcuts per latest UX spec (google-gemini#18698) * fix(plan): update persistent approval mode setting (google-gemini#18638) Co-authored-by: Sandy Tao <sandytao520@icloud.com> * fix: move toasts location to left side (google-gemini#18705) * feat(routing): restrict numerical routing to Gemini 3 family (google-gemini#18478) * fix(ide): fix ide nudge setting (google-gemini#18733) * fix(core): standardize tool formatting in system prompts (google-gemini#18615) * chore: consolidate to green in ask user dialog (google-gemini#18734) * feat: add `extensionsExplore` setting to enable extensions explore UI. (google-gemini#18686) * feat(cli): defer devtools startup and integrate with F12 (google-gemini#18695) * ui: update & subdue footer colors and animate progress indicator (google-gemini#18570) * test: add model-specific snapshots for coreTools (google-gemini#18707) Co-authored-by: matt korwel <matt.korwel@gmail.com> * ci: shard windows tests and fix event listener leaks (google-gemini#18670) * fix: allow `ask_user` tool in yolo mode (google-gemini#18541) * feat: redact disabled tools from system prompt (google-gemini#13597) (google-gemini#18613) * Update Gemini.md to use the curent year on creating new files (google-gemini#18460) * Code review cleanup for thinking display (google-gemini#18720) * fix(cli): hide scrollbars when in alternate buffer copy mode (google-gemini#18354) Co-authored-by: Jacob Richman <jacob314@gmail.com> * Fix issues with rip grep (google-gemini#18756) * fix(cli): fix history navigation regression after prompt autocomplete (google-gemini#18752) * chore: cleanup unused and add unlisted dependencies in packages/cli (google-gemini#18749) * Fix issue where Gemini CLI creates tests in a new file (google-gemini#18409) * feat(telemetry): Ensure experiment IDs are included in OpenTelemetry logs (google-gemini#18747) * feat(ux): added text wrapping capabilities to markdown tables (google-gemini#18240) Co-authored-by: jacob314 <jacob314@gmail.com> * Revert "fix(mcp): ensure MCP transport is closed to prevent memory leaks" (google-gemini#18771) * chore(release): bump version to 0.30.0-nightly.20260210.a2174751d (google-gemini#18772) * chore: cleanup unused and add unlisted dependencies in packages/core (google-gemini#18762) * chore(core): update activate_skill prompt verbiage to be more direct (google-gemini#18605) * Add autoconfigure memory usage setting to the dialog (google-gemini#18510) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * fix(core): prevent race condition in policy persistence (google-gemini#18506) Co-authored-by: Allen Hutchison <adh@google.com> * fix(evals): prevent false positive in hierarchical memory test (google-gemini#18777) * test(evals): mark all `save_memory` evals as `USUALLY_PASSES` due to unreliability (google-gemini#18786) * feat(cli): add setting to hide shortcuts hint UI (google-gemini#18562) * feat(core): formalize 5-phase sequential planning workflow (google-gemini#18759) * Introduce limits for search results. (google-gemini#18767) --------- Co-authored-by: Andrew Garrett <andrewgarrett@google.com> Co-authored-by: N. Taylor Mullen <ntaylormullen@google.com> Co-authored-by: Sandy Tao <sandytao520@icloud.com> Co-authored-by: Gal Zahavi <38544478+galz10@users.noreply.github.com> Co-authored-by: christine betts <chrstn@uw.edu> Co-authored-by: Aswin Ashok <aswwwin@google.com> Co-authored-by: Abhijith V Ashok <abhi2349jith@gmail.com> Co-authored-by: Tommaso Sciortino <sciortino@gmail.com> Co-authored-by: Jack Wotherspoon <jackwoth@google.com> Co-authored-by: joshualitt <joshualitt@google.com> Co-authored-by: Jacob Richman <jacob314@gmail.com> Co-authored-by: Aishanee Shah <aishaneeshah@gmail.com> Co-authored-by: Jerop Kipruto <jerop@google.com> Co-authored-by: Adib234 <30782825+Adib234@users.noreply.github.com> Co-authored-by: Christian Gunderman <gundermanc@gmail.com> Co-authored-by: g-samroberts <158088236+g-samroberts@users.noreply.github.com> Co-authored-by: Spencer <spencertang@google.com> Co-authored-by: Dmitry Lyalin <dmitry.lyalin@lyalin.com> Co-authored-by: matt korwel <matt.korwel@gmail.com> Co-authored-by: Shreya Keshive <shreyakeshive@google.com> Co-authored-by: Sri Pasumarthi <111310667+sripasg@users.noreply.github.com> Co-authored-by: Keith Guerin <keithguerin@gmail.com> Co-authored-by: Sehoon Shon <sshon@google.com> Co-authored-by: Adam Weidman <65992621+adamfweidman@users.noreply.github.com> Co-authored-by: Kevin Ramdass <ramdass.kevin@gmail.com> Co-authored-by: Dev Randalpura <devrandalpura@google.com> Co-authored-by: gemini-cli-robot <gemini-cli-robot@google.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Brad Dux <959674+braddux@users.noreply.github.com> Co-authored-by: Allen Hutchison <adh@google.com> Co-authored-by: Abhijit Balaji <abhijitbalaji@google.com>

…oogle-gemini#18610)

NTaylorMullen requested a review from a team as a code owner February 9, 2026 07:59

gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Feb 9, 2026

gemini-code-assist bot reviewed Feb 9, 2026

View reviewed changes

packages/core/src/tools/memoryTool.ts Show resolved Hide resolved

NTaylorMullen force-pushed the ntm/fix-save-memory-evals branch from 947a6ec to 38d8a6c Compare February 9, 2026 08:49

NTaylorMullen merged commit fe70052 into main Feb 9, 2026
26 checks passed

NTaylorMullen deleted the ntm/fix-save-memory-evals branch February 9, 2026 09:06

aswinashok44 pushed a commit to aswinashok44/gemini-cli that referenced this pull request Feb 9, 2026

fix(evals): update save_memory evals and simplify tool description (g…

df30827

…oogle-gemini#18610)

This was referenced Feb 18, 2026

Changelog for v0.29.0 #19361

Merged

Changelog for v0.30.0-preview.5 #20107

Merged

kuishou68 pushed a commit to iOfficeAI/aioncli that referenced this pull request Feb 27, 2026

fix(evals): update save_memory evals and simplify tool description (g…

4df5c26

…oogle-gemini#18610)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evals): update save_memory evals and simplify tool description#18610

fix(evals): update save_memory evals and simplify tool description#18610
NTaylorMullen merged 1 commit intomainfrom
ntm/fix-save-memory-evals

NTaylorMullen commented Feb 9, 2026

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NTaylorMullen commented Feb 9, 2026

Summary

Details

Related Issues

Uh oh!

gemini-code-assist bot commented Feb 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 9, 2026 •

edited

Loading