Skip to content

Commit e317ed5

Browse files
Merge remote-tracking branch 'upstream/main' into fix/url-trailing-punctuation
2 parents 6562494 + da66c7c commit e317ed5

30 files changed

Lines changed: 1395 additions & 559 deletions

docs/cli/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ overview of Gemini CLI, see the [main documentation page](../index.md).
2323

2424
## Advanced features
2525

26+
- **[Plan mode (experimental)](./plan-mode.md):** Use a safe, read-only mode for
27+
planning complex changes.
2628
- **[Checkpointing](./checkpointing.md):** Automatically save and restore
2729
snapshots of your session and files.
2830
- **[Enterprise configuration](./enterprise.md):** Deploy and manage Gemini CLI

docs/cli/plan-mode.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Plan Mode (experimental) <!-- omit in toc -->
2+
3+
Plan Mode is a safe, read-only mode for researching and designing complex
4+
changes. It prevents modifications while you research, design and plan an
5+
implementation strategy.
6+
7+
> **Note: Plan Mode is currently an experimental feature.**
8+
>
9+
> Experimental features are subject to change. To use Plan Mode, enable it via
10+
> `/settings` (search for `Plan`) or add the following to your `settings.json`:
11+
>
12+
> ```json
13+
> {
14+
> "experimental": {
15+
> "plan": true
16+
> }
17+
> }
18+
> ```
19+
>
20+
> Your feedback is invaluable as we refine this feature. If you have ideas,
21+
> suggestions, or encounter issues:
22+
>
23+
> - Use the `/bug` command within the CLI to file an issue.
24+
> - [Open an issue](https://github.com/google-gemini/gemini-cli/issues) on
25+
> GitHub.
26+
27+
- [Starting in Plan Mode](#starting-in-plan-mode)
28+
- [How to use Plan Mode](#how-to-use-plan-mode)
29+
- [Entering Plan Mode](#entering-plan-mode)
30+
- [The Planning Workflow](#the-planning-workflow)
31+
- [Exiting Plan Mode](#exiting-plan-mode)
32+
- [Tool Restrictions](#tool-restrictions)
33+
34+
## Starting in Plan Mode
35+
36+
You can configure Gemini CLI to start directly in Plan Mode by default:
37+
38+
1. Type `/settings` in the CLI.
39+
2. Search for `Approval Mode`.
40+
3. Set the value to `Plan`.
41+
42+
Other ways to start in Plan Mode:
43+
44+
- **CLI Flag:** `gemini --approval-mode=plan`
45+
- **Manual Settings:** Manually update your `settings.json`:
46+
47+
```json
48+
{
49+
"tools": {
50+
"approvalMode": "plan"
51+
}
52+
}
53+
```
54+
55+
## How to use Plan Mode
56+
57+
### Entering Plan Mode
58+
59+
You can enter Plan Mode in three ways:
60+
61+
1. **Keyboard Shortcut:** Press `Shift+Tab` to cycle through approval modes
62+
(`Default` -> `Plan` -> `Auto-Edit`).
63+
2. **Command:** Type `/plan` in the input box.
64+
3. **Natural Language:** Ask the agent to "start a plan for...".
65+
66+
### The Planning Workflow
67+
68+
1. **Requirements:** The agent clarifies goals using `ask_user`.
69+
2. **Exploration:** The agent uses read-only tools (like [`read_file`]) to map
70+
the codebase and validate assumptions.
71+
3. **Planning:** A detailed plan is written to a temporary Markdown file.
72+
4. **Review:** You review the plan.
73+
- **Approve:** Exit Plan Mode and start implementation (switching to
74+
Auto-Edit or Default approval mode).
75+
- **Iterate:** Provide feedback to refine the plan.
76+
77+
### Exiting Plan Mode
78+
79+
To exit Plan Mode:
80+
81+
1. **Keyboard Shortcut:** Press `Shift+Tab` to cycle to the desired mode.
82+
1. **Tool:** The agent calls the `exit_plan_mode` tool to present the finalized
83+
plan for your approval.
84+
85+
## Tool Restrictions
86+
87+
Plan Mode enforces strict safety policies to prevent accidental changes.
88+
89+
These are the only allowed tools:
90+
91+
- **FileSystem (Read):** [`read_file`], [`list_directory`], [`glob`]
92+
- **Search:** [`grep_search`], [`google_web_search`]
93+
- **Interaction:** `ask_user`
94+
- **MCP Tools (Read):** Read-only [MCP tools] (e.g., `github_read_issue`,
95+
`postgres_read_schema`) are allowed.
96+
- **Planning (Write):** [`write_file`] and [`replace`] ONLY allowed for `.md`
97+
files in the `~/.gemini/tmp/<project>/plans/` directory.
98+
99+
[`list_directory`]: ../tools/file-system.md#1-list_directory-readfolder
100+
[`read_file`]: ../tools/file-system.md#2-read_file-readfile
101+
[`grep_search`]: ../tools/file-system.md#5-grep_search-searchtext
102+
[`write_file`]: ../tools/file-system.md#3-write_file-writefile
103+
[`glob`]: ../tools/file-system.md#4-glob-findfiles
104+
[`google_web_search`]: ../tools/web-search.md
105+
[`replace`]: ../tools/file-system.md#6-replace-edit
106+
[MCP tools]: ../tools/mcp-server.md

docs/sidebar.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
{ "label": "Project context (GEMINI.md)", "slug": "docs/cli/gemini-md" },
2121
{ "label": "Shell commands", "slug": "docs/tools/shell" },
2222
{ "label": "Session management", "slug": "docs/cli/session-management" },
23+
{ "label": "Plan mode (experimental)", "slug": "docs/cli/plan-mode" },
2324
{ "label": "Todos", "slug": "docs/tools/todos" },
2425
{ "label": "Web search and fetch", "slug": "docs/tools/web-search" }
2526
]

evals/save_memory.eval.ts

Lines changed: 80 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ describe('save_memory', () => {
109109
params: {
110110
settings: { tools: { core: ['save_memory'] } },
111111
},
112-
prompt: `My dog's name is Buddy. What is my dog's name?`,
112+
prompt: `Please remember that my dog's name is Buddy.`,
113113
assert: async (rig, result) => {
114114
const wasToolCalled = await rig.waitForToolCall('save_memory');
115115
expect(wasToolCalled, 'Expected save_memory tool to be called').toBe(
@@ -145,25 +145,34 @@ describe('save_memory', () => {
145145
},
146146
});
147147

148-
const rememberingDbSchemaLocation =
149-
"Agent remembers project's database schema location";
148+
const ignoringDbSchemaLocation =
149+
"Agent ignores workspace's database schema location";
150150
evalTest('ALWAYS_PASSES', {
151-
name: rememberingDbSchemaLocation,
151+
name: ignoringDbSchemaLocation,
152152
params: {
153-
settings: { tools: { core: ['save_memory'] } },
153+
settings: {
154+
tools: {
155+
core: [
156+
'save_memory',
157+
'list_directory',
158+
'read_file',
159+
'run_shell_command',
160+
],
161+
},
162+
},
154163
},
155-
prompt: `The database schema for this project is located in \`db/schema.sql\`.`,
164+
prompt: `The database schema for this workspace is located in \`db/schema.sql\`.`,
156165
assert: async (rig, result) => {
157-
const wasToolCalled = await rig.waitForToolCall('save_memory');
158-
expect(wasToolCalled, 'Expected save_memory tool to be called').toBe(
159-
true,
160-
);
166+
await rig.waitForTelemetryReady();
167+
const wasToolCalled = rig
168+
.readToolLogs()
169+
.some((log) => log.toolRequest.name === 'save_memory');
170+
expect(
171+
wasToolCalled,
172+
'save_memory should not be called for workspace-specific information',
173+
).toBe(false);
161174

162175
assertModelHasOutput(result);
163-
checkModelOutputContent(result, {
164-
expectedContent: [/database schema|ok|remember|will do/i],
165-
testName: `${TEST_PREFIX}${rememberingDbSchemaLocation}`,
166-
});
167176
},
168177
});
169178

@@ -189,38 +198,74 @@ describe('save_memory', () => {
189198
},
190199
});
191200

192-
const rememberingTestCommand =
193-
'Agent remembers specific project test command';
201+
const ignoringBuildArtifactLocation =
202+
'Agent ignores workspace build artifact location';
194203
evalTest('ALWAYS_PASSES', {
195-
name: rememberingTestCommand,
204+
name: ignoringBuildArtifactLocation,
196205
params: {
197-
settings: { tools: { core: ['save_memory'] } },
206+
settings: {
207+
tools: {
208+
core: [
209+
'save_memory',
210+
'list_directory',
211+
'read_file',
212+
'run_shell_command',
213+
],
214+
},
215+
},
198216
},
199-
prompt: `The command to run all backend tests is \`npm run test:backend\`.`,
217+
prompt: `In this workspace, build artifacts are stored in the \`dist/artifacts\` directory.`,
200218
assert: async (rig, result) => {
201-
const wasToolCalled = await rig.waitForToolCall('save_memory');
202-
expect(wasToolCalled, 'Expected save_memory tool to be called').toBe(
203-
true,
204-
);
219+
await rig.waitForTelemetryReady();
220+
const wasToolCalled = rig
221+
.readToolLogs()
222+
.some((log) => log.toolRequest.name === 'save_memory');
223+
expect(
224+
wasToolCalled,
225+
'save_memory should not be called for workspace-specific information',
226+
).toBe(false);
227+
228+
assertModelHasOutput(result);
229+
},
230+
});
231+
232+
const ignoringMainEntryPoint = "Agent ignores workspace's main entry point";
233+
evalTest('ALWAYS_PASSES', {
234+
name: ignoringMainEntryPoint,
235+
params: {
236+
settings: {
237+
tools: {
238+
core: [
239+
'save_memory',
240+
'list_directory',
241+
'read_file',
242+
'run_shell_command',
243+
],
244+
},
245+
},
246+
},
247+
prompt: `The main entry point for this workspace is \`src/index.js\`.`,
248+
assert: async (rig, result) => {
249+
await rig.waitForTelemetryReady();
250+
const wasToolCalled = rig
251+
.readToolLogs()
252+
.some((log) => log.toolRequest.name === 'save_memory');
253+
expect(
254+
wasToolCalled,
255+
'save_memory should not be called for workspace-specific information',
256+
).toBe(false);
205257

206258
assertModelHasOutput(result);
207-
checkModelOutputContent(result, {
208-
expectedContent: [
209-
/command to run all backend tests|ok|remember|will do/i,
210-
],
211-
testName: `${TEST_PREFIX}${rememberingTestCommand}`,
212-
});
213259
},
214260
});
215261

216-
const rememberingMainEntryPoint =
217-
"Agent remembers project's main entry point";
262+
const rememberingBirthday = "Agent remembers user's birthday";
218263
evalTest('ALWAYS_PASSES', {
219-
name: rememberingMainEntryPoint,
264+
name: rememberingBirthday,
220265
params: {
221266
settings: { tools: { core: ['save_memory'] } },
222267
},
223-
prompt: `The main entry point for this project is \`src/index.js\`.`,
268+
prompt: `My birthday is on June 15th.`,
224269
assert: async (rig, result) => {
225270
const wasToolCalled = await rig.waitForToolCall('save_memory');
226271
expect(wasToolCalled, 'Expected save_memory tool to be called').toBe(
@@ -229,10 +274,8 @@ describe('save_memory', () => {
229274

230275
assertModelHasOutput(result);
231276
checkModelOutputContent(result, {
232-
expectedContent: [
233-
/main entry point for this project|ok|remember|will do/i,
234-
],
235-
testName: `${TEST_PREFIX}${rememberingMainEntryPoint}`,
277+
expectedContent: [/June 15th|ok|remember|will do/i],
278+
testName: `${TEST_PREFIX}${rememberingBirthday}`,
236279
});
237280
},
238281
});

evals/test-helper.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ export function evalTest(policy: EvalPolicy, evalCase: EvalCase) {
4949
// bootstrap test projects.
5050
const rootNodeModules = path.join(process.cwd(), 'node_modules');
5151
const testNodeModules = path.join(rig.testDir || '', 'node_modules');
52-
if (fs.existsSync(rootNodeModules)) {
52+
if (fs.existsSync(rootNodeModules) && !fs.existsSync(testNodeModules)) {
5353
fs.symlinkSync(rootNodeModules, testNodeModules, 'dir');
5454
}
5555

@@ -162,7 +162,7 @@ export function evalTest(policy: EvalPolicy, evalCase: EvalCase) {
162162
if (policy === 'USUALLY_PASSES' && !process.env['RUN_EVALS']) {
163163
it.skip(evalCase.name, fn);
164164
} else {
165-
it(evalCase.name, fn);
165+
it(evalCase.name, fn, evalCase.timeout);
166166
}
167167
}
168168

evals/validation_fidelity.eval.ts

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
/**
2+
* @license
3+
* Copyright 2026 Google LLC
4+
* SPDX-License-Identifier: Apache-2.0
5+
*/
6+
7+
import { describe, expect } from 'vitest';
8+
import { evalTest } from './test-helper.js';
9+
10+
describe('validation_fidelity', () => {
11+
evalTest('ALWAYS_PASSES', {
12+
name: 'should perform exhaustive validation autonomously when guided by system instructions',
13+
files: {
14+
'src/types.ts': `
15+
export interface LogEntry {
16+
level: 'info' | 'warn' | 'error';
17+
message: string;
18+
}
19+
`,
20+
'src/logger.ts': `
21+
import { LogEntry } from './types.js';
22+
23+
export function formatLog(entry: LogEntry): string {
24+
return \`[\${entry.level.toUpperCase()}] \${entry.message}\`;
25+
}
26+
`,
27+
'src/logger.test.ts': `
28+
import { expect, test } from 'vitest';
29+
import { formatLog } from './logger.js';
30+
import { LogEntry } from './types.js';
31+
32+
test('formats log correctly', () => {
33+
const entry: LogEntry = { level: 'info', message: 'test message' };
34+
expect(formatLog(entry)).toBe('[INFO] test message');
35+
});
36+
`,
37+
'package.json': JSON.stringify({
38+
name: 'test-project',
39+
type: 'module',
40+
scripts: {
41+
test: 'vitest run',
42+
build: 'tsc --noEmit',
43+
},
44+
}),
45+
'tsconfig.json': JSON.stringify({
46+
compilerOptions: {
47+
target: 'ESNext',
48+
module: 'ESNext',
49+
moduleResolution: 'node',
50+
strict: true,
51+
esModuleInterop: true,
52+
skipLibCheck: true,
53+
forceConsistentCasingInFileNames: true,
54+
},
55+
}),
56+
},
57+
prompt:
58+
"Refactor the 'LogEntry' interface in 'src/types.ts' to rename the 'message' field to 'payload'.",
59+
timeout: 600000,
60+
assert: async (rig) => {
61+
// The goal of this eval is to see if the agent realizes it needs to update usages
62+
// AND run 'npm run build' or 'tsc' autonomously to ensure project-wide structural integrity.
63+
64+
const toolLogs = rig.readToolLogs();
65+
const shellCalls = toolLogs.filter(
66+
(log) => log.toolRequest.name === 'run_shell_command',
67+
);
68+
69+
const hasBuildOrTsc = shellCalls.some((log) => {
70+
const cmd = JSON.parse(log.toolRequest.args).command.toLowerCase();
71+
return (
72+
cmd.includes('npm run build') ||
73+
cmd.includes('tsc') ||
74+
cmd.includes('typecheck') ||
75+
cmd.includes('npm run verify')
76+
);
77+
});
78+
79+
expect(
80+
hasBuildOrTsc,
81+
'Expected the agent to autonomously run a build or type-check command to verify the refactoring',
82+
).toBe(true);
83+
},
84+
});
85+
});

0 commit comments

Comments
 (0)