Skip to content
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions docs/feature-requests/001-flash-lite-image-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Feature: Add support for flash-lite and image models

## Summary

This feature aims to enhance the Gemini CLI by incorporating support for the flash-lite model and image processing capabilities.

## Motivation

- **Flash-lite Model:** To offer a faster and potentially more cost-effective option for simpler tasks, improving user experience and performance.
- **Image Processing:** To enable users to interact with Gemini models that support image input and generation, expanding the CLI's utility.

## Proposed Changes

1. **Model Configuration:**
* Add `gemini-2.5-flash-lite` alias to `defaultModelConfigs.ts`.
* Add `gemini-2.5-image` and `gemini-2.5-flash-lite-image` aliases to `defaultModelConfigs.ts`.
* Update `resolveModel` in `models.ts` to correctly map these aliases to their respective model names, respecting the preview features flag.
2. **Routing Strategy for Images:**
* Create a new `ImageStrategy` in `routing/strategies/ImageStrategy.ts`.
* This strategy will detect if a user request includes image parts (e.g., via `inlineData`) or explicitly asks for image generation.
* It will route such requests to the appropriate image-capable model (using the new aliases).
* Ensure that the use of preview image models is controlled by the `--preview` flag.
3. **Testing:**
* Add unit tests for the `ImageStrategy` to cover cases with and without images, and with/without preview features enabled.
* Update golden files if necessary.

## Acceptance Criteria

- Users can specify `flash-lite` as a model alias, and it correctly routes to the flash-lite model.
- Users can include image parts in their prompts, and the CLI correctly routes these to an image-capable model.
- Users can request image generation (e.g., using prompts like "create an image..."), and the CLI routes these to an image-capable model.
- Preview features flag correctly controls access to preview image models.
42 changes: 42 additions & 0 deletions docs/feature-requests/002-script-output-summarization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Feature: Implement script output summarization with excerpt preservation

## Summary

This feature introduces a mechanism to summarize lengthy script outputs before they are processed by the main thinking model. Summarization will primarily use the 'flash-lite' model for efficiency, with options to skip summarization for short outputs or when explicitly disallowed.

## Motivation

- **Efficiency:** Long script outputs can overwhelm the context window of the main thinking model and increase processing time and cost. Summarizing them first with a faster, cheaper model like 'flash-lite' can improve overall performance and cost-effectiveness.
- **Clarity:** Summaries can distill key information from complex script outputs, making them easier for the main LLM to process and act upon.
- **Preservation:** The summarization process should preserve critical excerpts from the original output to avoid loss of crucial information.

## Proposed Changes

1. **Shell Tool Output Tagging:**
* Modify `ShellToolInvocation.execute` to prepend a distinctive prefix (e.g., `[SHELL_OUTPUT]
`) to the `llmContent` of shell command results. This will help identify shell outputs for the routing strategy.
* Add a `toolSpecificInfo: { isShellOutput: true }` flag to the `ToolResult` to explicitly mark shell command outputs.
2. **Summarization Routing Strategy:**
* Create a new `ScriptOutputSummarizationStrategy` in `routing/strategies/scriptOutputSummarizationStrategy.ts`.
* This strategy will inspect the `RoutingContext` for shell output indicators (`toolSpecificInfo.isShellOutput`).
* **Summarization Conditions:**
* Summarize if the script output is longer than a defined threshold (e.g., 500 characters).
* Skip summarization if the output is shorter than the threshold.
* **(Optional/Future):** Implement a mechanism to respect explicit "no summarization" requests from the model or user (this might require further discussion on how such a signal would be passed).
* Use the 'flash-lite' model (via `DEFAULT_GEMINI_FLASH_MODEL` alias) for summarization.
3. **Model Router Integration:**
* Add `ScriptOutputSummarizationStrategy` to the `CompositeStrategy` in `ModelRouterService.ts`, ensuring it is evaluated before other strategies that might process general text.
4. **Testing:**
* Add unit tests for `ScriptOutputSummarizationStrategy` to cover:
* Correct identification of shell output.
* Correct skipping of summarization for short outputs.
* Correct summarization of long outputs using 'flash-lite'.
* Proper handling of empty script outputs.
* Ensure the summarization prompt includes instructions to preserve key excerpts.

## Acceptance Criteria

- Long script outputs are summarized using the 'flash-lite' model.
- Short script outputs are passed through without summarization.
- Summaries of script outputs retain critical information and key phrases from the original output.
- The system correctly identifies shell command outputs to apply this logic.
1 change: 0 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,6 @@
"pre-commit": "node scripts/pre-commit.js"
},
"overrides": {
"ink": "npm:@jrichman/ink@6.4.6",
"wrap-ansi": "9.0.2",
"cliui": {
"wrap-ansi": "7.0.0"
Expand Down
12 changes: 12 additions & 0 deletions packages/core/src/config/defaultModelConfigs.ts
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,18 @@ export const DEFAULT_MODEL_CONFIGS: ModelConfigServiceConfig = {
model: 'gemini-2.5-flash-lite',
},
},
'gemini-2.5-image': {
extends: 'chat-base-2.5',
modelConfig: {
model: 'gemini-2.5-pro-image-preview',
},
},
'gemini-2.5-flash-lite-image': {
extends: 'chat-base-2.5',
modelConfig: {
model: 'gemini-2.5-flash-lite-image-preview',
},
},
// Bases for the internal model configs.
'gemini-2.5-flash-base': {
extends: 'base',
Expand Down
7 changes: 7 additions & 0 deletions packages/core/src/config/models.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ export const DEFAULT_GEMINI_MODEL_AUTO = 'auto';
export const GEMINI_MODEL_ALIAS_PRO = 'pro';
export const GEMINI_MODEL_ALIAS_FLASH = 'flash';
export const GEMINI_MODEL_ALIAS_FLASH_LITE = 'flash-lite';
export const GEMINI_MODEL_ALIAS_IMAGE = 'image';
export const GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE = 'flash-lite-image';

export const DEFAULT_GEMINI_EMBEDDING_MODEL = 'gemini-embedding-001';

Expand Down Expand Up @@ -46,6 +48,11 @@ export function resolveModel(
case GEMINI_MODEL_ALIAS_FLASH_LITE: {
return DEFAULT_GEMINI_FLASH_LITE_MODEL;
}
case GEMINI_MODEL_ALIAS_IMAGE: {
return previewFeaturesEnabled
? 'gemini-2.5-pro-image-preview'
: DEFAULT_GEMINI_MODEL;
}
default: {
return requestedModel;
}
Expand Down
5 changes: 5 additions & 0 deletions packages/core/src/routing/modelRouterService.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ import { ClassifierStrategy } from './strategies/classifierStrategy.js';
import { CompositeStrategy } from './strategies/compositeStrategy.js';
import { FallbackStrategy } from './strategies/fallbackStrategy.js';
import { OverrideStrategy } from './strategies/overrideStrategy.js';
import { ImageStrategy } from './strategies/ImageStrategy.js';
import { ScriptOutputSummarizationStrategy } from './strategies/scriptOutputSummarizationStrategy.js';

import { logModelRouting } from '../telemetry/loggers.js';
import { ModelRoutingEvent } from '../telemetry/types.js';
Expand All @@ -42,6 +44,9 @@ export class ModelRouterService {
[
new FallbackStrategy(),
new OverrideStrategy(),
new ScriptOutputSummarizationStrategy(),
new ScriptOutputSummarizationStrategy(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The ScriptOutputSummarizationStrategy is instantiated twice. This will cause the strategy to be executed twice for each routing decision, which is inefficient and may lead to unintended side effects. Please remove the duplicate instance.

new ImageStrategy(),
new ClassifierStrategy(),
new DefaultStrategy(),
],
Expand Down
179 changes: 179 additions & 0 deletions packages/core/src/routing/strategies/ImageStrategy.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@

import { describe, it, expect, vi } from 'vitest';
import { ImageStrategy } from './ImageStrategy';
import { DEFAULT_GEMINI_MODEL, GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE } from '../../config/models';
import { RoutingContext } from '../routingStrategy';
import { Config } from '../../config/config';

describe('ImageStrategy', () => {
const mockConfig = {
getModel: vi.fn(),
getPreviewFeatures: vi.fn(),
} as unknown as Config;

it('should return flash-lite image model if request has image and flash-lite is preferred general model', async () => {
const strategy = new ImageStrategy();
const context = {
request: {
parts: [
{ text: 'Describe this image' },
{ inlineData: { mimeType: 'image/png', data: 'base64...' } },
],
},
} as RoutingContext;

// Mock config to return flash-lite as preferred general model
vi.spyOn(mockConfig, 'getModel').mockReturnValue('flash-lite');
vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(true);

const decision = await strategy.route(context, mockConfig, {} as any);

expect(decision).toEqual({
model: 'gemini-2.5-flash-lite-image-preview',
metadata: {
source: 'image',
latencyMs: 0,
reasoning: 'Request contains an image.',
},
});
});

it('should return pro image model if request has image and pro is preferred general model (preview enabled)', async () => {
const strategy = new ImageStrategy();
const context = {
request: {
parts: [
{ text: 'Describe this image' },
{ inlineData: { mimeType: 'image/png', data: 'base64...' } },
],
},
} as RoutingContext;

// Mock config to return pro as preferred general model
vi.spyOn(mockConfig, 'getModel').mockReturnValue('pro');
vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(true);

const decision = await strategy.route(context, mockConfig, {} as any);

expect(decision).toEqual({
model: 'gemini-2.5-pro-image-preview',
metadata: {
source: 'image',
latencyMs: 0,
reasoning: 'Request contains an image.',
},
});
});

it('should return pro model if request has image and preview features are disabled', async () => {
const strategy = new ImageStrategy();
const context = {
request: {
parts: [
{ text: 'Describe this image' },
{ inlineData: { mimeType: 'image/png', data: 'base64...' } },
],
},
} as RoutingContext;

// Mock config to disable preview features
vi.spy10('getPreviewFeatures').mockReturnValue(false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a typo vi.spy10 which should be vi.spyOn. This will cause the test to fail. Additionally, spyOn requires the object to be spied on as the first argument.

Suggested change
vi.spy10('getPreviewFeatures').mockReturnValue(false);
vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(false);

vi.spyOn(mockConfig, 'getModel').mockReturnValue('pro');

const decision = await strategy.route(context, mockConfig, {} as any);

expect(decision).toEqual({
model: DEFAULT_GEMINI_MODEL,
metadata: {
source: 'image',
latencyMs: 0,
reasoning: 'Request contains an image.',
},
});
});

it('should return flash-lite image model if request asks to generate image and flash-lite is preferred general model', async () => {
const strategy = new ImageStrategy();
const context = {
request: {
parts: [{ text: 'generate an image of a cat' }],
},
} as RoutingContext;

// Mock config to return flash-lite as preferred general model
vi.spyOn(mockConfig, 'getModel').mockReturnValue('flash-lite');
vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(true);

const decision = await strategy.route(context, mockConfig, {} as any);

expect(decision).toEqual({
model: 'gemini-2.5-flash-lite-image-preview',
metadata: {
source: 'image',
latencyMs: 0,
reasoning: 'Request for image generation.',
},
});
});

it('should return pro image model if request asks to generate image and pro is preferred general model (preview enabled)', async () => {
const strategy = new ImageStrategy();
const context = {
request: {
parts: [{ text: 'create an image of a dog' }],
},
} as RoutingContext;

// Mock config to return pro as preferred general model
vi.spyOn(mockConfig, 'getModel').mockReturnValue('pro');
vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(true);

const decision = await strategy.route(context, mockConfig, {} as any);

expect(decision).toEqual({
model: 'gemini-2.5-pro-image-preview',
metadata: {
source: 'image',
latencyMs: 0,
reasoning: 'Request for image generation.',
},
});
});

it('should return pro model if request asks to generate image and preview features are disabled', async () => {
const strategy = new ImageStrategy();
const context = {
request: {
parts: [{ text: 'draw a picture of a bird' }],
},
} as RoutingContext;

// Mock config to disable preview features
vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(false);
vi.spyOn(mockConfig, 'getModel').mockReturnValue('pro');

const decision = await strategy.route(context, mockConfig, {} as any);

expect(decision).toEqual({
model: DEFAULT_GEMINI_MODEL,
metadata: {
source: 'image',
latencyMs: 0,
reasoning: 'Request for image generation.',
},
});
});

it('should return null if the request does not contain an image or image generation request', async () => {
const strategy = new ImageStrategy();
const context = {
request: {
parts: [{ text: 'Hello, world!' }],
},
} as RoutingContext;

const decision = await strategy.route(context, {} as Config);

expect(decision).toBeNull();
});
});
61 changes: 61 additions & 0 deletions packages/core/src/routing/strategies/ImageStrategy.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import {
RoutingStrategy,
RoutingContext,
RoutingDecision,
} from '../routingStrategy';
import { Config } from '../../config/config';
import { BaseLlmClient } from '../../core/baseLlmClient';
import {
GEMINI_MODEL_ALIAS_IMAGE,
GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE,
resolveModel,
DEFAULT_GEMINI_MODEL,
} from '../../config/models';

export class ImageStrategy implements RoutingStrategy {
readonly name = 'image';

async route(
context: RoutingContext,
config: Config,
baseLlmClient: BaseLlmClient,
): Promise<RoutingDecision | null> {
const hasImage = context.request.parts.some(
(part) =>
'inlineData' in part && part.inlineData?.mimeType.startsWith('image/'),
);

const textRequest = context.request.parts
.map((part) => ('text' in part ? part.text : ''))
.join(' ');

const requestsImageGeneration = /generate an image|create an image|draw a picture/i.test(textRequest);

if (hasImage || requestsImageGeneration) {
const preferredGeneralModel = config.getModel();
// Resolve the general model preference to its concrete name
const resolvedGeneralModel = resolveModel(preferredGeneralModel, config.getPreviewFeatures());

let modelToUse: string;
// Check if the preferred general model is 'flash-lite' based on its name
if (resolvedGeneralModel.includes('flash-lite')) {
// Route to the flash-lite image model
modelToUse = GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE;
} else {
// Fallback to pro image model or default pro model based on preview flag
modelToUse = config.getPreviewFeatures() ? 'gemini-2.5-pro-image-preview' : DEFAULT_GEMINI_MODEL;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for selecting an image model is inconsistent and has a bug. When the preferred model is 'flash-lite', it routes to a preview model regardless of whether preview features are enabled. This could lead to using a preview model when it's not intended. The logic is also inconsistent as it returns a concrete model name for the 'pro' preference but an alias for 'flash-lite'.

I suggest refactoring to consistently handle the preview flag and return concrete model names for both cases to improve clarity and correctness.

Suggested change
const preferredGeneralModel = config.getModel();
// Resolve the general model preference to its concrete name
const resolvedGeneralModel = resolveModel(preferredGeneralModel, config.getPreviewFeatures());
let modelToUse: string;
// Check if the preferred general model is 'flash-lite' based on its name
if (resolvedGeneralModel.includes('flash-lite')) {
// Route to the flash-lite image model
modelToUse = GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE;
} else {
// Fallback to pro image model or default pro model based on preview flag
modelToUse = config.getPreviewFeatures() ? 'gemini-2.5-pro-image-preview' : DEFAULT_GEMINI_MODEL;
}
const previewEnabled = config.getPreviewFeatures();
let modelToUse: string;
if (!previewEnabled) {
modelToUse = DEFAULT_GEMINI_MODEL;
} else {
const preferredGeneralModel = config.getModel();
const resolvedGeneralModel = resolveModel(preferredGeneralModel, previewEnabled);
if (resolvedGeneralModel.includes('flash-lite')) {
modelToUse = 'gemini-2.5-flash-lite-image-preview';
} else {
modelToUse = 'gemini-2.5-pro-image-preview';
}
}


return {
model: modelToUse,
metadata: {
source: this.name,
latencyMs: 0, // Placeholder, actual measurement needed
reasoning: hasImage ? 'Request contains an image.' : 'Request for image generation.',
},
};
}

return null;
}
}
Loading