-
Notifications
You must be signed in to change notification settings - Fork 13k
feat: Implement script output summarization and refine image routing #14097
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
b273fc1
7074169
81e2a0b
a43f1e1
2e06a22
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # Feature: Add support for flash-lite and image models | ||
|
|
||
| ## Summary | ||
|
|
||
| This feature aims to enhance the Gemini CLI by incorporating support for the flash-lite model and image processing capabilities. | ||
|
|
||
| ## Motivation | ||
|
|
||
| - **Flash-lite Model:** To offer a faster and potentially more cost-effective option for simpler tasks, improving user experience and performance. | ||
| - **Image Processing:** To enable users to interact with Gemini models that support image input and generation, expanding the CLI's utility. | ||
|
|
||
| ## Proposed Changes | ||
|
|
||
| 1. **Model Configuration:** | ||
| * Add `gemini-2.5-flash-lite` alias to `defaultModelConfigs.ts`. | ||
| * Add `gemini-2.5-image` and `gemini-2.5-flash-lite-image` aliases to `defaultModelConfigs.ts`. | ||
| * Update `resolveModel` in `models.ts` to correctly map these aliases to their respective model names, respecting the preview features flag. | ||
| 2. **Routing Strategy for Images:** | ||
| * Create a new `ImageStrategy` in `routing/strategies/ImageStrategy.ts`. | ||
| * This strategy will detect if a user request includes image parts (e.g., via `inlineData`) or explicitly asks for image generation. | ||
| * It will route such requests to the appropriate image-capable model (using the new aliases). | ||
| * Ensure that the use of preview image models is controlled by the `--preview` flag. | ||
| 3. **Testing:** | ||
| * Add unit tests for the `ImageStrategy` to cover cases with and without images, and with/without preview features enabled. | ||
| * Update golden files if necessary. | ||
|
|
||
| ## Acceptance Criteria | ||
|
|
||
| - Users can specify `flash-lite` as a model alias, and it correctly routes to the flash-lite model. | ||
| - Users can include image parts in their prompts, and the CLI correctly routes these to an image-capable model. | ||
| - Users can request image generation (e.g., using prompts like "create an image..."), and the CLI routes these to an image-capable model. | ||
| - Preview features flag correctly controls access to preview image models. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| # Feature: Implement script output summarization with excerpt preservation | ||
|
|
||
| ## Summary | ||
|
|
||
| This feature introduces a mechanism to summarize lengthy script outputs before they are processed by the main thinking model. Summarization will primarily use the 'flash-lite' model for efficiency, with options to skip summarization for short outputs or when explicitly disallowed. | ||
|
|
||
| ## Motivation | ||
|
|
||
| - **Efficiency:** Long script outputs can overwhelm the context window of the main thinking model and increase processing time and cost. Summarizing them first with a faster, cheaper model like 'flash-lite' can improve overall performance and cost-effectiveness. | ||
| - **Clarity:** Summaries can distill key information from complex script outputs, making them easier for the main LLM to process and act upon. | ||
| - **Preservation:** The summarization process should preserve critical excerpts from the original output to avoid loss of crucial information. | ||
|
|
||
| ## Proposed Changes | ||
|
|
||
| 1. **Shell Tool Output Tagging:** | ||
| * Modify `ShellToolInvocation.execute` to prepend a distinctive prefix (e.g., `[SHELL_OUTPUT] | ||
| `) to the `llmContent` of shell command results. This will help identify shell outputs for the routing strategy. | ||
| * Add a `toolSpecificInfo: { isShellOutput: true }` flag to the `ToolResult` to explicitly mark shell command outputs. | ||
| 2. **Summarization Routing Strategy:** | ||
| * Create a new `ScriptOutputSummarizationStrategy` in `routing/strategies/scriptOutputSummarizationStrategy.ts`. | ||
| * This strategy will inspect the `RoutingContext` for shell output indicators (`toolSpecificInfo.isShellOutput`). | ||
| * **Summarization Conditions:** | ||
| * Summarize if the script output is longer than a defined threshold (e.g., 500 characters). | ||
| * Skip summarization if the output is shorter than the threshold. | ||
| * **(Optional/Future):** Implement a mechanism to respect explicit "no summarization" requests from the model or user (this might require further discussion on how such a signal would be passed). | ||
| * Use the 'flash-lite' model (via `DEFAULT_GEMINI_FLASH_MODEL` alias) for summarization. | ||
| 3. **Model Router Integration:** | ||
| * Add `ScriptOutputSummarizationStrategy` to the `CompositeStrategy` in `ModelRouterService.ts`, ensuring it is evaluated before other strategies that might process general text. | ||
| 4. **Testing:** | ||
| * Add unit tests for `ScriptOutputSummarizationStrategy` to cover: | ||
| * Correct identification of shell output. | ||
| * Correct skipping of summarization for short outputs. | ||
| * Correct summarization of long outputs using 'flash-lite'. | ||
| * Proper handling of empty script outputs. | ||
| * Ensure the summarization prompt includes instructions to preserve key excerpts. | ||
|
|
||
| ## Acceptance Criteria | ||
|
|
||
| - Long script outputs are summarized using the 'flash-lite' model. | ||
| - Short script outputs are passed through without summarization. | ||
| - Summaries of script outputs retain critical information and key phrases from the original output. | ||
| - The system correctly identifies shell command outputs to apply this logic. |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,179 @@ | ||||||
|
|
||||||
| import { describe, it, expect, vi } from 'vitest'; | ||||||
| import { ImageStrategy } from './ImageStrategy'; | ||||||
| import { DEFAULT_GEMINI_MODEL, GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE } from '../../config/models'; | ||||||
| import { RoutingContext } from '../routingStrategy'; | ||||||
| import { Config } from '../../config/config'; | ||||||
|
|
||||||
| describe('ImageStrategy', () => { | ||||||
| const mockConfig = { | ||||||
| getModel: vi.fn(), | ||||||
| getPreviewFeatures: vi.fn(), | ||||||
| } as unknown as Config; | ||||||
|
|
||||||
| it('should return flash-lite image model if request has image and flash-lite is preferred general model', async () => { | ||||||
| const strategy = new ImageStrategy(); | ||||||
| const context = { | ||||||
| request: { | ||||||
| parts: [ | ||||||
| { text: 'Describe this image' }, | ||||||
| { inlineData: { mimeType: 'image/png', data: 'base64...' } }, | ||||||
| ], | ||||||
| }, | ||||||
| } as RoutingContext; | ||||||
|
|
||||||
| // Mock config to return flash-lite as preferred general model | ||||||
| vi.spyOn(mockConfig, 'getModel').mockReturnValue('flash-lite'); | ||||||
| vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(true); | ||||||
|
|
||||||
| const decision = await strategy.route(context, mockConfig, {} as any); | ||||||
|
|
||||||
| expect(decision).toEqual({ | ||||||
| model: 'gemini-2.5-flash-lite-image-preview', | ||||||
| metadata: { | ||||||
| source: 'image', | ||||||
| latencyMs: 0, | ||||||
| reasoning: 'Request contains an image.', | ||||||
| }, | ||||||
| }); | ||||||
| }); | ||||||
|
|
||||||
| it('should return pro image model if request has image and pro is preferred general model (preview enabled)', async () => { | ||||||
| const strategy = new ImageStrategy(); | ||||||
| const context = { | ||||||
| request: { | ||||||
| parts: [ | ||||||
| { text: 'Describe this image' }, | ||||||
| { inlineData: { mimeType: 'image/png', data: 'base64...' } }, | ||||||
| ], | ||||||
| }, | ||||||
| } as RoutingContext; | ||||||
|
|
||||||
| // Mock config to return pro as preferred general model | ||||||
| vi.spyOn(mockConfig, 'getModel').mockReturnValue('pro'); | ||||||
| vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(true); | ||||||
|
|
||||||
| const decision = await strategy.route(context, mockConfig, {} as any); | ||||||
|
|
||||||
| expect(decision).toEqual({ | ||||||
| model: 'gemini-2.5-pro-image-preview', | ||||||
| metadata: { | ||||||
| source: 'image', | ||||||
| latencyMs: 0, | ||||||
| reasoning: 'Request contains an image.', | ||||||
| }, | ||||||
| }); | ||||||
| }); | ||||||
|
|
||||||
| it('should return pro model if request has image and preview features are disabled', async () => { | ||||||
| const strategy = new ImageStrategy(); | ||||||
| const context = { | ||||||
| request: { | ||||||
| parts: [ | ||||||
| { text: 'Describe this image' }, | ||||||
| { inlineData: { mimeType: 'image/png', data: 'base64...' } }, | ||||||
| ], | ||||||
| }, | ||||||
| } as RoutingContext; | ||||||
|
|
||||||
| // Mock config to disable preview features | ||||||
| vi.spy10('getPreviewFeatures').mockReturnValue(false); | ||||||
|
||||||
| vi.spy10('getPreviewFeatures').mockReturnValue(false); | |
| vi.spyOn(mockConfig, 'getPreviewFeatures').mockReturnValue(false); |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,61 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RoutingStrategy, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RoutingContext, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| RoutingDecision, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } from '../routingStrategy'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { Config } from '../../config/config'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { BaseLlmClient } from '../../core/baseLlmClient'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GEMINI_MODEL_ALIAS_IMAGE, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| resolveModel, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| DEFAULT_GEMINI_MODEL, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } from '../../config/models'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| export class ImageStrategy implements RoutingStrategy { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| readonly name = 'image'; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| async route( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| context: RoutingContext, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| config: Config, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| baseLlmClient: BaseLlmClient, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ): Promise<RoutingDecision | null> { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| const hasImage = context.request.parts.some( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| (part) => | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 'inlineData' in part && part.inlineData?.mimeType.startsWith('image/'), | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| const textRequest = context.request.parts | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .map((part) => ('text' in part ? part.text : '')) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .join(' '); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| const requestsImageGeneration = /generate an image|create an image|draw a picture/i.test(textRequest); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if (hasImage || requestsImageGeneration) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| const preferredGeneralModel = config.getModel(); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Resolve the general model preference to its concrete name | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| const resolvedGeneralModel = resolveModel(preferredGeneralModel, config.getPreviewFeatures()); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| let modelToUse: string; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Check if the preferred general model is 'flash-lite' based on its name | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if (resolvedGeneralModel.includes('flash-lite')) { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Route to the flash-lite image model | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| modelToUse = GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } else { | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Fallback to pro image model or default pro model based on preview flag | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| modelToUse = config.getPreviewFeatures() ? 'gemini-2.5-pro-image-preview' : DEFAULT_GEMINI_MODEL; | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| const preferredGeneralModel = config.getModel(); | |
| // Resolve the general model preference to its concrete name | |
| const resolvedGeneralModel = resolveModel(preferredGeneralModel, config.getPreviewFeatures()); | |
| let modelToUse: string; | |
| // Check if the preferred general model is 'flash-lite' based on its name | |
| if (resolvedGeneralModel.includes('flash-lite')) { | |
| // Route to the flash-lite image model | |
| modelToUse = GEMINI_MODEL_ALIAS_FLASH_LITE_IMAGE; | |
| } else { | |
| // Fallback to pro image model or default pro model based on preview flag | |
| modelToUse = config.getPreviewFeatures() ? 'gemini-2.5-pro-image-preview' : DEFAULT_GEMINI_MODEL; | |
| } | |
| const previewEnabled = config.getPreviewFeatures(); | |
| let modelToUse: string; | |
| if (!previewEnabled) { | |
| modelToUse = DEFAULT_GEMINI_MODEL; | |
| } else { | |
| const preferredGeneralModel = config.getModel(); | |
| const resolvedGeneralModel = resolveModel(preferredGeneralModel, previewEnabled); | |
| if (resolvedGeneralModel.includes('flash-lite')) { | |
| modelToUse = 'gemini-2.5-flash-lite-image-preview'; | |
| } else { | |
| modelToUse = 'gemini-2.5-pro-image-preview'; | |
| } | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
ScriptOutputSummarizationStrategyis instantiated twice. This will cause the strategy to be executed twice for each routing decision, which is inefficient and may lead to unintended side effects. Please remove the duplicate instance.