[Gemma x Gemini CLI] Add an Experimental Gemma Router that uses a LiteRT-LM shim into the Composite Model Classifier Strategy by sidwan02 · Pull Request #17231 · google-gemini/gemini-cli

sidwan02 · 2026-01-21T19:35:34Z

Summary

This pull request introduces an experimental Gemma-based model routing strategy, including a new client for a local LiteRT-LM shim and its integration into the model router service.

Details

Experimental Gemma Router: Introduced an experimental Gemma Router that uses a LiteRT-LM shim for model classification within the Composite Model Classifier Strategy.
New Dependencies: Added jsonrepair for robust JSON parsing.
Configuration Options: New configuration settings for the Gemma Model Router are available under experimental settings, allowing control over its enablement, classifier host, and model.
Local LiteRT-LM Client: A new client has been implemented to facilitate communication with the LireRT-LM server shim, specifically designed for JSON responses and includes response cleaning.
Gemma Classifier Strategy: A GemmaClassifierStrategy was added to the model routing service, which uses the local Gemma model to classify task complexity (SIMPLE/COMPLEX) and route requests to appropriate models (flash/pro) based on a defined rubric.

Related Issues

Related to https://github.com/google-gemini/maintainers-gemini-cli/issues/1222

How to Validate

Please follow the instructions in go/gcli-litertlm-setup-pr

A separate PR with instructions will be merged shortly after this one.

Pre-Merge Checklist

google-cla · 2026-01-21T19:35:39Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist · 2026-01-21T19:35:56Z

Summary of Changes

Hello @sidwan02, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates an experimental Gemma Router into the CLI's model routing mechanism. The primary goal is to enhance the system's ability to intelligently select the most suitable large language model (either a 'flash' or 'pro' model) for a given user request, based on its perceived complexity. This is achieved by leveraging a local Gemini-compatible API, allowing for more efficient and context-aware model dispatching without relying solely on external services.

Highlights

Experimental Gemma Router: Introduced an experimental Gemma Router that uses a LiteRT-LM shim for model classification within the Composite Model Classifier Strategy.
New Dependencies: Added jsonrepair for robust JSON parsing and ollama for interacting with local LLMs, along with its sub-dependency whatwg-fetch.
Configuration Options: New configuration settings for the Gemma Model Router are available under experimental settings, allowing control over its enablement, classifier host, and model.
Local Gemini Client: A new LocalGeminiClient has been implemented to facilitate communication with local Gemini-compatible APIs, specifically designed for JSON responses and includes response cleaning.
Gemma Classifier Strategy: A GemmaClassifierStrategy was added to the model routing service, which uses the local Gemma model to classify task complexity (SIMPLE/COMPLEX) and route requests to appropriate models (flash/pro) based on a defined rubric.

Ignored Files

Ignored by pattern: .gemini/** (1)
- .gemini/settings.json

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an experimental Gemma-based model router, utilizing a LocalGeminiClient for local Gemini-compatible API communication and a GemmaClassifierStrategy to route user requests. A high-severity security issue was identified in the LocalGeminiClient due to the unconditional writing of sensitive LLM interaction data (prompts and responses) to a local file (router.txt), which poses a significant data leakage risk and requires remediation. Furthermore, the review highlighted a critical issue in chat history processing for the classifier, potentially impacting accuracy, and another high-severity issue in the LocalGeminiClient concerning incorrect role mapping.

packages/core/src/routing/strategies/gemmaClassifierStrategy.ts

packages/core/src/core/localGeminiClient.ts

packages/core/src/core/localLiteRtLmClient.ts

gemini-cli · 2026-01-24T16:45:29Z

Hi there! Thank you for your contribution to Gemini CLI.

To improve our contribution process and better track changes, we now require all pull requests to be associated with an existing issue, as announced in our recent discussion and as detailed in our CONTRIBUTING.md.

This pull request is being closed because it is not currently linked to an issue. You can easily reopen this PR once you have linked it to an issue.

How to link an issue:
Add a keyword followed by the issue number (e.g., Fixes #123) in the description of your pull request. For more details, see the GitHub Documentation.

Thank you for your understanding and for being a part of our community!

sidwan02 · 2026-02-05T19:37:50Z

Related to https://github.com/google-gemini/maintainers-gemini-cli/issues/1222

allenhutchison · 2026-02-05T19:41:04Z

Go ahead and rebase and resolve conflicts, and work through the bot feedback, and then I'll take a deeper look and run the tests.

sidwan02 · 2026-02-05T22:49:18Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces an experimental Gemma router that uses a LiteRT-LM shim for model classification. A high-severity security issue was identified in the LocalGeminiClient class, where sensitive chat history is unconditionally written to a local file, posing a significant privacy risk. Additionally, a critical issue involves a debug feature writing files into the user's workspace, and a high-severity issue concerns role handling in the new client, which could lead to silent failures.

packages/core/src/core/localGeminiClient.ts

packages/core/src/core/localLiteRtLmClient.ts

…a3:1b.

…assifier within composite strategy.

…rt-lm.

…all debugging.

…g port is provided.

…t history logic

…l + fix router tests.

allenhutchison

Hey Sid, thanks for putting this together — the experimental gating and graceful fallback are both done really well. This is a first pass with an LLM helping me review the diff. I'm going to follow up with some manual testing this week.

1. Client instantiation on every route() call (request)

The LocalLiteRtLmClient is being instantiated on every route() call (gemmaClassifierStrategy.ts:213), which creates a new GoogleGenAI SDK client each time. Could you refactor this to lazily initialize the client on first use and reuse it across calls? The existing ClassifierStrategy avoids this by using the injected baseLlmClient — a similar pattern here would be ideal.

2. Move jsonrepair to the LiteRT-LM shim (request)

Would it be possible to move the JSON repair logic into the LiteRT-LM shim itself rather than adding jsonrepair as a production dependency in the core package? The shim is the layer that knows about the model's quirks, so it feels like the right place to handle malformed output. That would keep the core package from taking on a dependency for a single experimental feature.

3. Prompt duplication between system prompt and reminder (request)

The rubric and JSON schema are duplicated between LITERT_GEMMA_CLASSIFIER_SYSTEM_PROMPT and LITERT_GEMMA_CLASSIFIER_REMINDER. Could you extract the shared pieces (rubric, schema, output format) into constants and compose both prompts from them? That way if the rubric evolves there's only one place to update.

4. flattenChatHistory non-text parts (suggestion)

Minor: in flattenChatHistory, the .map((part) => part.text) will produce "undefined" strings for any non-text parts that slip through the earlier filter. A .filter(Boolean) after the map would be a small safety net.

5. Unrelated lockfile churn (request)

The lockfile diff has a lot of unrelated "peer": true removals across packages like vite, vitest, react, express, etc. Could you revert those? And if we move jsonrepair to the LiteRT-LM shim per the earlier comment, there shouldn't be any dependency changes needed here at all.

6. Comment on apiKey (nit)

Nit: could you add a brief comment above apiKey: 'no-api-key-needed' explaining that the SDK requires an API key even for local endpoints? It'll save future readers a head-scratch.

7. AbortSignal propagation (request)

The existing ClassifierStrategy passes context.signal through to the LLM call so cancellation works. GemmaClassifierStrategy doesn't propagate the signal to LocalLiteRtLmClient.generateJson(), so a user cancellation would hang until the 10s timeout. Could you thread the AbortSignal through?

sidwan02 · 2026-02-09T19:11:22Z

Thanks Allen!

Resolved in 651471e
LiteRT-LM team is aware of this and working on it.
Resolved in ba64940
Resolved in 41333cb
I'll resolve this after the jsonrepair dependency is removed
Just checked with LiteRT-LM and there isn't a need for an api key when using the shim. Eg, you can just do curl "http://localhost:9379/v1beta/models/gemma3-1b-gpu-custom:generateContent" -H 'Content-Type: application/json' -X POST -d '{"contents":[{"role":"user","parts":[{"text":"Tell me a joke."}]}]}' - did you mean a comment explaining that the Gemini API key is still needed for the Gemini CLI?
Resolved in 9810bdb. However, the LiteRT-LM shim currently doesn't do anything with the signal, so even if the user can cancel query routing requests and send new requests it's unclear if the model needs to wait for canceled requests to generate. I've put in a feature request with the LiteRT-LM team.

douglas-reid

LGTM

docs/cli/settings.md

docs/get-started/configuration.md

…ing no-api-key for LiteRT-LM shim

…andle abortsignal and removed jsonrepair

sidwan02 · 2026-02-18T23:07:11Z

apiKey comment, jsonrepair removal and AbortSignal handling are resolved in 9810bdb, e38f5a1, 6c9e650.

allenhutchison

Thank you for addressing the comments I left last time. I think this is ready to go in. I did find one bug when using vertex instead of vanilla api keys, I shared the details offline.

…eRT-LM shim into the Composite Model Classifier Strategy (google-gemini#17231) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Allen Hutchison <[email protected]>

sidwan02 requested review from a team as code owners January 21, 2026 19:35

sidwan02 marked this pull request as draft January 21, 2026 19:36

sidwan02 force-pushed the gemma-router-pr branch from 0ed6b9a to 26900c9 Compare January 21, 2026 19:38

gemini-code-assist bot reviewed Jan 21, 2026

View reviewed changes

packages/core/src/routing/strategies/gemmaClassifierStrategy.ts Show resolved Hide resolved

packages/core/src/core/localGeminiClient.ts Outdated Show resolved Hide resolved

packages/core/src/core/localLiteRtLmClient.ts Show resolved Hide resolved

gemini-cli bot added the priority/p1 Important and should be addressed in the near term. label Jan 21, 2026

gemini-cli bot closed this Jan 24, 2026

allenhutchison reopened this Feb 5, 2026

allenhutchison self-assigned this Feb 5, 2026

gemini-code-assist bot reviewed Feb 5, 2026

View reviewed changes

packages/core/src/core/localGeminiClient.ts Outdated Show resolved Hide resolved

packages/core/src/core/localLiteRtLmClient.ts Show resolved Hide resolved

sidwan02 force-pushed the gemma-router-pr branch 3 times, most recently from ea53d90 to 9da39ce Compare February 6, 2026 00:49

sidwan02 added 11 commits February 6, 2026 11:35

Add quick support for ollama and litert-lm router inference with gemm…

3ebe580

…a3:1b.

Remove ollama router and modularize liter-lm router into new gemma cl…

2d47336

…assifier within composite strategy.

Simplify gemmaModelRouter experimental setting to implicitly use lite…

222ae37

…rt-lm.

Add engineered gemma classifier system prompt and reminder + router c…

4bed6b9

…all debugging.

Flatten chat history in gemma Classifier for much better performance.

06e8059

In Gemma router fix TCP timeout if litert-lm shim is started but wron…

04aab94

…g port is provided.

Add tests for Gemma router and LiteRT services + simplify flatten cha…

4dbce54

…t history logic

Add tests for gemma model router settings.

8e22c16

Remove unused ollama and reminder prompt config dependencies

f556d5b

Update gemma model router classifier defaults to latest host and mode…

c9f8b57

…l + fix router tests.

Clean up router logs and debug chat history file

960a6f4

allenhutchison reviewed Feb 8, 2026

View reviewed changes

sidwan02 added 4 commits February 8, 2026 14:47

Lazily initialize litertlm client and reuse in new calls to gemma router

651471e

Remove prompt duplication between gemma3:1b system prompt and reminder

ba64940

gemma classifier prompt filters out non text parts

41333cb

Propagate abort signal to litertlm client in gemma classifier

9810bdb

douglas-reid reviewed Feb 9, 2026

View reviewed changes

docs/cli/settings.md Outdated Show resolved Hide resolved

docs/get-started/configuration.md Outdated Show resolved Hide resolved

sidwan02 added 3 commits February 18, 2026 13:47

Retire jsonrepair in client (shim handles this) + add comment explain…

e38f5a1

…ing no-api-key for LiteRT-LM shim

Update tests for gemmaClassifierStrategy and localLiteRtLmClient to h…

6c9e650

…andle abortsignal and removed jsonrepair

Update gemma router documentation with requirements and alerts

b1d8834

sidwan02 added 2 commits February 18, 2026 17:13

Only allow gemma3-1b-gpu-custom to be set for gemma model router.

6d48b06

Reset package-lock.json and vitest.config.ts

fca16d8

sidwan02 force-pushed the gemma-router-pr branch from 02f056c to fca16d8 Compare February 19, 2026 01:23

allenhutchison approved these changes Feb 24, 2026

View reviewed changes

SandyTao520 force-pushed the main branch from 4a78a96 to 0bf3a01 Compare February 26, 2026 03:03

sidwan02 and others added 3 commits February 26, 2026 09:17

Merge branch 'synced' into gemma-router-pr

a98e133

Final test fixes with preflight

c7a0b7e

Merge branch 'main' into gemma-router-pr

1e85554

allenhutchison enabled auto-merge February 26, 2026 18:21

allenhutchison disabled auto-merge February 26, 2026 18:42

Merge branch 'main' into gemma-router-pr

7ab16b7

allenhutchison added this pull request to the merge queue Feb 26, 2026

Merged via the queue into google-gemini:main with commit 9b7852f Feb 26, 2026
26 of 27 checks passed

github-actions bot mentioned this pull request Feb 27, 2026

📊 AI CLI 工具社区动态日报 2026-02-27 duanyytop/agents-radar#20

Open

gemini-code-assist bot mentioned this pull request Feb 27, 2026

Changelog for v0.32.0-preview.0 #20627

Merged

douglas-reid mentioned this pull request Mar 6, 2026

docs(local model routing): add docs on how to use Gemma for local model routing #21365

Merged

15 tasks

Conversation

sidwan02 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Related Issues

How to Validate

Pre-Merge Checklist

Uh oh!

google-cla bot commented Jan 21, 2026

Uh oh!

gemini-code-assist bot commented Jan 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-cli bot commented Jan 24, 2026

Uh oh!

sidwan02 commented Feb 5, 2026

Uh oh!

allenhutchison commented Feb 5, 2026

Uh oh!

sidwan02 commented Feb 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

allenhutchison left a comment

Choose a reason for hiding this comment

Uh oh!

sidwan02 commented Feb 9, 2026

Uh oh!

douglas-reid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sidwan02 commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allenhutchison left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sidwan02 commented Jan 21, 2026 •

edited

Loading

sidwan02 commented Feb 18, 2026 •

edited

Loading