docs(local model routing): add docs on how to use Gemma for local model routing by douglas-reid · Pull Request #21365 · google-gemini/gemini-cli

douglas-reid · 2026-03-06T03:41:44Z

Summary

Adds docs for the experimental feature of Gemma Model Routing.

Details

Related Issues

Issue: https://github.com/google-gemini/maintainers-gemini-cli/issues/1222
Prior PR: [Gemma x Gemini CLI] Add an Experimental Gemma Router that uses a LiteRT-LM shim into the Composite Model Classifier Strategy #17231

How to Validate

Run through the steps of the local-model-routing.md for your architecture.
Ensure that you see the routing requests hit your local endpoint.

Pre-Merge Checklist

gemini-code-assist · 2026-03-06T03:41:58Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces extensive documentation for an experimental feature in the Gemini CLI: local model routing using Gemma. The changes enable users to leverage locally-running Gemma models for routing decisions, potentially reducing costs and offering comparable latency to hosted models. The new documentation covers setup, configuration, and validation steps, integrating this new capability into the existing model routing and core documentation.

Highlights

New Documentation for Local Model Routing: Added comprehensive documentation explaining how to configure and use local Gemma models for routing decisions within the Gemini CLI. This feature is experimental and aims to reduce costs associated with hosted model usage.
Updated Model Selection Precedence: The model-routing.md document was updated to include local models in the model selection precedence list, clarifying their role when configured in settings.json.
Detailed Setup Instructions: The new local-model-routing.md file provides step-by-step instructions for setting up a local Gemma model, including downloading the LiteRT-LM runtime, pulling the Gemma model, starting the runtime, and verifying model serving across Windows, Linux, and MacOS.
Configuration Schema for Local Routing: A detailed configuration schema for enabling local Gemma model routing in settings.json is provided, outlining required fields like enabled, classifier.host, and classifier.model.

Changelog

docs/cli/model-routing.md
- Added a new section titled 'Local Model Routing (Experimental)' explaining the feature's purpose and benefits.
- Updated the 'Model selection precedence' list to include 'Local model (experimental)' as a routing option.
docs/core/index.md
- Added a new entry to the core documentation index linking to the 'Local Model Routing (experimental)' guide.
docs/core/local-model-routing.md
- Added a new document detailing the setup process for local Gemma model routing, including instructions for LiteRT-LM runtime download and model pulling.
- Included platform-specific instructions for Windows, Linux, and MacOS for runtime setup and verification.
- Provided a JSON configuration example and schema for enabling the local Gemma model router in settings.json.

Activity

Updated relevant documentation and README (if needed).
Validated on MacOS using npm run.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds documentation for the new experimental local model routing feature using Gemma. The changes are primarily in markdown files, introducing a new guide and updating existing ones. My review found a few critical and high-severity issues in the documentation that need to be addressed for accuracy and to prevent user confusion. Specifically, the model selection precedence has been updated incorrectly, a URL in the setup guide is broken, and the list of supported Gemma models is inconsistent with the implementation, which could lead to runtime errors for users following the guide.

_{Note: Security Review has been skipped due to the limited scope of the PR.}

docs/core/local-model-routing.md

docs/cli/model-routing.md

docs/core/local-model-routing.md

sidwan02

LGTM

docs/cli/model-routing.md

docs/core/local-model-routing.md

allenhutchison

A few nitpicks after running through this on a personal mac os machine (Tahoe 26.4). Otherwise everything worked great.

allenhutchison · 2026-03-12T16:42:03Z

docs/core/local-model-routing.md

+   [lit-macos-arm64](https://github.com/google-ai-edge/LiteRT-LM/releases/download/v0.9.0-alpha03/lit.macos_arm64).
+2. Ensure the binary is executable: `chmod a+x lit.macos_arm64`
+3. (Optional) Test starting the runtime: `./lit.macos_arm64 serve --verbose`
+


I ran this on a fresh mac OS device today and got tripped up by Mac OS security settings. By default mac os only allows binaries from "App Store and Known Developers" so when I tried to run the server it would fail with a message that offered to move it to the trash. I had to go to Settings -> Privacy & Security and click "Allow Anyway" unders "lit.macos_arm64" was blocked to protect your Mac.

Added language to this effect. PTAL.

docs/core/local-model-routing.md

allenhutchison · 2026-03-12T16:50:57Z

Also looks like you need to rerun the lint to pass the CI.

mattKorwel

Tested across Mac, Windows and linux. 🎉 LGTM

douglas-reid · 2026-03-12T20:33:51Z

Updated to address comments and reran the format.

…el routing (#21365) Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Allen Hutchison <adh@google.com> Co-authored-by: matt korwel <matt.korwel@gmail.com>

…el routing (google-gemini#21365) Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com> Co-authored-by: Allen Hutchison <adh@google.com> Co-authored-by: matt korwel <matt.korwel@gmail.com>

douglas-reid added 2 commits March 5, 2026 19:01

docs(model routing): add details on experimental Gemma usage

e0d0d14

Add links to published release artifacts

2524dc8

douglas-reid requested review from a team as code owners March 6, 2026 03:41

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

docs/core/local-model-routing.md Outdated Show resolved Hide resolved

docs/cli/model-routing.md Outdated Show resolved Hide resolved

docs/core/local-model-routing.md Show resolved Hide resolved

douglas-reid added 2 commits March 5, 2026 19:49

clarify language in model-routing.md

53b02a6

remove other model references

6c98228

gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Mar 6, 2026

whhone reviewed Mar 6, 2026

View reviewed changes