Skip to content

docs(local model routing): add docs on how to use Gemma for local model routing#21365

Merged
allenhutchison merged 10 commits intogoogle-gemini:mainfrom
douglas-reid:gemma/local-model-routing-docs
Mar 12, 2026
Merged

docs(local model routing): add docs on how to use Gemma for local model routing#21365
allenhutchison merged 10 commits intogoogle-gemini:mainfrom
douglas-reid:gemma/local-model-routing-docs

Conversation

@douglas-reid
Copy link
Copy Markdown
Contributor

Summary

Adds docs for the experimental feature of Gemma Model Routing.

Details

Related Issues

How to Validate

  1. Run through the steps of the local-model-routing.md for your architecture.
  2. Ensure that you see the routing requests hit your local endpoint.

Pre-Merge Checklist

  • [ X ] Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • [ X ] MacOS
      • [ X ] npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

@douglas-reid douglas-reid requested review from a team as code owners March 6, 2026 03:41
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces extensive documentation for an experimental feature in the Gemini CLI: local model routing using Gemma. The changes enable users to leverage locally-running Gemma models for routing decisions, potentially reducing costs and offering comparable latency to hosted models. The new documentation covers setup, configuration, and validation steps, integrating this new capability into the existing model routing and core documentation.

Highlights

  • New Documentation for Local Model Routing: Added comprehensive documentation explaining how to configure and use local Gemma models for routing decisions within the Gemini CLI. This feature is experimental and aims to reduce costs associated with hosted model usage.
  • Updated Model Selection Precedence: The model-routing.md document was updated to include local models in the model selection precedence list, clarifying their role when configured in settings.json.
  • Detailed Setup Instructions: The new local-model-routing.md file provides step-by-step instructions for setting up a local Gemma model, including downloading the LiteRT-LM runtime, pulling the Gemma model, starting the runtime, and verifying model serving across Windows, Linux, and MacOS.
  • Configuration Schema for Local Routing: A detailed configuration schema for enabling local Gemma model routing in settings.json is provided, outlining required fields like enabled, classifier.host, and classifier.model.
Changelog
  • docs/cli/model-routing.md
    • Added a new section titled 'Local Model Routing (Experimental)' explaining the feature's purpose and benefits.
    • Updated the 'Model selection precedence' list to include 'Local model (experimental)' as a routing option.
  • docs/core/index.md
    • Added a new entry to the core documentation index linking to the 'Local Model Routing (experimental)' guide.
  • docs/core/local-model-routing.md
    • Added a new document detailing the setup process for local Gemma model routing, including instructions for LiteRT-LM runtime download and model pulling.
    • Included platform-specific instructions for Windows, Linux, and MacOS for runtime setup and verification.
    • Provided a JSON configuration example and schema for enabling the local Gemma model router in settings.json.
Activity
  • Updated relevant documentation and README (if needed).
  • Validated on MacOS using npm run.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation for the new experimental local model routing feature using Gemma. The changes are primarily in markdown files, introducing a new guide and updating existing ones. My review found a few critical and high-severity issues in the documentation that need to be addressed for accuracy and to prevent user confusion. Specifically, the model selection precedence has been updated incorrectly, a URL in the setup guide is broken, and the list of supported Gemma models is inconsistent with the implementation, which could lead to runtime errors for users following the guide.

Note: Security Review has been skipped due to the limited scope of the PR.

@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Mar 6, 2026
Copy link
Copy Markdown
Contributor

@sidwan02 sidwan02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@douglas-reid douglas-reid force-pushed the gemma/local-model-routing-docs branch from 58eb42c to bde3299 Compare March 11, 2026 16:44
Copy link
Copy Markdown
Contributor

@allenhutchison allenhutchison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks after running through this on a personal mac os machine (Tahoe 26.4). Otherwise everything worked great.

[lit-macos-arm64](https://github.com/google-ai-edge/LiteRT-LM/releases/download/v0.9.0-alpha03/lit.macos_arm64).
2. Ensure the binary is executable: `chmod a+x lit.macos_arm64`
3. (Optional) Test starting the runtime: `./lit.macos_arm64 serve --verbose`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this on a fresh mac OS device today and got tripped up by Mac OS security settings. By default mac os only allows binaries from "App Store and Known Developers" so when I tried to run the server it would fail with a message that offered to move it to the trash. I had to go to Settings -> Privacy & Security and click "Allow Anyway" unders "lit.macos_arm64" was blocked to protect your Mac.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added language to this effect. PTAL.

@allenhutchison
Copy link
Copy Markdown
Contributor

Also looks like you need to rerun the lint to pass the CI.

@mattKorwel mattKorwel removed the status/need-issue Pull requests that need to have an associated issue. label Mar 12, 2026
Copy link
Copy Markdown
Collaborator

@mattKorwel mattKorwel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested across Mac, Windows and linux. 🎉 LGTM

@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Mar 12, 2026
@douglas-reid
Copy link
Copy Markdown
Contributor Author

Updated to address comments and reran the format.

@allenhutchison allenhutchison added this pull request to the merge queue Mar 12, 2026
Merged via the queue into google-gemini:main with commit 5abc170 Mar 12, 2026
26 of 27 checks passed
ruomengz pushed a commit that referenced this pull request Mar 13, 2026
…el routing (#21365)

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Allen Hutchison <adh@google.com>
Co-authored-by: matt korwel <matt.korwel@gmail.com>
SUNDRAM07 pushed a commit to SUNDRAM07/gemini-cli that referenced this pull request Mar 30, 2026
…el routing (google-gemini#21365)

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Allen Hutchison <adh@google.com>
Co-authored-by: matt korwel <matt.korwel@gmail.com>
warrenzhu25 pushed a commit to warrenzhu25/gemini-cli that referenced this pull request Apr 9, 2026
…el routing (google-gemini#21365)

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Allen Hutchison <adh@google.com>
Co-authored-by: matt korwel <matt.korwel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants