Skip to content

Add some dos and don'ts to behavioral evals README.#20629

Merged
gundermanc merged 2 commits intomainfrom
gundermanc/evals-readme
Mar 2, 2026
Merged

Add some dos and don'ts to behavioral evals README.#20629
gundermanc merged 2 commits intomainfrom
gundermanc/evals-readme

Conversation

@gundermanc
Copy link
Copy Markdown
Member

Add best practices to README.

@gundermanc gundermanc requested a review from a team as a code owner February 27, 2026 23:12
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the documentation for behavioral evaluations by providing comprehensive best practices and a new guide for troubleshooting and fixing evaluations. Additionally, it improves the readability and conciseness of grep tool outputs for better LLM consumption and increases the content generation timeout for improved stability.

Highlights

  • Behavioral Evals Documentation: The evals/README.md has been significantly updated to include best practices for designing behavioral evaluations and detailed instructions on how to fix failing evaluations using the new /fix-behavioral-eval command.
  • Grep Tool Output Formatting: The output format for both the grep and ripGrep tools has been refined to be more concise and LLM-friendly. Line numbers are now grouped in the file header, and individual line prefixes have been removed, with '...' indicating non-consecutive matched lines.
  • Content Generator Timeout: The HTTP timeout for the content generation process has been increased to 180 seconds to accommodate potentially longer generation times.
Changelog
  • evals/README.md
    • Expanded the introductory description of behavioral evaluations to include their role in assessing feature reliability and preventing regressions.
    • Added a new 'Best Practices' section with guidelines for designing effective behavioral evals, covering aspects like realistic complexity, maintainable size, unambiguous assertions, 'fail first' principle, and preferring fewer, more realistic tests.
    • Introduced a comprehensive 'Fixing Evaluations' section, detailing the /fix-behavioral-eval command's functionality (Investigate, Fix, Verify, Report) and providing usage examples and manual debugging tips.
  • packages/core/src/core/contentGenerator.ts
    • Increased the HTTP timeout for the GoogleGenAI client to 180000 milliseconds (3 minutes).
  • packages/core/src/tools/grep.test.ts
    • Updated expected llmContent in various grep tool tests to reflect the new, more concise output format, removing individual 'L:' prefixes.
  • packages/core/src/tools/grep.ts
    • Refactored the GrepToolInvocation to format grep results more concisely for LLM consumption, grouping matched line numbers in the file header and indicating non-consecutive lines with '...'.
  • packages/core/src/tools/ripGrep.test.ts
    • Updated expected llmContent in various ripgrep tool tests to reflect the new, more concise output format, removing individual 'L:' prefixes.
  • packages/core/src/tools/ripGrep.ts
    • Refactored the GrepToolInvocation (for ripgrep) to format results more concisely for LLM consumption, grouping matched line numbers in the file header and indicating non-consecutive lines with '...'.
Ignored Files
  • Ignored by pattern: .gemini/** (1)
    • .gemini/commands/fix-behavioral-eval.toml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gundermanc gundermanc force-pushed the gundermanc/evals-readme branch from 21dab06 to 002b0dc Compare February 27, 2026 23:13
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation for behavioral evals and refactors the output of the grep and ripGrep tools to be more readable. It also adds a timeout to API calls. My review focuses on a code duplication issue in the grep and ripGrep tools. The new formatting logic is identical in both files and should be extracted into a shared utility function to improve maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

packages/core/src/tools/grep.ts (259-276)

high

This block of code for formatting the grep results is duplicated in packages/core/src/tools/ripGrep.ts. To improve maintainability and avoid potential inconsistencies in the future, this logic should be extracted into a shared utility function. This function could take matchesByFile as an argument and return the formatted string.

packages/core/src/tools/ripGrep.ts (279-296)

high

This code block is a duplicate of the formatting logic found in packages/core/src/tools/grep.ts. As mentioned in the other comment, this should be refactored into a shared utility function to adhere to the DRY (Don't Repeat Yourself) principle.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 27, 2026

Size Change: -2 B (0%)

Total Size: 25.8 MB

ℹ️ View Unchanged
Filename Size Change
./bundle/gemini.js 25.3 MB -2 B (0%)
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 221 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 227 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 11.5 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B 0 B
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B
./bundle/sandbox-macos-strict-open.sb 4.82 kB 0 B
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB 0 B

compressed-size-action

@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Feb 27, 2026
Copy link
Copy Markdown
Contributor

@abhipatel12 abhipatel12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gundermanc gundermanc enabled auto-merge March 2, 2026 23:02
@gundermanc gundermanc added this pull request to the merge queue Mar 2, 2026
Merged via the queue into main with commit 25f59a0 Mar 2, 2026
27 checks passed
@gundermanc gundermanc deleted the gundermanc/evals-readme branch March 2, 2026 23:24
BryanBradfo pushed a commit to BryanBradfo/gemini-cli that referenced this pull request Mar 5, 2026
struckoff pushed a commit to struckoff/gemini-cli that referenced this pull request Mar 6, 2026
liamhelmer pushed a commit to badal-io/gemini-cli that referenced this pull request Mar 12, 2026
warrenzhu25 pushed a commit to warrenzhu25/gemini-cli that referenced this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants