ci(evals): only run evals in CI if prompts or tools changed#20898
ci(evals): only run evals in CI if prompts or tools changed#20898gundermanc merged 7 commits intomainfrom
Conversation
|
Size Change: -2 B (0%) Total Size: 25.8 MB ℹ️ View Unchanged
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the continuous integration process by implementing a conditional execution strategy for nightly evaluation tests. The primary goal is to prevent unnecessary CI blockages by ensuring these resource-intensive tests only run when changes directly impacting prompts or tools are detected, thereby streamlining the development workflow and improving CI efficiency. Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a script to conditionally run CI evaluations based on whether prompt or tool files have been modified. The logic is sound, defaulting to running evaluations if any error occurs. However, I've identified a potential issue where the script hardcodes the main branch for comparison, which could lead to incorrect behavior on PRs targeting other branches. My review includes a suggestion to make the script more robust by dynamically determining the target branch from CI environment variables.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
scripts/changed_prompt.js
Outdated
| stdio: 'ignore', | ||
| }); | ||
|
|
||
| // Find the merge base with main |
There was a problem hiding this comment.
it looks like the code find the merge base with the target branch instead of main?
| */ | ||
| import { execSync } from 'node:child_process'; | ||
|
|
||
| const EVALS_FILE_PREFIXES = [ |
There was a problem hiding this comment.
consider adding the evals/ directory too
There was a problem hiding this comment.
Ok, added. Note that as written, only the ALWAYS_PASSES evals will end up getting run during the CI. Breaking changes to USUALLY_PASSES ones will not.
I guess we could make it smart enough to compute the delta, but I'd rather aspire to stabilizing as many tests as possible and running them during the CI by default.
…emini#20898) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…emini#20898) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…emini#20898) Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Update the CI to require nightly evals to pass, but only when making changes to prompt and tools.
This change comes in the wake of an issue where the CI was blocked on evals this morning. I temporarily removed the block: #20870
I then investigated, and identified and fixed this regression, which caused the failure: #20890
Now that the tests are once again passing, I am making them required, but only for changes to prompts and tools, to minimize the impact of future failures.