-
Notifications
You must be signed in to change notification settings - Fork 1
Fix GeminiCoder CLI invocation to use positional arguments #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…es by using consistent printing methods.
…I calls fail, and to DummyMetric if Claude fails.
… logging in runner.py. Added test configuration to support log capture for assertions that downgrade was successful.
… logging in runner.py. Added test configuration to support log capture for assertions that downgrade was successful. Addressed ruff warnings.
…ric after the downgrade.
… verbosity temporarily to debug Claude judge unit test on build server. Adjusted logic to work when multiple coders are specified. Improved log messages.
…ic to DummyMetric.
… for the quota exhaustion fallback logic.
…ic downgrade to DummyMetric on quota check failure. Added notes on potential improvements to unit tests.
…to avoid default encoding errors.
…herwise, create an empty file as UTF-8. Partially addresses Issue #24.
…verwrite. Enforced UTF-8 encoding, switched to safe_dump and added document delimiter between records. Also simplified document generation. Fixes issue #24. Added second test case to literature_mcp_encoding_test.yaml for testing.
… on Windows (where the default codepage is cp1252).
… config. Fixes Issue #27.
…o address validation errors in test suite.
…aude-sonnet-4-20250514.
…lity improvement to support fix for Issue #29. Adding as individual commit in case it needs to be rolled back.
…g the system keyring for secrets (GOOSE_DISABLE_KEYRING).
…stead of unix-specific "." to specify current working directory.
…ble to avoid interfering with unix environment (shell history, etc.). Separate commit in case this needs to be rolled back.
…ath is not being referenced, but rather a server combination.
|
I'll analyze this and get back to you. |
The gemini CLI (and potentially other CLIs) were hanging because they wait for stdin input. This fix adds stdin=PIPE to the Popen call and immediately closes it, which is equivalent to redirecting from /dev/null. This prevents any subprocess from blocking while waiting for stdin input, without affecting functionality since run_process() doesn't support passing stdin data - all coders pass prompts via command-line arguments. Fixes the hanging issue reported with GeminiCoder.
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini", text]Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests verify graceful degradation at each level with proper error handling and logging. 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini", text]Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests verify graceful degradation at each level with proper error handling and logging. 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
|
Claude finished @justaddcoffee's task —— View job PR Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini", text]Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests verify graceful degradation at each level with proper error handling and logging. 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
- Add -m flag support to pass model parameter to gemini CLI - Follows same pattern as other coders (e.g., opencode) - Tested with gemini-2.5-flash model
|
I'll analyze this and get back to you. |
When MCP servers are configured in .gemini/settings.json, the gemini CLI 0.18.0 ignores positional arguments and enters interactive mode instead of processing the prompt. This causes the evaluation to hang indefinitely. The fix changes from using a positional argument: gemini "prompt text" To using the -p flag: gemini -p "prompt text" This workaround allows gemini to correctly process prompts even when MCP servers are configured. Tested with artl-mcp server configured and verified that prompts are now processed correctly instead of hanging.
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests verify graceful degradation at each level with proper error handling and logging. 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
Gemini CLI restricts MCP tool file access to workspace directories. Without --include-directories, MCP tools like read_file fail with: "File path must be within one of the workspace directories" This causes gemini to ask for clarification instead of completing tasks in headless mode with -p flag. Solution: Pass --include-directories pointing to the workdir, allowing MCP tools to access files in the evaluation workspace. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests verify graceful degradation at each level with proper error handling and logging. 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests verify graceful degradation at each level with proper error handling and logging. 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests verify graceful degradation at each level with proper error handling and logging. 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Add workspace directory so MCP tools can access files
command.extend(["--include-directories", str(Path.cwd())])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests in 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
The gemini CLI was running in interactive mode by default, waiting for additional input after processing the prompt. Adding --output-format text enables non-interactive/headless mode, allowing gemini to exit after completing the prompt. This fixes the evaluation hanging issue where gemini would output 'Okay, my setup is complete. I'm ready for your first command.' and wait indefinitely for stdin input.
The gemini CLI was hanging indefinitely because it was entering interactive mode when invoked with the -p flag. This fix switches to using positional arguments instead, which causes gemini to exit after processing the prompt. Changes: - Modified gemini.py to pass prompt as positional argument - Removed -p flag usage - Updated log message to reflect new approach 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The CodexCoder class was incorrectly using the claude CLI command instead of the codex CLI. This fixes the command to properly invoke codex with the correct arguments: - codex exec --json --dangerously-bypass-approvals-and-sandbox 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Claude finished @justaddcoffee's task —— View job 🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Add workspace directory so MCP tools can access files
command.extend(["--include-directories", str(Path.cwd())])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests in 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Add workspace directory so MCP tools can access files
command.extend(["--include-directories", str(Path.cwd())])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests in 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
- Add supports_mcp() classmethod returning True - Add default_config_paths() for AGENTS.md and .codex/config.toml - Add mcp_config_to_codex_format() to convert MCPConfig - Add _generate_toml_config() to create TOML config string - Update default_config_objects() to create .codex/config.toml with MCP servers - Update docstring with proper Codex documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Add workspace directory so MCP tools can access files
command.extend(["--include-directories", str(Path.cwd())])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests in 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|
Remove HOME=. override that was breaking authentication. Codex CLI reads .codex/config.toml from the current directory automatically, so overriding HOME is unnecessary and prevents codex from accessing OpenAI API credentials. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
|
Claude finished @justaddcoffee's task —— View job PR Code Review: Fix GeminiCoder CLI invocation ✅Conducting comprehensive code review focusing on code quality, potential bugs, performance, security, and test coverage. Review Checklist:
🎯 Overall Assessment: APPROVEDThis is a high-quality PR that addresses a critical bug while introducing valuable evaluation framework enhancements. The changes are well-implemented and follow project best practices. ✅ Core Gemini CLI Fix AnalysisThe main fix in Before: command = ["sh", "-c", f'echo "{text}" | gemini']After: command = ["gemini"]
# Add model parameter if specified
if self.params and self.params.get("model"):
command.extend(["-m", self.params["model"]])
# Add workspace directory so MCP tools can access files
command.extend(["--include-directories", str(Path.cwd())])
# Use -p flag for prompt (works with MCP servers, positional doesn't)
command.extend(["-p", text])Benefits:
🔍 Security AnalysisExcellent security improvements:
🧪 Test Coverage AssessmentStrong test coverage:
ClaudeJudge Fallback Testing:
Tests in 🚀 Evaluation Framework EnhancementsExcellent additions:
⚡ Performance Considerations
📋 Code QualityExcellent adherence to project standards:
|

Problem
GeminiCoder was invoking the gemini CLI incorrectly using
echo "prompt" | gemini, which causes the command to fail with exit status 1. The gemini CLI expects the prompt as a positional argument, not via stdin.This caused all 100 Gemini evaluation tests to fail in the literature MCP evaluation project.
Solution
Changed the command invocation from:
To:
Benefits
.gemini/settings.jsonare still loaded correctlyTesting
Verified that:
gemini "What is 2+2?"returns "4"Related
This PR builds on #37 (custom GEval support) and is needed for the literature MCP evaluation project to successfully test Gemini 2.5 Flash.