-
Notifications
You must be signed in to change notification settings - Fork 0
CI: fix T-pass under-emission (MultiEdit, strict prompts, coverage assert, Sonnet retry) #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: fix T-pass under-emission (MultiEdit, strict prompts, coverage assert, Sonnet retry) #82
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements a comprehensive CI workflow optimization for the Unity MCP bridge project, addressing 'T-pass under-emission' issues through multiple strategic improvements. The core change splits the test execution into separate passes: a Natural Language (NL) pass handling tests NL-0 through NL-4, and T passes split into two batches (A-E and F-J) using different AI models for better coverage.
The changes introduce several new utilities and optimizations:
-
Test Coverage Validation: A new bash script
validate-nlt-coverage.shvalidates that all expected test fragment XML files are generated, preventing incomplete test runs from passing CI. -
Payload Optimization: Multiple tools now return minimal data by default -
read_resourcereturns only metadata unless text is explicitly requested,find_in_filereturns position coordinates instead of text excerpts,validate_scriptreturns diagnostic counts instead of full details, andget_shareturns only essential fields. -
AI Tool Permissions: New Claude settings in
.claude/settings.jsonestablish security boundaries by allowing only Unity MCP tools and MultiEdit operations on reports directories while blocking potentially dangerous operations. -
Conversation Data Management: A new
prune_tool_results.pyutility compresses verbose tool results into concise summaries, reducing log file sizes for CI workflows while preserving essential information. -
Enhanced Error Handling: The script editing system now includes early structural validation to catch unbalanced braces before expensive validation steps.
-
Test Suite Restructuring: The original combined NL/T test suite has been split into separate prompt files, with the T suite now assuming completion of the NL pass and focusing exclusively on T-A through T-J tests.
These changes collectively create a more robust, efficient, and reliable CI testing environment that better handles AI-assisted Unity development workflows while reducing token consumption and improving test coverage validation.
Changed Files
| Filename | Score | Overview |
|---|---|---|
| scripts/validate-nlt-coverage.sh | 5/5 | New bash script that validates all expected NL and T test fragment XML files are present in reports directory |
| UnityMcpBridge/Editor/Tools/ManageScript.cs | 5/5 | Added structural balance check guard to prevent unbalanced braces from reaching validation |
| prune_tool_results.py | 4/5 | New utility that compresses verbose tool_result JSON content into concise summaries for CI workflows |
| UnityMcpBridge/UnityMcpServer~/src/tools/manage_script.py | 4/5 | Modified validate_script and get_sha functions to return simplified response formats with essential data only |
| tests/test_get_sha.py | 5/5 | Added assertion to verify get_sha returns expected data structure with sha256 and lengthBytes fields |
| .claude/settings.json | 5/5 | New Claude permissions configuration allowing Unity MCP tools and MultiEdit for reports while blocking other operations |
| tests/test_find_in_file_minimal.py | 4/5 | New minimal test validating find_in_file functionality returns correct position information for pattern matches |
| .github/workflows/claude-nl-suite.yml | 4/5 | Restructured workflow splitting tests into NL and T passes with strict prompts, coverage assertions, and Sonnet retry |
| .claude/prompts/nl-unity-suite-nl.md | 4/5 | New comprehensive Unity NL test suite prompt with strict XML formatting and additive test design |
| UnityMcpBridge/UnityMcpServer~/src/tools/resource_tools.py | 4/5 | Modified read_resource to default to metadata-only responses and restructured find_in_file for position-based results |
| README-DEV.md | 4/5 | Added documentation for new tool result pruning utility and updated CI workflow information |
| .claude/prompts/nl-unity-suite-t.md | 4/5 | Renamed and transformed from combined NL/T suite to focus only on T tests building on NL pass state |
| tests/test_read_console_truncate.py | 4/5 | New comprehensive unit tests for read_console functionality testing both full and truncated output modes |
| tests/test_validate_script_summary.py | 3/5 | New test validating validate_script tool correctly counts diagnostic severity levels, with potential logic issue in assertion |
| tests/test_read_resource_minimal.py | 4/5 | New test validating read_resource tool's metadata-only functionality without loading full content |
| UnityMcpBridge/UnityMcpServer~/src/tools/read_console.py | 4/5 | Modified defaults to only 'error' messages in 'json' format and added data trimming when stacktraces disabled |
Confidence score: 4/5
- This PR is generally safe to merge with some areas requiring attention due to complex workflow changes and potential test logic issues
- Score reflects comprehensive improvements to CI reliability but complexity in test splitting and some untested edge cases in new utilities
- Pay close attention to tests/test_validate_script_summary.py for potential assertion logic mismatch and the GitHub workflow changes for proper CI execution
Sequence Diagram
sequenceDiagram
participant User
participant GitHub Actions
participant Unity Container
participant MCP Server
participant Claude AI
participant Reports
User->>GitHub Actions: Trigger workflow_dispatch
GitHub Actions->>GitHub Actions: Check secrets (Unity, Anthropic)
GitHub Actions->>Unity Container: Start headless Unity with MCP bridge
Unity Container->>GitHub Actions: Bridge ready on port
GitHub Actions->>MCP Server: Install and configure MCP server
GitHub Actions->>GitHub Actions: Create report skeletons
GitHub Actions->>Claude AI: Run NL pass (tests NL-0 to NL-4)
Claude AI->>MCP Server: Execute Unity MCP tools
MCP Server->>Unity Container: Apply script edits
Unity Container->>MCP Server: Validate and compile
MCP Server->>Claude AI: Return results
Claude AI->>Reports: Write NL test fragments
GitHub Actions->>Claude AI: Run T pass A-E (tests T-A to T-E)
Claude AI->>MCP Server: Execute structured edits
MCP Server->>Unity Container: Apply method operations
Unity Container->>MCP Server: Validate changes
Claude AI->>Reports: Write T-A through T-E fragments
GitHub Actions->>Claude AI: Run T pass F-J (tests T-F to T-J)
Claude AI->>MCP Server: Execute atomic operations
Claude AI->>Reports: Write T-F through T-J fragments
GitHub Actions->>GitHub Actions: Assert T coverage
alt Coverage incomplete
GitHub Actions->>Claude AI: Retry with Sonnet model
Claude AI->>Reports: Write missing test fragments
end
GitHub Actions->>Reports: Canonicalize testcase names
GitHub Actions->>Reports: Backfill missing tests as failures
GitHub Actions->>Reports: Merge fragments into JUnit
GitHub Actions->>GitHub Actions: Publish JUnit report
GitHub Actions->>Unity Container: Stop Unity container
GitHub Actions->>User: Upload artifacts and summary
16 files reviewed, 10 comments
| @@ -0,0 +1,58 @@ | |||
| #!/usr/bin/env python3 | |||
| import sys, json, re | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Import re module is unused and can be removed
| import sys, json, re | |
| import sys, json |
| lines = data["lines"] or [] | ||
| lvls = {"info":0,"warning":0,"error":0} | ||
| for L in lines: | ||
| lvls[L.get("level","" ).lower()] = lvls.get(L.get("level","" ).lower(),0)+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Complex dictionary lookup logic could cause KeyError. The nested .get() calls are redundant since lvls is initialized with all keys
| loop = asyncio.new_event_loop() | ||
| try: | ||
| resp = loop.run_until_complete( | ||
| find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj)) | ||
| ) | ||
| finally: | ||
| loop.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: manually managing event loop is unnecessary here - use pytest-asyncio or asyncio.run() for cleaner async test execution
| loop = asyncio.new_event_loop() | |
| try: | |
| resp = loop.run_until_complete( | |
| find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj)) | |
| ) | |
| finally: | |
| loop.close() | |
| resp = asyncio.run( | |
| find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj)) | |
| ) |
| mcp_config: .claude/mcp.json | ||
| settings: .claude/settings.json | ||
| allowed_tools: "mcp__unity,Edit(reports/**),MultiEdit(reports/**)" | ||
| disallowed_tools: "Bash,MultiEdit(/!(reports/**)),WebFetch,WebSearch,Task,TodoWrite,NotebookEdit,NotebookRead" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Complex negation pattern /!(reports/**) may not work as expected in all contexts - consider testing this exclusion syntax
| - Per‑test `system-out` ≤ 400 chars: brief status only (no SHA). | ||
| - Console evidence: fetch the last 10 lines with `include_stacktrace:false` and include ≤ 3 lines in the fragment. | ||
| - Avoid quoting multi‑line diffs; reference markers instead. | ||
| — Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: This line appears to have inconsistent formatting with an em dash instead of a hyphen, unlike other bullet points in the section
| — Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors". | |
| - Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors". |
| - Use this exact format: | ||
|
|
||
| ```xml | ||
| <testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The classname 'UnityMCP.NL-T' suggests this template was copied from the T-suite prompt and may need to be updated to 'UnityMCP.NL' for consistency
| <testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T"> | |
| <testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL"> |
| - For `anchor_insert`, always set `"position": "before"` or `"after"`. | ||
| - **Precise ranges / atomic batch**: `mcp__unity__apply_text_edits` (non‑overlapping ranges) | ||
| STRICT OP GUARDRAILS | ||
| - Do not use `anchor_replace`. Structured edits must be one of: `anchor_insert`, `replace_method`, `insert_method`, `delete_method`, `regex_replace`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Missing 'STRICT OP GUARDRAILS' header formatting - this line should likely be formatted as a header like the surrounding sections
| resp = validate_script(None, uri="unity://path/Assets/Scripts/A.cs") | ||
| assert resp == {"success": True, "data": {"warnings": 1, "errors": 2}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The assertion expects 2 errors but mock data includes both 'error' and 'fatal' severities. Verify if fatal should count as error or be handled separately.
| loop = asyncio.new_event_loop() | ||
| try: | ||
| resp = loop.run_until_complete( | ||
| read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj)) | ||
| ) | ||
| finally: | ||
| loop.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: manually creates and closes event loop instead of using asyncio.run() which is the modern approach
| loop = asyncio.new_event_loop() | |
| try: | |
| resp = loop.run_until_complete( | |
| read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj)) | |
| ) | |
| finally: | |
| loop.close() | |
| resp = asyncio.run( | |
| read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj)) | |
| ) |
| if isinstance(resp, dict) and resp.get("success") and not include_stacktrace: | ||
| lines = resp.get("data", {}).get("lines", []) | ||
| trimmed = [ | ||
| {"level": l.get("level") or l.get("type"), "message": l.get("message") or l.get("text")} | ||
| for l in lines | ||
| ] | ||
| return {"success": True, "data": {"lines": trimmed}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The trimming logic assumes specific field structures but doesn't handle cases where both 'level' and 'type' are None, or both 'message' and 'text' are None. This could result in entries with None values.
Summary
Testing
pytesthttps://chatgpt.com/codex/tasks/task_e_68bcbbf232cc832791102f8596bd5683