-
Notifications
You must be signed in to change notification settings - Fork 0
Add tool_result pruning utility and split NL/T workflow into two passes #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tool_result pruning utility and split NL/T workflow into two passes #81
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements a significant architectural refactoring of the Unity MCP testing workflow by splitting the previously unified NL/T (Natural Language/Technical) test suite into two separate execution passes. The changes introduce a tool result pruning utility (prune_tool_results.py) that compacts verbose conversation logs by summarizing tool outputs, and implements "lean tool responses" across multiple Unity MCP tools to reduce token usage by default.
The core change separates the test execution into dedicated Natural Language tests (NL-0 through NL-4) and Technical/Transformation tests (T-A through T-J), with each pass using different AI models and prompt files. The NL pass uses Claude 3.5 Sonnet for complex language tasks, while the T pass uses Claude 3.5 Haiku with Sonnet fallback for efficiency. This separation allows T tests to build upon the final state left by the NL pass, creating a more realistic workflow simulation.
The pruning utility addresses performance concerns by replacing verbose tool outputs with concise summaries - for example, validation results become "validate: OK (warnings=1, errors=2)" instead of full diagnostic objects, and SHA operations return only essential hash and byte length data. Multiple Unity MCP tools have been modified to return minimal payloads by default: validate_script returns warning/error counts, get_sha returns only SHA256 and byte length, read_resource defaults to metadata-only mode, find_in_file returns position data instead of text excerpts, and read_console filters out stacktraces unless explicitly requested.
The workflow infrastructure has been updated with improved Unity bridge status handling using dedicated temp directories, more robust port detection using jq instead of bash parsing, and explicit Claude tool permissions via settings.json. Comprehensive test coverage has been added for the new functionality, including tests for the minimal resource reading, console truncation, script validation summarization, and file search positioning features.
Changed Files
| Filename | Score | Overview |
|---|---|---|
prune_tool_results.py |
4/5 | New utility script that processes conversation JSON to summarize and reduce tool_result content size |
UnityMcpBridge/UnityMcpServer~/src/tools/manage_script.py |
4/5 | Modified validate_script and get_sha to return simplified response data instead of full Unity server responses |
UnityMcpBridge/Editor/Tools/ManageScript.cs |
4/5 | Added structural balance check guard in EditScript method before validation to prevent unbalanced delimiters |
README-DEV.md |
4/5 | Added documentation for tool_result pruning utility and updated CI workflow docs to reflect two-pass execution |
UnityMcpBridge/UnityMcpServer~/src/tools/read_console.py |
4/5 | Modified default parameters and added trimming mechanism for console output when include_stacktrace=False |
.claude/prompts/nl-unity-suite-nl.md |
4/5 | New comprehensive test suite prompt for Unity NL editing tests using additive design pattern |
.claude/prompts/nl-unity-suite-t.md |
4/5 | Refactored and renamed from full suite to focus exclusively on T-tests building on NL pass state |
tests/test_find_in_file_minimal.py |
4/5 | New test file for find_in_file functionality focusing on position information validation |
.github/workflows/claude-nl-suite.yml |
4/5 | Refactored workflow to run NL and T tests in separate passes with different models and configurations |
tests/test_get_sha.py |
4/5 | Added assertion to verify exact return data structure filtering out metadata fields |
UnityMcpBridge/UnityMcpServer~/src/tools/resource_tools.py |
4/5 | Implemented lazy loading behavior and payload optimization with metadata-only defaults |
tests/test_validate_script_summary.py |
3/5 | New test validating script validation tool returns summary counts instead of full diagnostics |
tests/test_read_console_truncate.py |
4/5 | New test for read_console functionality with stacktrace truncation feature |
tests/test_read_resource_minimal.py |
4/5 | New test validating read_resource metadata-only behavior without text content |
Confidence score: 4/5
- This PR is generally safe to merge with solid architectural improvements and comprehensive testing
- Score reflects well-structured refactoring with proper separation of concerns, though the complexity of changes across multiple systems requires careful validation
- Pay close attention to the workflow YAML changes and ensure the two-pass execution works correctly in CI
Sequence Diagram
sequenceDiagram
participant User
participant GitHub as GitHub Actions
participant Unity as Unity Container
participant MCP as MCP Server
participant Claude as Claude AI
User->>GitHub: Trigger workflow dispatch
GitHub->>GitHub: Check secrets (Unity/Anthropic)
GitHub->>GitHub: Setup Python env with uv
GitHub->>Unity: Start headless Unity container
Unity->>Unity: Load project and start MCP bridge
GitHub->>GitHub: Wait for Unity bridge readiness
GitHub->>GitHub: Configure MCP client settings
GitHub->>GitHub: Create report skeletons
GitHub->>Claude: Run NL pass (claude-3-7-sonnet)
Claude->>MCP: list_resources("Assets/Scripts/*.cs")
MCP->>Unity: Query available scripts
Unity-->>MCP: Return script list
MCP-->>Claude: Available scripts
Claude->>MCP: read_resource("unity://path/Assets/Scripts/LongUnityScriptClaudeTest.cs")
MCP->>Unity: Read target script
Unity-->>MCP: Script contents with SHA
MCP-->>Claude: File content
loop NL Tests (NL-0 to NL-4)
Claude->>MCP: find_in_file(pattern for method/anchor)
MCP->>Unity: Locate target positions
Unity-->>MCP: Match positions
MCP-->>Claude: Line/column coordinates
Claude->>MCP: script_apply_edits(structured operations)
MCP->>Unity: Apply method replacements/insertions
Unity->>Unity: Validate syntax and refresh
Unity-->>MCP: Edit results
MCP-->>Claude: Success with new SHA
Claude->>GitHub: Write XML test fragment
GitHub->>GitHub: Store NL-X_results.xml
end
GitHub->>Claude: Run T pass (claude-3-5-haiku)
loop T Tests (T-A to T-J)
Claude->>MCP: get_sha("unity://path/Assets/Scripts/...")
MCP->>Unity: Get current file hash
Unity-->>MCP: SHA256 + metadata
MCP-->>Claude: File hash for precondition
Claude->>MCP: apply_text_edits(ranges with precondition)
MCP->>Unity: Apply atomic text edits
Unity->>Unity: Check precondition SHA
Unity->>Unity: Apply edits and validate
Unity-->>MCP: Edit results or stale_file error
MCP-->>Claude: Success/failure
Claude->>MCP: read_console(errors only)
MCP->>Unity: Check compilation errors
Unity-->>MCP: Console messages
MCP-->>Claude: Error status
Claude->>GitHub: Write XML test fragment
GitHub->>GitHub: Store T-X_results.xml
end
GitHub->>GitHub: Canonicalize test names
GitHub->>GitHub: Backfill missing tests
GitHub->>GitHub: Merge fragments into JUnit XML
GitHub->>GitHub: Generate markdown summary
GitHub->>GitHub: Publish JUnit report
GitHub->>GitHub: Upload artifacts
GitHub->>Unity: Stop Unity container
GitHub->>GitHub: Return Unity license
14 files reviewed, 5 comments
| except Exception: | ||
| return f"tool_result: {len(txt)} bytes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Using bare Exception catch is overly broad - consider json.JSONDecodeError
| except Exception: | |
| return f"tool_result: {len(txt)} bytes" | |
| except json.JSONDecodeError: | |
| return f"tool_result: {len(txt)} bytes" |
| - Use this exact format: | ||
|
|
||
| ```xml | ||
| <testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The classname 'UnityMCP.NL-T' contains 'T' but this is the NL-specific prompt file - should this be 'UnityMCP.NL'?
| - Per‑test `system-out` ≤ 400 chars: brief status only (no SHA). | ||
| - Console evidence: fetch the last 10 lines with `include_stacktrace:false` and include ≤ 3 lines in the fragment. | ||
| - Avoid quoting multi‑line diffs; reference markers instead. | ||
| — Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Line starts with em-dash (—) instead of hyphen (-), inconsistent with other bullet points
| — Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors". | |
| - Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors". |
| - Allowed ops: `anchor_insert`, `replace_method`, `insert_method`, `delete_method`, `regex_replace` | ||
| - For `anchor_insert`, always set `"position": "before"` or `"after"`. | ||
| - **Precise ranges / atomic batch**: `mcp__unity__apply_text_edits` (non‑overlapping ranges) | ||
| STRICT OP GUARDRAILS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Missing line break before 'STRICT OP GUARDRAILS' section heading
| loop = asyncio.new_event_loop() | ||
| try: | ||
| resp = loop.run_until_complete( | ||
| read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj)) | ||
| ) | ||
| finally: | ||
| loop.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Consider using asyncio.run() instead of manually managing event loop lifecycle for cleaner async testing
Summary
Testing
pytesthttps://chatgpt.com/codex/tasks/task_e_68bcbbf232cc832791102f8596bd5683