Add tool_result pruning utility and split NL/T workflow into two passes #81

dsarno · 2025-09-07T15:12:32Z

Summary

separate NL and T suite prompts into dedicated files and point each workflow step at the right one
document new prompt files in the developer guide for adding or editing tests

Testing

pytest

https://chatgpt.com/codex/tasks/task_e_68bcbbf232cc832791102f8596bd5683

coderabbitai · 2025-09-07T15:12:38Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch codex/add-tool_result-json-pruner-gic9uw

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

Greptile Summary

This PR implements a significant architectural refactoring of the Unity MCP testing workflow by splitting the previously unified NL/T (Natural Language/Technical) test suite into two separate execution passes. The changes introduce a tool result pruning utility (prune_tool_results.py) that compacts verbose conversation logs by summarizing tool outputs, and implements "lean tool responses" across multiple Unity MCP tools to reduce token usage by default.

The core change separates the test execution into dedicated Natural Language tests (NL-0 through NL-4) and Technical/Transformation tests (T-A through T-J), with each pass using different AI models and prompt files. The NL pass uses Claude 3.5 Sonnet for complex language tasks, while the T pass uses Claude 3.5 Haiku with Sonnet fallback for efficiency. This separation allows T tests to build upon the final state left by the NL pass, creating a more realistic workflow simulation.

The pruning utility addresses performance concerns by replacing verbose tool outputs with concise summaries - for example, validation results become "validate: OK (warnings=1, errors=2)" instead of full diagnostic objects, and SHA operations return only essential hash and byte length data. Multiple Unity MCP tools have been modified to return minimal payloads by default: validate_script returns warning/error counts, get_sha returns only SHA256 and byte length, read_resource defaults to metadata-only mode, find_in_file returns position data instead of text excerpts, and read_console filters out stacktraces unless explicitly requested.

The workflow infrastructure has been updated with improved Unity bridge status handling using dedicated temp directories, more robust port detection using jq instead of bash parsing, and explicit Claude tool permissions via settings.json. Comprehensive test coverage has been added for the new functionality, including tests for the minimal resource reading, console truncation, script validation summarization, and file search positioning features.

Changed Files

Filename	Score	Overview
`prune_tool_results.py`	4/5	New utility script that processes conversation JSON to summarize and reduce tool_result content size
`UnityMcpBridge/UnityMcpServer~/src/tools/manage_script.py`	4/5	Modified validate_script and get_sha to return simplified response data instead of full Unity server responses
`UnityMcpBridge/Editor/Tools/ManageScript.cs`	4/5	Added structural balance check guard in EditScript method before validation to prevent unbalanced delimiters
`README-DEV.md`	4/5	Added documentation for tool_result pruning utility and updated CI workflow docs to reflect two-pass execution
`UnityMcpBridge/UnityMcpServer~/src/tools/read_console.py`	4/5	Modified default parameters and added trimming mechanism for console output when include_stacktrace=False
`.claude/prompts/nl-unity-suite-nl.md`	4/5	New comprehensive test suite prompt for Unity NL editing tests using additive design pattern
`.claude/prompts/nl-unity-suite-t.md`	4/5	Refactored and renamed from full suite to focus exclusively on T-tests building on NL pass state
`tests/test_find_in_file_minimal.py`	4/5	New test file for find_in_file functionality focusing on position information validation
`.github/workflows/claude-nl-suite.yml`	4/5	Refactored workflow to run NL and T tests in separate passes with different models and configurations
`tests/test_get_sha.py`	4/5	Added assertion to verify exact return data structure filtering out metadata fields
`UnityMcpBridge/UnityMcpServer~/src/tools/resource_tools.py`	4/5	Implemented lazy loading behavior and payload optimization with metadata-only defaults
`tests/test_validate_script_summary.py`	3/5	New test validating script validation tool returns summary counts instead of full diagnostics
`tests/test_read_console_truncate.py`	4/5	New test for read_console functionality with stacktrace truncation feature
`tests/test_read_resource_minimal.py`	4/5	New test validating read_resource metadata-only behavior without text content

Confidence score: 4/5

This PR is generally safe to merge with solid architectural improvements and comprehensive testing
Score reflects well-structured refactoring with proper separation of concerns, though the complexity of changes across multiple systems requires careful validation
Pay close attention to the workflow YAML changes and ensure the two-pass execution works correctly in CI

Sequence Diagram

sequenceDiagram
    participant User
    participant GitHub as GitHub Actions
    participant Unity as Unity Container
    participant MCP as MCP Server
    participant Claude as Claude AI

    User->>GitHub: Trigger workflow dispatch
    GitHub->>GitHub: Check secrets (Unity/Anthropic)
    GitHub->>GitHub: Setup Python env with uv
    GitHub->>Unity: Start headless Unity container
    Unity->>Unity: Load project and start MCP bridge
    GitHub->>GitHub: Wait for Unity bridge readiness
    GitHub->>GitHub: Configure MCP client settings
    GitHub->>GitHub: Create report skeletons

    GitHub->>Claude: Run NL pass (claude-3-7-sonnet)
    Claude->>MCP: list_resources("Assets/Scripts/*.cs")
    MCP->>Unity: Query available scripts
    Unity-->>MCP: Return script list
    MCP-->>Claude: Available scripts

    Claude->>MCP: read_resource("unity://path/Assets/Scripts/LongUnityScriptClaudeTest.cs")
    MCP->>Unity: Read target script
    Unity-->>MCP: Script contents with SHA
    MCP-->>Claude: File content

    loop NL Tests (NL-0 to NL-4)
        Claude->>MCP: find_in_file(pattern for method/anchor)
        MCP->>Unity: Locate target positions
        Unity-->>MCP: Match positions
        MCP-->>Claude: Line/column coordinates
        
        Claude->>MCP: script_apply_edits(structured operations)
        MCP->>Unity: Apply method replacements/insertions
        Unity->>Unity: Validate syntax and refresh
        Unity-->>MCP: Edit results
        MCP-->>Claude: Success with new SHA
        
        Claude->>GitHub: Write XML test fragment
        GitHub->>GitHub: Store NL-X_results.xml
    end

    GitHub->>Claude: Run T pass (claude-3-5-haiku)
    
    loop T Tests (T-A to T-J)
        Claude->>MCP: get_sha("unity://path/Assets/Scripts/...")
        MCP->>Unity: Get current file hash
        Unity-->>MCP: SHA256 + metadata
        MCP-->>Claude: File hash for precondition
        
        Claude->>MCP: apply_text_edits(ranges with precondition)
        MCP->>Unity: Apply atomic text edits
        Unity->>Unity: Check precondition SHA
        Unity->>Unity: Apply edits and validate
        Unity-->>MCP: Edit results or stale_file error
        MCP-->>Claude: Success/failure
        
        Claude->>MCP: read_console(errors only)
        MCP->>Unity: Check compilation errors
        Unity-->>MCP: Console messages
        MCP-->>Claude: Error status
        
        Claude->>GitHub: Write XML test fragment
        GitHub->>GitHub: Store T-X_results.xml
    end

    GitHub->>GitHub: Canonicalize test names
    GitHub->>GitHub: Backfill missing tests
    GitHub->>GitHub: Merge fragments into JUnit XML
    GitHub->>GitHub: Generate markdown summary
    GitHub->>GitHub: Publish JUnit report
    GitHub->>GitHub: Upload artifacts
    GitHub->>Unity: Stop Unity container
    GitHub->>GitHub: Return Unity license

_{14 files reviewed, 5 comments}

_{Edit Code Review Bot Settings | Greptile}

greptile-apps · 2025-09-07T15:13:10Z

prune_tool_results.py

+    except Exception:
+        return f"tool_result: {len(txt)} bytes"


style: Using bare Exception catch is overly broad - consider json.JSONDecodeError

Suggested change

except Exception:

return f"tool_result: {len(txt)} bytes"

except json.JSONDecodeError:

return f"tool_result: {len(txt)} bytes"

greptile-apps · 2025-09-07T15:13:21Z

.claude/prompts/nl-unity-suite-nl.md

+- Use this exact format:
+
+```xml
+<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T">


logic: The classname 'UnityMCP.NL-T' contains 'T' but this is the NL-specific prompt file - should this be 'UnityMCP.NL'?

greptile-apps · 2025-09-07T15:13:22Z

.claude/prompts/nl-unity-suite-nl.md

+- Per‑test `system-out` ≤ 400 chars: brief status only (no SHA).
+- Console evidence: fetch the last 10 lines with `include_stacktrace:false` and include ≤ 3 lines in the fragment.
+- Avoid quoting multi‑line diffs; reference markers instead.
+— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".


style: Line starts with em-dash (—) instead of hyphen (-), inconsistent with other bullet points

Suggested change

— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".

- Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".

greptile-apps · 2025-09-07T15:13:22Z

.claude/prompts/nl-unity-suite-nl.md

+  - Allowed ops: `anchor_insert`, `replace_method`, `insert_method`, `delete_method`, `regex_replace`
+  - For `anchor_insert`, always set `"position": "before"` or `"after"`.
+- **Precise ranges / atomic batch**: `mcp__unity__apply_text_edits` (non‑overlapping ranges)
+STRICT OP GUARDRAILS


style: Missing line break before 'STRICT OP GUARDRAILS' section heading

greptile-apps · 2025-09-07T15:13:52Z

tests/test_read_resource_minimal.py

+    loop = asyncio.new_event_loop()
+    try:
+        resp = loop.run_until_complete(
+            read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj))
+        )
+    finally:
+        loop.close()


style: Consider using asyncio.run() instead of manually managing event loop lifecycle for cleaner async testing

Split NL and T prompts and update workflow

f99b925

dsarno added the codex label Sep 7, 2025 — with ChatGPT Codex Connector

greptile-apps bot reviewed Sep 7, 2025

View reviewed changes

dsarno added 2 commits September 7, 2025 08:53

Update workflow configuration

2a65bf3

emit all 10 to t step

08ec160

dsarno closed this Sep 7, 2025

dsarno deleted the codex/add-tool_result-json-pruner-gic9uw branch September 7, 2025 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tool_result pruning utility and split NL/T workflow into two passes #81

Add tool_result pruning utility and split NL/T workflow into two passes #81

Uh oh!

dsarno commented Sep 7, 2025

Uh oh!

coderabbitai bot commented Sep 7, 2025 •

edited

Loading

Other AI code review bot(s) detected

Review skipped

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Sep 7, 2025

Uh oh!

greptile-apps bot Sep 7, 2025

Uh oh!

greptile-apps bot Sep 7, 2025

Uh oh!

greptile-apps bot Sep 7, 2025

Uh oh!

greptile-apps bot Sep 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".
	- Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".

Add tool_result pruning utility and split NL/T workflow into two passes #81

Add tool_result pruning utility and split NL/T workflow into two passes #81

Uh oh!

Conversation

dsarno commented Sep 7, 2025

Summary

Testing

Uh oh!

coderabbitai bot commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Review skipped

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Sequence Diagram

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Sep 7, 2025 •

edited

Loading