Skip to content

Conversation

@dsarno
Copy link
Owner

@dsarno dsarno commented Sep 7, 2025

Summary

  • separate NL and T suite prompts into dedicated files and point each workflow step at the right one
  • document new prompt files in the developer guide for adding or editing tests

Testing

  • pytest

https://chatgpt.com/codex/tasks/task_e_68bcbbf232cc832791102f8596bd5683

@coderabbitai
Copy link

coderabbitai bot commented Sep 7, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/add-tool_result-json-pruner-gic9uw

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a significant architectural refactoring of the Unity MCP testing workflow by splitting the previously unified NL/T (Natural Language/Technical) test suite into two separate execution passes. The changes introduce a tool result pruning utility (prune_tool_results.py) that compacts verbose conversation logs by summarizing tool outputs, and implements "lean tool responses" across multiple Unity MCP tools to reduce token usage by default.

The core change separates the test execution into dedicated Natural Language tests (NL-0 through NL-4) and Technical/Transformation tests (T-A through T-J), with each pass using different AI models and prompt files. The NL pass uses Claude 3.5 Sonnet for complex language tasks, while the T pass uses Claude 3.5 Haiku with Sonnet fallback for efficiency. This separation allows T tests to build upon the final state left by the NL pass, creating a more realistic workflow simulation.

The pruning utility addresses performance concerns by replacing verbose tool outputs with concise summaries - for example, validation results become "validate: OK (warnings=1, errors=2)" instead of full diagnostic objects, and SHA operations return only essential hash and byte length data. Multiple Unity MCP tools have been modified to return minimal payloads by default: validate_script returns warning/error counts, get_sha returns only SHA256 and byte length, read_resource defaults to metadata-only mode, find_in_file returns position data instead of text excerpts, and read_console filters out stacktraces unless explicitly requested.

The workflow infrastructure has been updated with improved Unity bridge status handling using dedicated temp directories, more robust port detection using jq instead of bash parsing, and explicit Claude tool permissions via settings.json. Comprehensive test coverage has been added for the new functionality, including tests for the minimal resource reading, console truncation, script validation summarization, and file search positioning features.

Changed Files
Filename Score Overview
prune_tool_results.py 4/5 New utility script that processes conversation JSON to summarize and reduce tool_result content size
UnityMcpBridge/UnityMcpServer~/src/tools/manage_script.py 4/5 Modified validate_script and get_sha to return simplified response data instead of full Unity server responses
UnityMcpBridge/Editor/Tools/ManageScript.cs 4/5 Added structural balance check guard in EditScript method before validation to prevent unbalanced delimiters
README-DEV.md 4/5 Added documentation for tool_result pruning utility and updated CI workflow docs to reflect two-pass execution
UnityMcpBridge/UnityMcpServer~/src/tools/read_console.py 4/5 Modified default parameters and added trimming mechanism for console output when include_stacktrace=False
.claude/prompts/nl-unity-suite-nl.md 4/5 New comprehensive test suite prompt for Unity NL editing tests using additive design pattern
.claude/prompts/nl-unity-suite-t.md 4/5 Refactored and renamed from full suite to focus exclusively on T-tests building on NL pass state
tests/test_find_in_file_minimal.py 4/5 New test file for find_in_file functionality focusing on position information validation
.github/workflows/claude-nl-suite.yml 4/5 Refactored workflow to run NL and T tests in separate passes with different models and configurations
tests/test_get_sha.py 4/5 Added assertion to verify exact return data structure filtering out metadata fields
UnityMcpBridge/UnityMcpServer~/src/tools/resource_tools.py 4/5 Implemented lazy loading behavior and payload optimization with metadata-only defaults
tests/test_validate_script_summary.py 3/5 New test validating script validation tool returns summary counts instead of full diagnostics
tests/test_read_console_truncate.py 4/5 New test for read_console functionality with stacktrace truncation feature
tests/test_read_resource_minimal.py 4/5 New test validating read_resource metadata-only behavior without text content

Confidence score: 4/5

  • This PR is generally safe to merge with solid architectural improvements and comprehensive testing
  • Score reflects well-structured refactoring with proper separation of concerns, though the complexity of changes across multiple systems requires careful validation
  • Pay close attention to the workflow YAML changes and ensure the two-pass execution works correctly in CI

Sequence Diagram

sequenceDiagram
    participant User
    participant GitHub as GitHub Actions
    participant Unity as Unity Container
    participant MCP as MCP Server
    participant Claude as Claude AI

    User->>GitHub: Trigger workflow dispatch
    GitHub->>GitHub: Check secrets (Unity/Anthropic)
    GitHub->>GitHub: Setup Python env with uv
    GitHub->>Unity: Start headless Unity container
    Unity->>Unity: Load project and start MCP bridge
    GitHub->>GitHub: Wait for Unity bridge readiness
    GitHub->>GitHub: Configure MCP client settings
    GitHub->>GitHub: Create report skeletons

    GitHub->>Claude: Run NL pass (claude-3-7-sonnet)
    Claude->>MCP: list_resources("Assets/Scripts/*.cs")
    MCP->>Unity: Query available scripts
    Unity-->>MCP: Return script list
    MCP-->>Claude: Available scripts

    Claude->>MCP: read_resource("unity://path/Assets/Scripts/LongUnityScriptClaudeTest.cs")
    MCP->>Unity: Read target script
    Unity-->>MCP: Script contents with SHA
    MCP-->>Claude: File content

    loop NL Tests (NL-0 to NL-4)
        Claude->>MCP: find_in_file(pattern for method/anchor)
        MCP->>Unity: Locate target positions
        Unity-->>MCP: Match positions
        MCP-->>Claude: Line/column coordinates
        
        Claude->>MCP: script_apply_edits(structured operations)
        MCP->>Unity: Apply method replacements/insertions
        Unity->>Unity: Validate syntax and refresh
        Unity-->>MCP: Edit results
        MCP-->>Claude: Success with new SHA
        
        Claude->>GitHub: Write XML test fragment
        GitHub->>GitHub: Store NL-X_results.xml
    end

    GitHub->>Claude: Run T pass (claude-3-5-haiku)
    
    loop T Tests (T-A to T-J)
        Claude->>MCP: get_sha("unity://path/Assets/Scripts/...")
        MCP->>Unity: Get current file hash
        Unity-->>MCP: SHA256 + metadata
        MCP-->>Claude: File hash for precondition
        
        Claude->>MCP: apply_text_edits(ranges with precondition)
        MCP->>Unity: Apply atomic text edits
        Unity->>Unity: Check precondition SHA
        Unity->>Unity: Apply edits and validate
        Unity-->>MCP: Edit results or stale_file error
        MCP-->>Claude: Success/failure
        
        Claude->>MCP: read_console(errors only)
        MCP->>Unity: Check compilation errors
        Unity-->>MCP: Console messages
        MCP-->>Claude: Error status
        
        Claude->>GitHub: Write XML test fragment
        GitHub->>GitHub: Store T-X_results.xml
    end

    GitHub->>GitHub: Canonicalize test names
    GitHub->>GitHub: Backfill missing tests
    GitHub->>GitHub: Merge fragments into JUnit XML
    GitHub->>GitHub: Generate markdown summary
    GitHub->>GitHub: Publish JUnit report
    GitHub->>GitHub: Upload artifacts
    GitHub->>Unity: Stop Unity container
    GitHub->>GitHub: Return Unity license
Loading

14 files reviewed, 5 comments

Edit Code Review Bot Settings | Greptile

Comment on lines +7 to +8
except Exception:
return f"tool_result: {len(txt)} bytes"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Using bare Exception catch is overly broad - consider json.JSONDecodeError

Suggested change
except Exception:
return f"tool_result: {len(txt)} bytes"
except json.JSONDecodeError:
return f"tool_result: {len(txt)} bytes"

- Use this exact format:

```xml
<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T">
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The classname 'UnityMCP.NL-T' contains 'T' but this is the NL-specific prompt file - should this be 'UnityMCP.NL'?

- Per‑test `system-out` ≤ 400 chars: brief status only (no SHA).
- Console evidence: fetch the last 10 lines with `include_stacktrace:false` and include ≤ 3 lines in the fragment.
- Avoid quoting multi‑line diffs; reference markers instead.
— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Line starts with em-dash (—) instead of hyphen (-), inconsistent with other bullet points

Suggested change
Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".
- Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".

- Allowed ops: `anchor_insert`, `replace_method`, `insert_method`, `delete_method`, `regex_replace`
- For `anchor_insert`, always set `"position": "before"` or `"after"`.
- **Precise ranges / atomic batch**: `mcp__unity__apply_text_edits` (non‑overlapping ranges)
STRICT OP GUARDRAILS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Missing line break before 'STRICT OP GUARDRAILS' section heading

Comment on lines +57 to +63
loop = asyncio.new_event_loop()
try:
resp = loop.run_until_complete(
read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj))
)
finally:
loop.close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Consider using asyncio.run() instead of manually managing event loop lifecycle for cleaner async testing

@dsarno dsarno closed this Sep 7, 2025
@dsarno dsarno deleted the codex/add-tool_result-json-pruner-gic9uw branch September 7, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants