Skip to content

Conversation

@dsarno
Copy link
Owner

@dsarno dsarno commented Sep 7, 2025

Summary

  • permit MultiEdit for report writes in project settings
  • split T pass into Haiku A–E/F–J batches with strict prompts, coverage check, and Sonnet retry
  • add helper script to validate NL/T fragment coverage locally

Testing

  • pytest

https://chatgpt.com/codex/tasks/task_e_68bcbbf232cc832791102f8596bd5683

@coderabbitai
Copy link

coderabbitai bot commented Sep 7, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/add-tool_result-json-pruner-9jr70m

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive CI workflow optimization for the Unity MCP bridge project, addressing 'T-pass under-emission' issues through multiple strategic improvements. The core change splits the test execution into separate passes: a Natural Language (NL) pass handling tests NL-0 through NL-4, and T passes split into two batches (A-E and F-J) using different AI models for better coverage.

The changes introduce several new utilities and optimizations:

  1. Test Coverage Validation: A new bash script validate-nlt-coverage.sh validates that all expected test fragment XML files are generated, preventing incomplete test runs from passing CI.

  2. Payload Optimization: Multiple tools now return minimal data by default - read_resource returns only metadata unless text is explicitly requested, find_in_file returns position coordinates instead of text excerpts, validate_script returns diagnostic counts instead of full details, and get_sha returns only essential fields.

  3. AI Tool Permissions: New Claude settings in .claude/settings.json establish security boundaries by allowing only Unity MCP tools and MultiEdit operations on reports directories while blocking potentially dangerous operations.

  4. Conversation Data Management: A new prune_tool_results.py utility compresses verbose tool results into concise summaries, reducing log file sizes for CI workflows while preserving essential information.

  5. Enhanced Error Handling: The script editing system now includes early structural validation to catch unbalanced braces before expensive validation steps.

  6. Test Suite Restructuring: The original combined NL/T test suite has been split into separate prompt files, with the T suite now assuming completion of the NL pass and focusing exclusively on T-A through T-J tests.

These changes collectively create a more robust, efficient, and reliable CI testing environment that better handles AI-assisted Unity development workflows while reducing token consumption and improving test coverage validation.

Changed Files
Filename Score Overview
scripts/validate-nlt-coverage.sh 5/5 New bash script that validates all expected NL and T test fragment XML files are present in reports directory
UnityMcpBridge/Editor/Tools/ManageScript.cs 5/5 Added structural balance check guard to prevent unbalanced braces from reaching validation
prune_tool_results.py 4/5 New utility that compresses verbose tool_result JSON content into concise summaries for CI workflows
UnityMcpBridge/UnityMcpServer~/src/tools/manage_script.py 4/5 Modified validate_script and get_sha functions to return simplified response formats with essential data only
tests/test_get_sha.py 5/5 Added assertion to verify get_sha returns expected data structure with sha256 and lengthBytes fields
.claude/settings.json 5/5 New Claude permissions configuration allowing Unity MCP tools and MultiEdit for reports while blocking other operations
tests/test_find_in_file_minimal.py 4/5 New minimal test validating find_in_file functionality returns correct position information for pattern matches
.github/workflows/claude-nl-suite.yml 4/5 Restructured workflow splitting tests into NL and T passes with strict prompts, coverage assertions, and Sonnet retry
.claude/prompts/nl-unity-suite-nl.md 4/5 New comprehensive Unity NL test suite prompt with strict XML formatting and additive test design
UnityMcpBridge/UnityMcpServer~/src/tools/resource_tools.py 4/5 Modified read_resource to default to metadata-only responses and restructured find_in_file for position-based results
README-DEV.md 4/5 Added documentation for new tool result pruning utility and updated CI workflow information
.claude/prompts/nl-unity-suite-t.md 4/5 Renamed and transformed from combined NL/T suite to focus only on T tests building on NL pass state
tests/test_read_console_truncate.py 4/5 New comprehensive unit tests for read_console functionality testing both full and truncated output modes
tests/test_validate_script_summary.py 3/5 New test validating validate_script tool correctly counts diagnostic severity levels, with potential logic issue in assertion
tests/test_read_resource_minimal.py 4/5 New test validating read_resource tool's metadata-only functionality without loading full content
UnityMcpBridge/UnityMcpServer~/src/tools/read_console.py 4/5 Modified defaults to only 'error' messages in 'json' format and added data trimming when stacktraces disabled

Confidence score: 4/5

  • This PR is generally safe to merge with some areas requiring attention due to complex workflow changes and potential test logic issues
  • Score reflects comprehensive improvements to CI reliability but complexity in test splitting and some untested edge cases in new utilities
  • Pay close attention to tests/test_validate_script_summary.py for potential assertion logic mismatch and the GitHub workflow changes for proper CI execution

Sequence Diagram

sequenceDiagram
    participant User
    participant GitHub Actions
    participant Unity Container
    participant MCP Server
    participant Claude AI
    participant Reports

    User->>GitHub Actions: Trigger workflow_dispatch
    GitHub Actions->>GitHub Actions: Check secrets (Unity, Anthropic)
    GitHub Actions->>Unity Container: Start headless Unity with MCP bridge
    Unity Container->>GitHub Actions: Bridge ready on port
    GitHub Actions->>MCP Server: Install and configure MCP server
    GitHub Actions->>GitHub Actions: Create report skeletons
    GitHub Actions->>Claude AI: Run NL pass (tests NL-0 to NL-4)
    Claude AI->>MCP Server: Execute Unity MCP tools
    MCP Server->>Unity Container: Apply script edits
    Unity Container->>MCP Server: Validate and compile
    MCP Server->>Claude AI: Return results
    Claude AI->>Reports: Write NL test fragments
    GitHub Actions->>Claude AI: Run T pass A-E (tests T-A to T-E)
    Claude AI->>MCP Server: Execute structured edits
    MCP Server->>Unity Container: Apply method operations
    Unity Container->>MCP Server: Validate changes
    Claude AI->>Reports: Write T-A through T-E fragments
    GitHub Actions->>Claude AI: Run T pass F-J (tests T-F to T-J)
    Claude AI->>MCP Server: Execute atomic operations
    Claude AI->>Reports: Write T-F through T-J fragments
    GitHub Actions->>GitHub Actions: Assert T coverage
    alt Coverage incomplete
        GitHub Actions->>Claude AI: Retry with Sonnet model
        Claude AI->>Reports: Write missing test fragments
    end
    GitHub Actions->>Reports: Canonicalize testcase names
    GitHub Actions->>Reports: Backfill missing tests as failures
    GitHub Actions->>Reports: Merge fragments into JUnit
    GitHub Actions->>GitHub Actions: Publish JUnit report
    GitHub Actions->>Unity Container: Stop Unity container
    GitHub Actions->>User: Upload artifacts and summary
Loading

16 files reviewed, 10 comments

Edit Code Review Bot Settings | Greptile

@@ -0,0 +1,58 @@
#!/usr/bin/env python3
import sys, json, re
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Import re module is unused and can be removed

Suggested change
import sys, json, re
import sys, json

lines = data["lines"] or []
lvls = {"info":0,"warning":0,"error":0}
for L in lines:
lvls[L.get("level","" ).lower()] = lvls.get(L.get("level","" ).lower(),0)+1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Complex dictionary lookup logic could cause KeyError. The nested .get() calls are redundant since lvls is initialized with all keys

Comment on lines +37 to +43
loop = asyncio.new_event_loop()
try:
resp = loop.run_until_complete(
find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj))
)
finally:
loop.close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: manually managing event loop is unnecessary here - use pytest-asyncio or asyncio.run() for cleaner async test execution

Suggested change
loop = asyncio.new_event_loop()
try:
resp = loop.run_until_complete(
find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj))
)
finally:
loop.close()
resp = asyncio.run(
find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj))
)

mcp_config: .claude/mcp.json
settings: .claude/settings.json
allowed_tools: "mcp__unity,Edit(reports/**),MultiEdit(reports/**)"
disallowed_tools: "Bash,MultiEdit(/!(reports/**)),WebFetch,WebSearch,Task,TodoWrite,NotebookEdit,NotebookRead"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Complex negation pattern /!(reports/**) may not work as expected in all contexts - consider testing this exclusion syntax

- Per‑test `system-out` ≤ 400 chars: brief status only (no SHA).
- Console evidence: fetch the last 10 lines with `include_stacktrace:false` and include ≤ 3 lines in the fragment.
- Avoid quoting multi‑line diffs; reference markers instead.
— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: This line appears to have inconsistent formatting with an em dash instead of a hyphen, unlike other bullet points in the section

Suggested change
Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".
- Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".

- Use this exact format:

```xml
<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T">
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The classname 'UnityMCP.NL-T' suggests this template was copied from the T-suite prompt and may need to be updated to 'UnityMCP.NL' for consistency

Suggested change
<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T">
<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL">

- For `anchor_insert`, always set `"position": "before"` or `"after"`.
- **Precise ranges / atomic batch**: `mcp__unity__apply_text_edits` (non‑overlapping ranges)
STRICT OP GUARDRAILS
- Do not use `anchor_replace`. Structured edits must be one of: `anchor_insert`, `replace_method`, `insert_method`, `delete_method`, `regex_replace`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Missing 'STRICT OP GUARDRAILS' header formatting - this line should likely be formatted as a header like the surrounding sections

Comment on lines +67 to +68
resp = validate_script(None, uri="unity://path/Assets/Scripts/A.cs")
assert resp == {"success": True, "data": {"warnings": 1, "errors": 2}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The assertion expects 2 errors but mock data includes both 'error' and 'fatal' severities. Verify if fatal should count as error or be handled separately.

Comment on lines +57 to +63
loop = asyncio.new_event_loop()
try:
resp = loop.run_until_complete(
read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj))
)
finally:
loop.close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: manually creates and closes event loop instead of using asyncio.run() which is the modern approach

Suggested change
loop = asyncio.new_event_loop()
try:
resp = loop.run_until_complete(
read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj))
)
finally:
loop.close()
resp = asyncio.run(
read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj))
)

Comment on lines +73 to +79
if isinstance(resp, dict) and resp.get("success") and not include_stacktrace:
lines = resp.get("data", {}).get("lines", [])
trimmed = [
{"level": l.get("level") or l.get("type"), "message": l.get("message") or l.get("text")}
for l in lines
]
return {"success": True, "data": {"lines": trimmed}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The trimming logic assumes specific field structures but doesn't handle cases where both 'level' and 'type' are None, or both 'message' and 'text' are None. This could result in entries with None values.

@dsarno dsarno closed this Sep 7, 2025
@dsarno dsarno deleted the codex/add-tool_result-json-pruner-9jr70m branch September 7, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants