CI: fix T-pass under-emission (MultiEdit, strict prompts, coverage assert, Sonnet retry) #82

dsarno · 2025-09-07T18:48:54Z

Summary

permit MultiEdit for report writes in project settings
split T pass into Haiku A–E/F–J batches with strict prompts, coverage check, and Sonnet retry
add helper script to validate NL/T fragment coverage locally

Testing

pytest

https://chatgpt.com/codex/tasks/task_e_68bcbbf232cc832791102f8596bd5683

coderabbitai · 2025-09-07T18:49:00Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch codex/add-tool_result-json-pruner-9jr70m

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

Greptile Summary

This PR implements a comprehensive CI workflow optimization for the Unity MCP bridge project, addressing 'T-pass under-emission' issues through multiple strategic improvements. The core change splits the test execution into separate passes: a Natural Language (NL) pass handling tests NL-0 through NL-4, and T passes split into two batches (A-E and F-J) using different AI models for better coverage.

The changes introduce several new utilities and optimizations:

Test Coverage Validation: A new bash script validate-nlt-coverage.sh validates that all expected test fragment XML files are generated, preventing incomplete test runs from passing CI.
Payload Optimization: Multiple tools now return minimal data by default - read_resource returns only metadata unless text is explicitly requested, find_in_file returns position coordinates instead of text excerpts, validate_script returns diagnostic counts instead of full details, and get_sha returns only essential fields.
AI Tool Permissions: New Claude settings in .claude/settings.json establish security boundaries by allowing only Unity MCP tools and MultiEdit operations on reports directories while blocking potentially dangerous operations.
Conversation Data Management: A new prune_tool_results.py utility compresses verbose tool results into concise summaries, reducing log file sizes for CI workflows while preserving essential information.
Enhanced Error Handling: The script editing system now includes early structural validation to catch unbalanced braces before expensive validation steps.
Test Suite Restructuring: The original combined NL/T test suite has been split into separate prompt files, with the T suite now assuming completion of the NL pass and focusing exclusively on T-A through T-J tests.

These changes collectively create a more robust, efficient, and reliable CI testing environment that better handles AI-assisted Unity development workflows while reducing token consumption and improving test coverage validation.

Changed Files

Filename	Score	Overview
scripts/validate-nlt-coverage.sh	5/5	New bash script that validates all expected NL and T test fragment XML files are present in reports directory
UnityMcpBridge/Editor/Tools/ManageScript.cs	5/5	Added structural balance check guard to prevent unbalanced braces from reaching validation
prune_tool_results.py	4/5	New utility that compresses verbose tool_result JSON content into concise summaries for CI workflows
UnityMcpBridge/UnityMcpServer~/src/tools/manage_script.py	4/5	Modified validate_script and get_sha functions to return simplified response formats with essential data only
tests/test_get_sha.py	5/5	Added assertion to verify get_sha returns expected data structure with sha256 and lengthBytes fields
.claude/settings.json	5/5	New Claude permissions configuration allowing Unity MCP tools and MultiEdit for reports while blocking other operations
tests/test_find_in_file_minimal.py	4/5	New minimal test validating find_in_file functionality returns correct position information for pattern matches
.github/workflows/claude-nl-suite.yml	4/5	Restructured workflow splitting tests into NL and T passes with strict prompts, coverage assertions, and Sonnet retry
.claude/prompts/nl-unity-suite-nl.md	4/5	New comprehensive Unity NL test suite prompt with strict XML formatting and additive test design
UnityMcpBridge/UnityMcpServer~/src/tools/resource_tools.py	4/5	Modified read_resource to default to metadata-only responses and restructured find_in_file for position-based results
README-DEV.md	4/5	Added documentation for new tool result pruning utility and updated CI workflow information
.claude/prompts/nl-unity-suite-t.md	4/5	Renamed and transformed from combined NL/T suite to focus only on T tests building on NL pass state
tests/test_read_console_truncate.py	4/5	New comprehensive unit tests for read_console functionality testing both full and truncated output modes
tests/test_validate_script_summary.py	3/5	New test validating validate_script tool correctly counts diagnostic severity levels, with potential logic issue in assertion
tests/test_read_resource_minimal.py	4/5	New test validating read_resource tool's metadata-only functionality without loading full content
UnityMcpBridge/UnityMcpServer~/src/tools/read_console.py	4/5	Modified defaults to only 'error' messages in 'json' format and added data trimming when stacktraces disabled

Confidence score: 4/5

This PR is generally safe to merge with some areas requiring attention due to complex workflow changes and potential test logic issues
Score reflects comprehensive improvements to CI reliability but complexity in test splitting and some untested edge cases in new utilities
Pay close attention to tests/test_validate_script_summary.py for potential assertion logic mismatch and the GitHub workflow changes for proper CI execution

Sequence Diagram

sequenceDiagram
    participant User
    participant GitHub Actions
    participant Unity Container
    participant MCP Server
    participant Claude AI
    participant Reports

    User->>GitHub Actions: Trigger workflow_dispatch
    GitHub Actions->>GitHub Actions: Check secrets (Unity, Anthropic)
    GitHub Actions->>Unity Container: Start headless Unity with MCP bridge
    Unity Container->>GitHub Actions: Bridge ready on port
    GitHub Actions->>MCP Server: Install and configure MCP server
    GitHub Actions->>GitHub Actions: Create report skeletons
    GitHub Actions->>Claude AI: Run NL pass (tests NL-0 to NL-4)
    Claude AI->>MCP Server: Execute Unity MCP tools
    MCP Server->>Unity Container: Apply script edits
    Unity Container->>MCP Server: Validate and compile
    MCP Server->>Claude AI: Return results
    Claude AI->>Reports: Write NL test fragments
    GitHub Actions->>Claude AI: Run T pass A-E (tests T-A to T-E)
    Claude AI->>MCP Server: Execute structured edits
    MCP Server->>Unity Container: Apply method operations
    Unity Container->>MCP Server: Validate changes
    Claude AI->>Reports: Write T-A through T-E fragments
    GitHub Actions->>Claude AI: Run T pass F-J (tests T-F to T-J)
    Claude AI->>MCP Server: Execute atomic operations
    Claude AI->>Reports: Write T-F through T-J fragments
    GitHub Actions->>GitHub Actions: Assert T coverage
    alt Coverage incomplete
        GitHub Actions->>Claude AI: Retry with Sonnet model
        Claude AI->>Reports: Write missing test fragments
    end
    GitHub Actions->>Reports: Canonicalize testcase names
    GitHub Actions->>Reports: Backfill missing tests as failures
    GitHub Actions->>Reports: Merge fragments into JUnit
    GitHub Actions->>GitHub Actions: Publish JUnit report
    GitHub Actions->>Unity Container: Stop Unity container
    GitHub Actions->>User: Upload artifacts and summary

_{16 files reviewed, 10 comments}

_{Edit Code Review Bot Settings | Greptile}

greptile-apps · 2025-09-07T18:49:27Z

prune_tool_results.py

@@ -0,0 +1,58 @@
+#!/usr/bin/env python3
+import sys, json, re


style: Import re module is unused and can be removed

Suggested change

import sys, json, re

import sys, json

greptile-apps · 2025-09-07T18:49:27Z

prune_tool_results.py

+        lines = data["lines"] or []
+        lvls = {"info":0,"warning":0,"error":0}
+        for L in lines:
+            lvls[L.get("level","" ).lower()] = lvls.get(L.get("level","" ).lower(),0)+1


logic: Complex dictionary lookup logic could cause KeyError. The nested .get() calls are redundant since lvls is initialized with all keys

greptile-apps · 2025-09-07T18:49:48Z

tests/test_find_in_file_minimal.py

+    loop = asyncio.new_event_loop()
+    try:
+        resp = loop.run_until_complete(
+            find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj))
+        )
+    finally:
+        loop.close()


style: manually managing event loop is unnecessary here - use pytest-asyncio or asyncio.run() for cleaner async test execution

Suggested change

loop = asyncio.new_event_loop()

try:

resp = loop.run_until_complete(

find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj))

)

finally:

loop.close()

resp = asyncio.run(

find_in_file(uri="unity://path/Assets/A.txt", pattern="world", ctx=None, project_root=str(proj))

)

greptile-apps · 2025-09-07T18:49:51Z

.github/workflows/claude-nl-suite.yml

+            mcp_config: .claude/mcp.json
+            settings: .claude/settings.json
+            allowed_tools: "mcp__unity,Edit(reports/**),MultiEdit(reports/**)"
+            disallowed_tools: "Bash,MultiEdit(/!(reports/**)),WebFetch,WebSearch,Task,TodoWrite,NotebookEdit,NotebookRead"


style: Complex negation pattern /!(reports/**) may not work as expected in all contexts - consider testing this exclusion syntax

greptile-apps · 2025-09-07T18:49:52Z

.claude/prompts/nl-unity-suite-nl.md

+- Per‑test `system-out` ≤ 400 chars: brief status only (no SHA).
+- Console evidence: fetch the last 10 lines with `include_stacktrace:false` and include ≤ 3 lines in the fragment.
+- Avoid quoting multi‑line diffs; reference markers instead.
+— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".


style: This line appears to have inconsistent formatting with an em dash instead of a hyphen, unlike other bullet points in the section

Suggested change

— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".

- Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".

greptile-apps · 2025-09-07T18:49:53Z

.claude/prompts/nl-unity-suite-nl.md

+- Use this exact format:
+
+```xml
+<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T">


logic: The classname 'UnityMCP.NL-T' suggests this template was copied from the T-suite prompt and may need to be updated to 'UnityMCP.NL' for consistency

Suggested change

<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T">

<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL">

greptile-apps · 2025-09-07T18:49:54Z

.claude/prompts/nl-unity-suite-nl.md

+  - For `anchor_insert`, always set `"position": "before"` or `"after"`.
+- **Precise ranges / atomic batch**: `mcp__unity__apply_text_edits` (non‑overlapping ranges)
+STRICT OP GUARDRAILS
+- Do not use `anchor_replace`. Structured edits must be one of: `anchor_insert`, `replace_method`, `insert_method`, `delete_method`, `regex_replace`.


style: Missing 'STRICT OP GUARDRAILS' header formatting - this line should likely be formatted as a header like the surrounding sections

greptile-apps · 2025-09-07T18:50:06Z

tests/test_validate_script_summary.py

+    resp = validate_script(None, uri="unity://path/Assets/Scripts/A.cs")
+    assert resp == {"success": True, "data": {"warnings": 1, "errors": 2}}


logic: The assertion expects 2 errors but mock data includes both 'error' and 'fatal' severities. Verify if fatal should count as error or be handled separately.

greptile-apps · 2025-09-07T18:50:08Z

tests/test_read_resource_minimal.py

+    loop = asyncio.new_event_loop()
+    try:
+        resp = loop.run_until_complete(
+            read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj))
+        )
+    finally:
+        loop.close()


style: manually creates and closes event loop instead of using asyncio.run() which is the modern approach

Suggested change

loop = asyncio.new_event_loop()

try:

resp = loop.run_until_complete(

read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj))

)

finally:

loop.close()

resp = asyncio.run(

read_resource(uri="unity://path/Assets/A.txt", ctx=None, project_root=str(proj))

)

greptile-apps · 2025-09-07T18:50:13Z

UnityMcpBridge/UnityMcpServer~/src/tools/read_console.py

+        if isinstance(resp, dict) and resp.get("success") and not include_stacktrace:
+            lines = resp.get("data", {}).get("lines", [])
+            trimmed = [
+                {"level": l.get("level") or l.get("type"), "message": l.get("message") or l.get("text")}
+                for l in lines
+            ]
+            return {"success": True, "data": {"lines": trimmed}}


logic: The trimming logic assumes specific field structures but doesn't handle cases where both 'level' and 'type' are None, or both 'message' and 'text' are None. This could result in entries with None values.

ci: add local validate-nlt-coverage helper

26ead87

dsarno added the codex label Sep 7, 2025 — with ChatGPT Codex Connector

greptile-apps bot reviewed Sep 7, 2025

View reviewed changes

dsarno closed this Sep 7, 2025

dsarno deleted the codex/add-tool_result-json-pruner-9jr70m branch September 7, 2025 23:03

	— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".
	- Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".

	<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL-T">
	<testcase name="NL-0 — Baseline State Capture" classname="UnityMCP.NL">

		resp = validate_script(None, uri="unity://path/Assets/Scripts/A.cs")
		assert resp == {"success": True, "data": {"warnings": 1, "errors": 2}}

CI: fix T-pass under-emission (MultiEdit, strict prompts, coverage assert, Sonnet retry) #82

CI: fix T-pass under-emission (MultiEdit, strict prompts, coverage assert, Sonnet retry) #82

Uh oh!

Conversation

dsarno commented Sep 7, 2025

Summary

Testing

Uh oh!

coderabbitai bot commented Sep 7, 2025

Review skipped

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Sequence Diagram

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants