Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 19 additions & 21 deletions .claude/prompts/nl-unity-suite-full-additive.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,11 @@ CI provides:
- Do not restate tool JSON; summarize in ≤ 2 short lines.
- Never paste full file contents. For matches, include only the matched line and ±1 line.
- Prefer `mcp__unity__find_in_file` for targeting; avoid `mcp__unity__read_resource` unless strictly necessary. If needed, limit to `head_bytes ≤ 256` or `tail_lines ≤ 10`.
- Per‑test `system-out` ≤ 400 chars: brief status + latest SHA only.
- Console evidence: fetch the last 10 lines and include ≤ 3 lines in the fragment.
- Per‑test `system-out` ≤ 400 chars: brief status only (no SHA).
- Console evidence: fetch the last 10 lines with `include_stacktrace:false` and include ≤ 3 lines in the fragment.
- Avoid quoting multi‑line diffs; reference markers instead.
— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries; include ≤ 3 lines total in the fragment; if no errors, state "no errors".
— Final check is folded into T‑J: perform an errors‑only scan and include a single "no errors" line or up to 3 error lines within the T‑J fragment.
— Console scans: perform two reads — last 10 `log/info` lines and up to 3 `error` entries (use `include_stacktrace:false`); include ≤ 3 lines total in the fragment; if no errors, state "no errors".
— Final check is folded into T‑J: perform an errors‑only scan (with `include_stacktrace:false`) and include a single "no errors" line or up to 3 error lines within the T‑J fragment.

---

Expand Down Expand Up @@ -85,7 +85,7 @@ STRICT OP GUARDRAILS

**State Tracking:**
- Track file SHA after each test (`mcp__unity__get_sha`) and use it as a precondition
for `apply_text_edits` in T‑F/T‑G/T‑I to exercise `stale_file` semantics.
for `apply_text_edits` in T‑F/T‑G/T‑I to exercise `stale_file` semantics. Do not include SHA values in report fragments.
- Use content signatures (method names, comment markers) to verify expected state
- Validate structural integrity after each major change

Expand Down Expand Up @@ -138,7 +138,7 @@ STRICT OP GUARDRAILS
- Perform a targeted scan for errors/exceptions (type: errors), up to 3 entries
- Validate no compilation errors from previous operations
- **Expected final state**: State C (unchanged)
- **IMMEDIATELY** write clean XML fragment to `reports/NL-4_results.xml` (no extra text). The `<testcase name>` must start with `NL-4`. Include at most 3 lines total across both reads, or simply state "no errors; console OK" (≤ 400 chars), plus the latest SHA.
- **IMMEDIATELY** write clean XML fragment to `reports/NL-4_results.xml` (no extra text). The `<testcase name>` must start with `NL-4`. Include at most 3 lines total across both reads, or simply state "no errors; console OK" (≤ 400 chars).

### T-A. Temporary Helper Lifecycle (Returns to State C)
**Goal**: Test insert → verify → delete cycle for temporary code
Expand All @@ -149,6 +149,9 @@ STRICT OP GUARDRAILS
- Delete helper method via structured delete operation
- **Expected final state**: Return to State C (helper removed, other changes intact)

### Late-Test Editing Rule
- When modifying a method body, use `mcp__unity__script_apply_edits`. If the method is expression-bodied (`=>`), convert it to a block or replace the whole method definition. After the edit, run `mcp__unity__validate_script` and rollback on error. Use `//` comments in inserted code.

### T-B. Method Body Interior Edit (Additive State D)
**Goal**: Edit method interior without affecting structure, on modified file
**Actions**:
Expand All @@ -172,7 +175,7 @@ STRICT OP GUARDRAILS
- Use smart anchor matching to find current class-ending brace (after NL-3 tail comments)
- Insert permanent helper before class brace: `private void TestHelper() { /* placeholder */ }`
- Validate with `mcp__unity__validate_script(level:"standard")`
- **IMMEDIATELY** write clean XML fragment to `reports/T-D_results.xml` (no extra text). The `<testcase name>` must start with `T-D`. Include brief evidence and the latest SHA in `system-out`.
- **IMMEDIATELY** write clean XML fragment to `reports/T-D_results.xml` (no extra text). The `<testcase name>` must start with `T-D`. Include brief evidence in `system-out`.
- **Expected final state**: State E + TestHelper() method before class end

### T-E. Method Evolution Lifecycle (Additive State G)
Expand All @@ -193,7 +196,7 @@ STRICT OP GUARDRAILS
3. Add final class comment: `// end of test modifications`
- All edits computed from same file snapshot, applied atomically
- **Expected final state**: State G + three coordinated comments
- After applying the atomic edits, run `validate_script(level:"standard")` and emit a clean fragment to `reports/T-F_results.xml` with a short summary and the latest SHA.
- After applying the atomic edits, run `validate_script(level:"standard")` and emit a clean fragment to `reports/T-F_results.xml` with a short summary.

### T-G. Path Normalization Test (No State Change)
**Goal**: Verify URI forms work equivalently on modified file
Expand All @@ -203,15 +206,15 @@ STRICT OP GUARDRAILS
- Second should return `stale_file`, retry with updated SHA
- Verify both URI forms target same file
- **Expected final state**: State H (no content change, just path testing)
- Emit `reports/T-G_results.xml` showing evidence of stale SHA handling and final SHA.
- Emit `reports/T-G_results.xml` showing evidence of stale SHA handling.

### T-H. Validation on Modified File (No State Change)
**Goal**: Ensure validation works correctly on heavily modified file
**Actions**:
- Run `validate_script(level:"standard")` on current state
- Verify no structural errors despite extensive modifications
- **Expected final state**: State H (validation only, no edits)
- Emit `reports/T-H_results.xml` confirming validation OK and including the latest SHA.
- Emit `reports/T-H_results.xml` confirming validation OK.

### T-I. Failure Surface Testing (No State Change)
**Goal**: Test error handling on real modified file
Expand All @@ -220,7 +223,7 @@ STRICT OP GUARDRAILS
- Attempt edit with stale SHA (should fail cleanly)
- Verify error responses are informative
- **Expected final state**: State H (failed operations don't modify file)
- Emit `reports/T-I_results.xml` capturing error evidence and final SHA; file must contain one `<testcase>`.
- Emit `reports/T-I_results.xml` capturing error evidence; file must contain one `<testcase>`.

### T-J. Idempotency on Modified File (Additive State I)
**Goal**: Verify operations behave predictably when repeated
Expand All @@ -232,7 +235,7 @@ STRICT OP GUARDRAILS
- **Remove again** (same `regex_replace`) → expect `no_op: true`.
- `mcp__unity__validate_script(level:"standard")`
- Perform a final console scan for errors/exceptions (errors only, up to 3); include "no errors" if none
- **IMMEDIATELY** write clean XML fragment to `reports/T-J_results.xml` with evidence of both `no_op: true` outcomes and the console result. The `<testcase name>` must start with `T-J` and include the latest SHA.
- **IMMEDIATELY** write clean XML fragment to `reports/T-J_results.xml` with evidence of both `no_op: true` outcomes and the console result. The `<testcase name>` must start with `T-J`.
- **Expected final state**: State H + verified idempotent behavior

---
Expand Down Expand Up @@ -299,7 +302,7 @@ BAN ON EXTRA TOOLS AND DIRS

## XML Fragment Templates (T-F .. T-J)

Use these skeletons verbatim as a starting point. Replace the bracketed placeholders with your evidence and the latest SHA. Ensure each file contains exactly one `<testcase>` element and that the `name` begins with the exact test id.
Use these skeletons verbatim as a starting point. Replace the bracketed placeholders with your evidence. Ensure each file contains exactly one `<testcase>` element and that the `name` begins with the exact test id.

```xml
<testcase name="T-F — Atomic Multi-Edit" classname="UnityMCP.NL-T">
Expand All @@ -309,7 +312,6 @@ Applied 3 non-overlapping edits in one atomic call:
- ApplyBlend(): added "// safe animation"
- End-of-class: added "// end of test modifications"
validate_script: OK
SHA: [sha-here]
]]></system-out>
</testcase>
```
Expand All @@ -319,7 +321,6 @@ SHA: [sha-here]
<system-out><![CDATA[
Read Unity console (INFO): OK.
No compilation errors detected.
SHA: [sha-here]
]]></system-out>
</testcase>
```
Expand All @@ -328,8 +329,7 @@ SHA: [sha-here]
<testcase name="T-G — Path Normalization Test" classname="UnityMCP.NL-T">
<system-out><![CDATA[
Edit via unity://path/... succeeded.
Same edit via Assets/... returned stale_file, retried with updated SHA: OK.
Final SHA: [sha-here]
Same edit via Assets/... returned stale_file, retried with updated hash: OK.
]]></system-out>
</testcase>
```
Expand All @@ -338,7 +338,6 @@ Final SHA: [sha-here]
<testcase name="T-H — Validation on Modified File" classname="UnityMCP.NL-T">
<system-out><![CDATA[
validate_script(level:"standard"): OK on the modified file.
SHA: [sha-here]
]]></system-out>
</testcase>
```
Expand All @@ -347,8 +346,8 @@ SHA: [sha-here]
<testcase name="T-I — Failure Surface Testing" classname="UnityMCP.NL-T">
<system-out><![CDATA[
Overlapping edit: failed cleanly (error captured).
Stale SHA edit: failed cleanly (error captured).
File unchanged; final SHA: [sha-here]
Stale hash edit: failed cleanly (error captured).
File unchanged.
]]></system-out>
</testcase>
```
Expand All @@ -361,7 +360,6 @@ Insert same marker again: no_op: true.
regex_remove marker: OK.
regex_remove again: no_op: true.
validate_script: OK.
SHA: [sha-here]
]]></system-out>
</testcase>
```
96 changes: 79 additions & 17 deletions .github/workflows/claude-nl-suite.yml
Original file line number Diff line number Diff line change
Expand Up @@ -202,13 +202,15 @@ jobs:
manual_args=(-manualLicenseFile "/root/.local/share/unity3d/Unity/Unity_lic.ulf")
fi

mkdir -p "$RUNNER_TEMP/unity-status"
docker rm -f unity-mcp >/dev/null 2>&1 || true
docker run -d --name unity-mcp --network host \
-e HOME=/root \
-e UNITY_MCP_ALLOW_BATCH=1 \
-e UNITY_MCP_STATUS_DIR=/root/.unity-mcp \
-e UNITY_MCP_BIND_HOST=127.0.0.1 \
-v "${{ github.workspace }}:/workspace" -w /workspace \
-v "$RUNNER_TEMP/unity-status:/root/.unity-mcp" \
-v "$RUNNER_TEMP/unity-config:/root/.config/unity3d:ro" \
-v "$RUNNER_TEMP/unity-local:/root/.local/share/unity3d:ro" \
"$UNITY_IMAGE" /opt/unity/Editor/Unity -batchmode -nographics -logFile - \
Expand Down Expand Up @@ -238,7 +240,7 @@ jobs:
logs="$(docker logs unity-mcp 2>&1 || true)"

# 1) Primary: status JSON exposes TCP port
port="$(docker exec unity-mcp bash -lc 'shopt -s nullglob; for f in /root/.unity-mcp/unity-mcp-status-*.json; do grep -ho "\"unity_port\"[[:space:]]*:[[:space:]]*[0-9]\+" "$f"; done | sed -E "s/.*: *([0-9]+).*/\1/" | head -n1' 2>/dev/null || true)"
port="$(jq -r '.unity_port // empty' "$RUNNER_TEMP"/unity-status/unity-mcp-status-*.json 2>/dev/null | head -n1 || true)"
if [[ -n "${port:-}" ]] && timeout 1 bash -lc "exec 3<>/dev/tcp/127.0.0.1/$port"; then
echo "Bridge ready on port $port"
exit 0
Expand Down Expand Up @@ -288,12 +290,39 @@ jobs:
"env": {
"PYTHONUNBUFFERED": "1",
"MCP_LOG_LEVEL": "debug",
"UNITY_PROJECT_ROOT": "$GITHUB_WORKSPACE/TestProjects/UnityMCPTests"
"UNITY_PROJECT_ROOT": "$GITHUB_WORKSPACE/TestProjects/UnityMCPTests",
"UNITY_MCP_STATUS_DIR": "$RUNNER_TEMP/unity-status",
"UNITY_MCP_HOST": "127.0.0.1"
}
}
}
}
JSON

- name: Pin Claude tool permissions (.claude/settings.json)
run: |
set -eux
mkdir -p .claude
cat > .claude/settings.json <<'JSON'
{
"permissions": {
"allow": [
"mcp__unity",
"Edit(reports/**)"
],
"deny": [
"Bash",
"MultiEdit",
"WebFetch",
"WebSearch",
"Task",
"TodoWrite",
"NotebookEdit",
"NotebookRead"
]
}
}
JSON

# ---------- Reports & helper ----------
- name: Prepare reports and dirs
Expand All @@ -314,32 +343,65 @@ jobs:
</testsuite></testsuites>
XML
printf '# Unity NL/T Editing Suite Test Results\n\n' > "$MD_OUT"

- name: Verify Unity bridge status/port
run: |
set -euxo pipefail
ls -la "$RUNNER_TEMP/unity-status" || true
jq -r . "$RUNNER_TEMP"/unity-status/unity-mcp-status-*.json | sed -n '1,80p' || true

shopt -s nullglob
status_files=("$RUNNER_TEMP"/unity-status/unity-mcp-status-*.json)
if ((${#status_files[@]})); then
port="$(grep -hEo '"unity_port"[[:space:]]*:[[:space:]]*[0-9]+' "${status_files[@]}" \
| sed -E 's/.*: *([0-9]+).*/\1/' | head -n1 || true)"
else
port=""
fi

echo "unity_port=$port"
if [[ -n "$port" ]]; then
timeout 1 bash -lc "exec 3<>/dev/tcp/127.0.0.1/$port" && echo "TCP OK"
fi

# (removed) Revert helper and baseline snapshot are no longer used


# ---------- Run suite ----------
- name: Run Claude NL suite (single pass)
# ---------- Run suite in two passes ----------
- name: Run Claude NL pass
uses: anthropics/claude-code-base-action@beta
if: steps.detect.outputs.anthropic_ok == 'true' && steps.detect.outputs.unity_ok == 'true'
continue-on-error: true
with:
use_node_cache: false
prompt_file: .claude/prompts/nl-unity-suite-full-additive.md
mcp_config: .claude/mcp.json
settings: .claude/settings.json
allowed_tools: "mcp__unity,Edit(reports/**)"
disallowed_tools: "Bash,MultiEdit,WebFetch,WebSearch,Task,TodoWrite,NotebookEdit,NotebookRead"
model: claude-3-7-sonnet-20250219
append_system_prompt: |
You are running the NL pass only. Do not run any T-* tests.
Emit only NL-0..NL-4 fragments and stop.
timeout_minutes: "30"
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

- name: Run Claude T pass
uses: anthropics/claude-code-base-action@beta
if: steps.detect.outputs.anthropic_ok == 'true' && steps.detect.outputs.unity_ok == 'true'
continue-on-error: true
with:
use_node_cache: false
prompt_file: .claude/prompts/nl-unity-suite-full-additive.md
mcp_config: .claude/mcp.json
allowed_tools: >-
Write,
mcp__unity__manage_editor,
mcp__unity__list_resources,
mcp__unity__read_resource,
mcp__unity__apply_text_edits,
mcp__unity__script_apply_edits,
mcp__unity__validate_script,
mcp__unity__find_in_file,
mcp__unity__read_console,
mcp__unity__get_sha
disallowed_tools: TodoWrite,Task,Bash
model: claude-3-7-sonnet-latest
settings: .claude/settings.json
allowed_tools: "mcp__unity,Edit(reports/**)"
disallowed_tools: "Bash,MultiEdit,WebFetch,WebSearch,Task,TodoWrite,NotebookEdit,NotebookRead"
model: claude-3-5-haiku-20241022
fallback_model: claude-3-7-sonnet-20250219
append_system_prompt: |
You are running the T pass only. Do not run any NL-* tests.
Emit only T-A..T-J fragments and stop.
timeout_minutes: "30"
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}

Expand Down
31 changes: 27 additions & 4 deletions README-DEV.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,27 @@ Restores original files from backup.
2. Allows you to select which backup to restore
3. Restores both Unity Bridge and Python Server files

### `prune_tool_results.py`
Compacts large `tool_result` blobs in conversation JSON into concise one-line summaries.

**Usage:**
```bash
python3 prune_tool_results.py < reports/claude-execution-output.json > reports/claude-execution-output.pruned.json
```

The script reads a conversation from `stdin` and writes the pruned version to `stdout`, making logs much easier to inspect or archive.

### Lean Tool Responses
To keep live conversations small, server tools now emit minimal payloads by default:

* `find_in_file` – first match positions only (`startLine/Col`, `endLine/Col`).
* `read_console` – full entries by default; pass `include_stacktrace=False` to trim to `{level, message}` (use `count` to limit).
* `validate_script` – diagnostics summarized as `{warnings, errors}` counts.
* `get_sha` – `{sha256, lengthBytes}` only.
* `read_resource` – returns only `metadata.sha256` and byte length unless `include_text` or window arguments are provided.

These defaults dramatically cut token usage without affecting essential information.

## Finding Unity Package Cache Path

Unity stores Git packages under a version-or-hash folder. Expect something like:
Expand All @@ -70,10 +91,12 @@ Note: In recent builds, the Python server sources are also bundled inside the pa

We provide a CI job to run a Natural Language Editing mini-suite against the Unity test project. It spins up a headless Unity container and connects via the MCP bridge.

- Trigger: Workflow dispatch (`Claude NL suite (Unity live)`).
- Image: `UNITY_IMAGE` (UnityCI) pulled by tag; the job resolves a digest at runtime. Logs are sanitized.
- Reports: JUnit at `reports/junit-nl-suite.xml`, Markdown at `reports/junit-nl-suite.md`.
- Publishing: JUnit is normalized to `reports/junit-for-actions.xml` and published; artifacts upload all files under `reports/`.
- Trigger: Workflow dispatch (`Claude NL suite (Unity live)`).
- Image: `UNITY_IMAGE` (UnityCI) pulled by tag; the job resolves a digest at runtime. Logs are sanitized.
- Execution: runs in two passes (NL then T) so each session stays lean.
- Tool permissions are pinned via `.claude/settings.json`, allowing Unity MCP tools and edits under `reports/` only.
- Reports: JUnit at `reports/junit-nl-suite.xml`, Markdown at `reports/junit-nl-suite.md`.
- Publishing: JUnit is normalized to `reports/junit-for-actions.xml` and published; artifacts upload all files under `reports/`.

### Test target script
- The repo includes a long, standalone C# script used to exercise larger edits and windows:
Expand Down
4 changes: 4 additions & 0 deletions UnityMcpBridge/Editor/Tools/ManageScript.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1347,6 +1347,10 @@ private static object EditScript(
appliedCount = replacements.Count;
}

// Guard against structural imbalance before validation
if (!CheckBalancedDelimiters(working, out int lineBal, out char expectedBal))
return Response.Error("unbalanced_braces", new { status = "unbalanced_braces", line = lineBal, expected = expectedBal.ToString() });

// No-op guard for structured edits: if text unchanged, return explicit no-op
if (string.Equals(working, original, StringComparison.Ordinal))
{
Expand Down
Loading
Loading