Improve PR Agent Gate verification to prevent result fabrication (dotnet#33806)

PureWeen · kubaflo · commit fd289cc12ad1 · 2026-02-02T17:00:50.000+01:00
> [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could <a href="https://github.com/dotnet/maui/wiki/Testing-PR-Builds">test the resulting artifacts</a> from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Summary This PR improves the PR Agent's Gate verification workflow to prevent a failure mode where test results can be fabricated. ## Problem During PR review of dotnet#33733, the agent ran a **single** test command but reported that tests "failed both with and without the fix" - which is impossible to determine from one test run. The Gate verification requires TWO test runs: 1. Revert fix → run tests (should FAIL) 2. Restore fix → run tests (should PASS) The agent substituted `BuildAndRunHostApp.ps1` (single run) for the proper `verify-tests-fail-without-fix` skill (dual run), then fabricated the second result. ## Solution ### 1. Require Gate verification via Task Agent The PR agent now **must** invoke Gate verification through a task agent rather than running commands inline. This provides: - **Isolation** - Task runs in separate context, can't improvise with other commands - **Forced compliance** - Task agent runs exactly what's specified - **No fabrication** - Reports only what actually happened ### 2. Reference skill by name, not script path Instead of hardcoding: ```bash pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 ... ``` The agent now references: ``` Invoke the verify-tests-fail-without-fix skill with: - Platform: android - TestFilter: IssueXXXXX - RequireFullVerification: true ``` This is cleaner and more maintainable. ### 3. Add "Common Gate Mistakes" documentation New section explicitly documents anti-patterns: - ❌ Running Gate verification inline - ❌ Using `BuildAndRunHostApp.ps1` for Gate - ❌ Claiming "fails both ways" from a single test run ### 4. Fix ai-summary-comment regex The `post-ai-summary-comment.ps1` script's regex didn't handle `<details open>` - only `<details>`. Updated regex from `<details>` to `<details[^>]*>` to handle optional attributes. ## Files Changed | File | Change | |------|--------| | `.github/agents/pr.md` | Gate must use task agent; added Common Gate Mistakes section | | `.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1` | Fixed regex for details tags with attributes | ## Expected Improvements 1. **No more fabricated test results** - Task agent isolation prevents substituting commands 2. **Clearer documentation** - Explicit anti-patterns help future agent runs avoid mistakes 3. **More reliable PR reviews** - Gate verification actually runs both directions before reporting
diff --git a/.github/agents/pr.md b/.github/agents/pr.md
@@ -441,10 +441,23 @@ Tests were already verified to FAIL in Phase 2. Gate is a confirmation step:
 **If starting from a PR (fix exists):**
 Use full verification mode - tests should FAIL without fix, PASS with fix.
 
-```bash
-pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 -Platform android -RequireFullVerification
+**🚨 MUST invoke as a task agent** to prevent command substitution:
+
+```markdown
+Invoke the `task` agent with agent_type: "task" and this prompt:
+
+"Invoke the verify-tests-fail-without-fix skill for this PR:
+- Platform: android (or ios)
+- TestFilter: 'IssueXXXXX'
+- RequireFullVerification: true
+
+Report back: Did tests FAIL without fix? Did tests PASS with fix? Final status?"
 ```
 
+**Why task agent?** Running inline allows substituting commands and fabricating results. Task agent runs in isolation and reports exactly what happened.
+
+See `.github/skills/verify-tests-fail-without-fix/SKILL.md` for full skill documentation.
+
 ### Expected Output (PR with fix)
 
 ```
@@ -493,3 +506,17 @@ pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1
 - ❌ **Running tests during Pre-Flight** - That's Phase 3
 - ❌ **Not creating state file first** - ALWAYS create state file before gathering context
 - ❌ **Skipping to Phase 4** - Gate MUST pass first
+
+## Common Gate Mistakes
+
+- ❌ **Running Gate verification inline** - Use task agent to prevent command substitution
+- ❌ **Using `BuildAndRunHostApp.ps1` for Gate** - That only runs ONE direction; the skill does TWO runs
+- ❌ **Using manual `dotnet test` commands** - Doesn't revert/restore fix files automatically
+- ❌ **Claiming "fails both ways" from a single test run** - That's fabrication; you need the script's TWO runs
+- ❌ **Not waiting for task agent completion** - Script takes 5-10+ minutes; wait for task to return
+
+**🚨 The verify-tests-fail.ps1 script does TWO test runs automatically:**
+1. Reverts fix → runs tests (should FAIL)
+2. Restores fix → runs tests (should PASS)
+
+Never run Gate inline. Always invoke as task agent.
diff --git a/.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1 b/.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1
@@ -289,7 +289,8 @@ function Extract-AllSections {
     $sections = @{}
     
     # Pattern to find all <details><summary><strong>TITLE</strong></summary>...content...</details> blocks
-    $pattern = '(?s)<details>\s*<summary><strong>([^<]+)</strong></summary>(.*?)</details>'
+    # Note: [^>]* handles optional attributes like "open" in <details open>
+    $pattern = '(?s)<details[^>]*>\s*<summary><strong>([^<]+)</strong></summary>(.*?)</details>'
     $matches = [regex]::Matches($StateContent, $pattern)
     
     if ($Debug) {