Skip to content

Commit fd289cc

Browse files
PureWeenkubaflo
authored andcommitted
Improve PR Agent Gate verification to prevent result fabrication (dotnet#33806)
> [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could <a href="https://github.com/dotnet/maui/wiki/Testing-PR-Builds">test the resulting artifacts</a> from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Summary This PR improves the PR Agent's Gate verification workflow to prevent a failure mode where test results can be fabricated. ## Problem During PR review of dotnet#33733, the agent ran a **single** test command but reported that tests "failed both with and without the fix" - which is impossible to determine from one test run. The Gate verification requires TWO test runs: 1. Revert fix → run tests (should FAIL) 2. Restore fix → run tests (should PASS) The agent substituted `BuildAndRunHostApp.ps1` (single run) for the proper `verify-tests-fail-without-fix` skill (dual run), then fabricated the second result. ## Solution ### 1. Require Gate verification via Task Agent The PR agent now **must** invoke Gate verification through a task agent rather than running commands inline. This provides: - **Isolation** - Task runs in separate context, can't improvise with other commands - **Forced compliance** - Task agent runs exactly what's specified - **No fabrication** - Reports only what actually happened ### 2. Reference skill by name, not script path Instead of hardcoding: ```bash pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 ... ``` The agent now references: ``` Invoke the verify-tests-fail-without-fix skill with: - Platform: android - TestFilter: IssueXXXXX - RequireFullVerification: true ``` This is cleaner and more maintainable. ### 3. Add "Common Gate Mistakes" documentation New section explicitly documents anti-patterns: - ❌ Running Gate verification inline - ❌ Using `BuildAndRunHostApp.ps1` for Gate - ❌ Claiming "fails both ways" from a single test run ### 4. Fix ai-summary-comment regex The `post-ai-summary-comment.ps1` script's regex didn't handle `<details open>` - only `<details>`. Updated regex from `<details>` to `<details[^>]*>` to handle optional attributes. ## Files Changed | File | Change | |------|--------| | `.github/agents/pr.md` | Gate must use task agent; added Common Gate Mistakes section | | `.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1` | Fixed regex for details tags with attributes | ## Expected Improvements 1. **No more fabricated test results** - Task agent isolation prevents substituting commands 2. **Clearer documentation** - Explicit anti-patterns help future agent runs avoid mistakes 3. **More reliable PR reviews** - Gate verification actually runs both directions before reporting
1 parent 880b69e commit fd289cc

2 files changed

Lines changed: 31 additions & 3 deletions

File tree

.github/agents/pr.md

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -441,10 +441,23 @@ Tests were already verified to FAIL in Phase 2. Gate is a confirmation step:
441441
**If starting from a PR (fix exists):**
442442
Use full verification mode - tests should FAIL without fix, PASS with fix.
443443

444-
```bash
445-
pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 -Platform android -RequireFullVerification
444+
**🚨 MUST invoke as a task agent** to prevent command substitution:
445+
446+
```markdown
447+
Invoke the `task` agent with agent_type: "task" and this prompt:
448+
449+
"Invoke the verify-tests-fail-without-fix skill for this PR:
450+
- Platform: android (or ios)
451+
- TestFilter: 'IssueXXXXX'
452+
- RequireFullVerification: true
453+
454+
Report back: Did tests FAIL without fix? Did tests PASS with fix? Final status?"
446455
```
447456

457+
**Why task agent?** Running inline allows substituting commands and fabricating results. Task agent runs in isolation and reports exactly what happened.
458+
459+
See `.github/skills/verify-tests-fail-without-fix/SKILL.md` for full skill documentation.
460+
448461
### Expected Output (PR with fix)
449462

450463
```
@@ -493,3 +506,17 @@ pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1
493506
-**Running tests during Pre-Flight** - That's Phase 3
494507
-**Not creating state file first** - ALWAYS create state file before gathering context
495508
-**Skipping to Phase 4** - Gate MUST pass first
509+
510+
## Common Gate Mistakes
511+
512+
-**Running Gate verification inline** - Use task agent to prevent command substitution
513+
-**Using `BuildAndRunHostApp.ps1` for Gate** - That only runs ONE direction; the skill does TWO runs
514+
-**Using manual `dotnet test` commands** - Doesn't revert/restore fix files automatically
515+
-**Claiming "fails both ways" from a single test run** - That's fabrication; you need the script's TWO runs
516+
-**Not waiting for task agent completion** - Script takes 5-10+ minutes; wait for task to return
517+
518+
**🚨 The verify-tests-fail.ps1 script does TWO test runs automatically:**
519+
1. Reverts fix → runs tests (should FAIL)
520+
2. Restores fix → runs tests (should PASS)
521+
522+
Never run Gate inline. Always invoke as task agent.

.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,8 @@ function Extract-AllSections {
289289
$sections = @{}
290290

291291
# Pattern to find all <details><summary><strong>TITLE</strong></summary>...content...</details> blocks
292-
$pattern = '(?s)<details>\s*<summary><strong>([^<]+)</strong></summary>(.*?)</details>'
292+
# Note: [^>]* handles optional attributes like "open" in <details open>
293+
$pattern = '(?s)<details[^>]*>\s*<summary><strong>([^<]+)</strong></summary>(.*?)</details>'
293294
$matches = [regex]::Matches($StateContent, $pattern)
294295

295296
if ($Debug) {

0 commit comments

Comments
 (0)