Skip to content

Implementation Plan: Add Red-Team Severity Calibration by Experiment Type in review-design#614

Merged
Trecek merged 8 commits intointegrationfrom
add-red-team-severity-calibration-by-experiment-type-in-revi/609
Apr 5, 2026
Merged

Implementation Plan: Add Red-Team Severity Calibration by Experiment Type in review-design#614
Trecek merged 8 commits intointegrationfrom
add-red-team-severity-calibration-by-experiment-type-in-revi/609

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 5, 2026

Summary

The review-design skill has L1 severity calibration that correctly caps estimand_clarity and hypothesis_falsifiability by experiment_type — benchmarks can never produce L1 critical findings. But the red-team dimension has no analogous calibration, meaning any critical red-team finding triggers STOP regardless of experiment type. This creates an unresolvable loop for benchmarks: the red-team always finds new critical issues at progressively higher abstraction (the Hydra pattern), exhausting retries without ever producing GO.

The fix adds a red-team severity calibration rubric to review-design/SKILL.md (mirroring the L1 rubric), updates the verdict logic to apply the cap before building stop_triggers, and adds diminishing-return awareness to resolve-design-review/SKILL.md so it can detect goalposts-moving across rounds.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;

    START([Plan submitted])
    GO([GO → execute])
    REVISE_OUT([REVISE → revise_design])
    REVISED_OUT([revised → revise_design])
    FAILED_OUT([failed → design_rejected])

    subgraph ReviewDesign ["● review-design/SKILL.md"]
        direction TB
        L1["L1 Analysis<br/>━━━━━━━━━━<br/>estimand_clarity +<br/>hypothesis_falsifiability"]
        L1GATE{"L1 Fail-Fast<br/>━━━━━━━━━━<br/>Any L1 critical?"}
        PARALLEL["L2 + L3 + L4 + RT<br/>━━━━━━━━━━<br/>Parallel analysis"]
        RTCAP["● RT Severity Cap<br/>━━━━━━━━━━<br/>RT_MAX_SEVERITY[experiment_type]<br/>Downgrade if above ceiling"]
        MERGE["Merge + Dedup<br/>━━━━━━━━━━<br/>All findings pooled"]
        VERDICT{"● Verdict Logic<br/>━━━━━━━━━━<br/>stop_triggers built<br/>AFTER rt_cap applied"}
    end

    subgraph ResolveDesign ["● resolve-design-review/SKILL.md"]
        direction TB
        PARSE["Step 1: Parse Dashboard<br/>━━━━━━━━━━<br/>Extract stop-trigger findings<br/>Classify ADDRESSABLE/STRUCTURAL/DISCUSS"]
        DIMCHECK{"prior_revision_guidance<br/>━━━━━━━━━━<br/>provided?"}
        DIMRET["● Step 1.5: Diminishing-Return<br/>━━━━━━━━━━<br/>Compare ADDRESSABLE themes<br/>vs prior guidance entries"]
        GOALPOST{"goalposts_moving<br/>━━━━━━━━━━<br/>true for any finding?"}
        RECLASSIFY["● Reclassify<br/>━━━━━━━━━━<br/>ADDRESSABLE → STRUCTURAL<br/>annotate prior_theme_match"]
        RESGATE{"Any ADDRESSABLE<br/>or DISCUSS?"}
    end

    subgraph RecipeRouting ["● research.yaml — resolve_design_review step"]
        direction LR
        RECIPE["skill_command passes<br/>━━━━━━━━━━<br/>$context.revision_guidance<br/>as optional 3rd arg"]
    end

    START --> L1
    L1 --> L1GATE
    L1GATE -->|"yes (L1 critical)"| MERGE
    L1GATE -->|"no"| PARALLEL
    PARALLEL --> RTCAP
    RTCAP --> MERGE
    MERGE --> VERDICT
    VERDICT -->|"stop_triggers present"| RECIPE
    VERDICT -->|"critical or ≥3 warnings"| REVISE_OUT
    VERDICT -->|"otherwise"| GO

    RECIPE --> PARSE
    PARSE --> DIMCHECK
    DIMCHECK -->|"yes"| DIMRET
    DIMCHECK -->|"no (round 1)"| RESGATE
    DIMRET --> GOALPOST
    GOALPOST -->|"true"| RECLASSIFY
    GOALPOST -->|"false"| RESGATE
    RECLASSIFY --> RESGATE
    RESGATE -->|"yes"| REVISED_OUT
    RESGATE -->|"all STRUCTURAL"| FAILED_OUT

    class START,GO,REVISE_OUT,REVISED_OUT,FAILED_OUT terminal;
    class L1,PARALLEL handler;
    class L1GATE,VERDICT,DIMCHECK,GOALPOST,RESGATE stateNode;
    class MERGE,PARSE phase;
    class RTCAP,DIMRET,RECLASSIFY newComponent;
    class RECIPE detector;
Loading

Color Legend:

Color Category Description
Dark Blue Terminal Start and outcome states
Orange Handler Analysis agents (L1, parallel L2-L4+RT)
Teal State Decision points and verdict routing
Purple Phase Merge and parse aggregation steps
Green Modified Component ● Nodes changed by this PR (RT cap, diminishing-return detection, reclassify, recipe routing)
Red Detector Recipe routing gate (passes revision_guidance)

Closes #609

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-20260404-185816-184240/.autoskillit/temp/make-plan/add-red-team-severity-calibration-by-experiment-type_plan_2026-04-04_185816.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step input output cached count time
plan 5.5k 76.5k 6.0M 5 32m 41s
verify 3.1k 86.2k 5.4M 5 31m 25s
implement 1.1k 116.2k 22.6M 6 50m 55s
fix 214 28.4k 3.5M 5 30m 58s
audit_impl 137 58.9k 3.1M 5 19m 28s
open_pr 135 68.4k 5.4M 4 23m 1s
review_pr 31 22.8k 1.2M 1 5m 50s
Total 10.2k 457.5k 47.2M 3h 14m

Trecek and others added 6 commits April 4, 2026 19:35
Add experiment-type-aware severity cap for red-team findings, mirroring
the existing L1 calibration rubric. Benchmarks cap at warning (no STOP),
causal_inference retains critical, exploratory caps at info. The cap is
applied in Step 7 before verdict logic evaluates stop_triggers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add optional prior_revision_guidance_path argument and Step 1.5 that
detects goalposts-moving findings by comparing current ADDRESSABLE
findings against prior revision themes. Goalposts-moving findings are
reclassified as STRUCTURAL to terminate non-converging review cycles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…urn detection

Add 4 tests for review-design (rubric present, cap before verdict,
benchmark cannot STOP, causal_inference can STOP) and 3 tests for
resolve-design-review (diminishing-return present, goalposts reclassified
as STRUCTURAL, revision_guidance context input).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ch recipe

Update skill_command to include context.revision_guidance as third arg
and add revision_guidance to optional_context_refs for backward compat.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract column-value mapping from header+data rows instead of searching
for experiment type names in data rows (which appear only in headers).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested (8 blocking issues found; see inline comments)

def test_red_team_severity_cap_applied_before_verdict(skill_text: str) -> None:
"""Severity cap must be applied BEFORE building stop_triggers in verdict logic.

Without this ordering, red-team criticals bypass the cap and still trigger STOP.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[critical] tests: Syntax error: the docstring for test_red_team_severity_cap_applied_before_verdict is missing its opening """. Line 322 reads """Severity cap must be applied BEFORE building stop_triggers in verdict logic.""" (closing on same line), then L324 is a bare string Without this ordering, red-team criticals bypass the cap... followed by """ on L325. This makes L324 a bare expression that is NOT inside a triple-quoted string, causing a SyntaxError at import time — the entire test module fails to collect.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The docstring is correctly formed: """Severity cap must be applied BEFORE building stop_triggers in verdict logic.\n\n Without this ordering...\n """ (opening triple-quote on L322, closing on L325). ast.parse() confirms no SyntaxError. The diff hunk visible to the reviewer was truncated before the closing line. No change needed.


Without this ordering, red-team criticals bypass the cap and still trigger STOP.
"""
step7_text = skill_text_between("### Step 7", "### Step 8", skill_text)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: skill_text_between("### Step 7", "### Step 8", skill_text) is called but skill_text_between is not imported or defined anywhere in the visible diff. If this helper is absent from the existing test file, the test will raise a NameError at runtime.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. skill_text_between is defined at line 190 of the same file (def skill_text_between(start_heading: str, end_heading: str, text: str) -> str:). The function predates this PR. The reviewer's diff hunk started at line 294 and did not include the pre-existing helper definition above it.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 8 blocking issues. See inline comments.

Verdict: changes_requested

Critical (1):

  • tests/skills/test_review_design_contracts.py L324: Syntax error — missing opening """ for docstring of test_red_team_severity_cap_applied_before_verdict. Module fails to import.

Warnings (7):

  • tests/skills/test_review_design_contracts.py L326: skill_text_between called but not imported/defined — NameError at runtime
  • tests/skills/test_review_design_contracts.py L314: Arbitrary 1000-char window in rubric presence check — fragile assertion
  • tests/skills/test_review_design_contracts.py L341: Private helper _parse_rt_rubric placed mid-sequence between public tests — cohesion violation
  • tests/skills/test_review_design_contracts.py L346: Missing equal-length guard before zip(headers[1:], values[1:]) — silent truncation
  • tests/skills/test_review_design_contracts.py L347: Off-by-one risk if table_lines[1] absent — IndexError instead of informative assertion
  • tests/skills/test_review_design_contracts.py L350: Fragile [1:] slice assumes stable table structure — silent off-by-one if table reformatted
  • src/autoskillit/skills_extended/review-design/SKILL.md L315: Asymmetric naming: critical vs warning_findings — inconsistent _findings suffix

Info (2, not blocking):

  • src/autoskillit/skills_extended/review-design/SKILL.md L319: Hyphen vs en-dash in comment
  • tests/skills/test_resolve_design_review_contracts.py L93: Three-way OR in assertion is too easy to satisfy accidentally

Trecek and others added 2 commits April 4, 2026 20:47
…tion boundary, add equality assertions, drop [1:] slicing

- Move _parse_rt_rubric to top of red-team section so it precedes all tests that call it
- Replace 1000-char fixed window with next-section-heading boundary in both _parse_rt_rubric and test_red_team_severity_calibration_rubric_present
- Change len(table_lines) >= 2 to == 2 to enforce exact one-header/one-data-row structure
- Add assert len(headers) == len(values) before zip() to catch mismatched column counts
- Drop [1:] slicing; use dict(zip(headers, values)) so callers look up by name without index-based alignment assumptions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ng in verdict logic

warning_findings already uses the _findings suffix; align critical to match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek enabled auto-merge April 5, 2026 04:00
@Trecek Trecek disabled auto-merge April 5, 2026 04:01
@Trecek Trecek added this pull request to the merge queue Apr 5, 2026
Merged via the queue into integration with commit 75eafa2 Apr 5, 2026
2 checks passed
@Trecek Trecek deleted the add-red-team-severity-calibration-by-experiment-type-in-revi/609 branch April 5, 2026 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant