Evolve experiment plan schema: YAML frontmatter for machine-readable metadata

## Summary

Update `plan-experiment` to produce a YAML frontmatter block at the top of experiment plan files. This structured metadata enables automated design review (`review-design`) to parse experiment type, hypotheses, metrics, baselines, and statistical plan without LLM extraction.

## Changes

### plan-experiment skill update
- Add Step 3a after prose plan is drafted: extract structured info into YAML frontmatter between `---` delimiters before the `# Experiment Plan:` heading
- Accept optional second positional argument `revision_guidance` (path to revision feedback from review-design). When present, read it and incorporate the feedback. When absent or empty, proceed normally (first pass).

### Frontmatter schema

```yaml
---
experiment_type: benchmark
# REQUIRED. One of: benchmark, configuration_study, causal_inference, robustness_audit, exploratory

estimand:
  # RECOMMENDED. Required when experiment_type = causal_inference.
  treatment: "{the intervention or manipulation}"
  outcome: "{the measured effect}"
  population: "{scope of units/datasets/contexts}"
  contrast: "{A vs B vs C comparison}"

hypothesis_h0: "{null hypothesis with measurable threshold}"   # REQUIRED
hypothesis_h1: "{alt hypothesis with measurable threshold}"    # REQUIRED

metrics:
  # REQUIRED, min 1
  - name: "{metric_name}"
    unit: "{unit of measurement}"
    canonical_name: "{src/metrics.rs entry or NEW}"
    collection_method: "{exact command or code path}"
    threshold: "{success threshold}"
    direction: "higher_is_better"    # optional: higher_is_better | lower_is_better | target_value
    primary: true                     # optional: true for the one metric H1 references

baselines:
  # REQUIRED for benchmark/causal_inference
  - name: "{comparator name}"
    version: "{package==version or git SHA}"
    tuning_budget: "{what tuning was done, or 'default'}"

statistical_plan:
  # REQUIRED unless experiment_type = exploratory
  test: "{primary statistical test name}"
  alpha: 0.05
  power_target: 0.80
  correction_method: "Holm-Bonferroni"   # null | Bonferroni | Holm-Bonferroni | BH
  sample_size_justification: "{why N is sufficient}"
  min_detectable_effect: "{MDE in metric units}"   # optional

environment:
  # REQUIRED
  type: "custom"   # standard | custom
  spec_path: "research/{slug}/environment.yml"   # required when type=custom

success_criteria:
  # REQUIRED, all three sub-fields
  conclusive_positive: "{conditions supporting H1, referencing metrics}"
  conclusive_negative: "{conditions supporting H0}"
  inconclusive: "{conditions where no conclusion can be drawn}"

experiment_slug: "{YYYY-MM-DD-slug}"   # optional
---
```

### Field requirements by experiment type

| Field | benchmark | config_study | causal_inference | robustness_audit | exploratory |
|-------|-----------|-------------|-----------------|-----------------|-------------|
| experiment_type | required | required | required | required | required |
| estimand | recommended | recommended | **required** (with contrast) | recommended | optional |
| hypothesis_h0/h1 | required | required | required | required | required |
| metrics | required | required | required | required | required |
| baselines | **required** | optional | **required** | optional | optional |
| statistical_plan | required | required | required | required | **waived** |
| environment | required | required | required | required | required |
| success_criteria | required | required | required | required | required |

### Validation rules (applied before writing frontmatter)

```
V1: benchmark/causal_inference → len(baselines) >= 1 AND each baseline.version not empty
    ERROR: "Benchmark/causal_inference experiments require at least one named baseline with a version"

V2: causal_inference → estimand.contrast is not null
    ERROR: "causal_inference requires estimand with treatment, outcome, and contrast fields"

V3: !exploratory → statistical_plan present AND test not null
    ERROR: "Non-exploratory experiments require a statistical_plan; use {test: 'none'} to waive"

V4: environment.type=custom → spec_path not null
    ERROR: "Custom environment requires spec_path pointing to environment.yml"

V5: len(metrics) >= 2 → exactly one metric has primary: true
    WARNING: "Multiple metrics but no primary designated; H1 threshold ambiguous"

V6: any metric.canonical_name = "NEW"
    WARNING: "Plan includes NEW metrics not yet in src/metrics.rs"

V7: hypothesis_h1 has no numeric threshold
    WARNING: "H1 should include a measurable numeric threshold"

V8: success_criteria.conclusive_positive should reference at least one metric.name
    WARNING: "Success criteria does not reference any declared metric"
```

Log warnings as YAML comments (`# WARNING: ...`) in the frontmatter block.

### Prose section ↔ frontmatter mapping

| Prose Section | Frontmatter Field(s) |
|---------------|---------------------|
| `## Hypothesis` (H0/H1 bold labels) | `hypothesis_h0`, `hypothesis_h1`, `estimand` |
| `## Independent Variables` table | `estimand.contrast`, `baselines[]` |
| `## Dependent Variables (Metrics)` table | `metrics[]` |
| `## Environment` | `environment` |
| `## Analysis Plan` | `statistical_plan` |
| `## Success Criteria` | `success_criteria` |
| `## Experiment Directory Layout` | `experiment_slug` |

### Backward compatibility
- Plans without frontmatter must still be consumable by all downstream skills (scope, review-design, implement-experiment, run-experiment, write-report)
- Frontmatter is additive — all existing prose sections remain unchanged
- `review-design` handles missing frontmatter via targeted LLM extraction fallback (per-field, not whole-plan)

## Tests
- plan-experiment output has valid YAML frontmatter with at least `experiment_type`, `hypothesis_h0`, `hypothesis_h1`, `metrics`
- All 8 validation rules apply correctly for each experiment type
- Plans without frontmatter are handled gracefully by downstream skills
- plan-experiment accepts and uses optional revision_guidance second argument
- plan-experiment without second argument works identically to current behavior

## Dependencies
Depends on #589 (recipe simplification — plan-experiment needs to accept revision_guidance arg)

Prose Section	Frontmatter Field(s)
`## Hypothesis` (H0/H1 bold labels)	`hypothesis_h0`, `hypothesis_h1`, `estimand`
`## Independent Variables` table	`estimand.contrast`, `baselines[]`
`## Dependent Variables (Metrics)` table	`metrics[]`
`## Environment`	`environment`
`## Analysis Plan`	`statistical_plan`
`## Success Criteria`	`success_criteria`
`## Experiment Directory Layout`	`experiment_slug`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evolve experiment plan schema: YAML frontmatter for machine-readable metadata #590

Summary

Changes

plan-experiment skill update

Frontmatter schema

Field requirements by experiment type

Validation rules (applied before writing frontmatter)

Prose section ↔ frontmatter mapping

Backward compatibility

Tests

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	benchmark	config_study	causal_inference	robustness_audit	exploratory
experiment_type	required	required	required	required	required
estimand	recommended	recommended	required (with contrast)	recommended	optional
hypothesis_h0/h1	required	required	required	required	required
metrics	required	required	required	required	required
baselines	required	optional	required	optional	optional
statistical_plan	required	required	required	required	waived
environment	required	required	required	required	required
success_criteria	required	required	required	required	required

Evolve experiment plan schema: YAML frontmatter for machine-readable metadata #590

Description

Summary

Changes

plan-experiment skill update

Frontmatter schema

Field requirements by experiment type

Validation rules (applied before writing frontmatter)

Prose section ↔ frontmatter mapping

Backward compatibility

Tests

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions