[research] Dynamic triage routing: learn per-finding layer selection (paper-novel contribution)

## What

Train a small classifier that decides **per finding** which subset
of pwnkit's 11 triage layers to invoke, instead of statically
toggling them via env vars. The training signal is the per-layer
telemetry from #112.

This is the novel contribution worth publishing in the joint paper
with @quguanni (#67) — every existing hybrid system uses a static
pipeline. Nobody has shipped learned routing.

## Motivation

Per [#72's per-profile ablation](https://github.com/PwnKit-Labs/pwnkit/issues/72#issuecomment-4229254355):

| Profile | Solved | Findings | Cost |
|---|---:|---:|---:|
| `none` (no triage at all) | **4/14** | 14 | $9.91 |
| `no-triage` | 2/14 | 10 | $8.95 |
| `moat-only` | **0/14** | 5 | $23.48 |
| `moat` | **0/14** | 1 | $17.22 |

The triage layers help on some findings and hurt on others. Static
toggling at the scan level is the wrong granularity:

- For a high-confidence SQLi with a clear error-based signal, the
  oracle is free, deterministic, and sufficient — running the
  expensive `adversarialDebate` layer adds latency, cost, and
  *removes* a real finding (per the ablation).
- For an ambiguous logic bug where the agent's confidence is 0.4,
  the structured 4-step verify and the PoV gate add real signal.
- For a finding matching a known FP pattern in `triageMemories`,
  every other layer should be skipped — auto-reject.

The right granularity is per-finding, not per-scan. The routing
decision is itself a learnable function of the finding's features
and category.

## Architecture

```
Finding arrives
    ↓
Layer 1 feature extractor (already shipped)
    ↓
45-dim vector + finding text
    ↓
Routing classifier (NEW)
    ↓
{
  "tp_score": 0.72,
  "layers_to_run": ["oracle", "structured_verify"],
  "layers_to_skip": ["pov_gate", "adversarial_debate", "memories"],
  "auto_accept_if_score_above": 0.95,
  "auto_reject_if_score_below": 0.20
}
    ↓
For each layer in layers_to_run: invoke
For each layer in layers_to_skip: don't invoke
Aggregate verdicts → final triage decision
```

The routing head can be a multi-label classifier sitting alongside
the TP/FP head, both consuming the same 45-feature vector + text
embedding from the VulnBERT-style hybrid encoder.

## Training data

Comes from #112 once it lands. Each finding becomes a row of:

```json
{
  "features": [...45 numbers],
  "text": "Title: ...\nCategory: ...\n...",
  "ground_truth_label": 1,
  "layer_verdicts": [
    { "layer": "holding_it_wrong", "verdict": "pass", "duration_ms": 0.3 },
    { "layer": "evidence_gate", "verdict": "pass", "duration_ms": 0.1 },
    { "layer": "oracle", "verdict": "pass", "confidence": 0.95, "duration_ms": 4200 },
    { "layer": "structured_verify", "verdict": "pass", "confidence": 0.88, "duration_ms": 18400, "cost_usd": 0.12 },
    { "layer": "adversarial_debate", "verdict": "reject", "confidence": 0.62, "duration_ms": 31200, "cost_usd": 0.41 }
  ],
  "final_verdict": "rejected"
}
```

The router's learning objective: predict which subset of layers
**would have produced the same final verdict** at the lowest cost.
A layer that never changes the verdict for findings of a given
shape is dead weight for that shape — the router learns to skip it.

## Why this is novel

Existing hybrid vulnerability triage systems use static pipelines:

| System | Architecture | Routing |
|---|---|---|
| Datadog Bits AI | LLM classifier → SAST findings | Static (one model on every finding) |
| GitHub Sec Lab Taskflow | GPT-4 + 7 YAML subtasks | Static (every subtask runs every time) |
| Endor Labs AI SAST | Rules + reachability + LLM | Static (rules → reachability → LLM, fixed order) |
| Semgrep Assistant | LLM + per-target memories | Static (memory check → LLM, fixed order) |
| SAST-Genius (arxiv 2509.15433) | Static analysis + LLM hybrid | Static |
| SecureFixAgent (arxiv 2509.16275) | Bandit + LoRA LLM iterative loop | Static |
| VulnBERT (Pebblebed) | CodeBERT + features + cross-attention | Static (one model, one verdict) |

Nobody routes findings to different verification stages based on a
learned function of the finding's content. That's the gap.

## The pwnkit-side analogy

pwnkit already has a precedent for this pattern, just not at the
triage layer:

- **`payload_lookup`** — the agent doesn't get the 22KB JSFuck XSS
  payload in its context by default. It calls `payload_lookup` when
  it detects a letter/digit filter. The agent decides when to
  unlock the capability.
- **`dynamicPlaybooks`** — vulnerability-specific playbooks (SQLi,
  SSTI, auth chain) get auto-injected at ~30% of the budget *based
  on what the agent observed during recon*. The injection is
  conditional on detected patterns, not static.
- **EGATS** — the agent expands attack-tree branches *based on
  evidence scores*, not a fixed schedule.

All three are existing dynamic-capability-unlock patterns. The
routing classifier is the same idea applied to the triage stage:
unlock layers based on what the finding looks like, not on what
the operator set in env vars.

## What this issue is asking for

Phase 1 — design doc only. No code yet. Specifically:

- [ ] Pick the routing model class (multi-label LR over the 45
      features? small MLP head on the VulnBERT-web encoder? per-category
      gating tree?). Defend the choice with the constraint that
      inference must be sub-millisecond per finding.
- [ ] Pick the training objective. Multi-label binary cross-entropy
      over `(layer, would_produce_same_final_verdict)` is the
      obvious starting point but needs to be defended against
      simpler baselines.
- [ ] Define the evaluation metric. Cost-saved-per-recall-lost is
      the right shape but the units need pinning down.
- [ ] Define the rollout: feature flag (`PWNKIT_FEATURE_LEARNED_ROUTER`),
      A/B harness against the existing static stack, the comparison
      points (none, default, moat, **router**).
- [ ] Identify what training data we need beyond #112's telemetry
      and how much of it (are 100 labeled findings enough? 10k?)

Phase 2 (separate issue, blocked on this) — actually train a model
and run the A/B.

## Out of scope

- Picking the text encoder (CodeBERT vs Llama-8B vs ?). Guanni's
  call (#67).
- Replacing the existing 11 layers with new ones. This issue is
  about *routing* the existing layers, not redesigning them.
- Production-grade ONNX runtime integration. Phase 2.

## Related

- #67 — joint paper plan, this is the novel contribution
- #72 — ablation that motivated this (showed static layers help
  on some findings, hurt on others)
- #111 — npm-bench recall analysis (the target metric to improve)
- #112 — per-layer telemetry (hard prerequisite, must land first)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[research] Dynamic triage routing: learn per-finding layer selection (paper-novel contribution) #113

What

Motivation

Architecture

Training data

Why this is novel

The pwnkit-side analogy

What this issue is asking for

Out of scope

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Profile	Solved	Findings	Cost
`none` (no triage at all)	4/14	14	$9.91
`no-triage`	2/14	10	$8.95
`moat-only`	0/14	5	$23.48
`moat`	0/14	1	$17.22

System	Architecture	Routing
Datadog Bits AI	LLM classifier → SAST findings	Static (one model on every finding)
GitHub Sec Lab Taskflow	GPT-4 + 7 YAML subtasks	Static (every subtask runs every time)
Endor Labs AI SAST	Rules + reachability + LLM	Static (rules → reachability → LLM, fixed order)
Semgrep Assistant	LLM + per-target memories	Static (memory check → LLM, fixed order)
SAST-Genius (arxiv 2509.15433)	Static analysis + LLM hybrid	Static
SecureFixAgent (arxiv 2509.16275)	Bandit + LoRA LLM iterative loop	Static
VulnBERT (Pebblebed)	CodeBERT + features + cross-attention	Static (one model, one verdict)

[research] Dynamic triage routing: learn per-finding layer selection (paper-novel contribution) #113

Description

What

Motivation

Architecture

Training data

Why this is novel

The pwnkit-side analogy

What this issue is asking for

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions