What
Train a small classifier that decides per finding which subset
of pwnkit's 11 triage layers to invoke, instead of statically
toggling them via env vars. The training signal is the per-layer
telemetry from #112.
This is the novel contribution worth publishing in the joint paper
with @quguanni (#67) — every existing hybrid system uses a static
pipeline. Nobody has shipped learned routing.
Motivation
Per #72's per-profile ablation:
| Profile |
Solved |
Findings |
Cost |
none (no triage at all) |
4/14 |
14 |
$9.91 |
no-triage |
2/14 |
10 |
$8.95 |
moat-only |
0/14 |
5 |
$23.48 |
moat |
0/14 |
1 |
$17.22 |
The triage layers help on some findings and hurt on others. Static
toggling at the scan level is the wrong granularity:
- For a high-confidence SQLi with a clear error-based signal, the
oracle is free, deterministic, and sufficient — running the
expensive adversarialDebate layer adds latency, cost, and
removes a real finding (per the ablation).
- For an ambiguous logic bug where the agent's confidence is 0.4,
the structured 4-step verify and the PoV gate add real signal.
- For a finding matching a known FP pattern in
triageMemories,
every other layer should be skipped — auto-reject.
The right granularity is per-finding, not per-scan. The routing
decision is itself a learnable function of the finding's features
and category.
Architecture
Finding arrives
↓
Layer 1 feature extractor (already shipped)
↓
45-dim vector + finding text
↓
Routing classifier (NEW)
↓
{
"tp_score": 0.72,
"layers_to_run": ["oracle", "structured_verify"],
"layers_to_skip": ["pov_gate", "adversarial_debate", "memories"],
"auto_accept_if_score_above": 0.95,
"auto_reject_if_score_below": 0.20
}
↓
For each layer in layers_to_run: invoke
For each layer in layers_to_skip: don't invoke
Aggregate verdicts → final triage decision
The routing head can be a multi-label classifier sitting alongside
the TP/FP head, both consuming the same 45-feature vector + text
embedding from the VulnBERT-style hybrid encoder.
Training data
Comes from #112 once it lands. Each finding becomes a row of:
{
"features": [...45 numbers],
"text": "Title: ...\nCategory: ...\n...",
"ground_truth_label": 1,
"layer_verdicts": [
{ "layer": "holding_it_wrong", "verdict": "pass", "duration_ms": 0.3 },
{ "layer": "evidence_gate", "verdict": "pass", "duration_ms": 0.1 },
{ "layer": "oracle", "verdict": "pass", "confidence": 0.95, "duration_ms": 4200 },
{ "layer": "structured_verify", "verdict": "pass", "confidence": 0.88, "duration_ms": 18400, "cost_usd": 0.12 },
{ "layer": "adversarial_debate", "verdict": "reject", "confidence": 0.62, "duration_ms": 31200, "cost_usd": 0.41 }
],
"final_verdict": "rejected"
}
The router's learning objective: predict which subset of layers
would have produced the same final verdict at the lowest cost.
A layer that never changes the verdict for findings of a given
shape is dead weight for that shape — the router learns to skip it.
Why this is novel
Existing hybrid vulnerability triage systems use static pipelines:
| System |
Architecture |
Routing |
| Datadog Bits AI |
LLM classifier → SAST findings |
Static (one model on every finding) |
| GitHub Sec Lab Taskflow |
GPT-4 + 7 YAML subtasks |
Static (every subtask runs every time) |
| Endor Labs AI SAST |
Rules + reachability + LLM |
Static (rules → reachability → LLM, fixed order) |
| Semgrep Assistant |
LLM + per-target memories |
Static (memory check → LLM, fixed order) |
| SAST-Genius (arxiv 2509.15433) |
Static analysis + LLM hybrid |
Static |
| SecureFixAgent (arxiv 2509.16275) |
Bandit + LoRA LLM iterative loop |
Static |
| VulnBERT (Pebblebed) |
CodeBERT + features + cross-attention |
Static (one model, one verdict) |
Nobody routes findings to different verification stages based on a
learned function of the finding's content. That's the gap.
The pwnkit-side analogy
pwnkit already has a precedent for this pattern, just not at the
triage layer:
payload_lookup — the agent doesn't get the 22KB JSFuck XSS
payload in its context by default. It calls payload_lookup when
it detects a letter/digit filter. The agent decides when to
unlock the capability.
dynamicPlaybooks — vulnerability-specific playbooks (SQLi,
SSTI, auth chain) get auto-injected at ~30% of the budget based
on what the agent observed during recon. The injection is
conditional on detected patterns, not static.
- EGATS — the agent expands attack-tree branches based on
evidence scores, not a fixed schedule.
All three are existing dynamic-capability-unlock patterns. The
routing classifier is the same idea applied to the triage stage:
unlock layers based on what the finding looks like, not on what
the operator set in env vars.
What this issue is asking for
Phase 1 — design doc only. No code yet. Specifically:
Phase 2 (separate issue, blocked on this) — actually train a model
and run the A/B.
Out of scope
Related
What
Train a small classifier that decides per finding which subset
of pwnkit's 11 triage layers to invoke, instead of statically
toggling them via env vars. The training signal is the per-layer
telemetry from #112.
This is the novel contribution worth publishing in the joint paper
with @quguanni (#67) — every existing hybrid system uses a static
pipeline. Nobody has shipped learned routing.
Motivation
Per #72's per-profile ablation:
none(no triage at all)no-triagemoat-onlymoatThe triage layers help on some findings and hurt on others. Static
toggling at the scan level is the wrong granularity:
oracle is free, deterministic, and sufficient — running the
expensive
adversarialDebatelayer adds latency, cost, andremoves a real finding (per the ablation).
the structured 4-step verify and the PoV gate add real signal.
triageMemories,every other layer should be skipped — auto-reject.
The right granularity is per-finding, not per-scan. The routing
decision is itself a learnable function of the finding's features
and category.
Architecture
The routing head can be a multi-label classifier sitting alongside
the TP/FP head, both consuming the same 45-feature vector + text
embedding from the VulnBERT-style hybrid encoder.
Training data
Comes from #112 once it lands. Each finding becomes a row of:
{ "features": [...45 numbers], "text": "Title: ...\nCategory: ...\n...", "ground_truth_label": 1, "layer_verdicts": [ { "layer": "holding_it_wrong", "verdict": "pass", "duration_ms": 0.3 }, { "layer": "evidence_gate", "verdict": "pass", "duration_ms": 0.1 }, { "layer": "oracle", "verdict": "pass", "confidence": 0.95, "duration_ms": 4200 }, { "layer": "structured_verify", "verdict": "pass", "confidence": 0.88, "duration_ms": 18400, "cost_usd": 0.12 }, { "layer": "adversarial_debate", "verdict": "reject", "confidence": 0.62, "duration_ms": 31200, "cost_usd": 0.41 } ], "final_verdict": "rejected" }The router's learning objective: predict which subset of layers
would have produced the same final verdict at the lowest cost.
A layer that never changes the verdict for findings of a given
shape is dead weight for that shape — the router learns to skip it.
Why this is novel
Existing hybrid vulnerability triage systems use static pipelines:
Nobody routes findings to different verification stages based on a
learned function of the finding's content. That's the gap.
The pwnkit-side analogy
pwnkit already has a precedent for this pattern, just not at the
triage layer:
payload_lookup— the agent doesn't get the 22KB JSFuck XSSpayload in its context by default. It calls
payload_lookupwhenit detects a letter/digit filter. The agent decides when to
unlock the capability.
dynamicPlaybooks— vulnerability-specific playbooks (SQLi,SSTI, auth chain) get auto-injected at ~30% of the budget based
on what the agent observed during recon. The injection is
conditional on detected patterns, not static.
evidence scores, not a fixed schedule.
All three are existing dynamic-capability-unlock patterns. The
routing classifier is the same idea applied to the triage stage:
unlock layers based on what the finding looks like, not on what
the operator set in env vars.
What this issue is asking for
Phase 1 — design doc only. No code yet. Specifically:
features? small MLP head on the VulnBERT-web encoder? per-category
gating tree?). Defend the choice with the constraint that
inference must be sub-millisecond per finding.
over
(layer, would_produce_same_final_verdict)is theobvious starting point but needs to be defended against
simpler baselines.
the right shape but the units need pinning down.
PWNKIT_FEATURE_LEARNED_ROUTER),A/B harness against the existing static stack, the comparison
points (none, default, moat, router).
and how much of it (are 100 labeled findings enough? 10k?)
Phase 2 (separate issue, blocked on this) — actually train a model
and run the A/B.
Out of scope
call (research: joint paper plan — agentic exploitation as a labeling oracle #67).
about routing the existing layers, not redesigning them.
Related
on some findings, hurt on others)