Skip to content

[research] Dynamic triage routing: learn per-finding layer selection (paper-novel contribution) #113

@peaktwilight

Description

@peaktwilight

What

Train a small classifier that decides per finding which subset
of pwnkit's 11 triage layers to invoke, instead of statically
toggling them via env vars. The training signal is the per-layer
telemetry from #112.

This is the novel contribution worth publishing in the joint paper
with @quguanni (#67) — every existing hybrid system uses a static
pipeline. Nobody has shipped learned routing.

Motivation

Per #72's per-profile ablation:

Profile Solved Findings Cost
none (no triage at all) 4/14 14 $9.91
no-triage 2/14 10 $8.95
moat-only 0/14 5 $23.48
moat 0/14 1 $17.22

The triage layers help on some findings and hurt on others. Static
toggling at the scan level is the wrong granularity:

  • For a high-confidence SQLi with a clear error-based signal, the
    oracle is free, deterministic, and sufficient — running the
    expensive adversarialDebate layer adds latency, cost, and
    removes a real finding (per the ablation).
  • For an ambiguous logic bug where the agent's confidence is 0.4,
    the structured 4-step verify and the PoV gate add real signal.
  • For a finding matching a known FP pattern in triageMemories,
    every other layer should be skipped — auto-reject.

The right granularity is per-finding, not per-scan. The routing
decision is itself a learnable function of the finding's features
and category.

Architecture

Finding arrives
    ↓
Layer 1 feature extractor (already shipped)
    ↓
45-dim vector + finding text
    ↓
Routing classifier (NEW)
    ↓
{
  "tp_score": 0.72,
  "layers_to_run": ["oracle", "structured_verify"],
  "layers_to_skip": ["pov_gate", "adversarial_debate", "memories"],
  "auto_accept_if_score_above": 0.95,
  "auto_reject_if_score_below": 0.20
}
    ↓
For each layer in layers_to_run: invoke
For each layer in layers_to_skip: don't invoke
Aggregate verdicts → final triage decision

The routing head can be a multi-label classifier sitting alongside
the TP/FP head, both consuming the same 45-feature vector + text
embedding from the VulnBERT-style hybrid encoder.

Training data

Comes from #112 once it lands. Each finding becomes a row of:

{
  "features": [...45 numbers],
  "text": "Title: ...\nCategory: ...\n...",
  "ground_truth_label": 1,
  "layer_verdicts": [
    { "layer": "holding_it_wrong", "verdict": "pass", "duration_ms": 0.3 },
    { "layer": "evidence_gate", "verdict": "pass", "duration_ms": 0.1 },
    { "layer": "oracle", "verdict": "pass", "confidence": 0.95, "duration_ms": 4200 },
    { "layer": "structured_verify", "verdict": "pass", "confidence": 0.88, "duration_ms": 18400, "cost_usd": 0.12 },
    { "layer": "adversarial_debate", "verdict": "reject", "confidence": 0.62, "duration_ms": 31200, "cost_usd": 0.41 }
  ],
  "final_verdict": "rejected"
}

The router's learning objective: predict which subset of layers
would have produced the same final verdict at the lowest cost.
A layer that never changes the verdict for findings of a given
shape is dead weight for that shape — the router learns to skip it.

Why this is novel

Existing hybrid vulnerability triage systems use static pipelines:

System Architecture Routing
Datadog Bits AI LLM classifier → SAST findings Static (one model on every finding)
GitHub Sec Lab Taskflow GPT-4 + 7 YAML subtasks Static (every subtask runs every time)
Endor Labs AI SAST Rules + reachability + LLM Static (rules → reachability → LLM, fixed order)
Semgrep Assistant LLM + per-target memories Static (memory check → LLM, fixed order)
SAST-Genius (arxiv 2509.15433) Static analysis + LLM hybrid Static
SecureFixAgent (arxiv 2509.16275) Bandit + LoRA LLM iterative loop Static
VulnBERT (Pebblebed) CodeBERT + features + cross-attention Static (one model, one verdict)

Nobody routes findings to different verification stages based on a
learned function of the finding's content. That's the gap.

The pwnkit-side analogy

pwnkit already has a precedent for this pattern, just not at the
triage layer:

  • payload_lookup — the agent doesn't get the 22KB JSFuck XSS
    payload in its context by default. It calls payload_lookup when
    it detects a letter/digit filter. The agent decides when to
    unlock the capability.
  • dynamicPlaybooks — vulnerability-specific playbooks (SQLi,
    SSTI, auth chain) get auto-injected at ~30% of the budget based
    on what the agent observed during recon
    . The injection is
    conditional on detected patterns, not static.
  • EGATS — the agent expands attack-tree branches based on
    evidence scores
    , not a fixed schedule.

All three are existing dynamic-capability-unlock patterns. The
routing classifier is the same idea applied to the triage stage:
unlock layers based on what the finding looks like, not on what
the operator set in env vars.

What this issue is asking for

Phase 1 — design doc only. No code yet. Specifically:

  • Pick the routing model class (multi-label LR over the 45
    features? small MLP head on the VulnBERT-web encoder? per-category
    gating tree?). Defend the choice with the constraint that
    inference must be sub-millisecond per finding.
  • Pick the training objective. Multi-label binary cross-entropy
    over (layer, would_produce_same_final_verdict) is the
    obvious starting point but needs to be defended against
    simpler baselines.
  • Define the evaluation metric. Cost-saved-per-recall-lost is
    the right shape but the units need pinning down.
  • Define the rollout: feature flag (PWNKIT_FEATURE_LEARNED_ROUTER),
    A/B harness against the existing static stack, the comparison
    points (none, default, moat, router).
  • Identify what training data we need beyond [research] Per-layer triage telemetry: log layerVerdicts on every finding #112's telemetry
    and how much of it (are 100 labeled findings enough? 10k?)

Phase 2 (separate issue, blocked on this) — actually train a model
and run the A/B.

Out of scope

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions