Skip to content

research: joint paper plan — agentic exploitation as a labeling oracle #67

@peaktwilight

Description

@peaktwilight

What

Tracking issue for the joint research paper between Doruk (pwnkit) and Guanni Qu (VulnBERT, Pebblebed Research Resident). Working title:

Agentic exploitation as a labeling oracle for vulnerability triage models

The thesis

The training data for a high-precision vulnerability triage classifier only exists if you have BOTH:

  1. an agentic exploit harness running at scale (pwnkit's role) that produces `(finding, attempted-PoC, real-or-fake)` tuples
  2. a labeled-classifier training pipeline (VulnBERT-class hybrid features-plus-embeddings architecture) that can consume those tuples and produce a small specialized model

Neither half exists meaningfully without the other. The dataset itself — not the model — is the moat. This is the inverse of the standard "dataset is given, model is the asset" framing in security ML.

What's already shipped

  • `triage-data-collector.ts` produces JSONL with text + 45-feature vector + label + provenance (commit 5bc8503)
  • `feature-extractor.ts` 45-feature handcrafted vector explicitly inspired by VulnBERT's hybrid architecture (the file docstring says so)
  • npm-bench preserves raw findings so the data can be collected post-hoc (commit 5bc8503)
  • 18 vitest tests covering the collector and feature extraction (commit 5bc8503)

What's still needed for the paper

  • Grow npm-bench from 81 → ~250 packages so the labeled set is publication-scale
  • First full dataset dump committed under `packages/benchmark/results/triage-dataset-v1.jsonl` (or shipped to Hugging Face)
  • Baseline classifier (CodeBERT-scale or XGBoost on the 45-feature vector alone) trained and reported
  • Hybrid classifier (handcrafted features + neural embeddings, VulnBERT architecture) trained and reported
  • Joint pipeline: pwnkit raw → classifier → triaged finding, benchmarked against pwnkit alone on `npm-bench` (currently F1 0.444)
  • arXiv preprint (cs.CR)
  • Hugging Face: dataset card + model card
  • Blog post on pwnkit.com / opensoar.app

Honest negative results to include

  • Handcrafted features designed for web-exploit findings (SQL errors, payload reflection, stack traces) are mostly zero on npm supply-chain findings — a publishable insight on domain transferability
  • `label_source: package_verdict` is coarser than per-finding labels and produces some false-FPs on safe packages with legitimate findings — quantify the noise floor

Sequencing

This issue intentionally has no deadline. The work is the moat and the research direction is the long game.

Why this issue exists

So the work is in the open and not in someone's notebook. Discoverable, citable, mergeable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions