Let autonomous AI agents hack you before attackers do.
Fully autonomous agentic pentesting framework.
Docs · Website · Blog · Benchmark · Triage
Fully autonomous agentic pentesting for web apps, AI/LLM apps, package ecosystems, and source code.
This README is the fast path. The detailed command reference, configuration, architecture notes, recipes, and benchmark breakdowns live in the docs site.
docker run --rm -e OPENROUTER_API_KEY=$KEY \
ghcr.io/peaktwilight/pwnkit:latest scan --target https://example.comIf you use Azure OpenAI instead, also pass AZURE_OPENAI_BASE_URL and AZURE_OPENAI_MODEL. For the Responses API, the Azure base URL should include /openai/v1.
The image ships with Node 20, Playwright/Chromium, and the standard pentest toolbox (sqlmap, nmap, nikto, gobuster, ffuf, hydra, john, …) preinstalled.
# Scan an AI / LLM endpoint
npx pwnkit-cli scan --target https://example.com/api/chat
# Pentest a web app
npx pwnkit-cli scan --target https://example.com --mode web
# White-box scan with source code access
npx pwnkit-cli scan --target https://example.com --repo ./source
# Audit a package
npx pwnkit-cli audit lodash
# Review source code
npx pwnkit-cli review ./my-app
# Auto-detect — just give it a target
npx pwnkit-cli https://example.comPrefer Bun? Swap npx for bunx — same commands, same flags, zero-install, noticeably faster cold start. pwnkit-cli is pure-JS with a WASM SQLite core, so there are no native bindings to rebuild on either runtime.
Global install:
npm i -g pwnkit-cli
# or
bun add -g pwnkit-cliscantargets AI / LLM apps, web apps, REST / OpenAPI APIs, and MCP servers.auditinstalls and inspects packages acrossnpm,pypi,cargo, andociwith ecosystem-specific prep, static analysis, and AI review.reviewperforms deep source-code security review on a local repo or Git URL.triage-dataturns benchmark runs and verified findings into labeled JSONL for triage-model training.cloud-sinkcan stream findings and final reports to an orchestrator withPWNKIT_CLOUD_SINK+PWNKIT_CLOUD_SCAN_ID.dashboard,history,findings, andtriageprovide local persistence and review workflows.
- Shell-first web pentesting. The agent uses
bash, writes scripts, and chains tools like a human pentester instead of being trapped in a small HTTP-tool DSL. - Blind verification. Findings are independently re-exploited before they are reported.
- Docs-backed benchmark transparency. The current benchmark details live in the docs and raw artifacts under
packages/benchmark/results.
- Getting Started
- Adversarial evals
- Commands
- Configuration
- Recipes
- Architecture
- Triage Pipeline
- Benchmark
- XBOW retained artifact-backed aggregate: 99/104 = 95.2%
- XBOW retained artifact-backed black-box: 74/104 = 71.2%
- XBOW historical mixed local+CI publication: 95/104 aggregate and 90/104 black-box
- Cybench: 8/10 = 80%
- AI / LLM regression set: 10/10
The benchmark docs page is the canonical benchmark surface. It distinguishes the current retained artifact-backed tally from the older mixed local+CI publication line and lists the remaining challenge-set mismatches explicitly.
- uses: PwnKit-Labs/pwnkit@main
with:
mode: review
path: .
format: sarif
env:
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}git clone https://github.com/PwnKit-Labs/pwnkit.git
cd pwnkit
pnpm install
pnpm lint
pnpm testSee CONTRIBUTING.md.
Open-source adversarial security for the agentic AI era. pwnkit is one piece of the open-source PwnKit Labs stack:
