Bernstein orchestrates short-lived CLI coding agents around a central task server. One command starts the whole orchestra.
- Python 3.12+ (macOS, Linux, Windows)
- At least one CLI coding agent installed and authenticated. Bernstein supports 18 adapters out of the box:
| Agent | Install |
|---|---|
| Aider | pip install aider-chat |
| Amp | brew install amp |
| Claude Code | npm install -g @anthropic-ai/claude-code |
| Codex CLI | npm install -g @openai/codex |
| Cody | Install Cody CLI |
| Continue | VS Code / JetBrains extension |
| Cursor | Cursor app |
| Gemini CLI | npm install -g @google/gemini-cli |
| Goose | Install Goose CLI |
| Kilo | npm install -g kilo |
| Kiro | Install Kiro CLI |
| Ollama | brew install ollama |
| OpenCode | Install OpenCode CLI |
| Qwen | npm install -g qwen-code |
| Roo Code | VS Code extension |
| Tabby | Install Tabby |
| Generic | Any CLI agent via generic adapter |
| IaC | Infrastructure-as-code adapter |
No agent yet? Run bernstein demo for a zero-config walkthrough.
# Fastest (Rust-based)
uv tool install bernstein
# Or any of these
pip install bernstein
pipx install bernstein
brew tap chernistry/bernstein && brew install bernstein
# Fedora / RHEL
sudo dnf copr enable alexchernysh/bernstein
sudo dnf install bernstein
# npm wrapper (requires Python 3.12+)
npx bernstein-orchestrator
# Verify
bernstein --versionEditor extensions: search "Bernstein" in VS Code or Cursor, or run code --install-extension alex-chernysh.bernstein.
git clone https://github.com/chernistry/bernstein && cd bernstein
uv venv && uv pip install -e ".[dev]"
uv run python scripts/run_tests.py -xRun this once inside your project directory:
cd my-project
bernstein initThis creates .sdd/ — a lightweight, file-based state directory. Nothing hidden, nothing magical:
.sdd/
├── backlog/
│ ├── open/ # YAML task specs waiting to be claimed
│ ├── claimed/ # Tasks currently being worked by an agent
│ ├── done/ # Completed tasks (automated sync)
│ └── closed/ # Completed tasks (manual scripts)
├── runtime/
│ ├── pids/ # PID metadata JSON files (for bernstein ps)
│ ├── signals/ # Agent signal files: WAKEUP, SHUTDOWN, HEARTBEAT
│ └── logs/ # Agent and server runtime logs
├── metrics/
│ ├── tasks.jsonl # Per-task timing and outcome data
│ ├── costs_*.json # Cost tracking by model
│ └── quality_scores.jsonl # Quality gate results
├── traces/ # Step-by-step agent decision traces
├── memory/ # Cross-session lessons and memory state
├── agents/ # Agent catalog (agency + custom sources)
├── caching/ # Prompt cache artifacts
└── config.yaml # Server port, model defaults, worker limits
The .sdd/ directory is your single source of truth. Back it up, inspect it, recover from it. No databases, no hidden state.
Pass a plain-English goal directly on the command line:
bernstein -g "Build a legal RAG system with hybrid retrieval and typed answers"Bernstein starts a manager agent, decomposes the goal into tasks, spawns worker agents in parallel, and verifies each result before marking it done.
Pre-define goals, tasks, and role policies in a YAML file:
bernstein # auto-discovers bernstein.yamlgoal: "Build a legal RAG system with hybrid retrieval and typed answers"
tasks:
- title: "Implement vector store"
role: backend
priority: 1
scope: medium
complexity: medium
- title: "Add BM25 sparse index"
role: backend
priority: 2
scope: small
complexity: low
depends_on: ["TSK-001"]Full seed file reference at templates/bernstein.yaml.
For projects with known stages, write a plan file with stages and steps — like an Ansible playbook:
bernstein run plan.yamlPlan files skip the manager decomposition step and go straight to execution. See templates/plan.yaml for the format.
bernstein blocks in a live terminal dashboard by default. Attach to a running session:
bernstein live # attach to running session
bernstein live --classic # legacy 3-column viewOpen a real-time browser UI:
bernstein dashboard # opens http://localhost:8052/dashboardbernstein status # task counts by status and role
bernstein ps # table of running agents (PID, role, model)
bernstein cost # spend breakdown by model and task
bernstein plan # show task backlog as a table
bernstein plan --export # export backlog to JSONbernstein cost # human-readable spend summary
bernstein cost --json # machine-readable JSON output
bernstein cost --share # generate shareable cost report linkbernstein logs # tail recent agent output
bernstein logs -f # follow mode (like tail -f)
bernstein logs -a claude # filter by agent namebernstein stop # graceful shutdown (agents save work first)
bernstein stop --timeout 3 # shorter timeout (default: 10s)
bernstein stop --force # hard kill (kill immediately)
bernstein checkpoint # snapshot session progress for later resume
bernstein checkpoint --goal # include current goal in snapshot
bernstein wrap-up # end session with structured summary and learnings
bernstein wrap-up --stop # wrap-up and stop in one commandGraceful stop sends a SHUTDOWN signal via .sdd/runtime/signals/. Agents finish their current subtask, write their state, and exit. Use --force only when agents are stuck.
bernstein debug # generate a diagnostic zip for bug reportsCollects logs, config (secrets redacted), and runtime state into a zip file suitable for attaching to GitHub issues.
bernstein doctor # check: adapters, API keys, ports, .sdd/ integrity
bernstein doctor --fix # auto-repair issues where possible
bernstein doctor --json # machine-readable outputDoctor checks:
- Python version (must be 3.12+)
- Installed CLI agents and their login status
- Server port availability
.sdd/directory structure and stale locks- MCP server reachability
- Storage backend connectivity (if postgres/redis configured)
bernstein recap # tasks, pass/fail, cost, duration
bernstein recap --as-json # JSON output for automationbernstein retro # full retrospective from all completed tasks
bernstein retro --since 24 # last 24 hours only
bernstein retro --print # print to stdout and write file
bernstein retro -o custom-report.md # custom output pathbernstein trace <task_id> # step-by-step agent decision trace
bernstein trace <task_id> --as-json # JSON output
bernstein replay <run_id> # deterministically re-run from a recorded trace
bernstein replay <run_id> --limit 5 # limit replay depthbernstein diff <task_id> # git diff for what an agent changed
bernstein diff <task_id> --stat # diffstat summary onlyAssign different roles to different tasks. Bernstein fans out to specialists:
tasks:
- title: "Implement auth middleware"
role: backend
priority: 1
- title: "Write integration tests"
role: qa
priority: 2
- title: "Update API documentation"
role: docs
priority: 2The manager decomposes and assigns backend, qa, and docs agents in parallel.
Cap spend to avoid surprise bills:
bernstein -g "Refactor the monolith" --budget 5.00 # stop after $5
bernstein --evolve --budget 2.00 # evolve mode with $2 capWhen the budget hits, Bernstein stops spawning new agents and wraps up.
Run without the live dashboard — useful for CI pipelines and overnight runs:
bernstein --headless -g "Refactor the auth module"
bernstein --headless # auto-discovers bernstein.yamlOutput goes to .sdd/runtime/ logs instead of the terminal.
Preview the task plan without spawning agents or writing state:
bernstein --dry-run -g "Build a REST API with auth"
bernstein --dry-run # preview plan from bernstein.yamlShows tasks, roles, priorities, and dependencies the manager would generate. Nothing written to disk.
bernstein --plan-only # generate plan, wait for approval
bernstein --from-plan saved_plan.yaml # execute a previously saved planTasks stay frozen until you approve them via POST /plans/{id}/approve.
Orchestrate across multiple git repositories:
# bernstein.yaml
goal: "Build the microservices platform"
workspace:
repos:
- name: backend
path: ./services/backend
- name: frontend
path: ./services/frontendbernstein workspace # show status of all repos
bernstein workspace clone # clone missing repos
bernstein workspace validate # check workspace healthBernstein can improve itself. Leave it running and it analyses its own metrics, proposes changes, sandboxes them, and auto-applies what passes:
bernstein --evolve # run indefinitely
bernstein --evolve --max-cycles 10 # stop after 10 cycles
bernstein --evolve --budget 5.00 # stop after $5 spent
bernstein --evolve --interval 120 # 2-minute cycles (default: 5 min)
bernstein --evolve --headless # unattended overnightLow-risk proposals (L0/L1) apply automatically. Higher-risk ones (L2+) save to .sdd/evolution/deferred.jsonl for human review:
bernstein evolve review # list pending proposals
bernstein evolve approve <ID> # approve one
bernstein evolve run # run the evolution loop manuallybernstein doctor # check which agents are installed
bernstein agents discover # auto-detect installed CLI agentsMake sure at least one agent is installed and you've run its login/auth flow (e.g. claude login, codex login).
Check the agent logs:
bernstein logs -f # follow all agent output
bernstein logs -a claude # filter by agent
tail -f .sdd/runtime/logs/*.log # raw log filesCommon causes: missing API key, expired auth token, or the agent's CLI returned an error on the prompt.
An agent likely crashed or was killed. Run janitor cleanup:
bernstein stop && bernstein start # restart with fresh state
bernstein doctor --fix # clear stale locksTasks in claimed/ that never completed will show up in the next bernstein recap.
A previous Bernstein session is still running:
bernstein stop --force # kill it
# Or find the PID:
cat .sdd/runtime/pids/server.json
kill $(cat .sdd/runtime/pids/server.json/pid)bernstein cost # see spend by model
bernstein cost --json | python -m json.tool # detailed breakdownReduce cost by:
- Setting a budget:
--budget 5.00 - Using cheaper models for simple tasks via
role_model_policy - Enabling plan mode to review tasks before spawning
- Mixing models: cheap agents for docs/tests, heavy models for architecture
bernstein logs -f # see what the agent produced
bernstein diff <task_id> # inspect the changesQuality gates check lint, type-check, and tests. If your project has no tests, the test gate may pass trivially — or fail if the test runner can't find tests. Add tests to get real signal.
SWE-Bench results in benchmarks/swe_bench/results/ are currently mock preview artifacts — not verified eval runs. To publish real numbers:
uv run python benchmarks/swe_bench/run.py eval --scenarios bernstein-sonnet --limit 50The 1.78× speedup headline comes from the simulation harness in benchmarks/run_benchmark.py — it models scheduling, not real agent execution. Treat it as a capacity planning estimate.