Skip to content

Technical Architecture

Marty McEnroe edited this page Feb 19, 2026 · 1 revision

Technical Architecture

AssemblyZero is a multi-agent orchestration platform built on LangGraph state machines, with cross-model adversarial review (Claude drafts, Gemini reviews) and RAG-like codebase intelligence.


LLM Invocation Patterns

AssemblyZero uses four LLM providers across three distinct roles:

Provider Class Use Case Cost Model
Claude CLI (claude -p) ClaudeCLIProvider Drafting, implementation, all creative work Free (Max subscription)
Anthropic API AnthropicProvider Automatic fallback when CLI fails Per-token ($5-25/M output)
Fallback FallbackProvider CLI first (180s) → API (300s) transparent failover Free first, paid if needed
Gemini GeminiProvider Adversarial review only (different model family) Free (API quota)

Why claude -p Instead of the API?

The Claude CLI (claude -p) is invoked as a subprocess with strict flags:

claude -p --output-format json --tools "" --strict-mcp-config --model claude-opus-4-6
  • --tools "" — Disables all built-in tools (no file ops, no web access)
  • --strict-mcp-config — Disables MCP tool loading
  • --output-format json — Returns structured response with token counts and cost

This gives us free LLM calls via the Max subscription, deterministic side-effect-free behavior, and structured token accounting. The API exists only as a paid fallback for resilience.

Cross-Model Review

The key architectural decision: Claude and Gemini are different model families. When Gemini reviews Claude's work, it brings genuinely different training biases and blind spots. This is adversarial verification, not self-review.

See: ADR 0208: LLM Invocation Strategy


LangGraph State Machines

All workflows are implemented as LangGraph StateGraph instances with typed state, conditional edges, and SQLite checkpointing.

Five Workflows

Workflow Nodes Purpose Key Conditional Edge
Issue 7 GitHub issue → structured brief Review verdict routes to revision or filing
Requirements (LLD) 10 Issue → approved Low-Level Design Gemini verdict routes to revision or approval
Implementation Spec 7 LLD → concrete implementation instructions Completeness check routes to revision or review
TDD Implementation 13 Spec → code + tests + PR Test results route to fix loop or report generation
Scout Variable External intelligence gathering Search results route to deeper research or synthesis

Requirements Workflow Graph

The most complex workflow, showing the full draft-review-revise cycle:

graph TD
    N0["N0: Load Input<br/>(issue body, config)"]
    N0b["N0b: Analyze Codebase<br/>(file scanning, patterns)"]
    N1["N1: Generate Draft<br/>(Claude drafts LLD)"]
    N1_5["N1.5: Mechanical Validation<br/>(format, structure checks)"]
    N1b["N1b: Validate Test Plan<br/>(requirement refs in scenarios)"]
    N2["N2: Human Gate - Draft<br/>(optional human review)"]
    N3["N3: Review<br/>(Gemini adversarial review)"]
    N4["N4: Human Gate - Verdict<br/>(optional override)"]
    N5["N5: Finalize<br/>(write approved LLD)"]

    N0 --> N0b
    N0b --> N1
    N1 --> N1_5
    N1_5 --> N1b
    N1b --> N2
    N2 --> N3
    N3 --> N4
    N4 -->|"APPROVE"| N5
    N4 -->|"BLOCK/REVISE"| N1
Loading

Checkpointing

Each workflow uses SqliteSaver from langgraph.checkpoint.sqlite:

from langgraph.checkpoint.sqlite import SqliteSaver

with SqliteSaver.from_conn_string(str(db_path)) as memory:
    app = workflow.compile(checkpointer=memory)
    config = {"configurable": {"thread_id": f"issue-{issue_number}"}}
    result = app.invoke(initial_state, config)
  • Thread ID = issue number → enables resume after interruption
  • Per-issue databases prevent deadlocks during concurrent workflows
  • Recursion limits scale with iteration count: (max_iters × edges_per_loop) + buffer

Codebase Intelligence (RAG-Like)

AssemblyZero uses a RAG-like pattern for codebase understanding, but without vector embeddings. Instead, it uses deterministic techniques:

How It Works

Technique What It Does Where Used
AST Summarization Extracts function signatures, class hierarchies, imports summarize_file_for_context() in N0b node
Pattern Scanning Finds similar implementations in codebase by structure Spec workflow N1 (analyze codebase)
Token Budget Management Trims context to fit model limits (~60KB target) All draft nodes
Section-Aware Truncation Preserves important sections, trims verbose ones Implementation spec context injection

The Flow

graph LR
    Issue["Issue/LLD"]
    Scan["Scan Codebase<br/>(files-to-modify table)"]
    AST["AST Summarize<br/>(signatures, imports)"]
    Similar["Find Similar<br/>(pattern matching)"]
    Budget["Token Budget<br/>(trim to ~60KB)"]
    Draft["Inject into<br/>Draft Prompt"]

    Issue --> Scan
    Scan --> AST
    Scan --> Similar
    AST --> Budget
    Similar --> Budget
    Budget --> Draft
Loading

Why Not Vector Embeddings?

  • Deterministic: Same input always produces same context (auditable)
  • No infrastructure: No vector database to deploy or maintain
  • Token-efficient: AST summaries are compact; full file contents would exceed budgets
  • Sufficient: For a single-repo workflow, structural analysis covers the use cases

Full Pipeline Flow

Idea → GitHub Issue → LLD (Design) → Implementation Spec → Code + Tests + PR
         │                │                   │                    │
         ▼                ▼                   ▼                    ▼
    Issue Workflow    Requirements        Spec Workflow      TDD Implementation
    (7 nodes)        Workflow             (7 nodes)          (13 nodes)
                     (10 nodes)

Each stage produces an artifact that feeds the next:

Stage Input Output Reviewed By
Issue Idea description Structured GitHub issue Optional Gemini
Requirements Issue body Approved LLD Gemini (mandatory)
Spec Approved LLD + codebase analysis Implementation spec with line-level instructions Gemini (mandatory)
Implementation Spec (or LLD) Working code, passing tests, PR Gemini (mandatory)

Governance Gates

Every stage transition requires a Gemini review verdict:

  • APPROVE → advance to next stage
  • BLOCK → loop back for revision (up to max iterations)
  • Human override → orchestrator can waive gates for hotfixes

These gates are enforced by the LangGraph state machine — there is no edge that bypasses review.


Key Source Files

File Contents
assemblyzero/core/llm_provider.py All four LLM providers, LLMCallResult, get_provider() factory
assemblyzero/workflows/requirements/graph.py Requirements workflow StateGraph (10 nodes)
assemblyzero/workflows/implementation_spec/graph.py Spec workflow StateGraph (7 nodes)
assemblyzero/workflows/issue/graph.py Issue workflow StateGraph (7 nodes)
tools/run_requirements_workflow.py Requirements workflow CLI runner with SqliteSaver
tools/run_implement_from_lld.py TDD implementation workflow CLI runner
docs/adrs/0208-llm-invocation-strategy.md ADR for LLM invocation decisions

Related

Clone this wiki locally