A local-first, multi-agent system that generates high‑quality, university‑grade lecture and workshop outlines (with supporting materials) from a single topic prompt. Agents are defined with Pydantic‑AI models and coordinated by a custom Python orchestrator. The system integrates OpenAI o4‑mini/o3 models, web search via Tavily, and a React‑based UX. Full state, citations, logs, and intermediates persist in SQLite or Postgres. Observability is handled by Logfire. Exports include Markdown, DOCX, and PDF.
- Lecture Builder Agent
- Multi-Agent Workflow: Researcher, Planner, Learning Advisor, Content Weaver, Editor, Final Reviewer and Exporter nodes coordinated by a lightweight custom orchestrator.
- Streaming UI: Token-level draft streaming with diff highlights; action/reasoning log streaming via SSE.
- Robust Citations: Tavily search, citation metadata stored in SQLite, Creative Commons and university domain filtering.
- Local-First: Operates offline using cached corpora and fallback to local dense retrieval.
- Flexible Session Model: Lectures and workshops share a composable schema with optional pedagogical styles and learning methods.
- Progressive Document Graph: Research results, modules, and slide details are captured in a DAG so individual nodes can be revised without regenerating the whole session.
- Flexible Exports: Markdown (canonical), DOCX (python-docx), PDF (WeasyPrint), with cover page, TOC, and bibliography.
- Audit & Governance: Immutable action logs, SHA‑256 state hashes, role‑based access, database encryption.
The system comprises:
-
Custom Orchestrator: Manages typed
Stateobjects through nodes and edges defined with Pydantic models. Handles checkpointing in SQLite. -
Agents:
- Researcher-Web: Executes parallel Tavily searches, dedupes and ranks sources.
- Curriculum Planner: Defines learning objectives and module structure.
- Learning Advisor: Converts pedagogical intent into Bloom-aligned lesson plans.
- Content Weaver: Generates slide decks and assessments from lesson plans.
- Editor: Reviews narrative coherence and learning goals.
- Final Reviewer: Scores overall consistency and completeness.
- Exporter: Renders final deliverables.
-
Web UX: React + Primer, SSE-driven, panels for document, log, sources, and controls.
-
Storage Layer: SQLite or Postgres for state, logs, citations via repository abstraction.
-
Export Pipeline: Pandoc-ready Markdown, python-docx, WeasyPrint PDF.
Agents emit events over several orchestrator channels that the frontend can subscribe to via Server-Sent Events:
messages– token-level content such as LLM outputs.debug– diagnostic information and warnings.values– structured state snapshots.updates– citation and progress updates.
All stream connections require a short-lived JWT passed as a query parameter.
Fetch a token from GET /stream/token and connect using
/stream/<channel>?token=<JWT> or /stream/<workspace>/<channel>?token=<JWT>.
- Python 3.11 or later
- Node.js 18+ (for frontend)
poetry(recommended) orpipenv- OpenAI API key
- Tavily API key (optional)
-
Clone the repository:
git clone https://github.com/your-org/lecture-builder-agent.git cd lecture-builder-agent -
Backend dependencies:
poetry install
-
Frontend dependencies:
cd frontend npm install -
(Optional) Build the frontend locally:
./scripts/build_frontend.sh
Docker images build the frontend automatically, so this step is only required when running the app directly on your host. The command generates static assets in
frontend/distthat the FastAPI server serves.
Configuration is managed with pydantic-settings and sourced from environment
variables (e.g. a .env file):
cp .env.example .env
# Edit .env:
# OPENAI_API_KEY=sk-...
# TAVILY_API_KEY=...
# LOGFIRE_API_KEY=...
# LOGFIRE_PROJECT=...
# MODEL=openai:o4-mini
# DATA_DIR=./workspace
# OFFLINE_MODE=false
# ENABLE_TRACING=true
# ALLOWLIST_DOMAINS=["wikipedia.org",".edu",".gov"]
# ALERT_WEBHOOK_URL=https://example.com/hookFor a single-command startup that builds the frontend, waits for the database, applies migrations, and serves the app with built assets, use Docker Compose:
docker compose up --buildThen open your browser at http://localhost:8000.
The frontend seeds a development JWT (role: user) in localStorage so API
requests work without a login flow:
localStorage.setItem(
"jwt",
"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyb2xlIjoidXNlciJ9.hM2ErG_O2e4d9YiCeVoTbiRDoo4ziDiIPfDFE40GUlg",
);Replace the token with one signed using your JWT_SECRET for custom setups.
To run the services directly on your host for development:
-
Start the backend (FastAPI + custom orchestrator):
./scripts/run.sh [--offline]
This helper invokes Uvicorn with:
poetry run uvicorn web.main:create_app --reload
-
Start the frontend:
cd frontend npm run devThe Vite dev server proxies
/apiand/streamrequests tohttp://localhost:8000, allowing the frontend to call those paths without specifying a host. -
Open your browser at
http://localhost:3000and enter a topic to begin.
- Custom orchestrator implemented in
src/core/orchestrator.pyand leveraging Pydantic‑AI models for agent interfaces. - Checkpointing handled in
src/core/checkpoint.pywith SQLite or Postgres backends. - Edge policies enforce confidence thresholds and retry loops.
- SearchClient abstraction in
src/agents/researcher_web.pysupporting Tavily - Citation objects stored in
state.citationstable. - Filtering by domain allowlist and SPDX license checks.
- OpenAIFunctionCaller utilizes function-call pattern for outline JSON.
- Schema enforcement in
schemas/outline.json. - Token streaming via the orchestrator's
messageschannel.
- Editor records narrative and objective feedback.
- FinalReviewer outputs a
QAReportobject with an overall score. - Integration tests in
tests/quality/.
- SQLite schema managed in
src/persistence/. - Parquet blobs for document versions.
- Optional Postgres: swap
storage/sqlite.pywithstorage/postgres.py.
- React app (
frontend/src/): Panels for Document, Log, Sources. - SSE client in
frontend/src/services/stream.ts. - Diff highlighting via
diff-match-patch. - Primer CSS provides the base styling for components and layout.
- Command palette (Cmd/Ctrl+K) for quick Run, Retry, and Export actions.
- Markdown: direct from outline JSON → Markdown converter.
- DOCX: generated via
src/export/docx_exporter.py. - PDF: headless WeasyPrint configured in
src/export/pdf_exporter.py.
All endpoints, except /healthz and /readyz, are namespaced under /api and
require a JWT via the Authorization: Bearer <token> header. Tokens are signed
using JWT_SECRET and validated with the HS256 algorithm by default.
| Code | Description |
|---|---|
| 401 | Missing token or failed signature check |
| 403 | Token valid but caller lacks required role |
- Researcher collects web snippets and extracts keywords.
- Planner turns research into learning outcomes and pedagogical intent.
- Learning Advisor expands topics into lesson plans aligned with Bloom's taxonomy.
- Content Weaver drafts slides, visualization notes, and speaker notes.
- Editor reviews the material and may trigger targeted rewrites.
- Final Check scores overall consistency and completeness.
Each stage updates the DocumentDAG so specific nodes—such as a slide's copy or visualization notes—can be refined without regenerating the entire session.
from core.orchestrator import GraphOrchestrator, build_main_flow
from core.state import State
state = State(prompt="Explain quantum computing basics")
orch = GraphOrchestrator(build_main_flow())
await orch.run(state)from agents.models import WeaveResult
model = WeaveResult(
title="Intro to AI",
learning_objectives=["Define artificial intelligence"],
duration_min=60,
session_type="workshop",
pedagogical_styles=["flipped"],
learning_methods=["case study"],
)- SQLite (default): single
workspace.dbfile inDATA_DIR. - Postgres: set
DATABASE_URLand installpsycopg2; update config.
Schema changes are managed with Alembic. After pulling new code, apply migrations to your workspace database:
alembic upgrade headTo create a new migration after modifying models:
alembic revision --autogenerate -m "add new table"The command reads alembic.ini and writes migration scripts to
migrations/versions/.
Run the CLI to generate lecture material while persisting state to the workspace database. Each invocation creates a new workspace identified by a slug of the topic and timestamp:
poetry run python -m cli.generate_lecture "Intro to Quantum"The resulting workspace data is stored in DATA_DIR/workspace.db and exports
are written alongside the generated Markdown file.
All runtime configuration is supplied via environment variables or a .env
file. Secret tokens (API keys, project identifiers, etc.) must be sourced from
your secret manager and never hard-coded. The legacy LangSmith variables have
been replaced by Logfire's settings.
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
API key for OpenAI | (required) |
TAVILY_API_KEY |
API key for Tavily search | |
LOGFIRE_API_KEY |
API key for Logfire | |
LOGFIRE_PROJECT |
Logfire project identifier | |
MODEL |
LLM provider and model (openai:o4-mini) |
openai:o4-mini |
DATA_DIR |
Path for SQLite DB, cache, logs | (required) |
FRONTEND_DIST |
Directory containing built frontend assets | frontend/dist |
DATABASE_URL |
SQLAlchemy connection string | sqlite:///${DATA_DIR}/workspace.db |
OFFLINE_MODE |
Run without external network calls | false |
ENABLE_TRACING |
Enable Logfire tracing instrumentation | true |
ALLOWLIST_DOMAINS |
JSON list of citation-allowed domains | ["wikipedia.org", ".edu", ".gov"] |
ALERT_WEBHOOK_URL |
Optional webhook for alert notifications | |
JWT_SECRET |
HMAC secret for signing JWTs | (required) |
JWT_ALGORITHM |
JWT signing algorithm | HS256 |
- Logfire: Handles structured JSON logs and spans. Use
core.logging.get_logger(job_id, user_id)to bind contextual identifiers for correlation. OpenTelemetry metrics track request counts, active SSE clients, and export durations, exposed at/metricsfor Prometheus scraping.
- Unit tests:
pytestintests/ - Integration tests: mock orchestrator runs in CI.
- Performance tests:
k6scripts inperformance/ - Benchmarks:
scripts/benchmark_pipeline.pycompares the current pipeline against the previous implementation to surface regressions. - CLI run output:
./scripts/cli.sh "<topic>" [--portfolio <portfolio> ...]writes the generated lecture in Markdown torun_output_<portfolio>.mdfor each portfolio. - Accessibility: Lighthouse and axe-core audits configured in CI pipeline.
- Metrics exposed via OpenTelemetry at
/metrics. - Alerts: configurable thresholds for latency, error rates, unsupported-claim rate.
- Audit: verify hash chain integrity using CLI tools.
Basic JWT authentication is in place for remote deployments; requests sent to
http://localhost or http://127.0.0.1 bypass authentication entirely. Role-
based authorization is planned for v2.
Refer to ROADMAP.md for sprint plans and milestones.
We welcome contributions! Please review:
This project is licensed under the MIT License.
- DESIGN.md — Detailed design decisions and component diagrams
- ARCHITECTURE.md — Data model, state graph definitions, and sequence diagrams
- SECURITY.md — Security posture, secrets management, and encryption options
- TEST_PLAN.md — Test cases, performance benchmarks, and QA checklist
- GOVERNANCE.md — SLOs, metrics dashboard configuration, and audit procedures
- CLI_REFERENCE.md — Commands for running, testing, and maintenance tasks
- MIGRATION_GUIDE.md — Rationale and API changes for the new orchestrator and models