The tutoring platform is the core product: learners pick a lesson, open a chat, and study through guided conversation. The tutor asks before it answers, uses diagrams and quizzes inline, and can search the web for evidence. There is also an open-ended Explore mode where learners can ask anything without a fixed lesson context.
This document covers everything except the course creator (see ARCHITECTURE-COURSE-CREATOR.md). For the project-wide reference, see PROJECT.md.
User sends message
│
▼
POST /tutor/chat (SSE streaming response)
│
├── Load lesson, progress, sessions from DB
├── Build TutorContext (lesson content, reference_kb, images, curriculum)
├── Select system prompt (v2 for lessons, explore for open-ended)
│
▼
┌──────────────────────────────────────────────────────────┐
│ Agent Loop (up to 5 tool-call rounds) │
│ │
│ LLM streams tokens ──► SSE text deltas │
│ │ │
│ ├── finish_reason: stop → done │
│ ├── finish_reason: length → truncation notice │
│ └── finish_reason: tool_calls │
│ │ │
│ ▼ │
│ execute_tool() │
│ ┌──────────────────────────────────────────────┐ │
│ │ search_web (Perplexity) │ │
│ │ get_lesson_context (DB) │ │
│ │ get_lesson_transcript (DB) │ │
│ │ get_lesson_reference_kb (DB) │ │
│ │ get_curated_resources (wiki) │ │
│ │ get_relevant_images (wiki) │ │
│ │ get_user_progress (context) │ │
│ └──────────────────────────────────────────────┘ │
│ append tool result → next LLM call │
└──────────────────────────────────────────────────────────┘
│
▼
_stream_with_save: persist user + assistant messages
Update TutorSession.updated_at
The response is a streaming HTTP body (text/event-stream). Each frame is data: {json}\n\n.
| Event | Payload | When |
|---|---|---|
text |
{"delta": "..."} |
Each streamed LLM token |
tool_call |
{"name": "search_web"} |
Before tool execution |
tool_result |
{"name": "...", "result": {...}} |
After tool returns |
error |
{"message": "..."} |
API or stream failure |
done |
{} |
End of response |
The response header X-Session-Id carries the session ID so the client can continue the same thread.
backend/app/agent/agent_loop.py
MAX_STEPS = 5— outer loop cap on LLM rounds. Each round ends instop,length, ortool_calls.- Prompt selection:
build_exploration_prompt(context)whencontext.exploration_modeis true, otherwisebuild_system_prompt_v2(context). - Tool gating: full
TUTOR_TOOLSlist unlesssettings.search_enabledis false, in which casesearch_webis stripped. - API call:
client.chat.completions.create(stream=True)withmodel=settings.llm_model,max_tokens=settings.llm_max_tokens,temperature=settings.llm_temperature. - Tool call assembly: tool call fragments are reassembled by
indexfrom streamed chunks, then each tool is executed and its result appended as arole: toolmessage before the next LLM call.
Seven tools are available to the agent:
| Tool | What it does | Notes |
|---|---|---|
search_web |
Web search via inference API (Perplexity endpoint) or direct Perplexity API | Delegates to course_enricher._search for API routing; results capped to 5; arXiv URLs rewritten to ar5iv |
get_lesson_context |
Fetches content, summary, and concepts for any lesson by slug | For cross-lesson references |
get_lesson_transcript |
Returns video transcript for current or specified lesson | Defaults to current lesson if no ID given |
get_lesson_reference_kb |
Returns the reference KB text for any lesson | The grounding article |
get_curated_resources |
Finds matching resources from the pedagogy wiki by concept overlap | Returns up to 8 entries with URLs, descriptions, and why-relevant |
get_relevant_images |
Concept-matched image lookup from wiki image metadata | Returns top 6 with paths, captions, and teaching guidance |
get_user_progress |
Returns completed lesson titles and counts | Reads from TutorContext built at request time, no DB call |
The _search_web handler delegates to course_enricher._search(), which routes based on available API keys:
- Inference API: when only
LLM_API_KEYis configured (sends requests to the Perplexity endpoint via the API gateway) - Direct Perplexity: when
SEARCH_API_KEYis configured separately
This ensures the tutor agent and the course enricher share identical search infrastructure.
The active prompt is system_prompt_v2.py. It is a single string assembled at request time from the TutorContext:
- Role and teaching goal — Socratic, intuition-first, no premature formulas
- Current lesson — title, summary, key concepts, completed lessons
- Teaching mode —
default,eli5,analogy,code,deep_dive(mode hints change tone) - Teaching principles — Socratic method, compounding analogies, multimodal toolkit (images, code, quizzes, resources, tables, Mermaid), energy matching, notation rules
- Conversation rhythm — teach / anchor / check / connect pattern, modality diversity target
- Reference material —
lesson_contentmarkdown (when present) - Detailed reference knowledge —
reference_kbmarkdown (the grounding layer, typically 2K-4K words) - Curriculum index — slugs, titles, and concepts for other lessons in the same course
- Available images — up to 15 images from wiki with path, caption, concepts, when_to_show
- Tools available — short descriptions of when to use each tool
- Output formats —
<quiz>JSON schema,<resource>JSON schema,<image>rules, Mermaid diagrams,<flow>/<architecture>interactive diagram specs - Factual accuracy tiers — four levels from reference KB (cite straight) to unsourced (say you're not sure)
system_prompt_explore.py is used for the Explore surface, where there is no current lesson:
- No embedded lesson body or reference KB
- Receives the full course catalog (
available_courses) so the model can route questions - Emphasizes proactive use of
get_lesson_contextandget_lesson_reference_kbwhen a curriculum topic is touched - Same output format rules (quiz, resource, image, Mermaid, flow, architecture)
backend/app/agent/context.py
| Field | Type | Default | Source |
|---|---|---|---|
lesson_id |
str |
"" |
DB |
lesson_title |
str |
"" |
DB |
lesson_summary |
str |
"" |
DB |
concepts |
list[str] |
[] |
DB (JSON parsed) |
lesson_content |
str |
"" |
DB |
reference_kb |
str |
"" |
DB |
completed_lesson_titles |
list[str] |
[] |
UserLessonProgress join |
curriculum_index |
list[dict] |
[] |
All lessons in same course |
mode |
str |
"default" |
Request body |
domain |
str |
"technical" |
LearningPath title |
available_images |
list[dict] |
[] |
lesson.image_metadata or wiki lookup |
exploration_mode |
bool |
False |
True for /explore |
available_courses |
list[dict] |
[] |
All courses (explore only) |
Next.js App Router with a persistent sidebar (course picker, modules, lesson links, progress indicators).
| Route | What it shows |
|---|---|
/learn |
Course list — My Courses / Shared with Me / Platform Courses sections |
/learn/[pathId] |
Modules and lesson cards with completion state |
/learn/[pathId]/[lessonId] |
Main lesson page: tutor chat (60%) + resizable lesson content panel (40%) |
/explore |
Open-ended chat, no lesson scope (Gemini-style centered empty state) |
/quiz |
Pick a topic, LLM-generated quiz with per-question feedback |
/concepts |
Browse rich concept pages (LLM-generated on first request, cached) |
/projects |
Capstone-style project cards tied to courses |
/assess |
Skill assessment from background + progress + quiz scores |
/create |
Course creator (see ARCHITECTURE-COURSE-CREATOR.md) |
TutorPanel is the chat shell. It uses useTutorStream — a custom hook that opens a fetch request to the chat endpoint, reads the streaming body with getReader(), and parses SSE frames.
MessageBubble renders:
- Markdown via ReactMarkdown + remark-gfm + remark-math + rehype-katex + rehype-highlight
- Mermaid diagrams from fenced code blocks — theme-aware rendering (default/dark), re-renders via MutationObserver on theme toggle
- Inline quizzes from
<quiz>XML tags —InlineQuizcomponent with light/dark mode colors - Resource cards from
<resource>tags — clickable with auto-constructed YouTube URLs, ExternalLink icon - Image cards from
<image>tags —ImageCardwith caption, hover-zoom, click-to-expand lightbox - Flow diagrams from
<flow>tags —AnimatedFlowBlock(SVG + framer-motion, step-by-step playback) - Architecture diagrams from
<architecture>tags —ArchitectureBlock(React Flow with drillable nodes)
Chat UX:
- Gemini-style empty state: centered input + suggested prompts, moves to sticky bottom bar when messages exist
- Auto-scroll only when user sends a new question (not during AI streaming)
- Teaching mode chips let the user switch modes mid-conversation
content/{course-slug}/
course.json # Course manifest (slug, title, modules, lessons with content/concepts/sources)
enrichment/ # Tutor reference KBs ({lesson-slug}_reference_kb.md)
| Table | Key fields |
|---|---|
LearningPath |
slug, title, description, level, order_index, created_by, visibility (public/private), share_token |
CourseShare |
learning_path_id, user_id (tracks per-user shares) |
Module |
learning_path_id, title, order_index |
Lesson |
module_id, title, slug, content, summary, concepts (JSON), reference_kb, sources_used (JSON URLs), image_metadata (JSON), youtube_id, video_title, transcript |
UserLessonProgress |
user_id, lesson_id, completed, completed_at |
TutorSession |
user_id, lesson_id, created_at, updated_at |
ChatMessage |
user_id, lesson_id, session_id, role, content, created_at |
ExplorationSession |
user_id, created_at, updated_at |
ExplorationMessage |
user_id, session_id, role, content, created_at |
Project |
slug, vision, outcomes, challenges, architecture_mermaid, demo/repo URLs |
QuizSession / QuizQuestion / QuizAnswer |
Quiz threads, questions, answers |
ConceptPage |
topic, rich text, optional lesson_id |
python seed.py scans content/*/course.json, creates or updates LearningPath -> Module -> Lesson, and loads enrichment/*_reference_kb.md into Lesson.reference_kb. It syncs all lesson fields including sources_used, image_metadata, youtube_id, and video_title. Platform courses get visibility="public".
The eval system stress-tests tutor quality using simulated multi-turn conversations with the real agent loop, real tools, and real system prompt.
Scenario YAML (persona + scripted turns + expected behaviors)
│
▼
StudentSimulator ─────────────────────┐
(scripted turns first, │
then LLM-generated replies) │
│ │
▼ ▼
run_tutor_turn() Scoring Pipeline
(real agent loop, ┌──────────────────────────────────┐
real tools, │ Layer 0: Behavioral metrics │
non-streaming) │ Layer 1: Format safety │
│ │ Layer 2: Prompt alignment judge │
▼ │ Layer 3: Outcome judge │
Conversation transcript └──────────────────────────────────┘
│
▼
Results JSON
Layer 0 — Behavioral metrics (scoring/behavioral.py, deterministic):
- Question ratios (open/closed)
- Image usage (proactive vs reactive)
- Modality diversity (quiz, resource, mermaid, flow, architecture, code, math)
- Tool call counts
- Backward-reference phrases (conversation continuity)
Layer 1 — Format safety (scoring/heuristics.py, deterministic):
- No fabricated URLs (only when
search_webwas called) - Quiz JSON validity inside
<quiz>tags - Plaintext multiple-choice detected (should be
<quiz>cards) <resource>blocks includeurlfield- No duplicate resources across conversation
- Structured block validity (mermaid, flow, architecture)
- Notation consistency with lesson content
Layer 2 — Prompt alignment judge (scoring/prompt_alignment.py, LLM-based):
- socratic_method — questions before answers, guided discovery
- multimodal_teaching — uses full toolkit (images, diagrams, code, quizzes)
- compound_analogies — builds on analogies across turns
- factual_grounding — cites reference KB, searches when unsure
- adaptive_mode — matches teaching mode and energy
- image_discipline — uses available images appropriately
Includes retry logic (up to 2 retries with backoff) and known-pattern rubric notes for code-heavy scenarios.
Layer 3 — Outcome judge (scoring/llm_judge.py + rubric.py, LLM-based):
- learning_arc — does understanding grow across the conversation?
- conversational_craft — does it feel like a mentor, not a chatbot?
- technical_accuracy — are facts, formulas, and citations correct?
- intellectual_engagement — would a real learner stay engaged?
- adaptive_responsiveness — does the tutor read the student's signals?
The judge sees the full transcript including tool calls and results.
40 YAML files in evals/scenarios/ covering 8 lessons across multiple persona types:
| Lesson | Scenarios | Coverage |
|---|---|---|
lora-parameter-efficient |
17 | Beginner, intermediate, factual probe, wrong intuition, visual learner, cross-topic, code mode, analogy mode, eli5, deep dive, demands answer, compound analogy, proactive visuals, resource timing, tangent follow, confusion recovery, structured overview |
self-attention |
7 | Format probes (quiz, sources, visual), explore mode probes, compound analogy |
building-first-langgraph-agent |
3 | Beginner, compound analogy, proactive visuals |
from-chatbots-to-agents |
3 | Beginner, compound analogy, proactive visuals |
code-execution-retrieval |
3 | Beginner, compound analogy, proactive visuals |
sim-to-real-transfer |
3 | Beginner, compound analogy, proactive visuals |
alignment-rlhf |
2 | Beginner, technical deep dive |
attention-breakthrough |
2 | Beginner, format probe visual |
cd backend
python -m evals.run_eval --lesson self-attention # All scenarios
python -m evals.run_eval --lesson self-attention --scenario format_probe_quiz # Single scenario
python -m evals.run_eval --lesson lora --prompt-version v2-explore # Explore mode
python -m evals.run_eval --lesson lora --compare model-a model-b # Side-by-sideResults are saved as JSON in evals/results/.
| What | Where | Notes |
|---|---|---|
| Non-secret settings | backend/settings.yaml |
LLM base URL, model IDs (primary, fallback, fast), search URL, usage limits, eval defaults |
| Secrets | backend/.env |
LLM_API_KEY, JWT_SECRET, optional SEARCH_API_KEY |
| Frontend API URL | NEXT_PUBLIC_API_URL env var |
Defaults to http://localhost:8000 |
| Wiki URL | NEXT_PUBLIC_WIKI_URL env var |
Used in AppHeader wiki link |
The LLM configuration is provider-agnostic: any OpenAI-compatible API works. Change llm.base_url and model IDs in settings.yaml, set the right key in .env.
| Concern | Path |
|---|---|
| Agent loop | backend/app/agent/agent_loop.py |
| System prompt (lesson) | backend/app/agent/system_prompt_v2.py |
| System prompt (explore) | backend/app/agent/system_prompt_explore.py |
| Tool schemas | backend/app/agent/tools.py |
| Tool execution | backend/app/agent/tool_handlers.py |
| Context dataclass | backend/app/agent/context.py |
| Tutor chat endpoint | backend/app/routers/tutor.py |
| Explore chat endpoint | backend/app/routers/explore.py |
| Course sharing/visibility | backend/app/routers/learning_paths.py |
| DB models | backend/app/models/learning.py |
| Config | backend/app/config.py + backend/settings.yaml |
| Seeding | backend/seed.py |
| Eval runner | backend/evals/runner.py + run_eval.py |
| Eval scoring | backend/evals/scoring/ (behavioral, heuristics, prompt_alignment, llm_judge, rubric) |
| Eval scenarios | backend/evals/scenarios/*.yaml |
| Ablation testing | backend/evals/run_ablation.py |
| Frontend chat | frontend/components/tutor/TutorPanel.tsx |
| SSE streaming hook | frontend/lib/useTutorStream.ts |
| Message rendering | frontend/components/tutor/MessageBubble.tsx |
| Mermaid diagrams | frontend/components/tutor/MermaidBlock.tsx |
| Quiz cards | frontend/components/tutor/InlineQuiz.tsx |
| Navigation | frontend/components/AppHeader.tsx |