Project Cornerstone: One Platform Backend Migration

Moving agent execution from the browser to the One Platform backend, connected over Server-Sent Events.

Architecture

┌─────────────────────────────────────────────────────────────┐
│  CLIENT (visual-editor, browser)                            │
│                                                             │
│   SCA Actions ──→ AgentService.startRun()                   │
│                       │                                     │
│              ┌────────┴────────┐                            │
│              │ AgentRunHandle  │                             │
│              │   .events      │ (AgentEventConsumer)        │
│              │   .abort()     │                             │
│              └────────┬───────┘                             │
│                       │                                     │
│              SSEAgentEventSource                            │
│                       │ fetch + iteratorFromStream           │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│                       ▼                                     │
│               OPAL_BACKEND_API_PREFIX                       │
│              (appcatalyst.pa.googleapis.com)                │
└─────────────────────────────────────────────────────────────┘
                        │
                ┌───────┴────────────────────────────────────┐
                │  ONE PLATFORM (production backend)          │
                │                                             │
                │  Wraps opal-backend-shared with             │
                │  One Platform API surface                   │
                │                                             │
                │  ← synced from packages/opal-backend-shared
                └─────────────────────────────────────────────┘

  LOCAL DEV:
  ┌─────────────────────────────────────────────────────────┐
  │  packages/opal-backend-dev (Python, FastAPI)            │
  │    - New APIs → wire to opal-backend-shared directly    │
  │    - Existing APIs → proxy to One Platform              │
  └─────────────────────────────────────────────────────────┘

  INTEGRATION TESTING:
  ┌─────────────────────────────────────────────────────────┐
  │  packages/opal-backend-fake (Python, FastAPI)           │
  │    - Canned scenarios, in-memory state                  │
  │    - No real API calls                                  │
  └─────────────────────────────────────────────────────────┘

Packages

Package	Language	Purpose
`opal-backend-shared`	Python	Shared agent logic (synced to prod)
`opal-backend-dev`	Python	Dev server (proxy + direct wiring)
`opal-backend-fake`	Python	Fake server (canned scenarios)
`unified-server`	TypeScript	Static content + blobs (unchanged)
`visual-editor`	TypeScript	Client (SSE consumer, unchanged)

Wire Format

21 AgentEvent types defined in agent-event.ts and mirrored as Pydantic models in opal-backend-shared.

Event	Direction	Purpose
`start`	→ client	Loop began
`thought`	→ client	Model reasoning
`functionCall`	→ client	Tool invocation started
`functionCallUpdate`	→ client	Tool status update
`functionResult`	→ client	Tool result
`subagentAddJson`	→ client	Nested progress
`subagentError`	→ client	Nested error
`subagentFinish`	→ client	Nested progress complete
`content`	→ client	Model output
`turnComplete`	→ client	Full turn finished
`sendRequest`	→ client	Gemini request sent
`waitForInput`	⇄ suspend	Needs user text
`waitForChoice`	⇄ suspend	Needs user choice
`readGraph`	⇄ suspend	Read graph data
`inspectNode`	⇄ suspend	Inspect a node
`applyEdits`	⇄ suspend	Confirmed edits
`queryConsent`	⇄ suspend	User consent
`graphEdit`	→ client	Fire-and-forget edits
`complete`	→ client	Loop finished
`error`	→ client	Loop error
`finish`	→ client	Cleanup signal

Dev Backend Pipeline

How to run end-to-end: npm run dev:backend -w packages/unified-server

This starts both the unified-server (port 3000, serves the frontend) and the Python dev backend (port 8080, runs the agent loop). Open localhost:3000, create or open an Opal, and run it — the full wire protocol will be exercised.

Activation: The frontend enters remote mode when CLIENT_DEPLOYMENT_CONFIG.DEV_BACKEND_MODE is set. This calls agentService.configureRemote(OPAL_BACKEND_API_PREFIX, fetchWithCreds) in packages/visual-editor/src/sca/services/services.ts.

Data flow:

Frontend (browser)                          Dev backend (Python)
─────────────────────                       ────────────────────
1. User runs Opal
2. resolveToSegments(objective, params)
   → segments[] + flags
3. SSEAgentRun POSTs to /api/agent/run
   body: {kind, segments, flags}
                                    ──→  4. to_pidgin(segments, file_system)
                                            → pidgin text + capabilities
                                         5. Wrap: <objective>text</objective>
                                         6. Loop.run(objective) → Gemini
                                    ←──  7. SSE events stream back
8. AgentEventConsumer dispatches
   events to SCA controllers

Key files:

packages/visual-editor/src/a2/agent/resolve-to-segments.ts — template → segments
packages/visual-editor/src/a2/agent/sse-agent-run.ts — POST body construction
packages/opal-backend-dev/opal_backend_dev/main.py — receives and processes body
packages/opal-backend-shared/opal_backend_shared/pidgin.py — to_pidgin + from_pidgin_string

Phases

How Objectives Work

Objectives (🎯) are the real milestones — concrete, executable tests that prove the system works. They go at the top of each phase. Everything below them is in service of reaching them.

Plan backward from the objective. Write the objective first as a specific action with an observable result ("run this command, see this output"). Then work backward: what items are needed to make that action succeed? If the items don't add up to a reachable objective, the plan is wrong — restructure it.

A checked-off list is not an objective. Individual items can be correct (code compiles, tests pass) without adding up to the objective. Before marking a phase complete, trace the full path from the user's action to the expected result. If any link is missing, the objective is not reached — add the missing work to the plan rather than redefining the objective.

Restructuring is progress. Discovering that the plan doesn't reach the objective is valuable information. Add new phases, split existing ones, move items — whatever makes the path to the objective honest.

Phase 1: Client-Side Event Coverage ✅

Event types, sink, consumer, bridge
buildHooksFromSink
AgentService + AgentRunHandle
Strangler-fig GraphEditingAgentService → Actions

Phase 2: Content Generation Agent ✅

Content generation agent uses AgentService
ConsoleProgressManager + RunStateManager as consumer handlers
Subagent reporter events + proxy ProgressReporter
FunctionCallEvent carries args for custom work item titles

Phase 3: Suspend/Resume via Events ✅

waitForInput / waitForChoice suspend events
Consumer handlers: requestInput(), ChoicePresenter
Graph-editing agent: sink.suspend() in chat-functions

Phase 3.5: Generalize Client Calls ✅

SuspendEvent union, widened suspend() signature
readGraph, inspectNode, applyEdits, queryConsent suspend events
Consumer handlers for each new suspend event

Phase 3.75: Client-Side SSE Transport ✅

SSEAgentEventSource — fetch + iteratorFromStream
SSEAgentRun / LocalAgentRun — split run implementations
AgentService.configureRemote(baseUrl, fetchFn)

Phase 4: Python Backend Packages

4.1: Scaffolding ✅

packages/opal-backend-shared/ — protocol primitives (events, sink, pending requests)
packages/opal-backend-fake/ — canned scenarios + FastAPI endpoints (absorbed mock-agent-server)
packages/opal-backend-dev/ — stub with proxy for existing APIs
Remove packages/mock-agent-server/
Migrate and verify all existing tests (13/13 passing)

4.2: Local Dev Workflow ✅

Root npm run setup:python (creates venvs for all Python packages)
PIP_INDEX_URL baked into all setup scripts
dev:fake starts fake Python backend alongside static server (with venv check)
BACKEND_API_ENDPOINT=http://localhost:8000 set in serve:fake env
Developer docs in opal-backend-dev/README.md

4.3: Proxy-First Backend ✅

opal_backend_shared/local/ — local-only shared API surface
api_surface.py — router factory with AgentBackend + ProxyBackend protocols
opal-backend-dev reverse proxy via httpx (forwards auth headers)
opal-backend-fake refactored to shared API surface (13 tests passing)
dev:backend wireit entry (serves at :3000, proxy at :8080)
start-dev-backend.sh with venv check

4.4: Port Agent Loop to Python

4.4a: Loop Core ✅

gemini_client.py — streaming Gemini API via httpx
function_definition.py — FunctionDefinition, FunctionGroup types
function_caller.py — async function dispatch + result collection
loop.py — while-loop orchestrator with LoopHooks
Unit tests with mocked Gemini responses (14 tests)

4.4b: Termination Functions ✅

Port system_objective_fulfilled (terminates loop with success)
Port system_failed_to_fulfill_objective (terminates loop with failure)
System instruction (meta-plan prompt — verbatim port)
Unit tests (13 tests)

4.4c: DevAgentBackend + End-to-End ✅

AgentEventSink + build_hooks_from_sink in opal-backend-shared
DevAgentBackend in opal-backend-dev (implements AgentBackend)
Agent endpoint wiring (always active, access token from request headers)
Unit tests (15 tests for event sink + hooks)

4.4d: Resumable Stream Protocol ✅

🎯 Objective: Backend pipeline works end-to-end without the frontend.
curl -X POST http://localhost:8080/api/agent/run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -d '{"kind":"content","objective":{"parts":[{"text":"Make a joke"}],"role":"user"}}' \
  --no-buffer
→ SSE stream of events → ends with system_objective_fulfilled.

Architecture pivot: Single POST /api/agent/run → SSE stream replaces the multi-endpoint pattern.

POST /api/agent/run  →  SSE stream (start or resume)
Body (start):  {kind, objective}
Body (resume): {interactionId, response}

Redesign api_surface.py — single POST /run → EventSourceResponse
Update DevAgentBackend — POST handler starts loop, streams inline
Update frontend SSEAgentEventSource — POST with body instead of GET
Update frontend AgentService — pass config into SSEAgentRun
Remove SSEAgentRun.resolveInput() side-channel
Auth: access token from Authorization header → Loop
Wire configureRemote() in app init via BACKEND_API_ENDPOINT
Fix proxy Content-Encoding header stripping

4.4e: Wire Content Runs Through AgentService ✅

🎯 Objective: Run an opal through the dev backend and see the result.
npm run dev:backend -w packages/unified-server
Open the app, run an opal with a simple text task like "make a joke". The agent calls Gemini via the Python backend, streams events back, and the result appears in the UI.

DEV_BACKEND_MODE deploy-time flag (like FAKE_MODE)
configureRemote() gated on flag in services.ts
Early instanceof SSEAgentRun branch in main.ts → invokeRemoteAgent()
Lightweight ConsoleProgressManager for remote UI reporting
complete event carries AgentResult.outcomes (don't break on finish)
Tests: SSE event sequence, outcome extraction, error handling (11 tests)

4.4f: Agent File System + System Functions

Port AgentFileSystem (in-memory virtual FS)
Port TaskTreeManager (task tree schema + status tracking)
Port PidginTranslator.fromPidginString (resolve <file> tags → data parts from FS — needed by system_write_file and onSuccess callback)
Port remaining system functions (list/read/write files, task tree)
Add intermediate / FileData to AgentResult
Wire file system + task tree into loop setup

4.5: Wire Protocol + Objective Handling

Design: Structured segments, not raw LLMContent. The client sends semantic intent; the server owns the entire pidgin vocabulary.

toPidgin splits in two: the client resolves templates into typed segments (text, asset, input). The server walks segments, registers data parts in AgentFileSystem, and emits all pidgin tags. pidgin.py is the single source of truth for the pidgin language.

Capabilities (useMemory, useNotebookLM) are discovered by the client during template resolution — they emerge from encountering template chips, not from runtime flags. Custom tools run on the server: the client sends board URLs, the server loads and invokes them.

Define segment types: text (literal), asset (titled content group), input (agent-output content group), tool (routes, memory, NLM, custom)
Client-side pre-resolution: resolve-to-segments.ts extracts template resolution from toPidgin into a step that runs before startRun()
Wire metadata: flags.useNotebookLM sideband; useMemory, routes, and custom tools discovered from tool segments server-side
Server-side to_pidgin(segments): walk segments, register data parts in FS, emit <asset>, <input>, <file>, <content>, <objective> tags
Server-side onSuccess callback: from_pidgin_string (done ✅) + intermediate file collection (done ✅)
End-to-end: npm run dev:backend → frontend resolves segments → POST → to_pidgin → Loop → SSE stream back to browser

4.6: Data Transform Plumbing

The shared substrate: resolving storedData/fileData references to Gemini-consumable formats. All media generation and content functions depend on this.

Key insight: the D2F (Drive → Gemini File) and B2F (Blob → Gemini File) transforms already go through backend endpoints (/v1beta1/uploadGeminiFile, /v1beta1/uploadBlobFile). The dev backend proxies to the same One Platform server. The Python agent loop can call these endpoints directly.

conform_body on the server: conform_body.py walks LLMContent parts, resolves storedData/fileData to Gemini File API URLs via /v1beta1/uploadGeminiFile (HTTP calls to One Platform)
json parts → {text: json.dumps()} (inline transform)
NotebookLM storedData → {text: url} passthrough
_upload_gemini_file helper: authenticated POST to One Platform, BackendClient protocol injected via Loop.__init__
Tests: 21 tests covering all 6 transforms, error handling, mixed content

4.7: Function Groups

With the wire protocol and data plumbing in place, function groups are thin handlers on top.

4.7a: Text Generation

🎯 Objective: Send an image into the agent and ask it to describe it. The image flows through segments → pidgin → from_pidgin_string → conform_body (resolves storedData to Gemini File API) → generate_text → text description streams back over SSE. Full multimodal pipeline end-to-end.

Port generate_text function (pidgin → conformBody → streamContent → merge text)
Grounding tools: Google Search, Google Maps, URL context
Wire get_generate_function_group into dev backend main.py
Tests: 20 tests — handler, grounding, model resolution, error handling

4.7b: Image Generation

🎯 Objective: Ask the agent to "generate an image of a cat" through the dev backend. The agent calls generate_image → executeStep with ai_image_tool → One Platform returns inline image data → saved to agent FS.

Port executeStep client (POST to /v1beta1/executeStep, collect output chunks)
Port generate_image function (prompt + optional input images + aspect ratio → executeStep → save to FS)
Wire into dev backend main.py
Tests

4.7c: Video Generation

🎯 Objective: Ask the agent to "make a short video of waves crashing." The agent calls generate_video → executeStep with Veo model → storedData part saved to agent FS.

Port generate_video function (prompt + optional reference images → executeStep with generate_video API + Veo model selection)
Port expandVeoError safety-code mapping
Wire into dev backend main.py
Tests

4.7d: Speech Generation

🎯 Objective: Ask the agent to "read this paragraph aloud." The agent calls generate_speech_from_text → executeStep → audio storedData saved to agent FS.

Port generate_speech_from_text function (text + voice selection → executeStep → save audio to FS)
Wire into dev backend main.py
Tests

4.7e: Music Generation

🎯 Objective: Ask the agent to "compose upbeat background music." The agent calls generate_music_from_text → executeStep → audio storedData saved to agent FS.

Port generate_music_from_text function (prompt → executeStep → save audio to FS)
Wire into dev backend main.py
Tests

4.7f: Code Generation & Execution

🎯 Objective: Ask the agent to "calculate the first 20 Fibonacci numbers." The agent calls generate_and_execute_code → conformBody → streamContent with codeExecution tool → streams text + inline file results back.

Port generate_and_execute_code function (prompt → conformBody → streamContent with code execution tool → merge text + file results)
Wire into dev backend main.py
Tests

4.8: Suspend/Resume for Interactive Agents

🎯 Objective: Open graph editor, use AI chat to edit a graph through the dev backend. Each interaction round-trips as: POST → stream → suspend → POST → stream → complete.

Design decision: reconnect, not keepalive. The SSE stream closes when the loop suspends. The client POSTs again with {interactionId, response} to resume. This is the only viable approach — suspends can last seconds, hours, or days. A keepalive stream cannot stay open that long, and the production backend is stateless. The dev backend must match this model.

4.8a: Suspend/Resume Protocol ✅

SuspendError + SuspendResult in opal-backend-shared
InteractionStore — in-memory state store keyed by interactionId
Loop catches SuspendError, returns SuspendResult
Dev backend: _start() / _resume() / _stream_loop() with state save/load
Client SSEAgentEventSource: reconnect loop — suspend → await handler → POST resume → new stream
Tests: 11 Python (suspend/resume round-trip) + TS reconnect test

4.8b: Chat Suspend Functions

Port chat_request_user_input + chat_present_choices to Python (raises SuspendError with waitForInput / waitForChoice)
waitForInput handler in invokeRemoteAgent — prompt display + input
Loop: suppress on_finish on suspend, suppress on_start on resume
End-to-end: Generate step (Agent mode, prompt: "ask user to provide their name") through dev:backend — suspend → user types → resume → complete

4.8c: Unified Suspend Handler Rendering ✅

🎯 Objective: Both waitForInput and waitForChoice work identically through local and remote paths. The rendering logic (prompt display, input collection, choice presentation) is shared — not duplicated between AgentUI and invokeRemoteAgent.

Extract #addChatOutput pattern into a shared utility (chat-output.ts) usable by both local (AgentUI) and remote (invokeRemoteAgent) paths
waitForInput remote handler uses the shared utility (deduplicate)
waitForChoice remote handler with ChoicePresenter + A2UIInteraction (proper choice buttons / checkboxes)
A2UIInteraction extracted as shared rendering core — AgentUI delegates to it, adding only pidgin translation on top
Server-side pidgin resolution for choice labels (from_pidgin_string in chat.py) — images and file refs render correctly in choices
End-to-end: choice-based interaction through dev:backend
Port fidelity audit — all function groups (image, video, audio, generate, chat, system) verified against TS source; descriptions restored; shared_schemas.py centralizes statusUpdateSchema, taskIdSchema, fileNameSchema

Phase 5: Python Consolidation & Copybara Prep

🎯 Objective: opal-backend-shared code is pure Python (no httpx, fastapi, or pydantic imports in synced files). The shared code can be copybara'd to third_party/py/opal_backend and the production backend can inject its own HTTP transport.
# Verify: no transport deps in synced code
grep -r "import httpx\|from fastapi\|from sse_starlette" \
  packages/opal-backend/opal_backend/*.py \
  packages/opal-backend/opal_backend/functions/
# → no results

5.1: Package Consolidation ✅

Merge opal-backend-shared, opal-backend-dev, opal-backend-fake into single packages/opal-backend/ with one pyproject.toml and one .venv
Directory structure: opal_backend/ (synced), opal_backend/local/ (not synced), opal_backend/dev/, opal_backend/fake/
Move fake-server artifacts (events.py, sse_sink.py, pending_requests.py) to local/ (not synced to google3)
Update all imports across source and test files
Update wireit scripts (setup:python, test:python, dev:backend, dev:fake)

5.2: HTTP Transport Abstraction ✅

http_client.py — HttpClient protocol (synced, no deps)
local/http_client_impl.py — httpx-based implementation (not synced)
Update gemini_client.py — replace import httpx with HttpClient
Update conform_body.py — replace import httpx with HttpClient
Update step_executor.py — replace import httpx with HttpClient
Thread HttpClient through Loop and function group factories
Update all tests to inject HttpClient

5.3: Typed Event Models ✅

opal_backend/events.py — dataclass models for all 22 AgentEvent types, AgentResult, FileData, request/response bodies (StartRunRequest, ResumeRunRequest), segment types — all with to_dict() producing camelCase JSON (no pydantic dependency)
agent_events.py — AgentEventSink queue and build_hooks_from_sink emit typed events instead of dict[str, Any]
suspend.py — SuspendError takes typed SuspendEvent + explicit function_call_part parameter (moved out of event dict)
loop.py — imports wire-format types from events.py
functions/chat.py — constructs WaitForInputEvent/WaitForChoiceEvent
dev/main.py — SSE serialization via event.to_dict() + json.dumps()
local/sse_sink.py — uses to_dict() instead of Pydantic model_dump_json()
Delete local/events.py (replaced by synced events.py)
Update all tests (test_agent_events.py, test_chat_functions.py, test_suspend_resume.py)

5.4: BackendClient Protocol ✅

backend_client.py — BackendClient protocol (execute_step, upload_gemini_file, upload_blob_file) + HttpBackendClient impl
step_executor.py — delegates to BackendClient, removed HTTP logic
conform_body.py — delegates to BackendClient.upload_gemini_file
loop.py — upstream_base: str → backend: BackendClient | None
All function groups (image, audio, video, generate) updated
dev/main.py — per-request HttpBackendClient with origin
Tests updated (step_executor, conform_body, suspend_resume)

5.5: High-Level Entry Point ✅

5.5a: Credential Internalization ✅

🎯 Objective: access_token and origin are no longer parameters of Loop, function group factories, or conform_body. Credentials are a transport concern — baked into HttpClient and BackendClient.

5.5b: `run()` / `resume()` Entry Point ✅

🎯 Objective: opal_backend.run() is a single async iterator that takes an objective + injected deps and yields AgentEvents. Consumers provide only what varies by environment. Everything else is internal.
async for event in opal_backend.run(
    objective=objective,
    client=http_client,
    backend=backend_client,
    store=interaction_store,
):
    yield event

opal_backend/run.py — run() async generator: creates AgentFileSystem, TaskTreeManager, AgentEventSink, LoopController, builds function groups internally, runs loop, dispatches CompleteEvent/ErrorEvent, closes sink. Accepts extensible flags dict.
opal_backend/run.py — resume() async generator: loads state from InteractionStore, injects function response, rebuilds function groups, runs loop, dispatches events
dev/main.py — delegates to opal_backend.run() / resume(); deleted _build_function_groups, _stream_loop (~130 lines removed)
opal_backend/__init__.py — re-export run, resume
Tests: test_run.py — 6 tests (complete, error, suspend, failed, unknown ID, full suspend→resume round-trip). All 328 pass.

5.5c: Protocol Boundary Extraction ✅

🎯 Objective: Core modules depend only on protocols. All transport-specific implementations live in local/.

backend_client.py — protocol-only (removed HttpBackendClient, HttpClient import, logging)
local/backend_client_impl.py — HttpBackendClient moved here
interaction_store.py — converted InteractionStore class to Protocol
local/interaction_store_impl.py — InMemoryInteractionStore moved here
Updated imports: dev/main.py, test_step_executor.py, test_suspend_resume.py, test_run.py (328 pass)

5.6: Memory & Storage ✅

🎯 Objective: Agents running on the Python backend can use persistent Google Sheets-backed memory. Chat conversations are persisted and recalled across sessions. The graph identity is required and anchors all storage.

5.7: Port Fidelity & Error Handling ✅

Systematic audit-and-fix pass across all function groups to ensure the Python backend matches TS behavior exactly. Also adds fatal error classification so unrecoverable errors (quota-exhausted) terminate the loop immediately instead of being retried by the LLM.

Phase 6: Production Readiness ← we are here

Full parity with the in-process agent. Everything that works locally works identically through the dev backend.

Status metadata plumbing — StatusUpdateOptions (expectedDurationInSec, isThought) flows through StatusUpdateCallback → FunctionCallUpdateEvent.opts → SSE wire format. All 4 TODOs in generate.py resolved.
url_context consent flow — FunctionDefinition gains an optional precondition handler, run by FunctionCaller before the main handler. The consent precondition raises SuspendError(QueryConsentEvent) with is_precondition_check=True. On resume, _resume_precondition records the grant and re-dispatches the function call — the model never sees the consent round-trip.
Cancel concurrent function caller tasks on suspend — loop.py now cancels sibling asyncio.Tasks before saving state, eliminating the "emit() called on closed sink" warning.
content event — investigation result: only consumed in local mode (RunStateManager.pushContent), ignored by remote client. Removed server-side emission from build_hooks_from_sink and loop. The event can be fully removed if LocalAgentRun is removed.
Segment data-parts transfer — moved to_pidgin() inside run() so segments are converted using the loop's own AgentFileSystem. Data parts from asset/input segments now survive into the loop. run() accepts segments (not objective) as its primary input; dev/main.py passes segments straight through.
Chat resume multimodal parts — _process_chat_response uses content_to_pidgin_string to register binary parts (images, file uploads) in AgentFileSystem and produce pidgin text with <file> tags, matching the TS local path behavior.

FilesExpand file tree

PROJECT_CORNERSTONE.md

Latest commit

History