Skip to content

[Agent refactor] Event-driven agent loop with hooks, interrupts, and steering #1316

@alexhoshina

Description

@alexhoshina
中文

当前的 agent loop(pkg/agent/loop.go)是一个黑盒——外部代码无法观察内部执行状态,无法 hook 执行过程,无法打断正在进行的处理,也无法在处理过程中追加消息。本提案将 agent loop 重新设计为事件驱动、可 hook、可中断、可追加的系统。

问题是什么?

现有的 runAgentLooprunLLMIteration 处理流水线存在四个架构层面的限制:

  1. 无可观测性 — Loop 不发射任何事件。外部消费者(UI、日志、自动化脚本)无法得知 loop 处于哪个阶段、正在执行哪个工具、或 Turn 为何结束。
  2. 无 Hook 点 — 外部代码无法拦截或修改 LLM 请求,无法审批/拒绝工具执行,也无法在运行时改变 loop 行为。
  3. 无中断机制 — Turn 一旦开始就无法停止。没有优雅中断(跳过剩余工具让 LLM 生成总结),也没有硬中断(立即取消一切)。
  4. 无消息注入 — 外部代码无法向正在进行的 Turn 注入消息(steering),也无法排队等待 Turn 结束后处理的消息(follow-up)。

此外,工具执行通过 WaitGroup 完全并行,工具之间没有检查点,无法在工具执行间检查中断或新的 steering 消息。


重构目标与非目标

目标

  1. 事件透明: loop 内部的每个关键阶段都向外发射事件
  2. Hook 可控: 外部可以在关键阶段拦截和修改行为
  3. 可打断: 支持 Graceful Interrupt 和 Hard Abort
  4. 可追加: 支持 Steering(影响当前 Turn)和 FollowUp(排队等待)
  5. 工具智能并行: 只读工具并行,有副作用的工具顺序执行

非目标

  • 不改变 MessageBus 的 inbound/outbound 模型
  • 不改变 Provider 接口(LLMProvider.Chat
  • 不改变 Tool 接口的核心签名(Execute / ExecuteAsync
  • 不改变 ContextBuilder.BuildMessages 的输入输出契约
  • 不改变 session 的存储格式(JSON)
  • 不引入外部依赖

边界定义

什么在 Loop 内部,什么在外部

graph TD
    subgraph AgentLoop["AgentLoop (外壳)"]
        Run["Run()<br/>消费 MessageBus,发送最终回复<br/>这一层不变,只增加 EventBus 初始化"]

        subgraph ProcessMessage["processMessage() — 前置处理层"]
            Transcribe["transcribeAudio<br/><i>保持在外部</i>"]
            Route["resolveMessageRoute<br/><i>保持在外部</i>"]
            Command["handleCommand<br/><i>保持在外部</i>"]
        end

        subgraph RunTurn["runTurn() — 重构后的核心 (原 runAgentLoop)"]
            subgraph TurnBoundary["Turn 边界 — 事件 + Hook 覆盖范围"]
                Context["初始上下文组装<br/>ContextBuilder.BuildMessages() 仍为纯函数<br/>resolveMediaRefs() 在此处调用<br/>产出: []providers.Message"]

                subgraph LLMLoop["LLM 迭代循环"]
                    DrainSteering["drain steering queue"]
                    BeforeLLM["Hook: BeforeLLMRequest"]
                    CallLLM["调用 Provider.Chat()"]
                    AfterLLM["Hook: AfterLLMResponse"]
                    ParseTools["解析 tool calls"]

                    subgraph ToolExec["工具执行 (智能并行)"]
                        BeforeTool["Hook: BeforeToolExecute"]
                        Approval["Hook: RequestApproval"]
                        ExecTool["Execute tool"]
                        AfterTool["Hook: AfterToolExecute"]
                    end

                    CheckInterrupt["检查 steering / interrupt"]
                end

                Compress["上下文压缩 (retry 分支)<br/>forceCompression()<br/>Hook: BeforeContextCompress"]
                SessionSave["Session 保存<br/>中间: AddFullMessage (只写内存)<br/>结束: AddMessage + Save (落盘)<br/>中断: 回滚内存中的中间状态"]
                Summarize["摘要触发 (Turn 结束后异步)"]
            end
        end

        API1["InjectSteering() ← 公开 API"]
        API2["InjectFollowUp() ← 公开 API"]
        API3["InterruptGraceful() ← 公开 API"]
        API4["InterruptHard() ← 公开 API"]
        API5["SubscribeEvents() ← 公开 API"]
        API6["RegisterHook() ← 公开 API"]
        API7["GetActiveTurn() ← 公开 API"]
    end

    Run --> ProcessMessage
    ProcessMessage --> RunTurn
    Context --> LLMLoop
    DrainSteering --> BeforeLLM --> CallLLM --> AfterLLM --> ParseTools --> ToolExec
    BeforeTool --> Approval --> ExecTool --> AfterTool
    ToolExec --> CheckInterrupt
    CheckInterrupt -->|"继续迭代"| DrainSteering
    LLMLoop --> Compress
    LLMLoop --> SessionSave
    SessionSave --> Summarize
Loading

核心概念

Turn 与 Iteration

术语 定义
Turn 一次完整的「用户输入 → LLM 迭代 → 最终回复」处理过程
Iteration Turn 内的一次 LLM 调用(可能产生工具调用,触发下一次 Iteration)
Steering 在 Turn 进行中注入的消息,影响当前 Turn 的下一次 Iteration
FollowUp 在 Turn 进行中排队的消息,当前 Turn 结束后作为新的 Inbound 处理
Graceful Interrupt 跳过剩余工具,可注入最后一条 steering 提示,让 LLM 生成总结后结束 Turn
Hard Abort 立即取消 Provider 调用和所有工具执行,Turn 以错误结束
SubTurn 由父 Turn 内部 spawn 的子 Turn,走完整 runTurn 路径,拥有独立 turnID
Ephemeral Session SubTurn 使用的纯内存 session,不落盘,SubTurn 结束后清理

EventBus 是什么?

EventBus 是一个多订阅者广播系统,agent loop 通过它在每个关键执行阶段发射结构化事件。

可以理解为:「让任何人都能实时观察 loop 在做什么,而不影响其行为。」

核心设计要点:

  • 非阻塞 fan-out:Emit 遍历订阅者并以非阻塞方式发送;channel 满时丢弃事件(永远不阻塞 loop)
  • Per-EventKind 丢弃计数[EventKindCount]atomic.Int64 数组,可精确诊断哪类事件正在丢失
  • 无独立 goroutine:Emit 在 loop 自身的 goroutine 中运行(减少 goroutine 数量,适合 $10 开发板、<10MB 内存的场景)
  • 每个订阅者缓冲容量 16:考虑到 LLM 调用延迟主导迭代时间,此容量足够

事件类型覆盖完整的 Turn 生命周期:

分类 事件
Turn 生命周期 TurnStart, TurnEnd
LLM 调用 LLMRequest, LLMDelta, LLMResponse, LLMRetry
上下文管理 ContextCompress, SessionSummarize
工具执行 ToolExecStart, ToolExecEnd, ToolExecSkipped
外部交互 SteeringInjected, FollowUpQueued, InterruptReceived
子 Agent 生命周期 SubTurnSpawn, SubTurnEnd, SubTurnResultDelivered
错误 Error

Hook 系统是什么?

Hook 系统允许外部代码在明确定义的执行节点上观察和拦截 loop 的执行。Hook 是同步的、按优先级排序的、且有超时保护。

共五个 hook 接口,均为可选——hook 只需实现它需要的接口:

接口 用途 可修改?
EventObserver 被动接收事件
LLMInterceptor 在 LLM 调用前后拦截 是——可修改消息、模型、工具定义;可中止 Turn
ToolInterceptor 在工具执行前后拦截 是——可修改参数、拒绝工具、中止 Turn
ToolApprover 暂停等待外部审批(如 UI 确认) 否——审批或拒绝
ContextCompressInterceptor 在上下文压缩前拦截 否——可中止 Turn

Hook 动作:ContinueModifyAbortTurnHardAbortDenyTool

超时分级

  • 普通 Hook(Observer、LLMInterceptor、ToolInterceptor):默认 5s,超时 → Continue
  • 审批 Hook(ToolApprover):默认 60s(可配置),超时 → deny(安全优先)

审批在独立 goroutine 中运行,使用 channel+timeout 模式,因此即使 hook 实现者行为不当也无法永久阻塞 loop。


中断与消息注入机制是什么?

中断

两种中断模式,作为 AgentLoop 的公开 API 暴露:

优雅中断(Graceful Interrupt) 硬中断(Hard Abort)
行为 设置 interrupted=true,跳过剩余工具,注入 steering 提示,允许再做一轮 LLM 迭代生成总结 设置 aborted=true,立即取消 provider context 和 turn context
Session 保留已完成的工具结果,正常保存 回滚到快照点(仅内存回滚,不写盘)
子 Turn 不级联——让子 Turn 自然完成 通过 cascadeAbort 递归级联到所有子 Turn

Steering 与 Follow-up

Steering Follow-up
时机 影响当前 Turn 的下一次 Iteration 排队等待;在当前 Turn 结束后作为新的 Inbound 处理
消费方式 在每次迭代循环顶部 drain 作为 runTurn 的返回值,由调用者在所有清理完成后投递
边界情况 如果 LLM 返回最终内容(无工具调用)但 steering 队列非空:若迭代余量充足则继续迭代,若已达 MaxIterations 则降级为 follow-up 通过 bus.PublishInboundrunTurn 完全返回后投递——避免与 defer 清理产生竞态

工具执行策略

当前基于 WaitGroup 的并行执行替换为智能分组

工具调用: [read_file, list_dir, write_file, web_search, exec]

分组:
  → {read_file, list_dir}  parallel=true   (均为只读)
  → {write_file}           parallel=false   (有副作用)
  → {web_search}           parallel=true    (只读)
  → {exec}                 parallel=false   (有副作用)
  • 只读工具(通过可选的 ReadOnlyIndicator 接口声明):连续的只读工具并行执行
  • 有副作用的工具:顺序执行,每个工具之间有 steering/中断检查点
  • 未声明的工具默认为有副作用(保守策略——只会降低并行度,不会导致正确性问题)
  • IsReadOnly() 是静态的(无参数)——有意不支持依赖参数的只读判断,以保持简单和安全

多 Agent 协作(子 Turn)

当前的 SubagentManager + RunToolLoop 架构存在关键缺陷:

  1. RunToolLoop 完全脱离 EventBus/Hook/中断体系
  2. 异步子 agent 结果路由到 agent:main:main session,丢失原始对话上下文
  3. 无父子 Turn 关系——无法级联中断
  4. 子 agent 不可被外部中断

设计:通过 runTurn 实现子 Turn

子 agent 重构为使用与顶层 Turn 相同的 runTurn 路径:

  • spawnSubTurn(parentTS, config) — 同步和异步子 Turn 的统一入口
  • Turn 树:通过 turnState 中的 parentTurnID / childTurnIDs 追踪父子关系
  • Ephemeral Session:子 Turn 使用纯内存 session(无磁盘 I/O,无需清理)
  • 资源控制:深度限制(默认 3),per-agent 并发信号量(buffered channel,非阻塞获取)
  • 结果投递:同步 → 直接返回 ToolResult;异步 → deliverSubTurnResult 投递到父 Turn 的 pendingResults 队列(在下次 Iteration 前 drain)或写入 session 历史(如果父 Turn 已结束)

与现有代码的关系

当前 重构后
runAgentLoop() 重命名为 runTurn(),使用 turnState + 事件发射重构
runLLMIteration() 合并到 runTurn() 的迭代循环中
工具执行(runLLMIteration 中的 WaitGroup) 提取到 tool_exec.go,使用智能分组
SubagentToolRunToolLoop SubagentToolspawnSubTurn(Sync)runTurn
SpawnToolRunToolLoop(异步) SpawnToolspawnSubTurn(Async) → goroutine 中 runTurn
无可观测性 EventBus,17 种事件类型
无 Hook HookManager,5 种 Hook 接口
无中断 InterruptGraceful / InterruptHard,含 session 回滚
无消息注入 InjectSteering / InjectFollowUp,基于队列的投递

新增文件

文件 职责
pkg/agent/events.go 事件类型、EventKind 枚举、Payload 结构体
pkg/agent/eventbus.go EventBus:Subscribe/Emit/Close
pkg/agent/hooks.go Hook 接口族、HookManager
pkg/agent/turn_state.go turnState、steering/followup 队列、中断标记
pkg/agent/tool_exec.go 带 Hook 的智能工具执行
pkg/agent/subturn.go spawnSubTurn、deliverSubTurnResult

修改文件

文件 范围
pkg/agent/loop.go 大——新增字段,runTurn 重写,公开 API
pkg/tools/base.go 极小——新增 ReadOnlyIndicator 接口
pkg/tools/*.go 小——为已知只读工具添加 IsReadOnly()
pkg/agent/instance.go 小——新增子 Turn 配置字段
pkg/tools/subagent.go 大——重写为调用 spawnSubTurn
pkg/tools/spawn.go 大——重写为调用 spawnSubTurn

并发安全

核心不变量:

  • 锁顺序turnState.mueventBus.mu (RLock)sessionManager.mu——永远不逆序
  • 锁外发射事件:EventBus.Emit 在释放 turnState.mu 后调用,最小化持锁时间
  • providerCancel 双向检查:Loop 设置 providerCancel 后检查 abortedInterruptHard 设置 aborted 后调用 providerCancel——在 ts.mu 保护下互为冗余
  • FollowUp 作为返回值runTurn 返回 follow-up 而非在 defer 中投递——消除清理与新 Turn 启动之间的竞态
  • 审批 goroutine 隔离:审批在独立 goroutine 中运行并带超时——即使 Hook 行为不当也无法阻塞 loop
  • Session 快照修正forceCompression() 后立即更新 ts.sessionSnapshot,防止回滚目标过时

开放问题

  1. EventBus 是否应支持过滤订阅(仅订阅特定 EventKind),还是应由订阅者自行过滤?
  2. 是否需要内置一个「debug hook」将所有事件写入文件,通过配置开关启用?
  3. 如果异步子 Turn 的结果在父 Turn 结束后到达,是否默认通知用户,还是需要用户 opt-in?
  4. ReadOnlyIndicator 是否应扩展到 MCP 工具(远程工具通过 MCP 元数据声明只读)?
The current agent loop (`pkg/agent/loop.go`) is a black box — external code cannot observe internal execution state, hook into the execution process, interrupt ongoing processing, or inject messages mid-turn. This proposal redesigns the agent loop as an event-driven, hookable, interruptible, and appendable system.

What is the problem?

The existing runAgentLooprunLLMIteration pipeline has four architectural limitations:

  1. No observability — The loop emits no events. External consumers (UI, logging, automation) cannot know what phase the loop is in, which tool is executing, or why a turn ended.
  2. No hook points — There is no way for external code to intercept or modify LLM requests, approve/deny tool executions, or alter loop behavior at runtime.
  3. No interrupt mechanism — Once a turn starts, it cannot be stopped. There is no graceful interrupt (skip remaining tools, let LLM summarize) or hard abort (cancel everything immediately).
  4. No message injection — External code cannot inject messages into an ongoing turn (steering) or queue messages for after the turn ends (follow-up).

Additionally, tool execution is fully parallel via WaitGroup with no checkpoints between tools, making it impossible to check for interrupts or new steering messages between tool executions.


Goals & Non-goals

Goals

  1. Event transparency: Every critical phase inside the loop emits events to external consumers
  2. Hook control: External code can intercept and modify behavior at key phases
  3. Interruptible: Support both Graceful Interrupt and Hard Abort
  4. Appendable: Support Steering (affects current Turn) and FollowUp (queued for later)
  5. Smart tool parallelism: Read-only tools run in parallel, mutating tools run sequentially

Non-goals

  • Do not change the MessageBus inbound/outbound model
  • Do not change the Provider interface (LLMProvider.Chat)
  • Do not change the core Tool interface signatures (Execute / ExecuteAsync)
  • Do not change the ContextBuilder.BuildMessages input/output contract
  • Do not change the session storage format (JSON)
  • Do not introduce external dependencies

Boundary definitions

What is inside the Loop vs. outside

graph TD
    subgraph AgentLoop["AgentLoop (outer shell)"]
        Run["Run()<br/>Consume MessageBus, send final reply<br/>This layer unchanged, only adds EventBus init"]

        subgraph ProcessMessage["processMessage() — preprocessing layer"]
            Transcribe["transcribeAudio<br/><i>stays outside</i>"]
            Route["resolveMessageRoute<br/><i>stays outside</i>"]
            Command["handleCommand<br/><i>stays outside</i>"]
        end

        subgraph RunTurn["runTurn() — refactored core (formerly runAgentLoop)"]
            subgraph TurnBoundary["Turn boundary — event + hook coverage"]
                Context["Initial context assembly<br/>ContextBuilder.BuildMessages() remains a pure function<br/>resolveMediaRefs() called here<br/>Output: []providers.Message"]

                subgraph LLMLoop["LLM iteration loop"]
                    DrainSteering["drain steering queue"]
                    BeforeLLM["Hook: BeforeLLMRequest"]
                    CallLLM["call Provider.Chat()"]
                    AfterLLM["Hook: AfterLLMResponse"]
                    ParseTools["parse tool calls"]

                    subgraph ToolExec["Tool execution (smart parallel)"]
                        BeforeTool["Hook: BeforeToolExecute"]
                        Approval["Hook: RequestApproval"]
                        ExecTool["Execute tool"]
                        AfterTool["Hook: AfterToolExecute"]
                    end

                    CheckInterrupt["check steering / interrupt"]
                end

                Compress["Context compression (retry branch)<br/>forceCompression()<br/>Hook: BeforeContextCompress"]
                SessionSave["Session save<br/>Intermediate: AddFullMessage (memory only)<br/>End: AddMessage + Save (disk)<br/>Abort: rollback in-memory state"]
                Summarize["Summary trigger (async after Turn ends)"]
            end
        end

        API1["InjectSteering() ← public API"]
        API2["InjectFollowUp() ← public API"]
        API3["InterruptGraceful() ← public API"]
        API4["InterruptHard() ← public API"]
        API5["SubscribeEvents() ← public API"]
        API6["RegisterHook() ← public API"]
        API7["GetActiveTurn() ← public API"]
    end

    Run --> ProcessMessage
    ProcessMessage --> RunTurn
    Context --> LLMLoop
    DrainSteering --> BeforeLLM --> CallLLM --> AfterLLM --> ParseTools --> ToolExec
    BeforeTool --> Approval --> ExecTool --> AfterTool
    ToolExec --> CheckInterrupt
    CheckInterrupt -->|"continue iteration"| DrainSteering
    LLMLoop --> Compress
    LLMLoop --> SessionSave
    SessionSave --> Summarize
Loading

Core concepts

Turn & Iteration

Term Definition
Turn One complete "user input → LLM iterations → final reply" processing cycle
Iteration One LLM call within a Turn (may produce tool calls, triggering the next Iteration)
Steering A message injected during a Turn that affects the next Iteration
FollowUp A message queued during a Turn, processed as a new inbound after the Turn ends
Graceful Interrupt Skip remaining tools, optionally inject a steering hint, let LLM generate a summary, then end the Turn
Hard Abort Immediately cancel the Provider call and all tool executions; Turn ends with an error
SubTurn A child Turn spawned inside a parent Turn, running the full runTurn path with its own turnID
Ephemeral Session An in-memory-only session used by SubTurns — never persisted to disk

What is the EventBus?

The EventBus is a multi-subscriber broadcast system that the agent loop uses to emit structured events at every critical phase of execution.

You can think of it as: "a way for anyone to watch what the loop is doing, in real time, without affecting its behavior."

Key design points:

  • Non-blocking fan-out: Emit iterates subscribers with non-blocking sends; full channels drop events (never block the loop)
  • Per-EventKind drop counters: [EventKindCount]atomic.Int64 array enables precise diagnostics of which event types are being lost
  • No dedicated goroutine: Emit runs in the loop's own goroutine (fewer goroutines, suitable for $10 boards with <10MB RAM)
  • Buffer capacity 16 per subscriber: Sufficient given LLM call latency dominates iteration timing

Event types cover the full turn lifecycle:

Category Events
Turn lifecycle TurnStart, TurnEnd
LLM calls LLMRequest, LLMDelta, LLMResponse, LLMRetry
Context management ContextCompress, SessionSummarize
Tool execution ToolExecStart, ToolExecEnd, ToolExecSkipped
External interaction SteeringInjected, FollowUpQueued, InterruptReceived
Sub-agent lifecycle SubTurnSpawn, SubTurnEnd, SubTurnResultDelivered
Errors Error

What is the Hook system?

The Hook system allows external code to observe and intercept the loop's execution at well-defined points. Hooks are synchronous, priority-ordered, and timeout-protected.

There are five hook interfaces, each optional — a hook implements only the ones it needs:

Interface Purpose Can modify?
EventObserver Passively receive events No
LLMInterceptor Intercept before/after LLM calls Yes — can modify messages, model, tools; can abort turn
ToolInterceptor Intercept before/after tool execution Yes — can modify args, deny tool, abort turn
ToolApprover Pause for external approval (e.g., UI confirmation) No — approve or deny
ContextCompressInterceptor Intercept before context compression No — can abort turn

Hook actions: Continue, Modify, AbortTurn, HardAbort, DenyTool.

Timeout tiers:

  • Normal hooks (Observer, LLMInterceptor, ToolInterceptor): 5s default, timeout → Continue
  • Approval hooks (ToolApprover): 60s default (configurable), timeout → deny (safety-first)

Approval runs in a separate goroutine with a channel+timeout pattern, so a misbehaving hook cannot block the loop indefinitely.


What is the Interrupt & Steering mechanism?

Interrupts

Two interrupt modes, exposed as public APIs on AgentLoop:

Graceful Interrupt Hard Abort
Behavior Sets interrupted=true, skips remaining tools, injects steering hint, allows one more LLM iteration for summary Sets aborted=true, cancels provider context + turn context immediately
Session Keeps completed tool results, saves normally Rolls back to snapshot (memory only, no disk write)
Sub-turns Does NOT cascade — lets children finish naturally Cascades recursively to all children via cascadeAbort

Steering & Follow-up

Steering Follow-up
Timing Affects the current turn's next iteration Queued; processed as new inbound after current turn ends
Consumption Drained at the top of each iteration loop Returned from runTurn, published by caller after all cleanup
Edge case If LLM returns final content (no tools) but steering is pending: continues iteration if budget remains, degrades to follow-up at MaxIterations Published via bus.PublishInbound after runTurn fully returns — avoids race with defer cleanup

Tool execution strategy

The current WaitGroup-based parallel execution is replaced with smart grouping:

Tool calls: [read_file, list_dir, write_file, web_search, exec]

Groups:
  → {read_file, list_dir}  parallel=true   (both ReadOnly)
  → {write_file}           parallel=false   (mutating)
  → {web_search}           parallel=true    (ReadOnly)
  → {exec}                 parallel=false   (mutating)
  • ReadOnly tools (declared via optional ReadOnlyIndicator interface): consecutive read-only tools run in parallel
  • Mutating tools: run sequentially with steering/interrupt checkpoints between each
  • Undeclared tools default to mutating (conservative — only reduces parallelism, never causes correctness issues)
  • IsReadOnly() is static (no args parameter) — args-dependent read-only is intentionally not supported for simplicity and safety

Multi-agent coordination (Sub-turns)

The current SubagentManager + RunToolLoop architecture has critical flaws:

  1. RunToolLoop is completely outside the EventBus/Hook/interrupt system
  2. Async sub-agent results route to agent:main:main session, losing the original conversation context
  3. No parent-child Turn relationship — no cascading interrupts
  4. Sub-agents cannot be externally interrupted

Design: Sub-turns via runTurn

Sub-agents are refactored to use the same runTurn path as top-level turns:

  • spawnSubTurn(parentTS, config) — single entry point for both sync and async sub-turns
  • Turn tree: parent/child relationship tracked via parentTurnID / childTurnIDs in turnState
  • Ephemeral sessions: sub-turns use in-memory-only sessions (no disk I/O, no cleanup needed)
  • Resource control: depth limit (default 3), per-agent concurrency semaphore (buffered channel, non-blocking acquire)
  • Result delivery: sync → direct ToolResult; async → deliverSubTurnResult to parent's pendingResults queue (drained at next iteration) or session history (if parent finished)

Relationship to existing code

Current After refactor
runAgentLoop() Renamed to runTurn(), restructured with turnState + event emission
runLLMIteration() Merged into runTurn() iteration loop
Tool execution (WaitGroup in runLLMIteration) Extracted to tool_exec.go with smart grouping
SubagentToolRunToolLoop SubagentToolspawnSubTurn(Sync)runTurn
SpawnToolRunToolLoop (async) SpawnToolspawnSubTurn(Async)runTurn in goroutine
No observability EventBus with 17 event types
No hooks HookManager with 5 hook interfaces
No interrupts InterruptGraceful / InterruptHard with session rollback
No steering InjectSteering / InjectFollowUp with queue-based delivery

New files

File Responsibility
pkg/agent/events.go Event types, EventKind enum, payload structs
pkg/agent/eventbus.go EventBus: Subscribe/Emit/Close
pkg/agent/hooks.go Hook interfaces, HookManager
pkg/agent/turn_state.go turnState, steering/followup queues, interrupt flags
pkg/agent/tool_exec.go Smart tool execution with hooks
pkg/agent/subturn.go spawnSubTurn, deliverSubTurnResult

Modified files

File Scope
pkg/agent/loop.go Major — new fields, runTurn rewrite, public APIs
pkg/tools/base.go Minimal — add ReadOnlyIndicator interface
pkg/tools/*.go Small — add IsReadOnly() to known read-only tools
pkg/agent/instance.go Small — add sub-turn config fields
pkg/tools/subagent.go Large — rewrite to call spawnSubTurn
pkg/tools/spawn.go Large — rewrite to call spawnSubTurn

Concurrency safety

Key invariants:

  • Lock ordering: turnState.mueventBus.mu (RLock)sessionManager.mu — never reversed
  • Emit outside locks: EventBus.Emit is called after releasing turnState.mu to minimize lock hold time
  • Double-check on providerCancel: Loop sets providerCancel then checks aborted; InterruptHard sets aborted then calls providerCancel — mutual redundancy under ts.mu
  • FollowUp as return value: runTurn returns follow-ups instead of publishing in defer — eliminates race between cleanup and new turn startup
  • Approval goroutine isolation: Approval runs in separate goroutine with timeout — cannot block loop even if hook misbehaves
  • Session snapshot correction: ts.sessionSnapshot is updated after forceCompression() to prevent stale rollback targets

Open questions

  1. Should EventBus support filtered subscriptions (subscribe to specific EventKinds only), or should filtering be subscriber-side?
  2. Should there be a built-in "debug hook" that logs all events to a file, enabled via config flag?
  3. should async sub-turn results that arrive after parent turn ends trigger a user notification by default, or should this be opt-in?
  4. Should ReadOnlyIndicator be extended to MCP tools (remote tools declaring read-only via MCP metadata)?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions