Skip to content

feat(agent): mid-turn message injection for responsive follow-ups#2985

Merged
chengyongru merged 2 commits intonightlyfrom
feat/mid-turn-injection
Apr 10, 2026
Merged

feat(agent): mid-turn message injection for responsive follow-ups#2985
chengyongru merged 2 commits intonightlyfrom
feat/mid-turn-injection

Conversation

@chengyongru
Copy link
Copy Markdown
Collaborator

@chengyongru chengyongru commented Apr 9, 2026

Summary

  • Allow user messages sent during an active agent turn to be injected into the running LLM context instead of waiting behind the per-session lock
  • between iterations, queued messages are drained as attachments and sent to the model in the current turn

Motivation

Currently, nanobot uses a per-session asyncio.Lock that serializes message processing. When a task takes a long time (e.g. web_search, long exec), new messages from the user must wait until the entire turn completes. This makes the agent feel unresponsive. See #1609 for the full discussion.

Previous attempt (#1233, closed) tried to interrupt tool execution mid-way, which introduced significant complexity (~500 lines) with cancellation edge cases. This PR takes a simpler approach: don't cancel tools, just inject messages between iterations.

How It Works

User sends "hello" → agent starts processing (holds session lock)
User sends "what time is it?" → routed to pending queue (not a new task)
  ↓
Agent iteration: LLM call → tool execution → [DRAIN] → next LLM call
                                              ↑ injected as user message
  ↓
LLM sees both messages naturally and responds to both

Two drain checkpoints in the agent loop:

  1. After tool execution (before next LLM call) — tools run to completion, then new messages are appended
  2. After final response ("last-mile") — if the user sent a follow-up while the LLM was generating its final answer, continue the loop instead of breaking

Key Design Decisions

Decision Rationale
Inject as natural user messages No special system prompt or [Follow-up] markers needed — LLM handles multi-turn naturally
No tool cancellation Avoids the complexity that caused #1233 to be closed (partial writes, inconsistent state)
_MAX_INJECTIONS_PER_TURN = 3 Prevents context window pressure from rapid message accumulation
_MAX_INJECTION_CYCLES = 5 Prevents injection loops from consuming the iteration budget
had_injections bypasses _sent_in_turn When follow-ups are injected, the final response is new content the user hasn't seen — always deliver it
Pending queue lifecycle via try/finally Prevents memory leaks; queue is registered before lock acquisition and cleaned up after

Edge Cases Handled

  • Last-mile: Messages arriving after the LLM's final response but before turn end are caught by Checkpoint 2
  • _sent_in_turn conflict: Follow-up responses bypass the MessageTool suppression check
  • Queue cleanup: finally block in _dispatch() ensures no dangling queues
  • Bounded accumulation: Both per-drain (3 messages) and per-turn (5 cycles) limits
  • Graceful degradation: If injection_callback throws, the error is logged and injection is skipped

Changes

  • nanobot/agent/runner.py: injection_callback on AgentRunSpec, _drain_injections() helper, two drain checkpoints in run(), had_injections on AgentRunResult
  • nanobot/agent/loop.py: _pending_queues dict, message routing in run(), queue lifecycle in _dispatch(), _drain_pending callback, _sent_in_turn bypass
  • Test files: updated _run_agent_loop return value unpacking (3-tuple → 5-tuple)

Related

Allow user messages sent during an active agent turn to be injected
into the running LLM context instead of being queued behind a
per-session lock. Inspired by Claude Code's mid-turn queue drain
mechanism (query.ts:1547-1643).

Key design decisions:
- Messages are injected as natural user messages between iterations,
  no tool cancellation or special system prompt needed
- Two drain checkpoints: after tool execution and after final LLM
  response ("last-mile" to prevent dropping late arrivals)
- Bounded by MAX_INJECTION_CYCLES (5) to prevent consuming the
  iteration budget on rapid follow-ups
- had_injections flag bypasses _sent_in_turn suppression so follow-up
  responses are always delivered

Closes #1609
@chengyongru chengyongru marked this pull request as ready for review April 10, 2026 15:26
…ue, and message safety

- Fix streaming protocol violation: Checkpoint 2 now checks for injections
  BEFORE calling on_stream_end, passing resuming=True when injections found
  so streaming channels (Feishu) don't prematurely finalize the card
- Bound pending queue to maxsize=20 with QueueFull handling
- Add warning log when injection batch exceeds _MAX_INJECTIONS_PER_TURN
- Re-publish leftover queue messages to bus in _dispatch finally block to
  prevent silent message loss on early exit (max_iterations, tool_error, cancel)
- Fix PEP 8 blank line before dataclass and logger.info indentation
- Add 12 new tests covering drain, checkpoints, cycle cap, queue routing,
  cleanup, and leftover re-publish
@chengyongru chengyongru merged commit bc4cc49 into nightly Apr 10, 2026
3 checks passed
jhkim43 added a commit to jhkim43/nanobot that referenced this pull request Apr 11, 2026
…UDS#2985)

* feat(agent): add mid-turn message injection for responsive follow-ups

Allow user messages sent during an active agent turn to be injected
into the running LLM context instead of being queued behind a
per-session lock. Inspired by Claude Code's mid-turn queue drain
mechanism (query.ts:1547-1643).

Key design decisions:
- Messages are injected as natural user messages between iterations,
  no tool cancellation or special system prompt needed
- Two drain checkpoints: after tool execution and after final LLM
  response ("last-mile" to prevent dropping late arrivals)
- Bounded by MAX_INJECTION_CYCLES (5) to prevent consuming the
  iteration budget on rapid follow-ups
- had_injections flag bypasses _sent_in_turn suppression so follow-up
  responses are always delivered

Closes HKUDS#1609

* fix(agent): harden mid-turn injection with streaming fix, bounded queue, and message safety

- Fix streaming protocol violation: Checkpoint 2 now checks for injections
  BEFORE calling on_stream_end, passing resuming=True when injections found
  so streaming channels (Feishu) don't prematurely finalize the card
- Bound pending queue to maxsize=20 with QueueFull handling
- Add warning log when injection batch exceeds _MAX_INJECTIONS_PER_TURN
- Re-publish leftover queue messages to bus in _dispatch finally block to
  prevent silent message loss on early exit (max_iterations, tool_error, cancel)
- Fix PEP 8 blank line before dataclass and logger.info indentation
- Add 12 new tests covering drain, checkpoints, cycle cap, queue routing,
  cleanup, and leftover re-publish
chengyongru added a commit that referenced this pull request Apr 11, 2026
)

* feat(agent): add mid-turn message injection for responsive follow-ups

Allow user messages sent during an active agent turn to be injected
into the running LLM context instead of being queued behind a
per-session lock. Inspired by Claude Code's mid-turn queue drain
mechanism (query.ts:1547-1643).

Key design decisions:
- Messages are injected as natural user messages between iterations,
  no tool cancellation or special system prompt needed
- Two drain checkpoints: after tool execution and after final LLM
  response ("last-mile" to prevent dropping late arrivals)
- Bounded by MAX_INJECTION_CYCLES (5) to prevent consuming the
  iteration budget on rapid follow-ups
- had_injections flag bypasses _sent_in_turn suppression so follow-up
  responses are always delivered

Closes #1609

* fix(agent): harden mid-turn injection with streaming fix, bounded queue, and message safety

- Fix streaming protocol violation: Checkpoint 2 now checks for injections
  BEFORE calling on_stream_end, passing resuming=True when injections found
  so streaming channels (Feishu) don't prematurely finalize the card
- Bound pending queue to maxsize=20 with QueueFull handling
- Add warning log when injection batch exceeds _MAX_INJECTIONS_PER_TURN
- Re-publish leftover queue messages to bus in _dispatch finally block to
  prevent silent message loss on early exit (max_iterations, tool_error, cancel)
- Fix PEP 8 blank line before dataclass and logger.info indentation
- Add 12 new tests covering drain, checkpoints, cycle cap, queue routing,
  cleanup, and leftover re-publish
Re-bin pushed a commit that referenced this pull request Apr 11, 2026
)

* feat(agent): add mid-turn message injection for responsive follow-ups

Allow user messages sent during an active agent turn to be injected
into the running LLM context instead of being queued behind a
per-session lock. Inspired by Claude Code's mid-turn queue drain
mechanism (query.ts:1547-1643).

Key design decisions:
- Messages are injected as natural user messages between iterations,
  no tool cancellation or special system prompt needed
- Two drain checkpoints: after tool execution and after final LLM
  response ("last-mile" to prevent dropping late arrivals)
- Bounded by MAX_INJECTION_CYCLES (5) to prevent consuming the
  iteration budget on rapid follow-ups
- had_injections flag bypasses _sent_in_turn suppression so follow-up
  responses are always delivered

Closes #1609

* fix(agent): harden mid-turn injection with streaming fix, bounded queue, and message safety

- Fix streaming protocol violation: Checkpoint 2 now checks for injections
  BEFORE calling on_stream_end, passing resuming=True when injections found
  so streaming channels (Feishu) don't prematurely finalize the card
- Bound pending queue to maxsize=20 with QueueFull handling
- Add warning log when injection batch exceeds _MAX_INJECTIONS_PER_TURN
- Re-publish leftover queue messages to bus in _dispatch finally block to
  prevent silent message loss on early exit (max_iterations, tool_error, cancel)
- Fix PEP 8 blank line before dataclass and logger.info indentation
- Add 12 new tests covering drain, checkpoints, cycle cap, queue routing,
  cleanup, and leftover re-publish
xzq-xu added a commit to xzq-xu/nanobot that referenced this pull request Apr 13, 2026
Upstream's pending_queue injection (PR HKUDS#2985) fully replaces the
SteeringHook mechanism. Gateway now passes pending_queue directly
to _process_message, so the per-call extra_hooks parameter and
the steering.py / messages.py files are no longer needed.

- Delete nanobot/agent/steering.py (InterruptionChecker + SteeringHook)
- Delete nanobot/agent/messages.py (AgentMessage dual-layer model)
- Remove extra_hooks parameter from _process_message and _run_agent_loop
- Restore original hook merging in _run_agent_loop

Made-with: Cursor
@chengyongru chengyongru deleted the feat/mid-turn-injection branch April 14, 2026 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant