Skip to content

refactor(agent): context boundary detection, proactive budget check, and safe compression#1490

Merged
afjcjsbx merged 13 commits intosipeed:refactor/agentfrom
is-Xiaoen:refactor/context-boundary
Mar 17, 2026
Merged

refactor(agent): context boundary detection, proactive budget check, and safe compression#1490
afjcjsbx merged 13 commits intosipeed:refactor/agentfrom
is-Xiaoen:refactor/context-boundary

Conversation

@is-Xiaoen
Copy link
Copy Markdown
Contributor

@is-Xiaoen is-Xiaoen commented Mar 13, 2026

Summary

Addresses track 6 of the agent refactor (#1439): context management boundaries, compression triggers, and token budgeting.

This PR fixes four concrete problems in the current context management:

  1. ContextWindow defaults to MaxTokens — conflates input capacity with output generation limit, causing premature summarization or missed compression triggers (fix(agent): decouple context_window from max_tokens #556)
  2. forceCompression can orphan tool pairs — slices at len/2 without checking whether the cut falls inside an assistant+ToolCalls → tool_result sequence (fix(agent): prevent history compression from orphaning tool_call/tool_result pairs #665)
  3. forceCompression assumes history[0] is a system prompt — session history only stores user/assistant/tool messages; the system prompt is built dynamically by BuildMessages. The old code incorrectly skipped the first user message and appended a compression note to it.
  4. Compression is reactive onlyforceCompression runs after the LLM already rejected the request with a 400, wasting a billed call
  5. Token estimation undercountsestimateTokens only counted m.Content, ignoring ToolCalls arguments, ReasoningContent, and Media items

Changes

New file: docs/agent-refactor/context.md

Design document for Track 6, as called for in the agent-refactor README (suggested document split, item 5: "context scope, history, summary, compression"). Documents:

  • Context window region definitions and history budget formula
  • ContextWindow vs MaxTokens distinction
  • Session history contents (no system prompt stored)
  • Turn as the atomic compression unit ([Agent refactor] Event-driven agent loop with hooks, interrupts, and steering #1316)
  • Three compression paths and their ordering
  • Token estimation approach and limitations
  • Interface boundaries between budget functions and BuildMessages
  • Known gaps

New file: pkg/agent/context_budget.go

Uses the Turn concept from the agent refactor design (#1316) as the atomic unit for compression. A Turn is a complete "user input → LLM iterations → final response" cycle.

  • parseTurnBoundaries(history) — identifies each Turn start index in the session history
  • isSafeBoundary(history, index) — checks whether an index is at a Turn boundary
  • findSafeBoundary(history, targetIndex) — finds the nearest Turn boundary to a target index (prefers backward to keep more recent context)
  • estimateMessageTokens(msg) — counts Content + ReasoningContent + ToolCalls (ID, type, name, arguments) + ToolCallID + Media items with 2.5 chars/token heuristic
  • estimateToolDefsTokens(defs) — estimates token cost of tool definitions (name + description + JSON schema)
  • isOverContextBudget(contextWindow, messages, toolDefs, maxTokens) — proactive budget check

All are pure functions with no AgentLoop dependency — forward-compatible with the agent refactor.

pkg/agent/loop.go

  • Proactive budget check (step 1.5 in runAgentLoop): before calling the LLM, estimate total token cost of assembled messages + tool definitions + output reserve. If over budget, run forceCompression and rebuild messages. The reactive path stays as a fallback for estimation undershoots.
  • forceCompression: rewritten to drop the oldest half of Turns (not arbitrary messages). Uses parseTurnBoundaries to find Turn-aligned cut points. Fixed to work with actual session data (no system prompt in history). Compression note goes into session summary, not into history messages.
  • summarizeSession: use findSafeBoundary to align cut to nearest Turn boundary instead of hardcoded history[:len-4]
  • estimateTokens: delegate to estimateMessageTokens — counts ToolCalls, ReasoningContent, Media, not just Content

pkg/agent/instance.go

  • Resolve ContextWindow independently from MaxTokens with a 4x heuristic default. This gives 131K for the default 32K max_tokens — reasonable for modern models. The reactive forceCompression handles any overshoot.

pkg/config/config.go

  • Add context_window field to AgentDefaults (with env var PICOCLAW_AGENTS_DEFAULTS_CONTEXT_WINDOW)

config/config.example.json

  • Add context_window field with default 131072

Web UI (web/frontend/)

  • Add context_window input field to the configuration page (form model, section, save handler)
  • Add i18n strings (en/zh). Optional field — leaving it empty falls back to the 4x heuristic.

Design decisions

  • Turn as the atomic unit — compression operates on complete Turns ([Agent refactor] Event-driven agent loop with hooks, interrupts, and steering #1316), not individual messages. parseTurnBoundaries identifies Turn starts; forceCompression drops "the oldest half of Turns." This naturally prevents splitting tool-call sequences since each Turn is atomic.
  • Pure functions, no new types — follows the refactor's "minimum concepts" rule. No ContextBudget struct, no ContextManager interface.
  • Compression note in summary, not history — session history stores only real conversation messages. The compression note goes into the session summary, which BuildMessages already injects into the system prompt. This avoids corrupting the stored history.
  • 4x heuristic for default context_window — conservative lower bound. A follow-up improvement could auto-detect from the provider, but the reactive path covers any mismatch.
  • No dependency on other refactor tracks — operates on []providers.Message and integer token counts. Independent of agent abstraction ([Agent refactor]what an Agent is #1218), event model ([Agent refactor] Event-driven agent loop with hooks, interrupts, and steering #1316), or capability model.

Issue mapping

Proposal in #1439 Status
A. Separate context_window from max_tokens Done
B. Explicit history budget Partial — proactive pre-call check uses full formula (messages + tool defs + output reserve); maybeSummarize still uses history-only percentage threshold against the now-correct ContextWindow base. Full budget-aware summarization trigger is a follow-up.
C. Proactive pre-call check Done
D. Tool-pair-aware truncation Done — Turn-based, aligned with #1316
E. ToolCalls in token estimation Done (+ ReasoningContent, Media)

Test plan

  • 54 test cases in context_budget_test.go covering all pure functions
  • parseTurnBoundaries tests: simple exchange, tool calls, chained tools, no user messages, leading non-user
  • Realistic session-shaped tests (no system in history, chained tools, reasoning content, media)
  • Single-Turn regression: findSafeBoundary returns 0 when only one Turn exists
  • Context retry integration test with realistic session data (no system message in history)
  • go build ./pkg/... — no compilation errors
  • golangci-lint run — no new lint issues
  • No changes to session storage format or API signatures — backward compatible

Closes #556
Closes #665
Ref #1439

cc @alexhoshina @yinwm

@sipeed-bot sipeed-bot bot added type: bug Something isn't working domain: agent domain: config go Pull requests that update go code labels Mar 13, 2026
@alexhoshina
Copy link
Copy Markdown
Collaborator

hi @is-Xiaoen, I've looked at this PR and have a simple idea.
In #1316, we used the concept of a Turn to define a complete iteration of the agent. Could we use the Turn as a cut-off point? Would using the Turn as a cut-off point be simpler and more intuitive than heuristic searching?

@is-Xiaoen
Copy link
Copy Markdown
Contributor Author

Good call — using the Turn as the atomic unit is cleaner than the raw heuristic scan. I've updated the implementation:

New function: parseTurnBoundaries(history)

Returns the starting index of each Turn in the session history. A Turn begins at a user message and extends through all subsequent assistant/tool messages until the next user message — matching the Turn definition from #1316.

How it's used:

  • findSafeBoundary now uses parseTurnBoundaries internally to locate the nearest Turn boundary, instead of scanning for Role == "user" directly. Same result, but the intent is explicit.

  • forceCompression drops the oldest half of Turns (not arbitrary messages):

turns := parseTurnBoundaries(history)
if len(turns) >= 2 {
    mid = turns[len(turns)/2]  // drop oldest half of Turns
}

This reads as "drop the oldest N Turns" rather than "find nearest user message to midpoint" — simpler and self-documenting.

The Turn-based approach also naturally handles chained tool calls within a single Turn (user → assistantTC → tool → assistantTC → tool → assistant), since the entire chain lives inside one Turn and is never split.

See commit 4eaa2ec for the full change.

@is-Xiaoen
Copy link
Copy Markdown
Contributor Author

@alexhoshina Saw your mention in #1218 about combining this with the new context builder. I've added docs/agent-refactor/context.md covering the context window regions, Turn as the compression unit, how the three compression paths relate, and the interface boundaries between budget functions and the builder — should make alignment easier going forward.

The budget-related functions (parseTurnBoundaries, isOverContextBudget,etc.) are all pure — they take []providers.Message and int parameters with no AgentLoop dependency, so the new builder can call them directly without
interface changes. Remaining gaps (e.g. budget-aware summarization trigger) are documented in context.md and can be followed up once the builder direction stabilizes.

If anything needs adjusting to better fit the architecture you have in mind — how the proactive check integrates, the Turn boundary detection logic, interface signatures — happy to iterate. Will rebase onto refactor/agent once the branch is up.

xuwei-xy pushed a commit to xuwei-xy/picoclaw that referenced this pull request Mar 14, 2026
…and safe compression

Separate context_window from max_tokens — they serve different purposes
(input capacity vs output generation limit). The previous conflation caused
premature summarization or missed compression triggers.

Changes:
- Add context_window field to AgentDefaults config (default: 4x max_tokens)
- Extract boundary-safe truncation helpers (isSafeBoundary, findSafeBoundary)
  into context_budget.go — pure functions with no AgentLoop dependency
- forceCompression: align split to safe boundary so tool-call sequences
  (assistant+ToolCalls → tool results) are never torn apart
- summarizeSession: use findSafeBoundary instead of hardcoded keep-last-4
- estimateTokens: count ToolCalls arguments and ToolCallID metadata,
  not just Content — fixes systematic undercounting in tool-heavy sessions
- Add proactive context budget check before LLM call in runAgentLoop,
  preventing 400 context-length errors instead of reacting to them
- Add estimateToolDefsTokens for tool definition token cost

Closes sipeed#556, closes sipeed#665
Ref sipeed#1439
Session history (GetHistory) contains only user/assistant/tool messages.
The system prompt is built dynamically by BuildMessages and is never
stored in session. The previous code incorrectly treated history[0] as
a system prompt, skipping the first user message and appending a
compression note to it.

Fix: operate on the full history slice, and record the compression
note in the session summary (which BuildMessages already injects into
the system prompt) rather than modifying any history message.
estimateMessageTokens now counts ReasoningContent (extended thinking /
chain-of-thought) which can be substantial and is persisted in session
history. Media items get a fixed per-item overhead (256 tokens) since
actual cost depends on provider-specific image tokenization.
Add context_window to config.example.json, the web configuration page
(form model, input field, save handler), and i18n strings (en/zh).
The field is optional — leaving it empty falls back to the 4x max_tokens
heuristic.
Add tests that reflect actual session data shape: history starts with
user messages (no system prompt), includes chained tool-call sequences,
reasoning content, and media items. Exercises the proactive budget check
path with BuildMessages-style assembled messages.
Fixes prealloc lint warning by using make() with capacity hint.
Introduce parseTurnBoundaries() which identifies each Turn start index
in the session history. A Turn is a complete "user input → LLM iterations
→ final response" cycle (as defined in the agent refactor design sipeed#1316).

findSafeBoundary now uses Turn boundaries instead of raw role-scanning,
making the intent explicit: "find the nearest Turn boundary."

forceCompression drops the oldest half of Turns (not arbitrary messages),
which is simpler and more intuitive. The Turn-based approach naturally
prevents splitting tool-call sequences since each Turn is atomic.
Two estimation bugs fixed:

1. Media tokens were added to the chars accumulator before the chars*2/5
   conversion, resulting in 256*2/5=102 tokens per item instead of 256.
   Fix: add media tokens directly to the final token count, bypassing
   the character-based heuristic.

2. estimateMessageTokens counted both tc.Name and tc.Function.Name for
   tool calls, but providers only send one (OpenAI-compat uses
   function.name, Anthropic uses tc.Name). Fix: count tc.Function.Name
   when Function is present, fall back to tc.Name only otherwise.

Also fix i18n hint text: "auto-detect" was misleading — the backend
uses a 4x max_tokens heuristic, not actual model detection.
When the entire history is a single Turn (one user message followed by
tool calls and responses, no subsequent user message), the only Turn
boundary is at index 0. Previously the fallback returned targetIndex,
which could land on a tool or assistant message — splitting the Turn.

Return 0 instead, so callers (forceCompression, summarizeSession) see
mid <= 0 and skip compression rather than cutting inside the Turn.
Session history only stores user/assistant/tool messages — the system
prompt is built dynamically by BuildMessages. Remove the incorrect
system message from TestAgentLoop_ContextExhaustionRetry test data
to match the real data model that forceCompression operates on.
Document the semantic boundaries of context management as called for
in the agent-refactor README (suggested document split, item 5):

- context window region definitions and history budget formula
- ContextWindow vs MaxTokens distinction
- session history contents (no system prompt stored)
- Turn as the atomic compression unit (sipeed#1316)
- three compression paths and their ordering
- token estimation approach and its limitations
- interface boundaries between budget functions and BuildMessages

Also documents known gaps: summarization trigger not using the full
budget formula, heuristic-only token estimation, and reactive retry
not preserving media references.

Ref sipeed#1439
@is-Xiaoen is-Xiaoen force-pushed the refactor/context-boundary branch from e0aad04 to 08259d7 Compare March 16, 2026 06:49
@is-Xiaoen is-Xiaoen changed the base branch from main to refactor/agent March 16, 2026 06:49
@is-Xiaoen
Copy link
Copy Markdown
Contributor Author

Rebased onto refactor/agent now that the branch is up — sits cleanly on top of steering (#1517), no conflicts. Updated the PR base branch accordingly.

CI should stay green. Ready for review when you get a chance.

}
}

// No Turn boundary after targetIndex either. The only boundary is at
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an LLM or user generates a single massive message (or a massive tool response) that exceeds the context window on its own, the entire history is technically a single "Turn" (index 0). findSafeBoundary will correctly identify that there are no safe boundaries to split the sequence and return 0. By aborting compression entirely, the agent gets permanently stuck in a "Context Window Exceeded" loop because it can never shrink the context.
If a Turn boundary cannot be found, we might think about fall back to a hard split to ensure the system can recover, or return the context to the initial empty state. WDYT??

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right, the agent would get stuck retrying if a single Turn exceeds the window.

Fixed in c63c644: when mid <= 0 (no safe Turn boundary), forceCompression now falls back to keeping only the most recent user message. This breaks Turn atomicity as a last resort, but guarantees recovery instead of looping.

The reactive path needs this especially since it passes "" for UserMessage when rebuilding — without at least the latest user message in history, the LLM would get no user context at all.

@afjcjsbx thanks for the review!

@afjcjsbx
Copy link
Copy Markdown
Collaborator

I reviewed the PR and I like both the ideas and the clear and clean implementation, I left you a note, but for me we can merge, thank you!

When the entire session history is a single Turn (e.g. one user message
followed by a massive tool response), findSafeBoundary returns 0 and
forceCompression previously did nothing — leaving the agent stuck in
a context-exceeded retry loop.

Now falls back to keeping only the most recent user message when no
safe Turn boundary exists. This breaks Turn atomicity as a last resort
but guarantees the agent can recover.

Also updates docs/agent-refactor/context.md to document this behavior.

Ref sipeed#1490
@afjcjsbx afjcjsbx merged commit 5e92a38 into sipeed:refactor/agent Mar 17, 2026
4 checks passed
andressg79 pushed a commit to andressg79/picoclaw that referenced this pull request Mar 30, 2026
When the entire session history is a single Turn (e.g. one user message
followed by a massive tool response), findSafeBoundary returns 0 and
forceCompression previously did nothing — leaving the agent stuck in
a context-exceeded retry loop.

Now falls back to keeping only the most recent user message when no
safe Turn boundary exists. This breaks Turn atomicity as a last resort
but guarantees the agent can recover.

Also updates docs/agent-refactor/context.md to document this behavior.

Ref sipeed#1490
andressg79 pushed a commit to andressg79/picoclaw that referenced this pull request Mar 30, 2026
refactor(agent): context boundary detection, proactive budget check, and safe compression
ra1phdd pushed a commit to ra1phdd/picoclaw-pkg that referenced this pull request Apr 12, 2026
When the entire session history is a single Turn (e.g. one user message
followed by a massive tool response), findSafeBoundary returns 0 and
forceCompression previously did nothing — leaving the agent stuck in
a context-exceeded retry loop.

Now falls back to keeping only the most recent user message when no
safe Turn boundary exists. This breaks Turn atomicity as a last resort
but guarantees the agent can recover.

Also updates docs/agent-refactor/context.md to document this behavior.

Ref sipeed#1490
armmer016 pushed a commit to armmer016/khunquant that referenced this pull request Apr 14, 2026
When the entire session history is a single Turn (e.g. one user message
followed by a massive tool response), findSafeBoundary returns 0 and
forceCompression previously did nothing — leaving the agent stuck in
a context-exceeded retry loop.

Now falls back to keeping only the most recent user message when no
safe Turn boundary exists. This breaks Turn atomicity as a last resort
but guarantees the agent can recover.

Also updates docs/agent-refactor/context.md to document this behavior.

Ref sipeed#1490
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: agent domain: config go Pull requests that update go code type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants