refactor(agent): context boundary detection, proactive budget check, and safe compression#1490
Conversation
|
hi @is-Xiaoen, I've looked at this PR and have a simple idea. |
|
Good call — using the Turn as the atomic unit is cleaner than the raw heuristic scan. I've updated the implementation: New function: Returns the starting index of each Turn in the session history. A Turn begins at a user message and extends through all subsequent assistant/tool messages until the next user message — matching the Turn definition from #1316. How it's used:
turns := parseTurnBoundaries(history)
if len(turns) >= 2 {
mid = turns[len(turns)/2] // drop oldest half of Turns
}This reads as "drop the oldest N Turns" rather than "find nearest user message to midpoint" — simpler and self-documenting. The Turn-based approach also naturally handles chained tool calls within a single Turn (user → assistantTC → tool → assistantTC → tool → assistant), since the entire chain lives inside one Turn and is never split. See commit |
|
@alexhoshina Saw your mention in #1218 about combining this with the new context builder. I've added The budget-related functions ( If anything needs adjusting to better fit the architecture you have in mind — how the proactive check integrates, the Turn boundary detection logic, interface signatures — happy to iterate. Will rebase onto |
…and safe compression Separate context_window from max_tokens — they serve different purposes (input capacity vs output generation limit). The previous conflation caused premature summarization or missed compression triggers. Changes: - Add context_window field to AgentDefaults config (default: 4x max_tokens) - Extract boundary-safe truncation helpers (isSafeBoundary, findSafeBoundary) into context_budget.go — pure functions with no AgentLoop dependency - forceCompression: align split to safe boundary so tool-call sequences (assistant+ToolCalls → tool results) are never torn apart - summarizeSession: use findSafeBoundary instead of hardcoded keep-last-4 - estimateTokens: count ToolCalls arguments and ToolCallID metadata, not just Content — fixes systematic undercounting in tool-heavy sessions - Add proactive context budget check before LLM call in runAgentLoop, preventing 400 context-length errors instead of reacting to them - Add estimateToolDefsTokens for tool definition token cost Closes sipeed#556, closes sipeed#665 Ref sipeed#1439
Session history (GetHistory) contains only user/assistant/tool messages. The system prompt is built dynamically by BuildMessages and is never stored in session. The previous code incorrectly treated history[0] as a system prompt, skipping the first user message and appending a compression note to it. Fix: operate on the full history slice, and record the compression note in the session summary (which BuildMessages already injects into the system prompt) rather than modifying any history message.
estimateMessageTokens now counts ReasoningContent (extended thinking / chain-of-thought) which can be substantial and is persisted in session history. Media items get a fixed per-item overhead (256 tokens) since actual cost depends on provider-specific image tokenization.
Add context_window to config.example.json, the web configuration page (form model, input field, save handler), and i18n strings (en/zh). The field is optional — leaving it empty falls back to the 4x max_tokens heuristic.
Add tests that reflect actual session data shape: history starts with user messages (no system prompt), includes chained tool-call sequences, reasoning content, and media items. Exercises the proactive budget check path with BuildMessages-style assembled messages.
Fixes prealloc lint warning by using make() with capacity hint.
Introduce parseTurnBoundaries() which identifies each Turn start index in the session history. A Turn is a complete "user input → LLM iterations → final response" cycle (as defined in the agent refactor design sipeed#1316). findSafeBoundary now uses Turn boundaries instead of raw role-scanning, making the intent explicit: "find the nearest Turn boundary." forceCompression drops the oldest half of Turns (not arbitrary messages), which is simpler and more intuitive. The Turn-based approach naturally prevents splitting tool-call sequences since each Turn is atomic.
Two estimation bugs fixed: 1. Media tokens were added to the chars accumulator before the chars*2/5 conversion, resulting in 256*2/5=102 tokens per item instead of 256. Fix: add media tokens directly to the final token count, bypassing the character-based heuristic. 2. estimateMessageTokens counted both tc.Name and tc.Function.Name for tool calls, but providers only send one (OpenAI-compat uses function.name, Anthropic uses tc.Name). Fix: count tc.Function.Name when Function is present, fall back to tc.Name only otherwise. Also fix i18n hint text: "auto-detect" was misleading — the backend uses a 4x max_tokens heuristic, not actual model detection.
When the entire history is a single Turn (one user message followed by tool calls and responses, no subsequent user message), the only Turn boundary is at index 0. Previously the fallback returned targetIndex, which could land on a tool or assistant message — splitting the Turn. Return 0 instead, so callers (forceCompression, summarizeSession) see mid <= 0 and skip compression rather than cutting inside the Turn.
Session history only stores user/assistant/tool messages — the system prompt is built dynamically by BuildMessages. Remove the incorrect system message from TestAgentLoop_ContextExhaustionRetry test data to match the real data model that forceCompression operates on.
Document the semantic boundaries of context management as called for in the agent-refactor README (suggested document split, item 5): - context window region definitions and history budget formula - ContextWindow vs MaxTokens distinction - session history contents (no system prompt stored) - Turn as the atomic compression unit (sipeed#1316) - three compression paths and their ordering - token estimation approach and its limitations - interface boundaries between budget functions and BuildMessages Also documents known gaps: summarization trigger not using the full budget formula, heuristic-only token estimation, and reactive retry not preserving media references. Ref sipeed#1439
e0aad04 to
08259d7
Compare
|
Rebased onto CI should stay green. Ready for review when you get a chance. |
| } | ||
| } | ||
|
|
||
| // No Turn boundary after targetIndex either. The only boundary is at |
There was a problem hiding this comment.
If an LLM or user generates a single massive message (or a massive tool response) that exceeds the context window on its own, the entire history is technically a single "Turn" (index 0). findSafeBoundary will correctly identify that there are no safe boundaries to split the sequence and return 0. By aborting compression entirely, the agent gets permanently stuck in a "Context Window Exceeded" loop because it can never shrink the context.
If a Turn boundary cannot be found, we might think about fall back to a hard split to ensure the system can recover, or return the context to the initial empty state. WDYT??
There was a problem hiding this comment.
Good catch — you're right, the agent would get stuck retrying if a single Turn exceeds the window.
Fixed in c63c644: when mid <= 0 (no safe Turn boundary), forceCompression now falls back to keeping only the most recent user message. This breaks Turn atomicity as a last resort, but guarantees recovery instead of looping.
The reactive path needs this especially since it passes "" for UserMessage when rebuilding — without at least the latest user message in history, the LLM would get no user context at all.
@afjcjsbx thanks for the review!
|
I reviewed the PR and I like both the ideas and the clear and clean implementation, I left you a note, but for me we can merge, thank you! |
When the entire session history is a single Turn (e.g. one user message followed by a massive tool response), findSafeBoundary returns 0 and forceCompression previously did nothing — leaving the agent stuck in a context-exceeded retry loop. Now falls back to keeping only the most recent user message when no safe Turn boundary exists. This breaks Turn atomicity as a last resort but guarantees the agent can recover. Also updates docs/agent-refactor/context.md to document this behavior. Ref sipeed#1490
When the entire session history is a single Turn (e.g. one user message followed by a massive tool response), findSafeBoundary returns 0 and forceCompression previously did nothing — leaving the agent stuck in a context-exceeded retry loop. Now falls back to keeping only the most recent user message when no safe Turn boundary exists. This breaks Turn atomicity as a last resort but guarantees the agent can recover. Also updates docs/agent-refactor/context.md to document this behavior. Ref sipeed#1490
refactor(agent): context boundary detection, proactive budget check, and safe compression
When the entire session history is a single Turn (e.g. one user message followed by a massive tool response), findSafeBoundary returns 0 and forceCompression previously did nothing — leaving the agent stuck in a context-exceeded retry loop. Now falls back to keeping only the most recent user message when no safe Turn boundary exists. This breaks Turn atomicity as a last resort but guarantees the agent can recover. Also updates docs/agent-refactor/context.md to document this behavior. Ref sipeed#1490
When the entire session history is a single Turn (e.g. one user message followed by a massive tool response), findSafeBoundary returns 0 and forceCompression previously did nothing — leaving the agent stuck in a context-exceeded retry loop. Now falls back to keeping only the most recent user message when no safe Turn boundary exists. This breaks Turn atomicity as a last resort but guarantees the agent can recover. Also updates docs/agent-refactor/context.md to document this behavior. Ref sipeed#1490
Summary
Addresses track 6 of the agent refactor (#1439): context management boundaries, compression triggers, and token budgeting.
This PR fixes four concrete problems in the current context management:
ContextWindowdefaults toMaxTokens— conflates input capacity with output generation limit, causing premature summarization or missed compression triggers (fix(agent): decouple context_window from max_tokens #556)forceCompressioncan orphan tool pairs — slices atlen/2without checking whether the cut falls inside anassistant+ToolCalls → tool_resultsequence (fix(agent): prevent history compression from orphaning tool_call/tool_result pairs #665)forceCompressionassumeshistory[0]is a system prompt — session history only stores user/assistant/tool messages; the system prompt is built dynamically byBuildMessages. The old code incorrectly skipped the first user message and appended a compression note to it.forceCompressionruns after the LLM already rejected the request with a 400, wasting a billed callestimateTokensonly countedm.Content, ignoringToolCallsarguments,ReasoningContent, andMediaitemsChanges
New file:
docs/agent-refactor/context.mdDesign document for Track 6, as called for in the agent-refactor README (suggested document split, item 5: "context scope, history, summary, compression"). Documents:
New file:
pkg/agent/context_budget.goUses the Turn concept from the agent refactor design (#1316) as the atomic unit for compression. A Turn is a complete "user input → LLM iterations → final response" cycle.
parseTurnBoundaries(history)— identifies each Turn start index in the session historyisSafeBoundary(history, index)— checks whether an index is at a Turn boundaryfindSafeBoundary(history, targetIndex)— finds the nearest Turn boundary to a target index (prefers backward to keep more recent context)estimateMessageTokens(msg)— counts Content + ReasoningContent + ToolCalls (ID, type, name, arguments) + ToolCallID + Media items with 2.5 chars/token heuristicestimateToolDefsTokens(defs)— estimates token cost of tool definitions (name + description + JSON schema)isOverContextBudget(contextWindow, messages, toolDefs, maxTokens)— proactive budget checkAll are pure functions with no
AgentLoopdependency — forward-compatible with the agent refactor.pkg/agent/loop.gorunAgentLoop): before calling the LLM, estimate total token cost of assembled messages + tool definitions + output reserve. If over budget, runforceCompressionand rebuild messages. The reactive path stays as a fallback for estimation undershoots.forceCompression: rewritten to drop the oldest half of Turns (not arbitrary messages). UsesparseTurnBoundariesto find Turn-aligned cut points. Fixed to work with actual session data (no system prompt in history). Compression note goes into session summary, not into history messages.summarizeSession: usefindSafeBoundaryto align cut to nearest Turn boundary instead of hardcodedhistory[:len-4]estimateTokens: delegate toestimateMessageTokens— counts ToolCalls, ReasoningContent, Media, not just Contentpkg/agent/instance.goContextWindowindependently fromMaxTokenswith a 4x heuristic default. This gives 131K for the default 32K max_tokens — reasonable for modern models. The reactiveforceCompressionhandles any overshoot.pkg/config/config.gocontext_windowfield toAgentDefaults(with env varPICOCLAW_AGENTS_DEFAULTS_CONTEXT_WINDOW)config/config.example.jsoncontext_windowfield with default 131072Web UI (
web/frontend/)Design decisions
parseTurnBoundariesidentifies Turn starts;forceCompressiondrops "the oldest half of Turns." This naturally prevents splitting tool-call sequences since each Turn is atomic.ContextBudgetstruct, noContextManagerinterface.BuildMessagesalready injects into the system prompt. This avoids corrupting the stored history.[]providers.Messageand integer token counts. Independent of agent abstraction ([Agent refactor]what an Agent is #1218), event model ([Agent refactor] Event-driven agent loop with hooks, interrupts, and steering #1316), or capability model.Issue mapping
maybeSummarizestill uses history-only percentage threshold against the now-correctContextWindowbase. Full budget-aware summarization trigger is a follow-up.Test plan
context_budget_test.gocovering all pure functionsparseTurnBoundariestests: simple exchange, tool calls, chained tools, no user messages, leading non-userfindSafeBoundaryreturns 0 when only one Turn existsgo build ./pkg/...— no compilation errorsgolangci-lint run— no new lint issuesCloses #556
Closes #665
Ref #1439
cc @alexhoshina @yinwm