-
Notifications
You must be signed in to change notification settings - Fork 4.2k
refactor(agent): context boundary detection, proactive budget check, and safe compression #1490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
afjcjsbx
merged 13 commits into
sipeed:refactor/agent
from
is-Xiaoen:refactor/context-boundary
Mar 17, 2026
Merged
Changes from 12 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
9c82b0b
refactor(agent): context boundary detection, proactive budget check, …
is-Xiaoen 9c65d78
fix(agent): forceCompression must not assume history[0] is system prompt
is-Xiaoen d5fdd5e
fix(agent): include ReasoningContent and Media in token estimation
is-Xiaoen e35906b
feat(config): expose context_window in example config and web UI
is-Xiaoen b7f1c2b
test(agent): add realistic session-shaped tests for context budget
is-Xiaoen efd4032
fix(agent): preallocate messages slice in budget test
is-Xiaoen 639739c
refactor(agent): use Turn as the atomic unit for compression cut-off
is-Xiaoen 8034ee7
fix(agent): correct media token arithmetic and tool call double-counting
is-Xiaoen edbdc3b
fix(agent): findSafeBoundary returns 0 for single-Turn history
is-Xiaoen 7c1a1c2
style(agent): fix gci comment alignment in test
is-Xiaoen b768dab
test(agent): use realistic session data in context retry test
is-Xiaoen 08259d7
docs(agent-refactor): add context.md for Track 6 boundary clarification
is-Xiaoen c63c644
fix(agent): forceCompression recovers from single oversized Turn
is-Xiaoen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,162 @@ | ||
| # Context | ||
|
|
||
| ## What this document covers | ||
|
|
||
| This document makes explicit the boundaries of context management in the agent loop: | ||
|
|
||
| - what fills the context window and how space is divided | ||
| - what is stored in session history vs. built at request time | ||
| - when and how context compression happens | ||
| - how token budgets are estimated | ||
|
|
||
| These are existing concepts. This document clarifies their boundaries rather than introducing new ones. | ||
|
|
||
| --- | ||
|
|
||
| ## Context window regions | ||
|
|
||
| The context window is the model's total input capacity. Four regions fill it: | ||
|
|
||
| | Region | Assembled by | Stored in session? | | ||
| |---|---|---| | ||
| | System prompt | `BuildMessages()` — static + dynamic parts | No | | ||
| | Summary | `SetSummary()` stores it; `BuildMessages()` injects it | Separate from history | | ||
| | Session history | User / assistant / tool messages | Yes | | ||
| | Tool definitions | Provider adapter injects at call time | No | | ||
|
|
||
| `MaxTokens` (the output generation limit) must also be reserved from the total budget. | ||
|
|
||
| The available space for history is therefore: | ||
|
|
||
| ``` | ||
| history_budget = ContextWindow - system_prompt - summary - tool_definitions - MaxTokens | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## ContextWindow vs MaxTokens | ||
|
|
||
| These serve different purposes: | ||
|
|
||
| - **MaxTokens** — maximum tokens the LLM may generate in one response. Sent as the `max_tokens` request parameter. | ||
| - **ContextWindow** — the model's total input context capacity. | ||
|
|
||
| These were previously set to the same value, which caused the summarization threshold to fire either far too early (at the default 32K) or not at all (when a user raised `max_tokens`). | ||
|
|
||
| Current default when not explicitly configured: `ContextWindow = MaxTokens * 4`. | ||
|
|
||
| --- | ||
|
|
||
| ## Session history | ||
|
|
||
| Session history stores only conversation messages: | ||
|
|
||
| - `user` — user input | ||
| - `assistant` — LLM response (may include `ToolCalls`) | ||
| - `tool` — tool execution results | ||
|
|
||
| Session history does **not** contain: | ||
|
|
||
| - System prompts — assembled at request time by `BuildMessages` | ||
| - Summary content — stored separately via `SetSummary`, injected by `BuildMessages` | ||
|
|
||
| This distinction matters: any code that operates on session history — compression, boundary detection, token estimation — must not assume a system message is present. | ||
|
|
||
| --- | ||
|
|
||
| ## Turn | ||
|
|
||
| A **Turn** is one complete cycle: | ||
|
|
||
| > user message -> LLM iterations (possibly including tool calls) -> final assistant response | ||
|
|
||
| This definition comes from the agent loop design (#1316). In session history, Turn boundaries are identified by `user`-role messages. | ||
|
|
||
| Turn is the atomic unit for compression. Cutting inside a Turn can orphan tool-call sequences — an assistant message with `ToolCalls` separated from its corresponding `tool` results. Compressing at Turn boundaries avoids this by construction. | ||
|
|
||
| `parseTurnBoundaries(history)` returns the starting index of each Turn. | ||
| `findSafeBoundary(history, targetIndex)` snaps a target cut point to the nearest Turn boundary. | ||
|
|
||
| --- | ||
|
|
||
| ## Compression paths | ||
|
|
||
| Three compression paths exist, in order of preference: | ||
|
|
||
| ### 1. Async summarization | ||
|
|
||
| `maybeSummarize` runs after each Turn completes. | ||
|
|
||
| Triggers when message count exceeds a threshold, or when estimated history tokens exceed a percentage of `ContextWindow`. If triggered, a background goroutine calls the LLM to produce a summary of the oldest messages. The summary is stored via `SetSummary`; `BuildMessages` injects it into the system prompt on the next call. | ||
|
|
||
| Cut point uses `findSafeBoundary` so no Turn is split. | ||
|
|
||
| ### 2. Proactive budget check | ||
|
|
||
| `isOverContextBudget` runs before each LLM call. | ||
|
|
||
| Uses the full budget formula: `message_tokens + tool_def_tokens + MaxTokens > ContextWindow`. If over budget, triggers `forceCompression` and rebuilds messages before calling the LLM. | ||
|
|
||
| This prevents wasted (and billed) LLM calls that would otherwise fail with a context-window error. | ||
|
|
||
| ### 3. Emergency compression (reactive) | ||
|
|
||
| `forceCompression` runs when the LLM returns a context-window error despite the proactive check. | ||
|
|
||
| Drops the oldest ~50% of Turns. Stores a compression note in the session summary (not in history messages) so `BuildMessages` can include it in the next system prompt. | ||
|
|
||
| This is the fallback for when the token estimate undershoots reality. | ||
|
|
||
| --- | ||
|
|
||
| ## Token estimation | ||
|
|
||
| Estimation uses a heuristic of ~2.5 characters per token (`chars * 2 / 5`). | ||
|
|
||
| `estimateMessageTokens` counts: | ||
|
|
||
| - `Content` (rune count, for multibyte correctness) | ||
| - `ReasoningContent` (extended thinking / chain-of-thought) | ||
| - `ToolCalls` — ID, type, function name, arguments | ||
| - `ToolCallID` (tool result metadata) | ||
| - Per-message overhead (role label, JSON structure) | ||
| - `Media` items — flat per-item token estimate, added directly to the final count (not through the character heuristic, since actual cost depends on resolution and provider-specific image tokenization) | ||
|
|
||
| `estimateToolDefsTokens` counts tool definition overhead: name, description, JSON schema of parameters. | ||
|
|
||
| These are deliberately heuristic. The proactive check handles the common case; the reactive path catches estimation errors. | ||
|
|
||
| --- | ||
|
|
||
| ## Interface boundaries | ||
|
|
||
| Context budget functions (`parseTurnBoundaries`, `findSafeBoundary`, `estimateMessageTokens`, `isOverContextBudget`) are **pure functions**. They take `[]providers.Message` and integer parameters. They have no dependency on `AgentLoop` or any other runtime struct. | ||
|
|
||
| `BuildMessages` is the sole assembler of the final message array sent to the LLM. Budget functions inform compression decisions but do not construct messages. | ||
|
|
||
| `forceCompression` and `summarizeSession` mutate session state (history and summary). `BuildMessages` reads that state to construct context. The flow is: | ||
|
|
||
| ``` | ||
| budget check --> compression decision --> mutate session --> BuildMessages reads session --> LLM call | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Known gaps | ||
|
|
||
| These are recognized limitations in the current implementation, documented here for visibility: | ||
|
|
||
| - **Summarization trigger does not use the full budget formula.** `maybeSummarize` compares estimated history tokens against a percentage of `ContextWindow`. It does not account for system prompt size, tool definition overhead, or `MaxTokens` reserve. The proactive check covers the critical path (preventing 400 errors), but the summarization trigger could be aligned with the same budget model for more accurate early compression. | ||
|
|
||
| - **Token estimation is heuristic.** It does not account for provider-specific tokenization, exact system prompt size (assembled separately), or variable image token costs. The two-path design (proactive + reactive) is intended to tolerate this imprecision. | ||
|
|
||
| - **Reactive retry does not preserve media.** When the reactive path rebuilds context after compression, it currently passes empty values for media references. This is a pre-existing issue in the main loop, not introduced by the budget system. | ||
|
|
||
| --- | ||
|
|
||
| ## What this document does not cover | ||
|
|
||
| - How `AGENT.md` frontmatter configures context parameters — that is part of the Agent definition work | ||
| - How the context builder assembles context in the new architecture — that is upcoming work | ||
| - How compression events surface through the event system — that is part of the event model (#1316) | ||
| - Subagent context isolation — that is a separate track |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,176 @@ | ||
| // PicoClaw - Ultra-lightweight personal AI agent | ||
| // License: MIT | ||
| // | ||
| // Copyright (c) 2026 PicoClaw contributors | ||
|
|
||
| package agent | ||
|
|
||
| import ( | ||
| "encoding/json" | ||
| "unicode/utf8" | ||
|
|
||
| "github.com/sipeed/picoclaw/pkg/providers" | ||
| ) | ||
|
|
||
| // parseTurnBoundaries returns the starting index of each Turn in the history. | ||
| // A Turn is a complete "user input → LLM iterations → final response" cycle | ||
| // (as defined in #1316). Each Turn begins at a user message and extends | ||
| // through all subsequent assistant/tool messages until the next user message. | ||
| // | ||
| // Cutting at a Turn boundary guarantees that no tool-call sequence | ||
| // (assistant+ToolCalls → tool results) is split across the cut. | ||
| func parseTurnBoundaries(history []providers.Message) []int { | ||
| var starts []int | ||
| for i, msg := range history { | ||
| if msg.Role == "user" { | ||
| starts = append(starts, i) | ||
| } | ||
| } | ||
| return starts | ||
| } | ||
|
|
||
| // isSafeBoundary reports whether index is a valid Turn boundary — i.e., | ||
| // a position where the kept portion (history[index:]) begins at a user | ||
| // message, so no tool-call sequence is torn apart. | ||
| func isSafeBoundary(history []providers.Message, index int) bool { | ||
| if index <= 0 || index >= len(history) { | ||
| return true | ||
| } | ||
| return history[index].Role == "user" | ||
| } | ||
|
|
||
| // findSafeBoundary locates the nearest Turn boundary to targetIndex. | ||
| // It prefers the boundary at or before targetIndex (preserving more recent | ||
| // context). Falls back to the nearest boundary after targetIndex, and | ||
| // returns targetIndex unchanged only when no Turn boundary exists at all. | ||
| func findSafeBoundary(history []providers.Message, targetIndex int) int { | ||
| if len(history) == 0 { | ||
| return 0 | ||
| } | ||
| if targetIndex <= 0 { | ||
| return 0 | ||
| } | ||
| if targetIndex >= len(history) { | ||
| return len(history) | ||
| } | ||
|
|
||
| turns := parseTurnBoundaries(history) | ||
| if len(turns) == 0 { | ||
| return targetIndex | ||
| } | ||
|
|
||
| // Find the last Turn boundary at or before targetIndex. | ||
| // Prefer backward: keeps more recent messages. | ||
| backward := -1 | ||
| for _, t := range turns { | ||
| if t <= targetIndex { | ||
| backward = t | ||
| } | ||
| } | ||
| if backward > 0 { | ||
| return backward | ||
| } | ||
|
|
||
| // No valid Turn boundary before target (or only at index 0 which | ||
| // would keep everything). Use the first Turn after targetIndex. | ||
| for _, t := range turns { | ||
| if t > targetIndex { | ||
| return t | ||
| } | ||
| } | ||
|
|
||
| // No Turn boundary after targetIndex either. The only boundary is at | ||
| // index 0, meaning the entire history is a single Turn. Return 0 to | ||
| // signal that safe compression is not possible — callers check for | ||
| // mid <= 0 and skip compression in that case. | ||
| return 0 | ||
| } | ||
|
|
||
| // estimateMessageTokens estimates the token count for a single message, | ||
| // including Content, ReasoningContent, ToolCalls arguments, ToolCallID | ||
| // metadata, and Media items. Uses a heuristic of 2.5 characters per token. | ||
| func estimateMessageTokens(msg providers.Message) int { | ||
| chars := utf8.RuneCountInString(msg.Content) | ||
|
|
||
| // ReasoningContent (extended thinking / chain-of-thought) can be | ||
| // substantial and is stored in session history via AddFullMessage. | ||
| if msg.ReasoningContent != "" { | ||
| chars += utf8.RuneCountInString(msg.ReasoningContent) | ||
| } | ||
|
|
||
| for _, tc := range msg.ToolCalls { | ||
| chars += len(tc.ID) + len(tc.Type) | ||
| if tc.Function != nil { | ||
| // Count function name + arguments (the wire format for most providers). | ||
| // tc.Name mirrors tc.Function.Name — count only once to avoid double-counting. | ||
| chars += len(tc.Function.Name) + len(tc.Function.Arguments) | ||
| } else { | ||
| // Fallback: some provider formats use top-level Name without Function. | ||
| chars += len(tc.Name) | ||
| } | ||
| } | ||
|
|
||
| if msg.ToolCallID != "" { | ||
| chars += len(msg.ToolCallID) | ||
| } | ||
|
|
||
| // Per-message overhead for role label, JSON structure, separators. | ||
| const messageOverhead = 12 | ||
| chars += messageOverhead | ||
|
|
||
| tokens := chars * 2 / 5 | ||
|
|
||
| // Media items (images, files) are serialized by provider adapters into | ||
| // multipart or image_url payloads. Add a fixed per-item token estimate | ||
| // directly (not through the chars heuristic) since actual cost depends | ||
| // on resolution and provider-specific image tokenization. | ||
| const mediaTokensPerItem = 256 | ||
| tokens += len(msg.Media) * mediaTokensPerItem | ||
|
|
||
| return tokens | ||
| } | ||
|
|
||
| // estimateToolDefsTokens estimates the total token cost of tool definitions | ||
| // as they appear in the LLM request. Each tool's name, description, and | ||
| // JSON schema parameters contribute to the context window budget. | ||
| func estimateToolDefsTokens(defs []providers.ToolDefinition) int { | ||
| if len(defs) == 0 { | ||
| return 0 | ||
| } | ||
|
|
||
| totalChars := 0 | ||
| for _, d := range defs { | ||
| totalChars += len(d.Function.Name) + len(d.Function.Description) | ||
|
|
||
| if d.Function.Parameters != nil { | ||
| if paramJSON, err := json.Marshal(d.Function.Parameters); err == nil { | ||
| totalChars += len(paramJSON) | ||
| } | ||
| } | ||
|
|
||
| // Per-tool overhead: type field, JSON structure, separators. | ||
| totalChars += 20 | ||
| } | ||
|
|
||
| return totalChars * 2 / 5 | ||
| } | ||
|
|
||
| // isOverContextBudget checks whether the assembled messages plus tool definitions | ||
| // and output reserve would exceed the model's context window. This enables | ||
| // proactive compression before calling the LLM, rather than reacting to 400 errors. | ||
| func isOverContextBudget( | ||
| contextWindow int, | ||
| messages []providers.Message, | ||
| toolDefs []providers.ToolDefinition, | ||
| maxTokens int, | ||
| ) bool { | ||
| msgTokens := 0 | ||
| for _, m := range messages { | ||
| msgTokens += estimateMessageTokens(m) | ||
| } | ||
|
|
||
| toolTokens := estimateToolDefsTokens(toolDefs) | ||
| total := msgTokens + toolTokens + maxTokens | ||
|
|
||
| return total > contextWindow | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an LLM or user generates a single massive message (or a massive tool response) that exceeds the context window on its own, the entire history is technically a single "Turn" (index 0). findSafeBoundary will correctly identify that there are no safe boundaries to split the sequence and return 0. By aborting compression entirely, the agent gets permanently stuck in a "Context Window Exceeded" loop because it can never shrink the context.
If a Turn boundary cannot be found, we might think about fall back to a hard split to ensure the system can recover, or return the context to the initial empty state. WDYT??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch — you're right, the agent would get stuck retrying if a single Turn exceeds the window.
Fixed in c63c644: when
mid <= 0(no safe Turn boundary),forceCompressionnow falls back to keeping only the most recent user message. This breaks Turn atomicity as a last resort, but guarantees recovery instead of looping.The reactive path needs this especially since it passes
""for UserMessage when rebuilding — without at least the latest user message in history, the LLM would get no user context at all.@afjcjsbx thanks for the review!