refactor(agent): context boundary detection, proactive budget check, and safe compression by is-Xiaoen · Pull Request #1490 · sipeed/picoclaw

is-Xiaoen · 2026-03-13T06:21:31Z

Summary

Addresses track 6 of the agent refactor (#1439): context management boundaries, compression triggers, and token budgeting.

This PR fixes four concrete problems in the current context management:

ContextWindow defaults to MaxTokens — conflates input capacity with output generation limit, causing premature summarization or missed compression triggers (fix(agent): decouple context_window from max_tokens #556)
forceCompression can orphan tool pairs — slices at len/2 without checking whether the cut falls inside an assistant+ToolCalls → tool_result sequence (fix(agent): prevent history compression from orphaning tool_call/tool_result pairs #665)
forceCompression assumes history[0] is a system prompt — session history only stores user/assistant/tool messages; the system prompt is built dynamically by BuildMessages. The old code incorrectly skipped the first user message and appended a compression note to it.
Compression is reactive only — forceCompression runs after the LLM already rejected the request with a 400, wasting a billed call
Token estimation undercounts — estimateTokens only counted m.Content, ignoring ToolCalls arguments, ReasoningContent, and Media items

Changes

New file: docs/agent-refactor/context.md

Design document for Track 6, as called for in the agent-refactor README (suggested document split, item 5: "context scope, history, summary, compression"). Documents:

Context window region definitions and history budget formula
ContextWindow vs MaxTokens distinction
Session history contents (no system prompt stored)
Turn as the atomic compression unit ([Agent refactor] Event-driven agent loop with hooks, interrupts, and steering #1316)
Three compression paths and their ordering
Token estimation approach and limitations
Interface boundaries between budget functions and BuildMessages
Known gaps

New file: pkg/agent/context_budget.go

Uses the Turn concept from the agent refactor design (#1316) as the atomic unit for compression. A Turn is a complete "user input → LLM iterations → final response" cycle.

parseTurnBoundaries(history) — identifies each Turn start index in the session history
isSafeBoundary(history, index) — checks whether an index is at a Turn boundary
findSafeBoundary(history, targetIndex) — finds the nearest Turn boundary to a target index (prefers backward to keep more recent context)
estimateMessageTokens(msg) — counts Content + ReasoningContent + ToolCalls (ID, type, name, arguments) + ToolCallID + Media items with 2.5 chars/token heuristic
estimateToolDefsTokens(defs) — estimates token cost of tool definitions (name + description + JSON schema)
isOverContextBudget(contextWindow, messages, toolDefs, maxTokens) — proactive budget check

All are pure functions with no AgentLoop dependency — forward-compatible with the agent refactor.

pkg/agent/loop.go

Proactive budget check (step 1.5 in runAgentLoop): before calling the LLM, estimate total token cost of assembled messages + tool definitions + output reserve. If over budget, run forceCompression and rebuild messages. The reactive path stays as a fallback for estimation undershoots.
forceCompression: rewritten to drop the oldest half of Turns (not arbitrary messages). Uses parseTurnBoundaries to find Turn-aligned cut points. Fixed to work with actual session data (no system prompt in history). Compression note goes into session summary, not into history messages.
summarizeSession: use findSafeBoundary to align cut to nearest Turn boundary instead of hardcoded history[:len-4]
estimateTokens: delegate to estimateMessageTokens — counts ToolCalls, ReasoningContent, Media, not just Content

pkg/agent/instance.go

Resolve ContextWindow independently from MaxTokens with a 4x heuristic default. This gives 131K for the default 32K max_tokens — reasonable for modern models. The reactive forceCompression handles any overshoot.

pkg/config/config.go

Add context_window field to AgentDefaults (with env var PICOCLAW_AGENTS_DEFAULTS_CONTEXT_WINDOW)

config/config.example.json

Add context_window field with default 131072

Web UI (web/frontend/)

Add context_window input field to the configuration page (form model, section, save handler)
Add i18n strings (en/zh). Optional field — leaving it empty falls back to the 4x heuristic.

Design decisions

Turn as the atomic unit — compression operates on complete Turns ([Agent refactor] Event-driven agent loop with hooks, interrupts, and steering #1316), not individual messages. parseTurnBoundaries identifies Turn starts; forceCompression drops "the oldest half of Turns." This naturally prevents splitting tool-call sequences since each Turn is atomic.
Pure functions, no new types — follows the refactor's "minimum concepts" rule. No ContextBudget struct, no ContextManager interface.
Compression note in summary, not history — session history stores only real conversation messages. The compression note goes into the session summary, which BuildMessages already injects into the system prompt. This avoids corrupting the stored history.
4x heuristic for default context_window — conservative lower bound. A follow-up improvement could auto-detect from the provider, but the reactive path covers any mismatch.
No dependency on other refactor tracks — operates on []providers.Message and integer token counts. Independent of agent abstraction ([Agent refactor]what an Agent is #1218), event model ([Agent refactor] Event-driven agent loop with hooks, interrupts, and steering #1316), or capability model.

Issue mapping

Proposal in #1439	Status
A. Separate context_window from max_tokens	Done
B. Explicit history budget	Partial — proactive pre-call check uses full formula (messages + tool defs + output reserve); `maybeSummarize` still uses history-only percentage threshold against the now-correct `ContextWindow` base. Full budget-aware summarization trigger is a follow-up.
C. Proactive pre-call check	Done
D. Tool-pair-aware truncation	Done — Turn-based, aligned with #1316
E. ToolCalls in token estimation	Done (+ ReasoningContent, Media)

Test plan

54 test cases in context_budget_test.go covering all pure functions
parseTurnBoundaries tests: simple exchange, tool calls, chained tools, no user messages, leading non-user
Realistic session-shaped tests (no system in history, chained tools, reasoning content, media)
Single-Turn regression: findSafeBoundary returns 0 when only one Turn exists
Context retry integration test with realistic session data (no system message in history)
go build ./pkg/... — no compilation errors
golangci-lint run — no new lint issues
No changes to session storage format or API signatures — backward compatible

Closes #556
Closes #665
Ref #1439

cc @alexhoshina @yinwm

alexhoshina · 2026-03-13T07:36:14Z

hi @is-Xiaoen, I've looked at this PR and have a simple idea.
In #1316, we used the concept of a Turn to define a complete iteration of the agent. Could we use the Turn as a cut-off point? Would using the Turn as a cut-off point be simpler and more intuitive than heuristic searching?

is-Xiaoen · 2026-03-13T07:55:17Z

Good call — using the Turn as the atomic unit is cleaner than the raw heuristic scan. I've updated the implementation:

New function: parseTurnBoundaries(history)

Returns the starting index of each Turn in the session history. A Turn begins at a user message and extends through all subsequent assistant/tool messages until the next user message — matching the Turn definition from #1316.

How it's used:

findSafeBoundary now uses parseTurnBoundaries internally to locate the nearest Turn boundary, instead of scanning for Role == "user" directly. Same result, but the intent is explicit.
forceCompression drops the oldest half of Turns (not arbitrary messages):

turns := parseTurnBoundaries(history)
if len(turns) >= 2 {
    mid = turns[len(turns)/2]  // drop oldest half of Turns
}

This reads as "drop the oldest N Turns" rather than "find nearest user message to midpoint" — simpler and self-documenting.

The Turn-based approach also naturally handles chained tool calls within a single Turn (user → assistantTC → tool → assistantTC → tool → assistant), since the entire chain lives inside one Turn and is never split.

See commit 4eaa2ec for the full change.

is-Xiaoen · 2026-03-14T03:11:05Z

@alexhoshina Saw your mention in #1218 about combining this with the new context builder. I've added docs/agent-refactor/context.md covering the context window regions, Turn as the compression unit, how the three compression paths relate, and the interface boundaries between budget functions and the builder — should make alignment easier going forward.

The budget-related functions (parseTurnBoundaries, isOverContextBudget,etc.) are all pure — they take []providers.Message and int parameters with no AgentLoop dependency, so the new builder can call them directly without
interface changes. Remaining gaps (e.g. budget-aware summarization trigger) are documented in context.md and can be followed up once the builder direction stabilizes.

If anything needs adjusting to better fit the architecture you have in mind — how the proactive check integrates, the Turn boundary detection logic, interface signatures — happy to iterate. Will rebase onto refactor/agent once the branch is up.

…and safe compression Separate context_window from max_tokens — they serve different purposes (input capacity vs output generation limit). The previous conflation caused premature summarization or missed compression triggers. Changes: - Add context_window field to AgentDefaults config (default: 4x max_tokens) - Extract boundary-safe truncation helpers (isSafeBoundary, findSafeBoundary) into context_budget.go — pure functions with no AgentLoop dependency - forceCompression: align split to safe boundary so tool-call sequences (assistant+ToolCalls → tool results) are never torn apart - summarizeSession: use findSafeBoundary instead of hardcoded keep-last-4 - estimateTokens: count ToolCalls arguments and ToolCallID metadata, not just Content — fixes systematic undercounting in tool-heavy sessions - Add proactive context budget check before LLM call in runAgentLoop, preventing 400 context-length errors instead of reacting to them - Add estimateToolDefsTokens for tool definition token cost Closes sipeed#556, closes sipeed#665 Ref sipeed#1439

Session history (GetHistory) contains only user/assistant/tool messages. The system prompt is built dynamically by BuildMessages and is never stored in session. The previous code incorrectly treated history[0] as a system prompt, skipping the first user message and appending a compression note to it. Fix: operate on the full history slice, and record the compression note in the session summary (which BuildMessages already injects into the system prompt) rather than modifying any history message.

estimateMessageTokens now counts ReasoningContent (extended thinking / chain-of-thought) which can be substantial and is persisted in session history. Media items get a fixed per-item overhead (256 tokens) since actual cost depends on provider-specific image tokenization.

Add context_window to config.example.json, the web configuration page (form model, input field, save handler), and i18n strings (en/zh). The field is optional — leaving it empty falls back to the 4x max_tokens heuristic.

Add tests that reflect actual session data shape: history starts with user messages (no system prompt), includes chained tool-call sequences, reasoning content, and media items. Exercises the proactive budget check path with BuildMessages-style assembled messages.

Fixes prealloc lint warning by using make() with capacity hint.

Introduce parseTurnBoundaries() which identifies each Turn start index in the session history. A Turn is a complete "user input → LLM iterations → final response" cycle (as defined in the agent refactor design sipeed#1316). findSafeBoundary now uses Turn boundaries instead of raw role-scanning, making the intent explicit: "find the nearest Turn boundary." forceCompression drops the oldest half of Turns (not arbitrary messages), which is simpler and more intuitive. The Turn-based approach naturally prevents splitting tool-call sequences since each Turn is atomic.

Two estimation bugs fixed: 1. Media tokens were added to the chars accumulator before the chars*2/5 conversion, resulting in 256*2/5=102 tokens per item instead of 256. Fix: add media tokens directly to the final token count, bypassing the character-based heuristic. 2. estimateMessageTokens counted both tc.Name and tc.Function.Name for tool calls, but providers only send one (OpenAI-compat uses function.name, Anthropic uses tc.Name). Fix: count tc.Function.Name when Function is present, fall back to tc.Name only otherwise. Also fix i18n hint text: "auto-detect" was misleading — the backend uses a 4x max_tokens heuristic, not actual model detection.

When the entire history is a single Turn (one user message followed by tool calls and responses, no subsequent user message), the only Turn boundary is at index 0. Previously the fallback returned targetIndex, which could land on a tool or assistant message — splitting the Turn. Return 0 instead, so callers (forceCompression, summarizeSession) see mid <= 0 and skip compression rather than cutting inside the Turn.

Session history only stores user/assistant/tool messages — the system prompt is built dynamically by BuildMessages. Remove the incorrect system message from TestAgentLoop_ContextExhaustionRetry test data to match the real data model that forceCompression operates on.

Document the semantic boundaries of context management as called for in the agent-refactor README (suggested document split, item 5): - context window region definitions and history budget formula - ContextWindow vs MaxTokens distinction - session history contents (no system prompt stored) - Turn as the atomic compression unit (sipeed#1316) - three compression paths and their ordering - token estimation approach and its limitations - interface boundaries between budget functions and BuildMessages Also documents known gaps: summarization trigger not using the full budget formula, heuristic-only token estimation, and reactive retry not preserving media references. Ref sipeed#1439

is-Xiaoen · 2026-03-16T06:50:31Z

Rebased onto refactor/agent now that the branch is up — sits cleanly on top of steering (#1517), no conflicts. Updated the PR base branch accordingly.

CI should stay green. Ready for review when you get a chance.

afjcjsbx · 2026-03-16T21:12:03Z

+		}
+	}
+
+	// No Turn boundary after targetIndex either. The only boundary is at


If an LLM or user generates a single massive message (or a massive tool response) that exceeds the context window on its own, the entire history is technically a single "Turn" (index 0). findSafeBoundary will correctly identify that there are no safe boundaries to split the sequence and return 0. By aborting compression entirely, the agent gets permanently stuck in a "Context Window Exceeded" loop because it can never shrink the context.
If a Turn boundary cannot be found, we might think about fall back to a hard split to ensure the system can recover, or return the context to the initial empty state. WDYT??

Good catch — you're right, the agent would get stuck retrying if a single Turn exceeds the window.

Fixed in c63c644: when mid <= 0 (no safe Turn boundary), forceCompression now falls back to keeping only the most recent user message. This breaks Turn atomicity as a last resort, but guarantees recovery instead of looping.

The reactive path needs this especially since it passes "" for UserMessage when rebuilding — without at least the latest user message in history, the LLM would get no user context at all.

@afjcjsbx thanks for the review!

afjcjsbx · 2026-03-16T21:21:04Z

I reviewed the PR and I like both the ideas and the clear and clean implementation, I left you a note, but for me we can merge, thank you!

When the entire session history is a single Turn (e.g. one user message followed by a massive tool response), findSafeBoundary returns 0 and forceCompression previously did nothing — leaving the agent stuck in a context-exceeded retry loop. Now falls back to keeping only the most recent user message when no safe Turn boundary exists. This breaks Turn atomicity as a last resort but guarantees the agent can recover. Also updates docs/agent-refactor/context.md to document this behavior. Ref sipeed#1490

refactor(agent): context boundary detection, proactive budget check, and safe compression

When the entire session history is a single Turn (e.g. one user message followed by a massive tool response), findSafeBoundary returns 0 and forceCompression previously did nothing — leaving the agent stuck in a context-exceeded retry loop. Now falls back to keeping only the most recent user message when no safe Turn boundary exists. This breaks Turn atomicity as a last resort but guarantees the agent can recover. Also updates docs/agent-refactor/context.md to document this behavior. Ref sipeed#1490

sipeed-bot bot added type: bug Something isn't working domain: agent domain: config go Pull requests that update go code labels Mar 13, 2026

alexhoshina mentioned this pull request Mar 13, 2026

[Agent refactor]what an Agent is #1218

Closed

xuwei-xy pushed a commit to xuwei-xy/picoclaw that referenced this pull request Mar 14, 2026

Merge PR sipeed#1490

1c26c79

xuwei-xy mentioned this pull request Mar 14, 2026

fix: merge PR #1500 #1490 #1488 #1487 #1485 #1545

Closed

is-Xiaoen added 12 commits March 16, 2026 14:48

fix(agent): preallocate messages slice in budget test

efd4032

Fixes prealloc lint warning by using make() with capacity hint.

style(agent): fix gci comment alignment in test

7c1a1c2

is-Xiaoen force-pushed the refactor/context-boundary branch from e0aad04 to 08259d7 Compare March 16, 2026 06:49

is-Xiaoen changed the base branch from main to refactor/agent March 16, 2026 06:49

afjcjsbx reviewed Mar 16, 2026

View reviewed changes

afjcjsbx approved these changes Mar 16, 2026

View reviewed changes

afjcjsbx merged commit 5e92a38 into sipeed:refactor/agent Mar 17, 2026
4 checks passed

This was referenced Mar 18, 2026

[Agent refactor] Context management: boundaries, compression, and token budgeting #1439

Closed

Meta: Agent refactor #1216

Closed

refactor(agent): consolidate Agent model - Phase 1 complete #1894

Merged

andressg79 pushed a commit to andressg79/picoclaw that referenced this pull request Mar 30, 2026

Merge pull request sipeed#1490 from is-Xiaoen/refactor/context-boundary

d78ac5e

refactor(agent): context boundary detection, proactive budget check, and safe compression

is-Xiaoen mentioned this pull request Apr 14, 2026

Phase 2 Implementation Plan: Agent Discovery → Delegation #2148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(agent): context boundary detection, proactive budget check, and safe compression#1490

refactor(agent): context boundary detection, proactive budget check, and safe compression#1490
afjcjsbx merged 13 commits intosipeed:refactor/agentfrom
is-Xiaoen:refactor/context-boundary

is-Xiaoen commented Mar 13, 2026 •

edited

Loading

Uh oh!

alexhoshina commented Mar 13, 2026

Uh oh!

is-Xiaoen commented Mar 13, 2026

Uh oh!

is-Xiaoen commented Mar 14, 2026

Uh oh!

is-Xiaoen commented Mar 16, 2026

Uh oh!

afjcjsbx Mar 16, 2026

Uh oh!

is-Xiaoen Mar 17, 2026

Uh oh!

afjcjsbx commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

is-Xiaoen commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Design decisions

Issue mapping

Test plan

Uh oh!

alexhoshina commented Mar 13, 2026

Uh oh!

is-Xiaoen commented Mar 13, 2026

Uh oh!

is-Xiaoen commented Mar 14, 2026

Uh oh!

is-Xiaoen commented Mar 16, 2026

Uh oh!

afjcjsbx Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

is-Xiaoen Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

afjcjsbx commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

is-Xiaoen commented Mar 13, 2026 •

edited

Loading