fix(agent): prevent history compression from orphaning tool_call/tool_result pairs by nikolasdehor · Pull Request #665 · sipeed/picoclaw

nikolasdehor · 2026-02-23T00:45:12Z

Summary

Adds sanitizeToolPairs() helper that removes orphaned tool_call/tool_result messages after history compression
Applied in both forceCompression() and summarizeSession() paths
Prevents Anthropic API 400 errors caused by orphaned tool pairs (tool_use ids were provided that do not have a tool_use block)

Root Cause

When conversation history exceeds context limits, forceCompression() splits at the midpoint (mid := len(conversation) / 2) and summarizeSession() keeps only the last 4 messages via TruncateHistory(key, 4). Neither checks whether the cut point falls between a tool_call and its tool_result, creating orphaned pairs that violate Anthropic's API contract.

The existing partial fix by Diegox-17 in context.go only strips orphaned tool messages from the very start of history, missing orphaned assistant messages with ToolCalls at the end or broken pairs created by compression.

Fix

sanitizeToolPairs() scans for tool_call IDs and tool_result IDs, then filters:

Orphaned tool results (no matching tool_call) are removed
Orphaned assistant tool_calls (missing results) with text content keep the text but drop the tool_calls
Orphaned assistant tool_calls without text content are removed entirely
Complete pairs are preserved unchanged

Test Plan

Closes #475

Copilot

Pull request overview

This PR aims to prevent provider (notably Anthropic) 400s during history compression/truncation by sanitizing conversation history so tool_call ↔ tool_result pairs aren’t orphaned after forceCompression() and summarizeSession() modify session history.

Changes:

Added sanitizeToolPairs() helper to remove/adjust orphaned tool-related messages after history cuts.
Integrated sanitization into forceCompression() and the summarizeSession() truncation path.
Added unit tests covering a variety of tool-pair boundary scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
pkg/agent/sanitize.go	Introduces `sanitizeToolPairs()` and logging around removed/modified messages.
pkg/agent/sanitize_test.go	Adds unit tests for sanitization behavior across multiple tool-pair scenarios.
pkg/agent/loop.go	Calls sanitization after forced compression and after truncating history post-summary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-23T00:48:04Z

pkg/agent/loop.go

+		if len(sanitized) != len(remaining) {
+			agent.Sessions.SetHistory(sessionKey, sanitized)
+		}


In summarizeSession, the sanitized history is only written back when the slice length changes. sanitizeToolPairs can modify messages without changing length (e.g., stripping ToolCalls but keeping assistant text), so this condition can silently keep invalid history and still trigger provider 400s. Consider always calling SetHistory after sanitization, or have sanitizeToolPairs return a (sanitized, changed) flag and check that instead of len().

Suggested change

if len(sanitized) != len(remaining) {

agent.Sessions.SetHistory(sessionKey, sanitized)

}

agent.Sessions.SetHistory(sessionKey, sanitized)

You are right -- the len() comparison misses in-place modifications. sanitizeToolPairs can strip ToolCalls from an assistant message while preserving the text content, which changes the message without changing the slice length. Will fix this to always call SetHistory after sanitization, or track a modified flag alongside the length check.

Copilot · 2026-02-23T00:48:04Z

pkg/agent/sanitize.go

+	// Build set of tool_call IDs present in assistant messages
+	toolCallIDs := make(map[string]bool)
+	for _, m := range messages {
+		if m.Role == "assistant" && len(m.ToolCalls) > 0 {
+			for _, tc := range m.ToolCalls {
+				if tc.ID != "" {
+					toolCallIDs[tc.ID] = true
+				}
+			}
+		}
+	}
+
+	// Build set of tool_result IDs present
+	toolResultIDs := make(map[string]bool)
+	for _, m := range messages {
+		if m.Role == "tool" && m.ToolCallID != "" {
+			toolResultIDs[m.ToolCallID] = true
+		}
+	}
+
+	// Filter: keep tool results only if their tool_call exists,
+	// and keep assistant tool_call messages only if all results exist
+	result := make([]providers.Message, 0, len(messages))
+	removed := 0
+
+	for _, m := range messages {
+		switch {
+		case m.Role == "tool" && m.ToolCallID != "":
+			// Keep tool result only if its tool_call is present
+			if toolCallIDs[m.ToolCallID] {
+				result = append(result, m)
+			} else {
+				removed++
+				logger.DebugCF("agent", "sanitizeToolPairs: removing orphaned tool result",
+					map[string]interface{}{"tool_call_id": m.ToolCallID})
+			}


sanitizeToolPairs currently decides whether to keep a tool result by checking if the ToolCallID exists anywhere in the input history (toolCallIDs), not whether the corresponding assistant ToolCalls message is retained and appears before the tool result in the sanitized output. This can keep tool results that come before their tool_call (after truncation) and can also keep tool results for an assistant tool_call message that the function later drops/strips, producing new orphans mid-history and still violating the provider contract. Recommend making the sanitization order-aware (pair tool results only with a preceding retained assistant tool_calls message) and ensuring that if an assistant tool_calls message is removed/stripped, any tool results for its IDs are removed as well (or alternatively keep only the subset of ToolCalls that actually have results and drop the rest consistently).

This is the most substantive feedback -- thank you. You are correct that the current implementation is not fully order-aware: it builds a set of all tool_call IDs across the entire history and then uses membership to decide whether to keep tool results. After truncation, a tool result could appear before its corresponding tool_call, or the tool_call message could be removed while its results are kept.

I will refactor sanitizeToolPairs to be a single forward pass that:

Tracks which tool_call IDs have been seen from retained assistant messages

Only keeps tool results whose IDs match a previously-seen, retained assistant tool_call

If an assistant tool_call message is stripped/dropped, also drops any subsequent tool results for those IDs

This makes the algorithm order-aware and eliminates the category of bugs you described.

Copilot · 2026-02-23T00:48:04Z

pkg/agent/sanitize.go

+			} else if m.Content != "" {
+				// Keep the text content but strip the tool calls
+				removed++
+				logger.DebugCF("agent", "sanitizeToolPairs: stripping orphaned tool_calls from assistant message, keeping text content",
+					map[string]interface{}{"tool_call_count": len(m.ToolCalls)})
+				result = append(result, providers.Message{
+					Role:    "assistant",
+					Content: m.Content,
+				})


The removed_count bookkeeping is inaccurate: in the branch that strips ToolCalls but keeps the assistant message text, removed++ is incremented even though no message is removed (only modified). This makes the warning log misleading and complicates debugging/monitoring. Consider tracking removed vs modified separately (or only incrementing removed when the message is actually dropped).

Fair point. The removed_count is misleading when the message is modified in-place rather than dropped. Will split the logging into removed (message actually dropped from the slice) vs modified (message kept but ToolCalls stripped) for clearer diagnostics.

Copilot · 2026-02-23T00:48:05Z

pkg/agent/sanitize_test.go

+	// The assistant msg is dropped (not all tool_calls have results).
+	// call_1's tool result stays because toolCallIDs includes call_1
+	// from the original scan. This is a secondary orphan handled by
+	// the existing Diegox-17 fix in BuildMessages (strips leading tool msgs).
+	if len(result) != 3 {
+		t.Errorf("expected 3 messages, got %d", len(result))
+	}
+	if result[0].Role != "user" {
+		t.Errorf("expected first message to be user, got %s", result[0].Role)
+	}


TestSanitizeToolPairs_MultiToolCallPartialResults encodes behavior where the assistant tool_call message is dropped but tool results for its IDs are still kept (comment says it relies on the older BuildMessages leading-tool-strip). This leaves an orphaned tool result in the middle of history, which is exactly what sanitizeToolPairs is intended to prevent, so the test currently allows the core bug to persist/regress. After fixing sanitizeToolPairs to remove tool results when their corresponding tool_call message is removed/stripped (and to be order-aware), update this test to assert the fully sanitized output rather than depending on BuildMessages’ unrelated leading-tool cleanup.

Suggested change

// The assistant msg is dropped (not all tool_calls have results).

// call_1's tool result stays because toolCallIDs includes call_1

// from the original scan. This is a secondary orphan handled by

// the existing Diegox-17 fix in BuildMessages (strips leading tool msgs).

if len(result) != 3 {

t.Errorf("expected 3 messages, got %d", len(result))

}

if result[0].Role != "user" {

t.Errorf("expected first message to be user, got %s", result[0].Role)

}

// The assistant msg is dropped because not all tool_calls have results.

// Any tool results corresponding to that assistant's tool_calls should

// also be dropped, so no orphaned tool messages remain.

if len(result) != 2 {

t.Errorf("expected 2 messages after sanitization, got %d", len(result))

}

if result[0].Role != "user" || result[0].Content != "Do two things" {

t.Errorf("unexpected first message: role=%s content=%q", result[0].Role, result[0].Content)

}

if result[1].Role != "user" || result[1].Content != "OK" {

t.Errorf("unexpected second message: role=%s content=%q", result[1].Role, result[1].Content)

}

Agreed -- this test currently encodes the "leaky" behavior where orphaned tool results are left for BuildMessages to clean up. After refactoring sanitizeToolPairs to be order-aware (per the comment above), this test should assert that the tool result is also dropped when its corresponding assistant tool_call message is removed, rather than relying on a downstream cleanup layer.

Will update the test expectations as part of the refactor.

Copilot · 2026-02-23T00:48:05Z

pkg/agent/sanitize_test.go

+		t.Errorf("expected 4 messages, got %d", len(result))
+	}


TestSanitizeToolPairs_CompletePairs only asserts the resulting length, so it would still pass if sanitizeToolPairs accidentally strips ToolCalls or drops the tool result while keeping four messages. Adding assertions that the assistant ToolCalls and matching tool ToolCallID are preserved would make this test actually validate the intended invariant.

Suggested change

t.Errorf("expected 4 messages, got %d", len(result))

}

t.Fatalf("expected 4 messages, got %d", len(result))

}

// Verify roles and contents are preserved in order

if result[0].Role != "user" || result[0].Content != "What is the weather?" {

t.Errorf("unexpected first message: %#v", result[0])

}

// Assistant tool_call message should still have the same ToolCalls

assistantWithTool := result[1]

if assistantWithTool.Role != "assistant" {

t.Errorf("expected second message role=assistant, got %s", assistantWithTool.Role)

}

if len(assistantWithTool.ToolCalls) != 1 {

t.Fatalf("expected second message to have 1 ToolCall, got %d", len(assistantWithTool.ToolCalls))

}

if assistantWithTool.ToolCalls[0].ID != "call_1" {

t.Errorf("expected ToolCall ID 'call_1', got %q", assistantWithTool.ToolCalls[0].ID)

}

if assistantWithTool.ToolCalls[0].Name != "get_weather" {

t.Errorf("expected ToolCall Name 'get_weather', got %q", assistantWithTool.ToolCalls[0].Name)

}

// Tool result should still be present and linked via ToolCallID

toolMsg := result[2]

if toolMsg.Role != "tool" {

t.Errorf("expected third message role=tool, got %s", toolMsg.Role)

}

if toolMsg.ToolCallID != "call_1" {

t.Errorf("expected ToolCallID 'call_1', got %q", toolMsg.ToolCallID)

}

if toolMsg.Content != `{"temp": 72}` {

t.Errorf("unexpected tool content: %q", toolMsg.Content)

}

// Final assistant response should be preserved

finalAssistant := result[3]

if finalAssistant.Role != "assistant" {

t.Errorf("expected fourth message role=assistant, got %s", finalAssistant.Role)

}

if finalAssistant.Content != "The temperature is 72 degrees." {

t.Errorf("unexpected final assistant content: %q", finalAssistant.Content)

}

Good point -- length-only assertions are brittle. Will add full content assertions (role, content, ToolCalls, ToolCallID) to TestSanitizeToolPairs_CompletePairs to verify the complete-pair scenario preserves all fields correctly.

nikolasdehor · 2026-02-28T03:51:47Z

Update: PR #730 (merged by winterfx) addresses the same class of bug in sanitizeHistoryForProvider() at the context.go layer. Our fix remains valuable because:

Defense in depth: PR fix: multi-tool-call results incorrectly dropped causing LLM API 400 #730 sanitizes at query time (BuildMessages), but corrupted history is still persisted to the session store. Our fix prevents the corruption from being stored in the first place.
Covers summarizeSession(): Neither fix: multi-tool-call results incorrectly dropped causing LLM API 400 #730 nor the new PR fix: preserve tool call/response pairing in forceCompression #871 addresses the TruncateHistory(key, 4) path in summarizeSession(), which can equally orphan tool pairs.
Handles partial assistant messages: Our sanitizeToolPairs() correctly handles assistant messages with both text content AND tool_calls (strips tool_calls, preserves text). PR fix: preserve tool call/response pairing in forceCompression #871's removeOrphanedToolMessages() only strips leading tool messages and does not handle orphaned assistant messages with ToolCalls.

This PR and #730 are complementary layers — one prevents corruption at write time, the other at read time.

…_result pairs When forceCompression() or summarizeSession() truncates conversation history, the cut point can fall between a tool_call and its tool_result, creating orphaned pairs that cause Anthropic API 400 errors. Add sanitizeToolPairs() to remove orphaned tool pairs after compression: - Tool results without matching tool_calls are removed - Assistant messages with tool_calls missing results are stripped or removed - Assistant messages with both text content and orphaned tool_calls keep the text but drop the tool_calls Applied in both forceCompression() and summarizeSession() paths. Closes sipeed#475

- Track retained tool_call IDs to eliminate secondary orphans: when an assistant message is dropped, its tool results are also dropped - Split removed/modified counters for accurate logging - Always call SetHistory after sanitization (not conditional on len) - Add full field assertions to CompletePairs test - Fix MultiToolCallPartialResults test: secondary orphans now cleaned

CLAassistant · 2026-03-05T15:07:22Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

…and safe compression Separate context_window from max_tokens — they serve different purposes (input capacity vs output generation limit). The previous conflation caused premature summarization or missed compression triggers. Changes: - Add context_window field to AgentDefaults config (default: 4x max_tokens) - Extract boundary-safe truncation helpers (isSafeBoundary, findSafeBoundary) into context_budget.go — pure functions with no AgentLoop dependency - forceCompression: align split to safe boundary so tool-call sequences (assistant+ToolCalls → tool results) are never torn apart - summarizeSession: use findSafeBoundary instead of hardcoded keep-last-4 - estimateTokens: count ToolCalls arguments and ToolCallID metadata, not just Content — fixes systematic undercounting in tool-heavy sessions - Add proactive context budget check before LLM call in runAgentLoop, preventing 400 context-length errors instead of reacting to them - Add estimateToolDefsTokens for tool definition token cost Closes sipeed#556, closes sipeed#665 Ref sipeed#1439

Copilot AI review requested due to automatic review settings February 23, 2026 00:45

Copilot started reviewing on behalf of nikolasdehor February 23, 2026 00:45 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

nikolasdehor mentioned this pull request Feb 28, 2026

fix: preserve tool call/response pairing in forceCompression #871

Open

sipeed-bot bot added type: bug Something isn't working domain: agent labels Mar 3, 2026

nikolasdehor added 2 commits March 3, 2026 01:20

nikolasdehor force-pushed the fix/tool-pair-orphan-475 branch from 67409f3 to 52e510b Compare March 3, 2026 04:24

This was referenced Mar 3, 2026

🦞 OpenClaw 生态日报 2026-03-03 duanyytop/agents-radar#51

Closed

🦞 OpenClaw 生态日报 2026-03-03 duanyytop/agents-radar#66

Open

is-Xiaoen mentioned this pull request Mar 12, 2026

[Agent refactor] Context management: boundaries, compression, and token budgeting #1439

Closed

is-Xiaoen mentioned this pull request Mar 13, 2026

refactor(agent): context boundary detection, proactive budget check, and safe compression #1490

Merged

8 tasks

yinwm closed this in 9c82b0b Mar 22, 2026

-	// The assistant msg is dropped (not all tool_calls have results).
-	// call_1's tool result stays because toolCallIDs includes call_1
-	// from the original scan. This is a secondary orphan handled by
-	// the existing Diegox-17 fix in BuildMessages (strips leading tool msgs).
-	if len(result) != 3 {
-		t.Errorf("expected 3 messages, got %d", len(result))
-	}
-	if result[0].Role != "user" {
-		t.Errorf("expected first message to be user, got %s", result[0].Role)
-	}
+	// The assistant msg is dropped because not all tool_calls have results.
+	// Any tool results corresponding to that assistant's tool_calls should
+	// also be dropped, so no orphaned tool messages remain.
+	if len(result) != 2 {
+		t.Errorf("expected 2 messages after sanitization, got %d", len(result))
+	}
+	if result[0].Role != "user" || result[0].Content != "Do two things" {
+		t.Errorf("unexpected first message: role=%s content=%q", result[0].Role, result[0].Content)
+	}
+	if result[1].Role != "user" || result[1].Content != "OK" {
+		t.Errorf("unexpected second message: role=%s content=%q", result[1].Role, result[1].Content)
+	}

-		t.Errorf("expected 4 messages, got %d", len(result))
-	}
+		t.Fatalf("expected 4 messages, got %d", len(result))
+	}
+	// Verify roles and contents are preserved in order
+	if result[0].Role != "user" || result[0].Content != "What is the weather?" {
+		t.Errorf("unexpected first message: %#v", result[0])
+	}
+	// Assistant tool_call message should still have the same ToolCalls
+	assistantWithTool := result[1]
+	if assistantWithTool.Role != "assistant" {
+		t.Errorf("expected second message role=assistant, got %s", assistantWithTool.Role)
+	}
+	if len(assistantWithTool.ToolCalls) != 1 {
+		t.Fatalf("expected second message to have 1 ToolCall, got %d", len(assistantWithTool.ToolCalls))
+	}
+	if assistantWithTool.ToolCalls[0].ID != "call_1" {
+		t.Errorf("expected ToolCall ID 'call_1', got %q", assistantWithTool.ToolCalls[0].ID)
+	}
+	if assistantWithTool.ToolCalls[0].Name != "get_weather" {
+		t.Errorf("expected ToolCall Name 'get_weather', got %q", assistantWithTool.ToolCalls[0].Name)
+	}
+	// Tool result should still be present and linked via ToolCallID
+	toolMsg := result[2]
+	if toolMsg.Role != "tool" {
+		t.Errorf("expected third message role=tool, got %s", toolMsg.Role)
+	}
+	if toolMsg.ToolCallID != "call_1" {
+		t.Errorf("expected ToolCallID 'call_1', got %q", toolMsg.ToolCallID)
+	}
+	if toolMsg.Content != `{"temp": 72}` {
+		t.Errorf("unexpected tool content: %q", toolMsg.Content)
+	}
+	// Final assistant response should be preserved
+	finalAssistant := result[3]
+	if finalAssistant.Role != "assistant" {
+		t.Errorf("expected fourth message role=assistant, got %s", finalAssistant.Role)
+	}
+	if finalAssistant.Content != "The temperature is 72 degrees." {
+		t.Errorf("unexpected final assistant content: %q", finalAssistant.Content)
+	}

Conversation

nikolasdehor commented Feb 23, 2026

Summary

Root Cause

Fix

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

nikolasdehor Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

nikolasdehor Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

nikolasdehor Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

nikolasdehor Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

nikolasdehor Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

nikolasdehor commented Feb 28, 2026

Uh oh!

CLAassistant commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants