Skip to content

Bug Report: Parallel Tool Execution History Cleanup Mismatch #761

@luxunxiansheng

Description

@luxunxiansheng

Bug Report: Parallel Tool Execution History Cleanup Mismatch

Summary

The aworld framework encounters a tool_calls mismatch error due to an aggressive memory cleanup logic in LLMAgent. When a tool or a sub-agent returns an Observation that is not explicitly flagged as a tool_result, the calling agent prematurely deletes its last assistant message (containing the tool_calls) before the framework can reconcile the results.

Symptoms

  • AWorldRuntimeException: tool_calls mismatch! <call_id> not found in {}
  • Agent execution crashes immediately after a tool call or agent handoff completes.
  • Logs show "deleted tool call messages from memory" immediately following a tool/agent action completion.

Root Cause Analysis

In aworld/agents/llm_agent.py, the _clean_redundant_tool_call_messages method is triggered during async_messages_transform if observation.is_tool_result is False.

# aworld/agents/llm_agent.py
def _clean_redundant_tool_call_messages(self, histories: List[MemoryItem]) -> None:
    try:
        for i in range(len(histories) - 1, -1, -1):
            his = histories[i]
            if his.metadata and "tool_calls" in his.metadata and his.metadata['tool_calls']:
                logger.info(f"Agent {self.id()} deleted tool call messages from memory: {his}")
                MemoryFactory.instance().delete(his.id)
            else:
                break

This cleanup logic assumes that if the current Observation is not a tool result, any pending tool_calls in the most recent history are "leaked" or redundant and should be removed.

However, several framework-level interactions do not consistently set is_tool_result=True:

  1. Custom Tools: Many tool implementations return Observation(content=...) without setting the is_tool_result flag.
  2. WorkflowRunner Handoffs: When an agent delegates to another agent via WorkflowRunner._agent, the resulting observation of the sub-task completion is not flagged as a tool result.

This causes the caller agent to delete the memory of the call it just made, leading to a mismatch when the framework attempts to append the result to the (now deleted) call.

Suggested Technical Fixes

1. WorkflowRunner Update

Ensure that agent handoffs are correctly identified as tool results when they return to the caller.

  • Location: aworld/runners/call_driven_runner.py
  • Change: Set is_tool_result=True in the Observation created after a sub-agent task completes.

2. Tool Implementation Pattern

The Observation class should clarify the necessity of the is_tool_result flag for any tool that expects to be part of a parallel or sequential tool call sequence.

3. Cleanup Logic Guard

The _clean_redundant_tool_call_messages logic in LLMAgent should be more cautious or have secondary checks to ensure it isn't deleting messages that the framework is currently attempting to process.

Verification

The mismatch error is resolved by ensuring that all components (tools and runners) that return results back to an LLMAgent explicitly set is_tool_result=True.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions