Bug Report: Parallel Tool Execution History Cleanup Mismatch

# Bug Report: Parallel Tool Execution History Cleanup Mismatch

## Summary
The `aworld` framework encounters a `tool_calls mismatch` error due to an aggressive memory cleanup logic in `LLMAgent`. When a tool or a sub-agent returns an `Observation` that is not explicitly flagged as a `tool_result`, the calling agent prematurely deletes its last assistant message (containing the `tool_calls`) before the framework can reconcile the results.

## Symptoms
- `AWorldRuntimeException: tool_calls mismatch! <call_id> not found in {}`
- Agent execution crashes immediately after a tool call or agent handoff completes.
- Logs show `"deleted tool call messages from memory"` immediately following a tool/agent action completion.

## Root Cause Analysis
In `aworld/agents/llm_agent.py`, the `_clean_redundant_tool_call_messages` method is triggered during `async_messages_transform` if `observation.is_tool_result` is `False`.

```python
# aworld/agents/llm_agent.py
def _clean_redundant_tool_call_messages(self, histories: List[MemoryItem]) -> None:
    try:
        for i in range(len(histories) - 1, -1, -1):
            his = histories[i]
            if his.metadata and "tool_calls" in his.metadata and his.metadata['tool_calls']:
                logger.info(f"Agent {self.id()} deleted tool call messages from memory: {his}")
                MemoryFactory.instance().delete(his.id)
            else:
                break
```

This cleanup logic assumes that if the current `Observation` is not a tool result, any pending `tool_calls` in the most recent history are "leaked" or redundant and should be removed.

However, several framework-level interactions do not consistently set `is_tool_result=True`:
1.  **Custom Tools**: Many tool implementations return `Observation(content=...)` without setting the `is_tool_result` flag.
2.  **WorkflowRunner Handoffs**: When an agent delegates to another agent via `WorkflowRunner._agent`, the resulting observation of the sub-task completion is not flagged as a tool result.

This causes the caller agent to delete the memory of the call it just made, leading to a mismatch when the framework attempts to append the result to the (now deleted) call.

## Suggested Technical Fixes

### 1. WorkflowRunner Update
Ensure that agent handoffs are correctly identified as tool results when they return to the caller.
- **Location**: `aworld/runners/call_driven_runner.py`
- **Change**: Set `is_tool_result=True` in the `Observation` created after a sub-agent task completes.

### 2. Tool Implementation Pattern
The `Observation` class should clarify the necessity of the `is_tool_result` flag for any tool that expects to be part of a parallel or sequential tool call sequence.

### 3. Cleanup Logic Guard
The `_clean_redundant_tool_call_messages` logic in `LLMAgent` should be more cautious or have secondary checks to ensure it isn't deleting messages that the framework is currently attempting to process.

## Verification
The mismatch error is resolved by ensuring that all components (tools and runners) that return results back to an `LLMAgent` explicitly set `is_tool_result=True`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: Parallel Tool Execution History Cleanup Mismatch #761

Bug Report: Parallel Tool Execution History Cleanup Mismatch

Summary

Symptoms

Root Cause Analysis

Suggested Technical Fixes

1. WorkflowRunner Update

2. Tool Implementation Pattern

3. Cleanup Logic Guard

Verification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug Report: Parallel Tool Execution History Cleanup Mismatch #761

Description

Bug Report: Parallel Tool Execution History Cleanup Mismatch

Summary

Symptoms

Root Cause Analysis

Suggested Technical Fixes

1. WorkflowRunner Update

2. Tool Implementation Pattern

3. Cleanup Logic Guard

Verification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions