Skip to content

Extend _convert_call_tool_result helper class in tools.py #272

@danielefuselli-project

Description

@danielefuselli-project

Problem

Actually i'm using LangGraph’s React Agent ( via create_react_agent) which calls MCP tools. The helper function convert_call_tool_result in tools.py splits the MCP tool response into:

  • tool_content: only the text outputs (instances of TextContent)
  • non_text_contents: everything else that MCP protocol permits (ImageContent, AudioContent, etc.)

As a result, if a MCP tool returns an ImageContent or AudioContent, the React agent ignores it because it only consider and include tool_content into the message to sent to LLM model

Current Behavior

def _convert_call_tool_result(
    call_tool_result: CallToolResult,
) -> tuple[str | list[str], list[NonTextContent] | None]:
    text_contents: list[TextContent] = []
    non_text_contents = []
    for content in call_tool_result.content:
        if isinstance(content, TextContent):
            text_contents.append(content)
        else:
            non_text_contents.append(content)
    tool_content: str | list[str] = [content.text for content in text_contents]
    if not text_contents:
        tool_content = ""
    elif len(text_contents) == 1:
        tool_content = tool_content[0]

    if call_tool_result.isError:
        raise ToolException(tool_content)

    return tool_content, non_text_contents or None

Because ImageContent and AudioContent are classified as “non-text,” the react agent never includes them in the message it sends back to the LLM.

Questions

  1. Would it be interesting to also add ImageContent and TextContent into the tool_content structure?
  2. Would it be useful to include non-text content in tool_content already using the OpenAI Chat Completions JSON format so that the agent can pass it through the LLM? Or how can I intercept and perform the OpenAI conversion? Below is a prototype code:

Example Behavior

def _convert_call_tool_result(
    call_tool_result: CallToolResult,
) -> tuple[str | list[str], list[NonTextContent] | None]:
    
    non_text_contents = []
    tool_content: str | list[str] = []
    for content in call_tool_result.content:
        if isinstance(content, TextContent):
            text_contents.append(content)
            tool_content.append(content)
        elif isinstance(content, ImageContent):
            non_text_contents.append(content)
            url = f"{content.data}" \
                    if urlparse(content.data).scheme \
                    else f"data:{content.mimeType};base64,{content.data}" 
            tool_content.append(
                {
                    "type": "image_url", 
                    "image_url": {
                        "url": url
                    }
                })
        elif isinstance(content, AudioContent):
            non_text_contents.append(content)
            tool_content.append(
                {
                    "type": "input_audio", 
                    "input_audio": {
                        "data" : {content.data},
                        "format" : {content.mimeType}.split["/"][-1]
                    }
                })
        else:
            non_text_contents.append(content)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions