-
Notifications
You must be signed in to change notification settings - Fork 322
Closed as duplicate
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
Actually i'm using LangGraph’s React Agent ( via create_react_agent) which calls MCP tools. The helper function convert_call_tool_result in tools.py splits the MCP tool response into:
- tool_content: only the text outputs (instances of TextContent)
- non_text_contents: everything else that MCP protocol permits (ImageContent, AudioContent, etc.)
As a result, if a MCP tool returns an ImageContent or AudioContent, the React agent ignores it because it only consider and include tool_content into the message to sent to LLM model
Current Behavior
def _convert_call_tool_result(
call_tool_result: CallToolResult,
) -> tuple[str | list[str], list[NonTextContent] | None]:
text_contents: list[TextContent] = []
non_text_contents = []
for content in call_tool_result.content:
if isinstance(content, TextContent):
text_contents.append(content)
else:
non_text_contents.append(content)
tool_content: str | list[str] = [content.text for content in text_contents]
if not text_contents:
tool_content = ""
elif len(text_contents) == 1:
tool_content = tool_content[0]
if call_tool_result.isError:
raise ToolException(tool_content)
return tool_content, non_text_contents or NoneBecause ImageContent and AudioContent are classified as “non-text,” the react agent never includes them in the message it sends back to the LLM.
Questions
- Would it be interesting to also add ImageContent and TextContent into the tool_content structure?
- Would it be useful to include non-text content in tool_content already using the OpenAI Chat Completions JSON format so that the agent can pass it through the LLM? Or how can I intercept and perform the OpenAI conversion? Below is a prototype code:
Example Behavior
def _convert_call_tool_result(
call_tool_result: CallToolResult,
) -> tuple[str | list[str], list[NonTextContent] | None]:
non_text_contents = []
tool_content: str | list[str] = []
for content in call_tool_result.content:
if isinstance(content, TextContent):
text_contents.append(content)
tool_content.append(content)
elif isinstance(content, ImageContent):
non_text_contents.append(content)
url = f"{content.data}" \
if urlparse(content.data).scheme \
else f"data:{content.mimeType};base64,{content.data}"
tool_content.append(
{
"type": "image_url",
"image_url": {
"url": url
}
})
elif isinstance(content, AudioContent):
non_text_contents.append(content)
tool_content.append(
{
"type": "input_audio",
"input_audio": {
"data" : {content.data},
"format" : {content.mimeType}.split["/"][-1]
}
})
else:
non_text_contents.append(content)Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request