Skip to content

fix(agent_toolset): read images/PDFs as content blocks instead of crashing#1674

Open
Zawwarsami16 wants to merge 2 commits into
anthropics:mainfrom
Zawwarsami16:fix/agent-read-binary-files
Open

fix(agent_toolset): read images/PDFs as content blocks instead of crashing#1674
Zawwarsami16 wants to merge 2 commits into
anthropics:mainfrom
Zawwarsami16:fix/agent-read-binary-files

Conversation

@Zawwarsami16

Copy link
Copy Markdown

Closes #1637.

Problem

The self-hosted agent_toolset_20260401 read tool decodes every file as UTF-8 (target.read_text()). Reading a binary image or PDF raises an uncaught UnicodeDecodeError (a ValueError, not caught alongside ToolError/OSError), which surfaces to the model as a raw tool error:

UnicodeDecodeError('utf-8', b'\xff\xd8\xff\xe0...', 0, 1, 'invalid start byte')

So an agent running under SessionToolRunner / the self-hosted environment worker can't read an image or PDF — which breaks the document skills that render pages/slides to images for visual QA. The hosted product and Claude Code's Read both handle images, so this gap is specific to the open-source self-hosted toolset.

Fix

read now sniffs the leading magic bytes and, for a recognized image/PDF, returns the matching base64 image / document content block instead of decoding as text. The plumbing for this already exists:

  • BetaFunctionToolResultType = Union[str, Iterable[BetaContent]], and BetaContent already includes BetaImageBlockParam / BetaRequestDocumentBlockParam.
  • _beta_session_runner._to_session_content already forwards image / document / search_result blocks to the session.

Detected types: JPEG, PNG, GIF, WebP, and PDF (content-sniffed, not extension-trusted). view_range isn't meaningful for binary, so it's rejected with a clear message. Any other non-UTF-8 file now raises a clean ToolError ("not a UTF-8 text file and not a supported binary…") rather than letting the UnicodeDecodeError propagate uncaught.

The text path is unchanged except that decoding is now explicitly UTF-8 (was already UTF-8 via read_text()).

Tests

  • Parametrized read of JPEG/PNG/GIF/WebP/PDF returns the right block type + media type, with base64 that round-trips to the original bytes.
  • view_range on a binary file raises.
  • Arbitrary non-text/non-image bytes raise a clear ToolError instead of UnicodeDecodeError.

pytest tests/lib/tools/test_agent_toolset.py tests/lib/tools/test_session_runner.py → 80 passed. ruff check + ruff format --check clean; pyright reports no errors on the changed file.

…shing

The self-hosted agent_toolset read tool decoded every file as UTF-8, so
reading a binary image or PDF raised an uncaught UnicodeDecodeError that
surfaced to the model as a raw tool error. Sniff image/PDF magic bytes and
return the matching base64 image/document content block (the tool-result
contract and session runner already forward these). Other non-UTF-8 files
now raise a clear ToolError instead of an uncaught UnicodeDecodeError.
Parametrized JPEG/PNG/GIF/WebP/PDF read returning base64 image/document
blocks, plus rejecting view_range on binary and a clear error on
non-text/non-binary bytes.
@Zawwarsami16 Zawwarsami16 requested a review from a team as a code owner June 13, 2026 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Self-hosted agent_toolset read tool raises UnicodeDecodeError on binary files (images/PDFs) instead of returning content blocks

1 participant