fix(agent_toolset): read images/PDFs as content blocks instead of crashing#1674
Open
Zawwarsami16 wants to merge 2 commits into
Open
fix(agent_toolset): read images/PDFs as content blocks instead of crashing#1674Zawwarsami16 wants to merge 2 commits into
Zawwarsami16 wants to merge 2 commits into
Conversation
…shing The self-hosted agent_toolset read tool decoded every file as UTF-8, so reading a binary image or PDF raised an uncaught UnicodeDecodeError that surfaced to the model as a raw tool error. Sniff image/PDF magic bytes and return the matching base64 image/document content block (the tool-result contract and session runner already forward these). Other non-UTF-8 files now raise a clear ToolError instead of an uncaught UnicodeDecodeError.
Parametrized JPEG/PNG/GIF/WebP/PDF read returning base64 image/document blocks, plus rejecting view_range on binary and a clear error on non-text/non-binary bytes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1637.
Problem
The self-hosted
agent_toolset_20260401readtool decodes every file as UTF-8 (target.read_text()). Reading a binary image or PDF raises an uncaughtUnicodeDecodeError(aValueError, not caught alongsideToolError/OSError), which surfaces to the model as a raw tool error:So an agent running under
SessionToolRunner/ the self-hosted environment worker can'treadan image or PDF — which breaks the document skills that render pages/slides to images for visual QA. The hosted product and Claude Code'sReadboth handle images, so this gap is specific to the open-source self-hosted toolset.Fix
readnow sniffs the leading magic bytes and, for a recognized image/PDF, returns the matching base64image/documentcontent block instead of decoding as text. The plumbing for this already exists:BetaFunctionToolResultType = Union[str, Iterable[BetaContent]], andBetaContentalready includesBetaImageBlockParam/BetaRequestDocumentBlockParam._beta_session_runner._to_session_contentalready forwardsimage/document/search_resultblocks to the session.Detected types: JPEG, PNG, GIF, WebP, and PDF (content-sniffed, not extension-trusted).
view_rangeisn't meaningful for binary, so it's rejected with a clear message. Any other non-UTF-8 file now raises a cleanToolError("not a UTF-8 text file and not a supported binary…") rather than letting theUnicodeDecodeErrorpropagate uncaught.The text path is unchanged except that decoding is now explicitly UTF-8 (was already UTF-8 via
read_text()).Tests
view_rangeon a binary file raises.ToolErrorinstead ofUnicodeDecodeError.pytest tests/lib/tools/test_agent_toolset.py tests/lib/tools/test_session_runner.py→ 80 passed.ruff check+ruff format --checkclean; pyright reports no errors on the changed file.