Skip to content

web_fetch can't fetch JSON APIs — fails with HTTP 415 because it only sends text/* Accept headers #5611

Description

@tanzhenxin

What happened?

web_fetch cannot retrieve JSON REST APIs. Any request to an endpoint that only serves JSON (for example the GitHub REST API) fails with:

Error during fetch for https://api.github.com/repos/openai/codex/releases/tags/rust-v0.92.0: Request failed with status code 415 Unsupported Media Type

This happens regardless of the format argument, because web_fetch always sends an Accept header that lists only text/* media types. The default (auto) sends text/markdown, text/html, text/plain, and each explicit format maps to a single text/* value. A JSON-only API has no text/* representation to offer, so it rejects the request.

Encountered in real use while asking the assistant to look up a package's release date: web_fetch could read the HTML release page, but every attempt to hit the cleaner JSON API endpoint (which contains the exact published_at timestamp) returned 415 — so the assistant had to fall back to scraping a year-less relative date out of the HTML.

Reproduction

qwen -p "Use the web_fetch tool to fetch https://api.github.com/repos/openai/codex/releases/tags/rust-v0.92.0 and report the published date."

Observed: the tool errors with 415 Unsupported Media Type. Expected: it returns the release info.

The 415 is purely about the Accept header — confirmed with curl against the same URL:

Accept header sent HTTP status
text/markdown 415
text/html 415
text/plain 415
text/markdown, text/html, text/plain (current auto) 415
*/* 200
application/vnd.github+json 200
(no Accept header) 200

What did you expect to happen?

web_fetch should be able to fetch JSON APIs (and any other content type it doesn't explicitly enumerate), while still preferring markdown when a server can provide it (the token-saving path). A general-purpose fetch tool shouldn't hard-fail on a whole class of endpoints just because of an over-narrow Accept header.

Login information

OpenAI-compatible API configuration (DashScope). Note: the bug is in the HTTP request web_fetch makes to the target URL, and is independent of how the model provider is configured or authenticated — it reproduces under any login.

Anything else we need to know?

Proposed fix

Add a low-priority catch-all to the Accept header using standard HTTP quality values — the same single-request content-negotiation pattern browsers use (e.g. Chrome sends …,*/*;q=0.8). The preferred type still wins whenever the server can serve it; */* only activates as a fallback.

format Current Accept Proposed Accept
auto (default) text/markdown, text/html, text/plain text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1
markdown text/markdown text/markdown, */*;q=0.1
html text/html text/html, */*;q=0.1
text text/plain text/plain, */*;q=0.1

Behavior for common requests (auto mode) — only the JSON-API case changes:

Request target Current Proposed
Normal HTML page 200, HTML 200, HTML (unchanged)
Markdown-capable server 200, markdown (token-saving) 200, markdown (unchanged)
Raw text/markdown file 200, file 200, file (unchanged)
Server that ignores Accept 200 200 (unchanged)
JSON REST API (e.g. GitHub) 415, fails 200, JSON

Confirmed with curl: the proposed auto header returns 200 on example.com, a raw GitHub README, and the GitHub JSON API (handing back the full published_at: 2026-01-27T11:50:52Z the HTML page couldn't provide).

Notes:

  • For context, the 415 here is a content-negotiation rejection of the Accept header (the spec-correct status for an unsatisfiable Accept is 406 Not Acceptable; GitHub's use of 415 is idiosyncratic) — so the Accept header is the right lever to adjust.
  • web_fetch already normalizes all fetched content to plain text regardless of format, so receiving JSON/HTML via the fallback is harmless to downstream processing.

Context

  • Separate from the GLM content-drop fix (PR fix(core): prevent GLM on DashScope from dropping web_fetch content #5599); this was found while verifying that PR. The two are independent — one is about how content is sent to the model, this is about how content is fetched from the server.
  • Scope is limited to the Accept header in web_fetch's direct-fetch path; no change to how fetched content is processed.
中文说明

发生了什么?

web_fetch 无法获取 JSON REST API。任何只返回 JSON 的接口(例如 GitHub REST API)都会失败并报错:

Error during fetch for https://api.github.com/repos/openai/codex/releases/tags/rust-v0.92.0: Request failed with status code 415 Unsupported Media Type

无论传入哪个 format 参数都会这样,因为 web_fetch 始终发送只包含 text/* 媒体类型的 Accept 头。默认值(auto)发送 text/markdown, text/html, text/plain,而每个显式的 format 都映射到单一的 text/* 值。只提供 JSON 的 API 没有任何 text/* 表示可供返回,于是拒绝该请求。

在实际使用中发现:让助手查询某软件包的发布日期时,web_fetch 能读取 HTML 发布页面,但每次尝试访问更干净的 JSON API 接口(其中含有精确的 published_at 时间戳)都返回 415——于是助手只能退而从 HTML 中抓取一个不含年份的相对日期。

复现

qwen -p "Use the web_fetch tool to fetch https://api.github.com/repos/openai/codex/releases/tags/rust-v0.92.0 and report the published date."

观察到:工具报错 415 Unsupported Media Type。期望:返回该 release 的信息。

415 完全由 Accept 头引起——用 curl 对同一 URL 验证:

发送的 Accept HTTP 状态码
text/markdown 415
text/html 415
text/plain 415
text/markdown, text/html, text/plain(当前 auto 415
*/* 200
application/vnd.github+json 200
(不发送 Accept 头) 200

期望的行为

web_fetch 应当能够获取 JSON API(以及任何它未显式列出的内容类型),同时在服务器能够提供 markdown 时仍优先选择 markdown(节省 token 的路径)。一个通用抓取工具不应仅仅因为 Accept 头过窄就对一整类接口直接失败。

登录信息

OpenAI 兼容 API 配置(DashScope)。注意:该 bug 位于 web_fetch目标 URL 发起的 HTTP 请求中,与模型提供方如何配置或鉴权无关——在任何登录方式下都会复现。

还有什么需要我们了解的吗?

建议的修复

使用标准的 HTTP 质量值(quality value)为 Accept 头添加一个低优先级的兜底类型——正是浏览器采用的单请求内容协商方式(例如 Chrome 发送 …,*/*;q=0.8)。只要服务器能提供首选类型,首选类型仍然胜出;*/* 仅作为兜底生效。

format 当前 Accept 建议的 Accept
auto(默认) text/markdown, text/html, text/plain text/markdown, text/html;q=0.9, text/plain;q=0.8, */*;q=0.1
markdown text/markdown text/markdown, */*;q=0.1
html text/html text/html, */*;q=0.1
text text/plain text/plain, */*;q=0.1

常见请求的行为(auto 模式)——只有 JSON API 这一行发生变化:

请求目标 当前 建议
普通 HTML 页面 200,HTML 200,HTML(不变)
支持 markdown 的服务器 200,markdown(节省 token) 200,markdown(不变)
原始 text/markdown 文件 200,文件 200,文件(不变)
忽略 Accept 的服务器 200 200(不变)
JSON REST API(如 GitHub) 415,失败 200,JSON

已用 curl 验证:建议的 auto 头在 example.com、原始 GitHub README、以及 GitHub JSON API 上均返回 200(并返回 HTML 页面无法提供的完整 published_at: 2026-01-27T11:50:52Z)。

说明:

  • 补充:这里的 415 实际上是对 Accept 头的内容协商拒绝(表示无法满足 Accept 的规范状态码应为 406 Not Acceptable,GitHub 使用 415 属于特例)——因此调整 Accept 头才是正确的着手点。
  • web_fetch 无论 format 如何都会将抓取到的内容归一化为纯文本,因此通过兜底拿到 JSON/HTML 对下游处理无害。

背景

  • 与 GLM 内容丢弃修复(PR fix(core): prevent GLM on DashScope from dropping web_fetch content #5599)相互独立;本问题是在验证该 PR 时发现的。两者无关——一个关乎内容如何发送给模型,本问题关乎内容如何从服务器抓取。
  • 范围仅限于 web_fetch 直接抓取路径中的 Accept 头;不改变已抓取内容的处理方式。

Metadata

Metadata

Assignees

No one assigned

    Labels

    category/toolsTool integration and executionpriority/P2Medium - Moderately impactful, noticeable problemscope/web-searchWeb search functionalitytype/bugSomething isn't working as expected

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions