Skip to content

feat(agent): add web_fetch local tool with tests#2799

Closed
wwwillchen-bot wants to merge 9 commits intomainfrom
agent--1772051885862-1772051170
Closed

feat(agent): add web_fetch local tool with tests#2799
wwwillchen-bot wants to merge 9 commits intomainfrom
agent--1772051885862-1772051170

Conversation

@wwwillchen-bot
Copy link
Copy Markdown
Collaborator

@wwwillchen-bot wwwillchen-bot commented Feb 25, 2026

Summary

  • add a new web_fetch local agent tool that fetches HTTP(S) content with format conversion, size/time limits, and binary/text handling
  • register web_fetch in the local agent tool definitions so it is available to the toolset
  • add unit tests covering schema defaults, URL validation, response handling, limits, and error paths
  • fix local_agent_handler import/type issues so type checks pass

Test plan

  • npm run fmt && npm run lint:fix && npm run ts
  • npm test

🤖 Generated with Claude Code


Open with Devin

@wwwillchen
Copy link
Copy Markdown
Collaborator

@BugBot run

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @wwwillchen-bot, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the agent's capabilities by adding a versatile web fetching tool. This allows the agent to programmatically retrieve and process web content, expanding its ability to gather information from external sources. The change also includes necessary refactoring to maintain code quality and ensure the new functionality is well-tested and integrated.

Highlights

  • New web_fetch Local Agent Tool: A new web_fetch local agent tool has been added, enabling agents to fetch HTTP(S) content. This tool supports format conversion (text, markdown, html), enforces size and time limits, and handles both binary and text content.
  • Tool Registration: The newly created web_fetch tool has been registered within the local agent tool definitions, making it available for use by the agent's toolset.
  • Comprehensive Unit Tests: Extensive unit tests have been included for the web_fetch tool, covering various aspects such as schema defaults, URL validation, response handling, content size and time limits, and different error paths to ensure robustness.
  • Dependency and Type Fixes: Import paths and type annotations within local_agent_handler.ts were adjusted to resolve type check issues and ensure correct dependencies.
Changelog
  • src/pro/main/ipc/handlers/local_agent/local_agent_handler.ts
    • Updated the import path for ensureDyadGitignored to gitignoreUtils.
    • Added an unknown type annotation to the error caught in the ensureDyadGitignored call.
  • src/pro/main/ipc/handlers/local_agent/tool_definitions.ts
    • Imported the new webFetchTool.
    • Added webFetchTool to the TOOL_DEFINITIONS array, making it available to the agent.
  • src/pro/main/ipc/handlers/local_agent/tools/web_fetch.spec.ts
    • Added a new file containing unit tests for the web_fetch tool.
    • Tests cover URL validation, default format, HTML to text/markdown conversion, content-length limits, binary content handling, and error responses.
  • src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts
    • Added a new file implementing the web_fetch tool.
    • Defined the input schema with URL, format (text, markdown, html), and timeout parameters.
    • Implemented logic for URL validation, building Accept headers, MIME type checking, HTML entity decoding, whitespace collapsing, and removing non-content HTML tags.
    • Included functions to extract plain text from HTML and convert HTML to Markdown.
    • Implemented the execute method to perform web requests, handle timeouts, check response status, enforce size limits, and process content based on the requested format.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@wwwillchen
Copy link
Copy Markdown
Collaborator

@BugBot run

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 25, 2026

Greptile Summary

This PR adds a new web_fetch local agent tool for fetching HTTP(S) content with comprehensive security protections and format conversion capabilities.

Key Changes:

  • Implements web_fetch tool with SSRF protections: DNS resolution validation to block domains resolving to private IPs, manual redirect handling with per-redirect validation, IP blocking for private ranges (10.x, 172.16.x, 192.168.x, 127.x, 169.254.x, IPv6 link-local/unique-local)
  • Enforces security limits: 5MB response size limit (enforced via streaming), configurable timeout (max 120s, propagates to stream reading), max 5 redirects with validation
  • Supports multiple output formats (markdown, text, html) with regex-based HTML conversion
  • Includes comprehensive test suite (400 lines) covering all security scenarios
  • Registers tool in tool_definitions.ts with "ask" consent default
  • Fixes type issues in local_agent_handler.ts (import path and error type annotation)

Security Assessment:

  • DNS rebinding vulnerability remains theoretically possible (TOCTOU between validation and fetch) but requires sophisticated attack with low-TTL DNS and precise timing
  • Previous review comments on SSRF, redirect handling, and timeout enforcement have been addressed
  • One minor gap: Alibaba Cloud metadata endpoint (100.100.100.200) not blocked, though only accessible from within Alibaba Cloud VMs

Confidence Score: 4/5

  • This PR is safe to merge with noted security limitations
  • Strong implementation with comprehensive SSRF protections, good test coverage, and proper integration. Score reflects one minor security gap (Alibaba Cloud metadata) and inherent DNS rebinding limitation (acknowledged TOCTOU issue that's difficult to fully mitigate). Previous review concerns have been addressed.
  • Pay attention to web_fetch.ts security implementation, particularly the cloud metadata endpoint gap

Important Files Changed

Filename Overview
src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts Implements new web_fetch tool with comprehensive SSRF protections including DNS resolution validation, manual redirect handling, timeout enforcement, and streaming size limits. Minor gap: Alibaba Cloud metadata endpoint (100.100.100.200) not blocked.
src/pro/main/ipc/handlers/local_agent/tools/web_fetch.spec.ts Comprehensive test suite covering schema validation, SSRF protections (IPv4, IPv6, DNS resolution, redirects), size limits, timeouts, and content type handling. All security scenarios well-tested.
src/pro/main/ipc/handlers/local_agent/tool_definitions.ts Added import and registration of webFetchTool to tool definitions array. Clean integration with no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start([web_fetch called]) --> Validate[validateHttpUrl: Check protocol & private IP]
    Validate -->|Invalid| Error1[Throw error]
    Validate -->|Valid| DNS[resolveAndValidateHost: DNS lookup]
    DNS -->|Resolves to private IP| Error2[Throw error]
    DNS -->|Public IP| Fetch[Fetch with timeout & redirect:manual]
    Fetch --> CheckStatus{Status code?}
    CheckStatus -->|301-308 redirect| CheckRedirCount{Redirect count < 5?}
    CheckRedirCount -->|No| Error3[Throw: Too many redirects]
    CheckRedirCount -->|Yes| ValidateRedirect[Validate redirect URL & DNS]
    ValidateRedirect -->|Private IP| Error4[Throw error]
    ValidateRedirect -->|Valid| Fetch
    CheckStatus -->|Non-2xx| Error5[Throw: Request failed]
    CheckStatus -->|2xx| CheckSize{Content-Length > 5MB?}
    CheckSize -->|Yes| Error6[Throw: Response too large]
    CheckSize -->|No| Stream[Stream body with size limit]
    Stream -->|Exceeds 5MB| Error7[Cancel & throw error]
    Stream -->|Success| CheckMime{Binary content?}
    CheckMime -->|Image/Binary| ReturnBinary[Return binary summary]
    CheckMime -->|Text-like| Convert[Convert HTML if needed]
    Convert --> Truncate[Truncate to 60K chars]
    Truncate --> Success([Return content])
Loading

Last reviewed commit: 6b234fb

devin-ai-integration[bot]

This comment was marked as resolved.

gemini-code-assist[bot]

This comment was marked as resolved.

- Move &amp; entity replacement to last position in decodeHtmlEntities
  to prevent double-unescaping (e.g. &amp;lt; → &lt; → <)
- Add \s* before > in closing tag regexes in removeNonContentTags
  to match tags with whitespace like </script >

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@wwwillchen
Copy link
Copy Markdown
Collaborator

@BugBot run

@github-actions
Copy link
Copy Markdown
Contributor

✅ Claude Code completed successfully

Summary

  • Fixed 1 CI failure (CodeQL — 2 high-severity security findings in src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts)
  • No review comments from trusted authors to address (2 threads from untrusted author github-advanced-security were skipped per policy)
Details

CodeQL Fix 1: Double HTML entity unescaping

  • decodeHtmlEntities() was replacing &amp; before other entities, causing double-unescaping (e.g., &amp;lt;&lt;<)
  • Fix: Moved &amp; replacement to last position so other entities are decoded first

CodeQL Fix 2: Bad HTML filtering regexp

  • removeNonContentTags() used regexes like <\/script> that didn't match closing tags with whitespace (e.g., </script >)
  • Fix: Added \s* before > in all closing tag regexes (e.g., <\/script\s*>)

Workflow run

cubic-dev-ai[bot]

This comment was marked as resolved.


function removeNonContentTags(html: string): string {
return html
.replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, " ")

Check failure

Code scanning / CodeQL

Bad HTML filtering regexp High

This regular expression does not match script end tags like </script\t\n bar>.

Copilot Autofix

AI about 2 months ago

In general, the problem is that the regexes in removeNonContentTags assume that closing tags like </script> have only optional whitespace before the >. Browsers accept </script ...> and similar forms as end tags, so content between <script> and such a malformed closing tag will not be removed. To fix this, the end-tag patterns should be relaxed to allow any characters except > between the tag name and the closing >, not just whitespace. We should apply the same fix consistently to all the non-content tag removals using the same pattern shape.

The best targeted fix is to change each closing-tag fragment from </tag\s*> to </tag[^>]*>. This preserves the current behavior (non-greedy match between start and end, [\s\S]*?, and case-insensitivity) while broadening the recognized end-tag forms to include attributes, tabs, newlines, and other characters up to the >. Concretely, in src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts inside removeNonContentTags, update lines 211–216:

  • Change /<script\b[^>]*>[\s\S]*?<\/script\s*>/gi to /<script\b[^>]*>[\s\S]*?<\/script[^>]*>/gi
  • Change /<style\b[^>]*>[\s\S]*?<\/style\s*>/gi to /<style\b[^>]*>[\s\S]*?<\/style[^>]*>/gi
  • Change /<noscript\b[^>]*>[\s\S]*?<\/noscript\s*>/gi to /<noscript\b[^>]*>[\s\S]*?<\/noscript[^>]*>/gi
  • Change /<iframe\b[^>]*>[\s\S]*?<\/iframe\s*>/gi to /<iframe\b[^>]*>[\s\S]*?<\/iframe[^>]*>/gi
  • Change /<object\b[^>]*>[\s\S]*?<\/object\s*>/gi to /<object\b[^>]*>[\s\S]*?<\/object[^>]*>/gi
  • Change /<embed\b[^>]*>[\s\S]*?<\/embed\s*>/gi to /<embed\b[^>]*>[\s\S]*?<\/embed[^>]*>/gi

No new imports or dependencies are needed; we are only adjusting the regex literals already in use.

Suggested changeset 1
src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts b/src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts
--- a/src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts
+++ b/src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts
@@ -208,12 +208,12 @@
 // Handles common tags; approximate conversion is acceptable for this tool's use case.
 function removeNonContentTags(html: string): string {
   return html
-    .replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, " ")
-    .replace(/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, " ")
-    .replace(/<noscript\b[^>]*>[\s\S]*?<\/noscript\s*>/gi, " ")
-    .replace(/<iframe\b[^>]*>[\s\S]*?<\/iframe\s*>/gi, " ")
-    .replace(/<object\b[^>]*>[\s\S]*?<\/object\s*>/gi, " ")
-    .replace(/<embed\b[^>]*>[\s\S]*?<\/embed\s*>/gi, " ");
+    .replace(/<script\b[^>]*>[\s\S]*?<\/script[^>]*>/gi, " ")
+    .replace(/<style\b[^>]*>[\s\S]*?<\/style[^>]*>/gi, " ")
+    .replace(/<noscript\b[^>]*>[\s\S]*?<\/noscript[^>]*>/gi, " ")
+    .replace(/<iframe\b[^>]*>[\s\S]*?<\/iframe[^>]*>/gi, " ")
+    .replace(/<object\b[^>]*>[\s\S]*?<\/object[^>]*>/gi, " ")
+    .replace(/<embed\b[^>]*>[\s\S]*?<\/embed[^>]*>/gi, " ");
 }
 
 function extractTextFromHtml(html: string): string {
EOF
@@ -208,12 +208,12 @@
// Handles common tags; approximate conversion is acceptable for this tool's use case.
function removeNonContentTags(html: string): string {
return html
.replace(/<script\b[^>]*>[\s\S]*?<\/script\s*>/gi, " ")
.replace(/<style\b[^>]*>[\s\S]*?<\/style\s*>/gi, " ")
.replace(/<noscript\b[^>]*>[\s\S]*?<\/noscript\s*>/gi, " ")
.replace(/<iframe\b[^>]*>[\s\S]*?<\/iframe\s*>/gi, " ")
.replace(/<object\b[^>]*>[\s\S]*?<\/object\s*>/gi, " ")
.replace(/<embed\b[^>]*>[\s\S]*?<\/embed\s*>/gi, " ");
.replace(/<script\b[^>]*>[\s\S]*?<\/script[^>]*>/gi, " ")
.replace(/<style\b[^>]*>[\s\S]*?<\/style[^>]*>/gi, " ")
.replace(/<noscript\b[^>]*>[\s\S]*?<\/noscript[^>]*>/gi, " ")
.replace(/<iframe\b[^>]*>[\s\S]*?<\/iframe[^>]*>/gi, " ")
.replace(/<object\b[^>]*>[\s\S]*?<\/object[^>]*>/gi, " ")
.replace(/<embed\b[^>]*>[\s\S]*?<\/embed[^>]*>/gi, " ");
}

function extractTextFromHtml(html: string): string {
Copilot is powered by AI and may make mistakes. Always verify output.
devin-ai-integration[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

@wwwillchen
Copy link
Copy Markdown
Collaborator

@BugBot run

@github-actions
Copy link
Copy Markdown
Contributor

✅ Claude Code completed successfully

Summary

  • Addressed 16 of 19 unresolved review threads
  • 3 remaining threads are CodeQL code scanning alerts (from github-advanced-security) that auto-resolve on re-scan and cannot be manually resolved via the API
  • Updated 3 E2E snapshot files to include the new web_fetch tool definition
  • Added 9 new unit tests for comprehensive coverage

Review Comments Addressed

Thread Author Issue Resolution
#4, #5, #6 greptile-apps SSRF via DNS rebinding, hostname bypass, redirect bypass Added resolveAndValidateHost() with DNS resolution, manual redirect handling with MAX_REDIRECTS=5
#7, #8 chatgpt-codex-connector Redirect revalidation, IPv6 private ranges Redirect loop validates each hop; isPrivateIp() now covers IPv6
#9, #10 devin-ai-integration IPv4-mapped IPv6, redirect SSRF Added IPv4-mapped IPv6 detection (dotted + hex forms), :: detection
#11 greptile-apps Entity decode order suggestion Replied explaining current order is correct (decode &amp; last prevents double-unescaping)
#12 greptile-apps Stream timeout concern Added comment clarifying abort signal propagates to stream reading
#13, #14 chatgpt-codex-connector Duplicates of #4-7 Resolved (same fixes apply)
#15 github-actions Nested try-catch with dead AbortError branch Flattened try-catch, removed dead code
#16 github-actions collapseWhitespace mangling code blocks Changed \s+ to [ \t]+ to preserve newlines
#17 github-actions Missing comment explaining regex choice Added comment explaining why regex is used over DOM parser
#18 github-actions Missing test coverage Added 9 new tests (IPv6, redirects, DNS, timeout, truncation, binary)
#19 github-actions Transient rule in local-agent-tools.md Removed the transient bug fix line

Unresolvable Threads (CodeQL Alerts)

Thread Issue Status
#1 Double-escaping in sanitizeHtml Code is correct; alert will auto-resolve on re-scan
#2 HTML regex usage Intentional for performance; comment added explaining choice
#3 Double-escaping variant Same as #1; auto-resolves
Key Code Changes

web_fetch.ts:

  • Added import { lookup } from "node:dns/promises" for DNS resolution
  • Added MAX_REDIRECTS = 5 constant
  • Extracted isPrivateIpv4() helper from isPrivateIp()
  • Enhanced isPrivateIp() with full IPv6 support (fc00::/7, fe80::/10, IPv4-mapped, unspecified)
  • Added resolveAndValidateHost() for pre-fetch DNS validation
  • Rewrote execute() with redirect: "manual" and validation loop
  • Fixed collapseWhitespace regex (\s+[ \t]+)
  • Tightened closing tag regexes ([^>]*>\s*>)

web_fetch.spec.ts:

  • Added mock for node:dns/promises
  • New tests: IPv6 private addresses, IPv4-mapped IPv6, non-image binary, truncation, timeout, redirect blocking, safe redirect following, DNS resolution blocking, streaming body size limit

E2E Snapshots:

  • local_agent_auto.spec.ts snapshot (flat format)
  • local_agent_basic.spec.ts snapshot (nested function format)
  • local_agent_ask.spec.ts snapshot (nested function format)

Workflow run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e07088d1ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +310 to +312
response = await fetch(currentUrl, {
signal: abortController.signal,
redirect: "manual",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bind fetch to validated address to prevent DNS rebinding

The SSRF guard validates hostnames with resolveAndValidateHost(), but the subsequent fetch(currentUrl, ...) still resolves DNS again from the hostname, so an attacker-controlled domain can return a public IP during validation and then rebind to a private/internal IP before the actual request. In that scenario the tool can still reach blocked networks despite the private-IP checks. Please pin requests to the validated address (or enforce a resolver/connection policy that reuses the checked IP) for each hop.

Useful? React with 👍 / 👎.

@wwwillchen
Copy link
Copy Markdown
Collaborator

@BugBot run

Cast vi.mocked(lookup) to any for the array-returning overload
used with { all: true } option, fixing TS2739 errors in CI.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@wwwillchen
Copy link
Copy Markdown
Collaborator

@BugBot run

@github-actions
Copy link
Copy Markdown
Contributor

✅ Claude Code completed successfully

Summary

  • Review comments: No unresolved threads from trusted authors (all 33 trusted-author threads were already resolved). 3 unresolved threads from github-advanced-security (untrusted, skipped).
  • CI failure fixed: TypeScript build error (TS2739) in web_fetch.spec.tsvi.mocked(lookup) typed as single-result overload but mocked with array for { all: true } variant. Fixed by casting to any.
  • Formatting: Applied oxfmt formatting fixes to web_fetch.spec.ts, web_fetch.ts, and rules/local-agent-tools.md.
Details

CI Failure (macOS build):

src/pro/main/ipc/handlers/local_agent/tools/web_fetch.spec.ts(45,41): error TS2739: Type '{ address: string; family: number; }[]' is missing the following properties from type 'LookupAddress': address, family
src/pro/main/ipc/handlers/local_agent/tools/web_fetch.spec.ts(355,41): error TS2739: Type '{ address: string; family: number; }[]' is missing the following properties from type 'LookupAddress': address, family

Fix: Changed vi.mocked(lookup).mockResolvedValue(...) to (vi.mocked(lookup) as any).mockResolvedValue(...) at lines 45 and 355. The lookup function from node:dns/promises has overloads — when called with { all: true } it returns LookupAddress[], but vi.mocked() resolves to the single-result overload type.

Untrusted commenters skipped: github-advanced-security (3 threads, all outdated)


Workflow run

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b234fb5ca

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +310 to +313
response = await fetch(currentUrl, {
signal: abortController.signal,
redirect: "manual",
headers: {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pin outbound request to validated address

The SSRF guard can be bypassed via DNS rebinding because resolveAndValidateHost() validates one DNS answer, but fetch(currentUrl, ...) resolves the hostname again when opening the socket. For attacker-controlled domains that return a public IP during validation and a private/internal IP on the subsequent fetch resolution, this code will still issue the request to the internal target. To make the private-network block effective, the request needs to be bound to the validated IP (or use a custom lookup/connection policy that enforces the checked addresses).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 13 additional findings in Devin Review.

Open in Devin Review

Comment on lines +58 to +94
function isPrivateIp(hostname: string): boolean {
// IPv6 loopback
if (hostname === "[::1]" || hostname === "::1") return true;

// Strip brackets for IPv6
const ip = hostname.replace(/^\[|\]$/g, "");
const lowerIp = ip.toLowerCase();

// Unspecified address
if (lowerIp === "::" || /^0(:0){7}$/.test(lowerIp)) return true;

// Unique local addresses (fc00::/7)
if (/^f[cd]/i.test(lowerIp)) return true;

// Link-local addresses (fe80::/10)
if (/^fe[89ab]/i.test(lowerIp)) return true;

// IPv4-mapped IPv6 in dotted form (::ffff:x.x.x.x)
const v4MappedMatch = lowerIp.match(/^::ffff:(\d+\.\d+\.\d+\.\d+)$/);
if (v4MappedMatch) {
return isPrivateIpv4(v4MappedMatch[1]);
}

// IPv4-mapped IPv6 in hex form (e.g. ::ffff:7f00:1 = ::ffff:127.0.0.1)
const v4MappedHexMatch = lowerIp.match(
/^::ffff:([0-9a-f]{1,4}):([0-9a-f]{1,4})$/,
);
if (v4MappedHexMatch) {
const high = parseInt(v4MappedHexMatch[1], 16);
const low = parseInt(v4MappedHexMatch[2], 16);
const reconstructed = `${(high >> 8) & 0xff}.${high & 0xff}.${(low >> 8) & 0xff}.${low & 0xff}`;
return isPrivateIpv4(reconstructed);
}

// IPv4 patterns
return isPrivateIpv4(ip);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 IPv6 prefix checks in isPrivateIp falsely block legitimate hostnames

The isPrivateIp function applies IPv6 ULA and link-local prefix regex checks to raw hostnames without first verifying the input is actually an IPv6 address. This causes legitimate domains to be rejected as "private or internal network addresses".

Root Cause and Impact

The function is called with parsed.hostname in validateHttpUrl (web_fetch.ts:146), which is a regular domain name for most URLs. The IPv6 checks at lines 70 and 73 use regexes that match common hostname prefixes:

  • /^f[cd]/i (line 70, intended for fc00::/7 ULA addresses) matches any hostname starting with "fc" or "fd" — e.g. fcc.gov, fdic.gov, fcbarcelona.com, fca.com
  • /^fe[89ab]/i (line 73, intended for fe80::/10 link-local addresses) matches any hostname starting with "fe" + 8/9/a/b — e.g. feather.io, features.example.com, feast.dev

Trace for https://fcc.gov/:

  1. validateHttpUrlparsed.hostname = "fcc.gov"
  2. isPrivateIp("fcc.gov") → line 70: /^f[cd]/i.test("fcc.gov")true
  3. Error thrown: "URL points to a private or internal network address"

The fix is to only apply IPv6 prefix checks when the input actually contains : (a character present in all IPv6 addresses but not in domain names).

Impact: Users cannot fetch content from any website whose domain starts with fc, fd, fe8, fe9, fea, or feb. This includes US government sites like fcc.gov and fdic.gov.

Suggested change
function isPrivateIp(hostname: string): boolean {
// IPv6 loopback
if (hostname === "[::1]" || hostname === "::1") return true;
// Strip brackets for IPv6
const ip = hostname.replace(/^\[|\]$/g, "");
const lowerIp = ip.toLowerCase();
// Unspecified address
if (lowerIp === "::" || /^0(:0){7}$/.test(lowerIp)) return true;
// Unique local addresses (fc00::/7)
if (/^f[cd]/i.test(lowerIp)) return true;
// Link-local addresses (fe80::/10)
if (/^fe[89ab]/i.test(lowerIp)) return true;
// IPv4-mapped IPv6 in dotted form (::ffff:x.x.x.x)
const v4MappedMatch = lowerIp.match(/^::ffff:(\d+\.\d+\.\d+\.\d+)$/);
if (v4MappedMatch) {
return isPrivateIpv4(v4MappedMatch[1]);
}
// IPv4-mapped IPv6 in hex form (e.g. ::ffff:7f00:1 = ::ffff:127.0.0.1)
const v4MappedHexMatch = lowerIp.match(
/^::ffff:([0-9a-f]{1,4}):([0-9a-f]{1,4})$/,
);
if (v4MappedHexMatch) {
const high = parseInt(v4MappedHexMatch[1], 16);
const low = parseInt(v4MappedHexMatch[2], 16);
const reconstructed = `${(high >> 8) & 0xff}.${high & 0xff}.${(low >> 8) & 0xff}.${low & 0xff}`;
return isPrivateIpv4(reconstructed);
}
// IPv4 patterns
return isPrivateIpv4(ip);
}
function isPrivateIp(hostname: string): boolean {
// IPv6 loopback
if (hostname === "[::1]" || hostname === "::1") return true;
// Strip brackets for IPv6
const ip = hostname.replace(/^\[|\]$/g, "");
const lowerIp = ip.toLowerCase();
// Only apply IPv6-specific checks if the value looks like an IPv6 address
if (lowerIp.includes(":")) {
// Unspecified address
if (lowerIp === "::" || /^0(:0){7}$/.test(lowerIp)) return true;
// Unique local addresses (fc00::/7)
if (/^f[cd]/i.test(lowerIp)) return true;
// Link-local addresses (fe80::/10)
if (/^fe[89ab]/i.test(lowerIp)) return true;
// IPv4-mapped IPv6 in dotted form (::ffff:x.x.x.x)
const v4MappedMatch = lowerIp.match(/^::ffff:(\d+\.\d+\.\d+\.\d+)$/);
if (v4MappedMatch) {
return isPrivateIpv4(v4MappedMatch[1]);
}
// IPv4-mapped IPv6 in hex form (e.g. ::ffff:7f00:1 = ::ffff:127.0.0.1)
const v4MappedHexMatch = lowerIp.match(
/^::ffff:([0-9a-f]{1,4}):([0-9a-f]{1,4})$/,
);
if (v4MappedHexMatch) {
const high = parseInt(v4MappedHexMatch[1], 16);
const low = parseInt(v4MappedHexMatch[2], 16);
const reconstructed = `${(high >> 8) & 0xff}.${high & 0xff}.${(low >> 8) & 0xff}.${low & 0xff}`;
return isPrivateIpv4(reconstructed);
}
}
// IPv4 patterns
return isPrivateIpv4(ip);
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Dyadbot Code Review Summary

Verdict: ✅ YES - Ready to merge

Reviewed by 3 independent agents: Correctness Expert, Code Health Expert, UX Wizard.

Note: Previous review's issues have been largely addressed. This review covers the current state of the diff.

Issues Summary

Severity File Issue
🟡 MEDIUM web_fetch.ts:44-56 isPrivateIpv4 missing 100.64.0.0/10 (Carrier-Grade NAT) range
🟡 MEDIUM web_fetch.ts:341 HTTP error message only shows status code, not status text
🟢 Low Priority Notes (9 items)
  • Content-Length NaN fallthrough - web_fetch.ts:344: Malformed content-length header causes NaN > limit to be false, silently skipping the pre-check. Streaming limit still protects.
  • IPv4-compatible IPv6 bypass - web_fetch.ts:58-94: Deprecated ::127.0.0.1 form (without ffff:) may normalize to ::7f00:1 which isn't caught by current patterns.
  • Unreachable default branch - web_fetch.ts:164: buildAcceptHeader default case is unreachable since all enum values are covered.
  • Redundant text/xml check - web_fetch.ts:179: text/xml is already covered by mime.startsWith("text/").
  • DESCRIPTION duplicates constants - web_fetch.ts:29-40: Hardcoded limits in description string could drift from actual constants.
  • Binary image message confusing - web_fetch.ts:393: "Use a URL-accessible image endpoint" is unclear guidance for the agent.
  • Tool description missing truncation limit - web_fetch.ts:29-40: No mention of 60K character output truncation in the tool description.
  • Truncation lacks total size - web_fetch.ts:260-267: Truncation message doesn't indicate total document size.
  • Unrecognized 3xx status codes - web_fetch.ts:340: Status 300 (Multiple Choices) falls through to generic "Request failed" error.
🚫 Dropped False Positives (6 items)
  • Duplicated HTML converter tail logic - Dropped: The two functions share minor structural similarity (~5 lines) but extracting a shared helper would hurt readability for marginal DRY benefit.
  • Mock AgentContext duplication - Dropped: Project-wide test pattern concern, not specific to this PR.
  • Non-HTML text not wrapped for markdown format - Dropped: The format parameter controls HTML conversion behavior, not generic content wrapping. Passing through JSON/plain text as-is is correct behavior.
  • Consent preview shows full URL - Dropped: Showing the full URL is expected behavior for a consent prompt.
  • Missing buildXml for streaming UI - Dropped: Feature enhancement, not a review finding.
  • Redirect missing Location error message too technical - Dropped: Technical message is appropriate for an HTTP protocol violation.

Generated by Dyadbot multi-agent code review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review: 2 issue(s) found

if (a === 0) return true; // 0.0.0.0/8
}
return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | security

isPrivateIpv4 does not block 100.64.0.0/10 (Carrier-Grade NAT) range

The function blocks RFC 1918 ranges, loopback, link-local, and 0.0.0.0/8, but not 100.64.0.0/10 (RFC 6598 Carrier-Grade NAT). Cloud providers like AWS use addresses in this range for internal VPC endpoints. Similarly, 198.18.0.0/15 (benchmark testing) and 240.0.0.0/4 (reserved) are not blocked.

Note: Lower risk in a desktop Electron context than in a server-side environment.

💡 Suggestion:

if (a === 100 && b >= 64 && b <= 127) return true; // 100.64.0.0/10 Carrier-Grade NAT
if (a === 198 && (b === 18 || b === 19)) return true; // 198.18.0.0/15 benchmark
if (a >= 240) return true; // 240.0.0.0/4 reserved

}

if (!response.ok) {
throw new Error(`Request failed with status code: ${response.status}`);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM | error-messages

HTTP error message only shows status code, not status text

When a fetch returns a non-2xx status (e.g., 403, 429, 500), the error message is Request failed with status code: 404. The agent gets no indication of why the request failed. Common cases like 403 (blocked by site), 429 (rate limited), and 503 (service unavailable) would benefit from human-readable context so the agent can decide whether to retry.

💡 Suggestion: Include response.statusText:

throw new Error(`Request failed: ${response.status} ${response.statusText}`);

@github-actions github-actions bot added cc:request:1 needs-human:review-issue ai agent flagged an issue that requires human review labels Feb 25, 2026
@wwwillchen wwwillchen closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cc:failed cc:request:1 cc:request:2 needs-human:review-issue ai agent flagged an issue that requires human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants