Skip to content

fix(web_fetch): include fetched page content in ForLLM response#512

Closed
CrisisAlpha wants to merge 1 commit intosipeed:mainfrom
CrisisAlpha:fix/web-fetch-forllm-content
Closed

fix(web_fetch): include fetched page content in ForLLM response#512
CrisisAlpha wants to merge 1 commit intosipeed:mainfrom
CrisisAlpha:fix/web-fetch-forllm-content

Conversation

@CrisisAlpha
Copy link
Contributor

📝 Description

The web_fetch tool's ForLLM field only returned a metadata summary line (byte count, extractor type, truncation status), omitting the actual extracted page content. This meant the LLM could never read fetched web pages, causing it to loop calling web_fetch repeatedly until max_tool_iterations was hit.

The fix appends the extracted text content to the ForLLM response so the LLM can read and reason about fetched pages.

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Fixes #388

📚 Technical Context (Skip for Docs)

🧪 Test Environment

  • Hardware: Mac (x86_64)
  • OS: macOS
  • Model/Provider: N/A (unit tests only, no API keys needed)
  • Channels: N/A

📸 Evidence (Optional)

Click to view Logs/Screenshots

Before (broken): ForLLM returns only:

Fetched 5000 bytes from https://example.com (extractor: text, truncated: false)

After (fixed): ForLLM returns metadata + actual content:

Fetched 5000 bytes from https://example.com (extractor: text, truncated: false)

[actual page content here]

All tests pass: make fmt, make vet, make test

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

Made with Cursor

The web_fetch tool's ForLLM field only returned metadata (byte count,
extractor type, truncation status) but omitted the actual extracted text.
This caused the LLM to never see fetched page content, making the tool
effectively useless and causing repeated fetch loops.

Append the extracted text content to ForLLM so the LLM can read and
reason about fetched web pages.

Fixes sipeed#388

Co-authored-by: Cursor <[email protected]>
Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical bug fix — a one-line change that makes the web_fetch tool actually useful. Without this, the LLM saw only metadata and would loop until hitting max iterations.

The fix is correct: appending the actual text content to ForLLM with a double-newline separator.

Also good: the test fixes change && to || in multiple assertions (strings.Contains(x, "a") && strings.Contains(x, "b")||). The old tests using && with a negation (!a && !b means "neither contains a NOR b") would only fail if BOTH conditions were missing, so it could pass even if one was missing. The corrected || means the test fails if EITHER is missing, which is the right behavior.

One thing to keep in mind: the text variable can be quite large (up to the truncation limit). Since this now goes into ForLLM which is sent as a tool result to the LLM provider, make sure the content is already bounded by the existing truncation logic upstream. Looking at the existing code, text is already truncated by maxOutputLen, so this should be fine.

LGTM.

@lxowalle
Copy link
Collaborator

Hi @CrisisAlpha, could you try again on the latest branch? I've tested and confirmed that this issue has been fixed on the latest branch.

@CrisisAlpha
Copy link
Contributor Author

Hey @lxowalle, thanks for checking! Just synced with the latest main (cb0c870) and the issue appears to be present. In pkg/tools/web.go L486-492, ForLLM only contains the metadata summary (byte count, extractor, truncated flag). The actual fetched text is only included in ForUser as JSON.

Since the agent loop sends ForLLM (not ForUser) to the LLM as the tool result, the LLM never receives the page content. The user sees it in the chat, but the LLM can't reason about it.

Could you double-check? Happy to rebase if needed!

@nikolasdehor
Copy link

@lxowalle I can confirm CrisisAlpha's analysis is correct -- the bug is still present on main.

Looking at the code path:

  1. In pkg/tools/web.go L486-492, ForLLM is set to only the metadata summary line. The actual text content is only serialized into ForUser as part of the JSON result.

  2. In pkg/agent/loop.go L706-713, the agent loop sends ForLLM (not ForUser) to the LLM as the tool result. This is the correct design -- ForUser is for display to the human, ForLLM is what the model reasons about.

The net effect is that the LLM receives a metadata-only string with zero page content. The LLM has no way to read the fetched page, which causes it to loop calling web_fetch repeatedly until max_tool_iterations is exhausted.

This PR's fix (appending text to ForLLM) is the correct approach. The ForUser/ForLLM split is intentional architecture, and the content simply needs to be in both.

@lxowalle
Copy link
Collaborator

Okay, I'll double-check.

@lxowalle
Copy link
Collaborator

Hi @CrisisAlpha , I tested and confirmed that the message for user can be passed to the channel, but it cannot be passed to the LLM. I suggest modifying the method to:

	return &ToolResult{
		ForLLM: fmt.Sprintf(
			"Fetched %d bytes from %s (extractor: %s, truncated: %v): %s",
			len(text),
			urlStr,
			extractor,
			truncated,
			string(resultJSON),
		),
		ForUser: "",
	}

This ensures that the message will not be sent to the channel repeatedly.

@sipeed-bot sipeed-bot bot added type: bug Something isn't working domain: tool labels Mar 3, 2026
@lxowalle
Copy link
Collaborator

lxowalle commented Mar 5, 2026

#833 fixed

@lxowalle lxowalle closed this Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: tool type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

web_fetch tool: ForLLM response missing actual page content

3 participants