feat: add browser automation tool via agent-browser CLI by Danieldd28 · Pull Request #318 · sipeed/picoclaw

Danieldd28 · 2026-02-16T15:38:35Z

Summary

Integrate the agent-browser CLI as a lightweight browser automation tool for PicoClaw. This replaces the previous PR #308 (ActionBook approach) with a much leaner design that wraps an external CLI binary instead of embedding browser dependencies.

Design

Instead of pulling in heavy Go browser libraries (chromedp, etc.), this delegates all browser complexity to the external agent-browser binary via exec.Command. PicoClaw stays lean:

Zero new Go dependencies (all stdlib: os/exec, bytes, fmt, strings, time)
~200 bytes RAM overhead when enabled, zero when disabled
Browser engine runs in a separate process (OS-level memory isolation)
Binary size increase: ~5KB

Changes

New files

pkg/tools/browser.go - BrowserTool wrapping agent-browser CLI
pkg/tools/browser_test.go - 11 unit tests

Modified files

pkg/config/config.go - Add BrowserConfig (enabled, session, headless, timeout, cdp_port)
pkg/agent/loop.go - Register browser tool conditionally

Configuration

{
  "tools": {
    "browser": {
      "enabled": true,
      "headless": false,
      "timeout": 60,
      "cdp_port": 9222
    }
  }
}

How it works

The tool exposes a single browser tool with a command parameter. The LLM constructs agent-browser subcommands directly:

browser open https://example.com
browser snapshot -i
browser click @e2
browser fill @e3 "text"
browser close

Global flags (--cdp, --headed, --session, --json) are added automatically based on config.

Testing

All 11 unit tests pass
Full test suite passes with zero regressions
Manual integration tested with headed browser on CDP port 9222

Prerequisite

Requires agent-browser CLI installed: npm install -g @anthropic/agent-browser

Integrate agent-browser CLI as a lightweight browser automation tool. Instead of embedding browser dependencies, this wraps the external agent-browser binary via exec.Command, keeping PicoClaw lean. Changes: - Add BrowserTool (pkg/tools/browser.go) wrapping agent-browser CLI - Add BrowserConfig to config with enabled, session, headless, timeout, cdp_port - Register browser tool conditionally in agent loop - Add unit tests for argument building, command splitting, error handling The tool accepts a single 'command' parameter and delegates to agent-browser. Default CDP port is 9222. Zero new Go dependencies - all stdlib imports.

Copilot

Pull request overview

Integrates the external agent-browser CLI as an optional PicoClaw tool to enable browser automation without adding embedded browser dependencies.

Changes:

Added a new browser tool that shells out to agent-browser with config-driven global flags.
Introduced tools.browser configuration (enabled/session/headless/timeout/cdp_port) with defaults.
Conditionally registered the browser tool in the agent tool registry when enabled.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
pkg/tools/browser.go	Implements `BrowserTool` wrapper around the `agent-browser` binary, including arg building and output handling.
pkg/tools/browser_test.go	Adds unit tests for tool metadata, parameter schema, and command/arg parsing helpers.
pkg/config/config.go	Adds `BrowserConfig` under `ToolsConfig` and wires defaults.
pkg/agent/loop.go	Registers the browser tool when `cfg.Tools.Browser.Enabled` is true.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T15:42:15Z

pkg/tools/browser.go

+type BrowserToolOptions struct {
+	Session  string // Session name for isolation
+	Headless bool   // Run in headless mode (default true)
+	Timeout  int    // Command timeout in seconds (default 30)
+	CDPPort  int    // Chrome DevTools Protocol port (default 9222)
+}


BrowserToolOptions says Headless has a default of true, but NewBrowserTool currently uses the bool zero-value (false) when opts.Headless isn’t explicitly set, which makes the tool run in headed mode by default (because buildArgs adds --headed when !t.headless). Either implement an explicit default-to-headless behavior (e.g., tri-state/pointer bool) or update the option comment/tests/docs so the default behavior is unambiguous and consistent.

Copilot · 2026-02-16T15:42:16Z

pkg/tools/browser.go

+	}
+	if current.Len() > 0 {
+		args = append(args, current.String())
+	}


splitCommand currently drops empty quoted arguments. For example, fill @e3 "" will produce no argument for the empty string because the final append is gated on current.Len() > 0. This breaks commands where an empty string is a valid parameter; consider tracking whether an argument was quoted so empty quoted args are preserved, and add a unit test for this case.

Copilot · 2026-02-16T15:42:16Z

pkg/tools/browser.go

+	}
+
+	// Build the full agent-browser command line
+	cmdArgs := t.buildArgs(command)


Execute() only validates that the raw command string is non-empty, but buildArgs/splitCommand can still return an empty subcommand (e.g., command set to "" or just quotes). In that case this will invoke agent-browser with only global flags, which is likely to fail with a confusing error. Consider validating that the parsed cmdArgs has at least 1 token and returning a clear ErrorResult if not.

Suggested change

cmdArgs := t.buildArgs(command)

cmdArgs := t.buildArgs(command)

if len(cmdArgs) == 0 {

return ErrorResult("parsed command is empty; provide an agent-browser subcommand (e.g. 'open https://example.com')")

}

Copilot · 2026-02-16T15:42:16Z

pkg/tools/browser.go

+}
+
+func (t *BrowserTool) Description() string {
+	return `Automate a headless browser via agent-browser CLI. Pass the subcommand as 'command'.


Description() starts with "Automate a headless browser…", but the tool can run headed mode when configured (via --headed when Headless=false). Consider adjusting the wording so it doesn’t promise headless operation unconditionally.

Suggested change

return `Automate a headless browser via agent-browser CLI. Pass the subcommand as 'command'.

return `Automate a browser (headless by default) via the agent-browser CLI. Pass the subcommand as 'command'.

Leeaandrob · 2026-02-16T19:27:37Z

@Zepan This PR addresses roadmap issue #293 (Autonomous Browser Operations — priority: high). It uses agent-browser CLI as a subprocess — lightweight approach that avoids embedding a full browser engine.

Note: PR #187 also targets browser automation but uses playwright-go (+1055 lines, heavier dependency). For PicoClaw's ultra-lightweight philosophy, the CLI subprocess approach in this PR is arguably more aligned.

Recommendation: Review both #318 and #187, pick one approach. The CLI subprocess model (this PR) is more consistent with PicoClaw's existing pattern (see: Codex CLI provider, Claude CLI provider). Playwright-go would add significant binary size.

nikolasdehor

Clean and well-scoped design. Delegating to an external binary is a pragmatic approach that keeps picoclaw lightweight. A few items to address:

1. Command injection via splitCommand

The splitCommand function is custom-written and does not handle escape characters (e.g., backslash-escaped quotes). While picoclaw's threat model assumes a trusted LLM, if the model hallucinates or is prompt-injected, arguments with embedded quotes would be mishandled. The function strips quotes but does not handle escaped quotes within strings. Consider using shlex-style parsing or at minimum documenting this limitation.

2. No binary existence check

If agent-browser is not installed, the tool will return a confusing exec: "agent-browser": executable file not found in $PATH error. Consider checking for the binary at registration time (during startup) and either logging a warning or skipping registration entirely, similar to how Whisper checks IsAvailable().

3. stderr filtering is fragile

if !strings.Contains(errOut, "Daemon started") {

This hard-codes a specific string from the agent-browser daemon. Any change in the external tool's output format will break this filter. Consider suppressing all stderr unless the exit code is non-zero, which is a more robust heuristic.

4. Missing config.example.json update

The browser config section is not added to the example config file, unlike all other tools in the codebase.

5. Test coverage

Good unit tests for arg parsing. The Execute tests are missing -- even a test that verifies the error path when agent-browser is not installed would be valuable.

Overall a solid addition. The items above are minor. Would like to see the config example updated and the binary check added.

nikolasdehor

Well-designed browser automation tool. The exec.Command delegation to agent-browser keeps PicoClaw lean (zero Go dependencies for browser) while providing full browser control. The splitCommand parser, timeout handling, and output truncation are all solid.

Issues and observations:

Command injection via splitCommand: The splitCommand function splits user input into arguments but does not handle all shell escaping edge cases. For example, a command like 'eval "rm -rf /"' would be split and passed to agent-browser as-is. While agent-browser is the actual binary being invoked (not a shell), the user-provided command string is fully trusted. Since the LLM constructs these commands, and agent-browser is a separate binary, the real risk is low, but worth noting.
splitCommand does not handle escaped quotes: Input like 'fill @e3 "she said "hello""' will not parse correctly because there is no backslash-escape handling inside quoted strings. This could cause issues with URLs containing quotes or form fields with special characters.
No binary existence check: If agent-browser is not installed, every Execute call will fail with a cryptic exec error. Consider checking for the binary in NewBrowserTool (via exec.LookPath) and returning a helpful error message in the tool description or at registration time.
Session isolation: The session flag is configurable but defaults to empty string, meaning all concurrent chats share the same browser state. For multi-user gateway deployments, this could cause cross-session contamination. Consider using the chat ID as a session identifier by default.
The test for buildArgs uses Headless: false (default from BrowserToolOptions{}) which adds --headed. But the default in config is Headless: true. The tests should mirror the default config to catch regressions.

Good feature. The design decision to wrap an external CLI is the right one for keeping the binary small.

CLAassistant · 2026-03-05T15:11:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

amirmamaghani · 2026-03-21T14:25:25Z

I made a simplier version which only needs a skill file. tested and working smoothly.

#1861

Copilot AI review requested due to automatic review settings February 16, 2026 15:38

Copilot started reviewing on behalf of Danieldd28 February 16, 2026 15:39 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

Leeaandrob mentioned this pull request Feb 16, 2026

feat: add browser tool powered by playwright-go #187

Open

5 tasks

lxowalle mentioned this pull request Feb 22, 2026

[Task] Refactor the tools system #634

Closed

5 tasks

nikolasdehor reviewed Feb 23, 2026

View reviewed changes

This was referenced Feb 27, 2026

Feat/refactor tools system #841

Closed

Feat/refactor tools system #846

Closed

sipeed-bot bot added type: enhancement New feature or request domain: tool domain: config labels Mar 3, 2026

nikolasdehor reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add browser automation tool via agent-browser CLI#318

feat: add browser automation tool via agent-browser CLI#318
Danieldd28 wants to merge 1 commit intomainfrom
feat/agent-browser-tool

Danieldd28 commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Leeaandrob commented Feb 16, 2026

Uh oh!

nikolasdehor left a comment

Uh oh!

nikolasdehor left a comment

Uh oh!

CLAassistant commented Mar 5, 2026

Uh oh!

amirmamaghani commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	return `Automate a headless browser via agent-browser CLI. Pass the subcommand as 'command'.
	return `Automate a browser (headless by default) via the agent-browser CLI. Pass the subcommand as 'command'.

Conversation

Danieldd28 commented Feb 16, 2026

Summary

Design

Changes

New files

Modified files

Configuration

How it works

Testing

Prerequisite

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Leeaandrob commented Feb 16, 2026

Uh oh!

nikolasdehor left a comment

Choose a reason for hiding this comment

Uh oh!

nikolasdehor left a comment

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Mar 5, 2026

Uh oh!

amirmamaghani commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants