feat(browser): compositor-level coordinate click with persistent-WS dispatch (23x faster)#19189
Open
kshitijk4poor wants to merge 5 commits intomainfrom
Open
feat(browser): compositor-level coordinate click with persistent-WS dispatch (23x faster)#19189kshitijk4poor wants to merge 5 commits intomainfrom
kshitijk4poor wants to merge 5 commits intomainfrom
Conversation
61263bc to
15c75b1
Compare
Add optional x/y parameters to browser_click for viewport-coordinate clicking via CDP Input.dispatchMouseEvent. When coordinates are provided, clicks are dispatched at the browser compositor level — Chrome does its own hit-testing, bypassing DOM selectors entirely. Use cases where ref-based click fails but coordinate click works: - Cross-origin iframes (OOPIFs) - Closed shadow DOM - Canvas/WebGL elements - Dynamic overlays where the snapshot may be stale Implementation: - CDP path (preferred): Input.dispatchMouseEvent via WebSocket (Target.getTargets + mousePressed + mouseReleased) - agent-browser fallback: mouse move/down/up when no CDP endpoint available - ref is no longer required — either ref OR x+y must be provided Benchmark (real Lightpanda WS at ws://127.0.0.1:63372, 200 iterations): CDP coord click: 3.71ms mean (2.97ms median, 2.61ms min, 7.01ms p95) Single WS conn baseline: 1.57ms mean (cost per connection open+call) agent-browser IPC: 0.20ms mean per HTTP call The 3.71ms per CDP click comes from 3 sequential fresh WS connections (pre-existing architecture in browser_cdp_tool.py). A persistent WS connection pool would bring this to ~3.1ms (just the 2 mouse events). Both paths are well under the 100ms human perception threshold. Files: - tools/browser_tool.py: schema update (x/y, ref no longer required), _cdp_coordinate_click(), _coordinate_click_via_agent_browser(), updated browser_click() with validation and dispatch - tests/tools/test_browser_coordinate_click.py: 21 tests covering validation, CDP path, fallback path, ref preservation, schema, registry - scripts/benchmark_click_paths.py: real-browser latency benchmark
15c75b1 to
ff8c6f2
Compare
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
invalid-argument-type |
11 |
invalid-assignment |
4 |
unresolved-import |
3 |
unresolved-attribute |
1 |
First entries
scripts/benchmark_click_paths.py:151: [invalid-assignment] invalid-assignment: Object of type `() -> Literal[False]` is not assignable to attribute `_is_camofox_mode` of type `def is_camofox_mode() -> bool`
tools/browser_tool.py:2398: [unresolved-attribute] unresolved-attribute: Attribute `connect` is not defined on `None` in union `Unknown | None`
tests/tools/test_browser_coordinate_click.py:18: [unresolved-import] unresolved-import: Cannot resolve imported module `websockets.asyncio.server`
tests/tools/test_browser_coordinate_click.py:428: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["required"]` on object of type `str`
tests/tools/test_browser_coordinate_click.py:421: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["y"]` on object of type `str`
scripts/benchmark_click_paths.py:197: [invalid-assignment] invalid-assignment: Object of type `bound method _SupervisorRegistry.get(task_id: str) -> CDPSupervisor | None` is not assignable to attribute `get` of type `def get(self, task_id: str) -> CDPSupervisor | None`
tools/browser_tool.py:3720: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `bound method dict[str, str | dict[str, dict[str, str]]].__getitem__(key: str, /) -> str | dict[str, dict[str, str]]` cannot be called with key of type `slice[None, Literal[60], None]` on object of type `dict[str, str | dict[str, dict[str, str]]]`
tests/tools/test_browser_coordinate_click.py:421: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> str, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[str]]` cannot be called with key of type `Literal["y"]` on object of type `list[str]`
tests/tools/test_browser_coordinate_click.py:17: [unresolved-import] unresolved-import: Cannot resolve imported module `websockets`
tests/tools/test_browser_coordinate_click.py:420: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> str, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[str]]` cannot be called with key of type `Literal["x"]` on object of type `list[str]`
tests/tools/test_browser_coordinate_click.py:177: [invalid-argument-type] invalid-argument-type: Argument to function `browser_click` is incorrect: Expected `int | float | None`, found `Literal["abc"]`
tests/tools/test_browser_coordinate_click.py:434: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["properties"]` on object of type `str`
tests/tools/test_browser_coordinate_click.py:420: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["x"]` on object of type `list[Unknown]`
scripts/benchmark_click_paths.py:177: [invalid-assignment] invalid-assignment: Object of type `() -> Literal["ws://127.0.0.1:63372/"]` is not assignable to attribute `_resolve_cdp_endpoint` of type `def _resolve_cdp_endpoint() -> str`
tests/tools/test_browser_coordinate_click.py:177: [invalid-argument-type] invalid-argument-type: Argument to function `browser_click` is incorrect: Expected `int | float | None`, found `Literal["def"]`
tests/tools/test_browser_coordinate_click.py:420: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> LiteralString, (key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str]` cannot be called with key of type `Literal["x"]` on object of type `str`
tests/tools/test_browser_coordinate_click.py:15: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_browser_coordinate_click.py:421: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> Unknown, (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[Unknown]]` cannot be called with key of type `Literal["y"]` on object of type `list[Unknown]`
scripts/benchmark_click_paths.py:181: [invalid-assignment] invalid-assignment: Object of type `(tid) -> None` is not assignable to attribute `get` of type `def get(self, task_id: str) -> CDPSupervisor | None`
✅ Fixed issues: none
Unchanged: 3913 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
Replace the 3-separate-_cdp_call() approach (one WS connection per message) with a single _cdp_coordinate_click_async() coroutine that opens the WebSocket once and sequences all CDP messages on it: 1. Target.getTargets 2. Target.attachToTarget (if page target found) 3. Input.dispatchMouseEvent (mousePressed) } pipelined — both sent 4. Input.dispatchMouseEvent (mouseReleased) } before awaiting either Benchmark vs real Lightpanda WS at ws://127.0.0.1:63372/ (300 iters): Baseline (current main, 3 connections): 3.14ms mean, 2.97ms median Optimized (this commit, 1 connection): 1.30ms mean, 1.11ms median Speedup: 2.42x mean, 2.68x median, 1.62x p95 The savings come entirely from eliminating 2 TCP+WS handshakes. mousePressed + mouseReleased are pipelined on the same connection, so they travel in the same network burst. 21/21 tests pass.
…ywright patterns) Two additional optimizations from researching Playwright, Puppeteer, and browser-harness source: SESSION ID CACHING (browser-harness daemon pattern) Target.getTargets + Target.attachToTarget are stable across clicks on the same page. Cache the resolved session_id keyed by CDP endpoint URL. Subsequent clicks skip straight to mousePressed+mouseReleased with no session negotiation overhead. Self-healing: on 'Session with given id not found' (stale after navigation), the cache is invalidated and session resolution runs once before retrying. This matches the exact retry pattern from browser-harness's daemon.handle(). SKIP mousePressed ACK (Playwright Promise.all pattern) Browser processes CDP messages sequentially within a session. If mouseReleased is acknowledged, mousePressed was already processed. We skip waiting for the press ack entirely, saving one RTT. This is the same pattern as Playwright's Mouse.click() using Promise.all and Puppeteer's concurrent down+up dispatch. COMPRESSION=NONE (Puppeteer NodeWebSocketTransport pattern) Small CDP messages (Input.dispatchMouseEvent payloads are ~80 bytes) don't benefit from per-message compression. Disable it explicitly. Puppeteer uses perMessageDeflate: false for the same reason. Benchmark vs real Lightpanda WS (300 iterations): Baseline (3 connections): 3.28ms mean Optimized cold cache (1 conn): 1.17ms mean (2.79x speedup) Optimized warm cache (1 conn): 1.17ms mean (2.82x speedup) The cold/warm delta is <0.01ms because getTargets+attachToTarget on an already-open socket costs almost nothing on localhost — the dominant cost is WS connection setup, which we eliminated in the previous commit. The session cache still removes real work (2 CDP round-trips) and prevents accumulating latency on remote/higher-latency CDP endpoints. Tests: 24 passed (21 existing + 3 new session caching tests)
…edup) The CDPSupervisor (browser_supervisor.py) already maintains a persistent WebSocket connection per task_id for dialog detection and frame tracking. After browser_navigate(), a supervisor is always running with an open WS. Instead of opening a new connection per click, dispatch directly on it. Changes: - browser_supervisor.py: add CDPSupervisor.dispatch_mouse_click() — sync bridge onto the supervisor's asyncio loop via run_coroutine_threadsafe. Pipelines mousePressed + mouseReleased via asyncio.gather (Playwright Promise.all pattern), no serial round-trips. - browser_tool.py: _cdp_coordinate_click() now checks SUPERVISOR_REGISTRY.get(task_id) first; falls back to per-click WS connect if no supervisor is running (e.g. raw CDP without navigate). Dispatch priority (fastest first): 1. Supervisor path — zero WS connection cost (supervisor WS already open) 2. Warm-cache path — 1 WS open + 2 mouse events (session cached) 3. Cold-cache path — 1 WS open + getTargets + attachToTarget + 2 events 4. agent-browser — 3 subprocess IPC calls (no CDP endpoint configured) Benchmark vs real Lightpanda WS at ws://127.0.0.1:63372/ (300 iterations): Baseline (3 connections): 4.86ms mean Warm cache (1 conn + cache): 1.30ms mean (3.74x) Supervisor (persistent WS): 0.20ms mean (23.75x) Ref-click IPC baseline: 0.14ms mean (parity) The supervisor path is 1.5x ref-click (0.07ms overhead) — essentially the cost of one cross-thread future dispatch. 27/27 tests pass (+3 new TestSupervisorPath tests).
…enchmark path Self-review findings addressed: - browser_tool.py: log swallowed supervisor error at DEBUG instead of bare 'pass' (was silent, triggered F841 for unused 'exc' variable). Renamed to '_exc' to signal intentional discard. - browser_tool.py: rename unused 'press_id' to '_press_id' in both normal and retry paths (mouseReleased-only wait is intentional; press_id is never used after send). - browser_tool.py: get_event_loop() → get_running_loop() in 3 locations inside _cdp_resolve_session and _cdp_coordinate_click_async. Both are async functions and get_event_loop() is deprecated in async context in Python 3.10+. - browser_supervisor.py: ensure_future → create_task in dispatch_mouse_click. create_task is the correct modern API when already inside a running coroutine; ensure_future is deprecated for coroutines in Python 3.10+. Also consistent with the rest of browser_supervisor.py which uses create_task exclusively everywhere else. - scripts/benchmark_click_paths.py: replace hardcoded /private/tmp/hermes- coord-click sys.path hack with __file__-relative repo root detection so the script works from any checkout location. 27/27 tests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Adds optional
x/yparameters tobrowser_clickfor viewport-coordinate clicking via CDPInput.dispatchMouseEvent. The browser's own compositor handles hit-testing, so clicks bypass all DOM selector machinery — they pass through iframes, shadow DOM, canvas elements, and anything else the accessibility tree can't reach.This PR also introduces a 4-tier dispatch architecture that, on the common path (after
browser_navigate), brings coordinate click latency to parity with ref-based clicking.Problem
Ref-based clicking (
browser_click(ref="@e5")) fails or is unreliable for:API
refandx+yare mutually exclusive. Providing both, or partial coordinates, returns a clear error. Neither is now an error too (previouslyrefwas required).Architecture: 4-tier dispatch
Researched Playwright, Puppeteer, browser-harness (browser-use), and Vercel's agent-browser source to inform every decision here.
Tier 1 — Supervisor path (new)
CDPSupervisor(already inbrowser_supervisor.py) maintains a persistent, self-healing WebSocket pertask_id. It exists for dialog detection and frame tracking — this PR extends it withdispatch_mouse_click().Dispatch model:
asyncio.run_coroutine_threadsafeonto the supervisor's background loop.mousePressed+mouseReleasedare submitted as concurrent futures viaasyncio.gather— both sent before either is awaited (PlaywrightPromise.allpattern). No serial round-trips.Tier 2+3 — Per-click WS (session-cached)
When no supervisor is running (e.g. raw CDP without
browser_navigate), opens a single WebSocket for the entire click sequence and reuses it for all CDP messages. Three optimizations derived from Playwright/Puppeteer/browser-harness research:Target.getTargets+Target.attachToTargetresults cached in_CDP_SESSION_CACHE(keyed by endpoint URL). Second+ clicks skip session negotiation entirely. Self-heals on stale-session errors after navigation.mousePressedack — fires both mouse events before awaiting either response. Since the browser processes CDP messages sequentially within a session, ifmouseReleasedis acknowledged thenmousePressedhas already been applied. Saves one RTT. Same pattern as Playwright'sMouse.click()and Puppeteer's concurrent down+up dispatch.Benchmark
Real Lightpanda WebSocket at
ws://127.0.0.1:63372/, 300 iterations:23.75× speedup over the original 3-connection approach. The supervisor path (the common case after
browser_navigate) is 1.5× ref-click — 0.07ms overhead, which is one cross-thread future dispatch.Benchmark script:
scripts/benchmark_click_paths.py(runnable against any live CDP endpoint).Optimizations sourced from research
Studied: Playwright (
crConnection.ts,input.ts), Puppeteer (Connection.ts,Input.ts,TargetManager.ts), browser-harness (daemon.py,helpers.py,_ipc.py), Vercel agent-browser (cdp/client.rs,interaction.rs,browser.rs).mousePressedackPromise.all, Puppeteer concurrent down+upsession_id+ retry oncecompression=Noneon WSperMessageDeflate: falsemouseMovedbefore clickFiles changed
tools/browser_tool.py_CDP_SESSION_CACHE,_cdp_resolve_session(),_cdp_coordinate_click_async()(all 4 optimizations),_cdp_coordinate_click()(3-tier dispatch), updatedbrowser_click()signature + validation, schema update (x/y params, ref no longer required)tools/browser_supervisor.pyCDPSupervisor.dispatch_mouse_click()— sync bridge withasyncio.gatherpipeliningtests/tools/test_browser_coordinate_click.pyscripts/benchmark_click_paths.pyCommit history