test(app): add manual performance diagnostics by Hona · Pull Request #32937 · anomalyco/opencode

Hona · 2026-06-19T04:48:21Z

Summary

add an opt-in Playwright performance suite for session tab switching, cached repaint sampling, and streaming timeline diagnostics
run benchmarks serially against a production build while keeping them excluded from normal local and CI E2E discovery
use fixed-work streaming completion, explicit metric capabilities/units, and observation terminology that does not claim compositor presentation
keep custom probes focused on product-specific measurements while using standard Chrome Performance traces for browser internals
stream traces atomically through Chrome's official CDP lifecycle, detect data loss, and associate artifacts/results with a schema version and run ID
document durable benchmark invariants under packages/app/AGENTS.md and e2e/performance/AGENTS.md

Usage

bun run test:bench

CPU and high-volume visual profiling are opt-in through TIMELINE_CPU_PROFILE=1. The streaming scenario's 30x CPU throttle is a deterministic stress profile, not a simulated device.

Metric Integrity

schema-v2 streaming measures completion and final settlement of the same 160-delta workload on every default run
completion and throughput use renderer-observed DOM progress, including progress after tracked-part remounts
RAF values are named callback-gap/frame-budget diagnostics, not presented or dropped frames
observer entries are admitted by occurrence time inside the measurement window and drained during teardown
disabled visual or geometry probes report null, never an unmeasured zero
visual mounts, unmounts, replacements, and dropouts require CSS-rendered viewport visibility and tracked logical subtree identity
tab values are sampled DOM observations admitted from a stable source state; mixed source/destination content is not correct or stable
repaint uses an explicit one-second click-relative upper bound and mutually exclusive source/blank/unknown/wrong/correct states
final BENCHMARK records include schema, run ID, retry identity, and test outcome after page/trace teardown, including missing-report failures
trace files use unique identities and atomic finalization; partial/lossy captures cannot masquerade as valid JSON traces

Validation

app typecheck passed
performance unit contracts: 10 passed
normal Playwright discovery: 11 existing tests, no manual benchmarks
standard and uncapped benchmark discovery: exactly 3 performance tests each
normal dev E2E suite: 11/11 passed
full production benchmark suite: 3/3 passed using one worker
production default, minimal, visual-profile, failed-result, missing-result, rendered-marker, timestamp, CSS-visibility, part-remount, and trace scratch evaluations completed
production Chrome trace captured atomically without data loss and parsed through bunx devtools-tracing stats
push hook monorepo typecheck: 23/23 passed

The committed smoke and regression suites remain the source of correctness coverage. Benchmarks assert scenario and metric collection completion, not machine-dependent performance budgets.

Fixed-Work Production Comparison

The benchmark tree and shared E2E support files from f543c8720 were copied byte-for-byte onto a clean worktree at the June 5 product revision 83dca45dd. Both revisions ran the same production build, schema-v2 Playwright config, 30x CPU stress profile, 320 history turns, 160-delta workload, viewport, and one-worker execution. Every run delivered all 160 deltas, observed final progress index 160, settled final geometry, and passed. Results are medians from three alternating runs per revision. Later commits 6409b9212 and 7d035307c tighten observer admission and preserve reporting through remount/mixed-content regressions; they do not alter the listed fields for these non-remount comparison runs.

30x streaming metric	June 5 product (`83dca45dd`)	PR product, benchmark source `f543c8720`	Change
Completion time	215.7s	59.1s	73% lower
Throughput	0.742 deltas/s	2.709 deltas/s	3.65x
RAF callback rate	3.75/s	37.43/s	9.97x
RAF-gap p95	917.6ms	115.9ms	87% lower
Missed-frame-budget equivalents	12,304	1,354	89% lower
Longest >33ms RAF-gap streak	26	3	88% lower
Maximum bottom drift	64px	0px	eliminated
Bottom-drift transitions	5	0	eliminated
Blank samples	0	0	unchanged

The June 5 product predates the titlebar-tab surface, so only the streaming scenario is directly comparable there.

Copilot

Pull request overview

Adds an opt-in Playwright E2E “performance diagnostics” suite for the app, intended for manual benchmarking/profiling (tab switching, cached repaint tracing, and streaming timeline diagnostics) while keeping it out of normal CI/local test discovery unless explicitly enabled.

Changes:

Gate packages/app/e2e/performance/** from Playwright discovery unless OPENCODE_PERFORMANCE=1 is set.
Add performance-focused Playwright specs plus deterministic stress fixtures and shared page/setup helpers.
Add Chrome trace capture helpers/config and pin devtools-tracing with package scripts for analyzing emitted traces.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
perf/manual-app-performance.md	Documents how to run the opt-in performance suite and trace tooling.
packages/app/playwright.config.ts	Adds `testIgnore` gating so performance specs are excluded unless opted-in.
packages/app/package.json	Adds `devtools-tracing` + trace analysis scripts.
packages/app/e2e/performance/timeline/timeline-test-helpers.ts	Shared helpers for timeline-focused performance tests (settings, storage, mocks).
packages/app/e2e/performance/timeline/session-timeline-stress.fixture.ts	Deterministic stress fixture to generate large/varied timeline content.
packages/app/e2e/performance/timeline/session-timeline-benchmark.spec.ts	Streaming timeline benchmark + geometry/rendering diagnostics and optional profiling/trace capture.
packages/app/e2e/performance/timeline/session-tab-switch-benchmark.spec.ts	Cold/hot session-tab switching benchmark + optional Chrome tracing.
packages/app/e2e/performance/timeline/session-tab-flash.spec.ts	Captures cached repaint/mutation/layout-shift traces during tab switching.
packages/app/e2e/performance/playwright.uncapped.config.ts	Provides an uncapped Chromium configuration for explicit benchmarking runs.
packages/app/e2e/performance/performance-page.ts	Small shared helpers for session URLs + localStorage project setup.
packages/app/e2e/performance/chrome-trace.ts	Helper to emit standard Chrome DevTools traces to disk.
bun.lock	Locks the added `devtools-tracing` dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+import type { Page } from "@playwright/test"
+import path from "node:path"
+import { writeFile } from "node:fs/promises"
+
+export async function startChromeTrace(page: Page, name: string) {
+  const directory = process.env.OPENCODE_PERFORMANCE_TRACE_DIR
+  if (!directory) return
+
+  const session = await page.context().newCDPSession(page)
+  const events: unknown[] = []
+  session.on("Tracing.dataCollected", (event) => events.push(...event.value))


+import config from "../../playwright.config"
+
+export default {
+  ...config,
+  testDir: "..",
+  outputDir: "../test-results-uncapped",
+  use: {
+    ...config.use,
+    launchOptions: {
+      args: ["--disable-frame-rate-limit", "--disable-gpu-vsync"],
+    },
+  },
+}


+- streaming timeline FPS, frame-gap, long-task, geometry, and remount diagnostics
+
+CPU and high-volume visual profiling are disabled by default. Set `TIMELINE_CPU_PROFILE=1` to enable both, or additionally set `TIMELINE_VISUAL_PROFILE=0` for CPU-only profiling.
+


+The repository pins `devtools-tracing`, which uses Chrome DevTools' Trace Engine:
+
+```sh
+bun trace:stats /tmp/opencode-performance-traces/session-tab-switch-cold.json
+bun trace:inp /tmp/opencode-performance-traces/session-tab-switch-cold.json
+bun trace:selectors /tmp/opencode-performance-traces/session-tab-switch-cold.json
+```


+function base64Encode(value: string) {
+  return Buffer.from(value, "utf8").toString("base64").replace(/\+/g, "-").replace(/\//g, "_").replace(/=/g, "")
+}


+Set `OPENCODE_PERFORMANCE_TRACE_DIR` to emit standard Chrome DevTools traces where supported:
+


test(app): add manual performance diagnostics

ff8dde8

Hona requested a review from Brendonovich as a code owner June 19, 2026 04:48

Copilot AI review requested due to automatic review settings June 19, 2026 04:48

Copilot started reviewing on behalf of Hona June 19, 2026 04:48 View session

Copilot AI reviewed Jun 19, 2026

View reviewed changes

Hona added 17 commits June 19, 2026 07:00

test(app): stabilize streaming benchmark readiness

49a992c

docs(app): colocate performance suite guide

4d4724f

test(app): harden performance diagnostics

5884be2

test(app): sample blank transition frames

3f6cf5b

refactor(app): modularize performance harness

1f6146a

refactor(app): standardize benchmark reporting

9c9e4a7

refactor(app): simplify benchmark composition

ee969d4

test(app): preserve benchmark diagnostics

e08418b

refactor(app): align benchmark tracing

459331f

fix(app): stabilize production benchmarks

58aaa93

docs(app): fix benchmark trace commands

995f3f9

docs(app): define engineering priorities

1ec55c3

fix(app): count logical subtree replacements

7ab08e2

fix(app): make benchmark metrics explicit

1069221

fix(app): tighten benchmark metric integrity

f543c87

fix(app): bound benchmark observer metrics

6409b92

fix(app): preserve benchmark regressions

7d03530

Hona merged commit c6083a4 into anomalyco:dev Jun 19, 2026
9 of 10 checks passed

thdxr pushed a commit that referenced this pull request Jun 21, 2026

test(app): add manual performance diagnostics (#32937)

9c0018a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(app): add manual performance diagnostics#32937

test(app): add manual performance diagnostics#32937
Hona merged 18 commits into
anomalyco:devfrom
Hona:test/manual-app-performance

Hona commented Jun 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		- streaming timeline FPS, frame-gap, long-task, geometry, and remount diagnostics

		CPU and high-volume visual profiling are disabled by default. Set `TIMELINE_CPU_PROFILE=1` to enable both, or additionally set `TIMELINE_VISUAL_PROFILE=0` for CPU-only profiling.

		Set `OPENCODE_PERFORMANCE_TRACE_DIR` to emit standard Chrome DevTools traces where supported:

Conversation

Hona commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Metric Integrity

Validation

Fixed-Work Production Comparison

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hona commented Jun 19, 2026 •

edited

Loading