Skip to content

test(app): add manual performance diagnostics#32937

Merged
Hona merged 18 commits into
anomalyco:devfrom
Hona:test/manual-app-performance
Jun 19, 2026
Merged

test(app): add manual performance diagnostics#32937
Hona merged 18 commits into
anomalyco:devfrom
Hona:test/manual-app-performance

Conversation

@Hona

@Hona Hona commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

  • add an opt-in Playwright performance suite for session tab switching, cached repaint sampling, and streaming timeline diagnostics
  • run benchmarks serially against a production build while keeping them excluded from normal local and CI E2E discovery
  • use fixed-work streaming completion, explicit metric capabilities/units, and observation terminology that does not claim compositor presentation
  • keep custom probes focused on product-specific measurements while using standard Chrome Performance traces for browser internals
  • stream traces atomically through Chrome's official CDP lifecycle, detect data loss, and associate artifacts/results with a schema version and run ID
  • document durable benchmark invariants under packages/app/AGENTS.md and e2e/performance/AGENTS.md

Usage

bun run test:bench

CPU and high-volume visual profiling are opt-in through TIMELINE_CPU_PROFILE=1. The streaming scenario's 30x CPU throttle is a deterministic stress profile, not a simulated device.

Metric Integrity

  • schema-v2 streaming measures completion and final settlement of the same 160-delta workload on every default run
  • completion and throughput use renderer-observed DOM progress, including progress after tracked-part remounts
  • RAF values are named callback-gap/frame-budget diagnostics, not presented or dropped frames
  • observer entries are admitted by occurrence time inside the measurement window and drained during teardown
  • disabled visual or geometry probes report null, never an unmeasured zero
  • visual mounts, unmounts, replacements, and dropouts require CSS-rendered viewport visibility and tracked logical subtree identity
  • tab values are sampled DOM observations admitted from a stable source state; mixed source/destination content is not correct or stable
  • repaint uses an explicit one-second click-relative upper bound and mutually exclusive source/blank/unknown/wrong/correct states
  • final BENCHMARK records include schema, run ID, retry identity, and test outcome after page/trace teardown, including missing-report failures
  • trace files use unique identities and atomic finalization; partial/lossy captures cannot masquerade as valid JSON traces

Validation

  • app typecheck passed
  • performance unit contracts: 10 passed
  • normal Playwright discovery: 11 existing tests, no manual benchmarks
  • standard and uncapped benchmark discovery: exactly 3 performance tests each
  • normal dev E2E suite: 11/11 passed
  • full production benchmark suite: 3/3 passed using one worker
  • production default, minimal, visual-profile, failed-result, missing-result, rendered-marker, timestamp, CSS-visibility, part-remount, and trace scratch evaluations completed
  • production Chrome trace captured atomically without data loss and parsed through bunx devtools-tracing stats
  • push hook monorepo typecheck: 23/23 passed

The committed smoke and regression suites remain the source of correctness coverage. Benchmarks assert scenario and metric collection completion, not machine-dependent performance budgets.

Fixed-Work Production Comparison

The benchmark tree and shared E2E support files from f543c8720 were copied byte-for-byte onto a clean worktree at the June 5 product revision 83dca45dd. Both revisions ran the same production build, schema-v2 Playwright config, 30x CPU stress profile, 320 history turns, 160-delta workload, viewport, and one-worker execution. Every run delivered all 160 deltas, observed final progress index 160, settled final geometry, and passed. Results are medians from three alternating runs per revision. Later commits 6409b9212 and 7d035307c tighten observer admission and preserve reporting through remount/mixed-content regressions; they do not alter the listed fields for these non-remount comparison runs.

30x streaming metric June 5 product (83dca45dd) PR product, benchmark source f543c8720 Change
Completion time 215.7s 59.1s 73% lower
Throughput 0.742 deltas/s 2.709 deltas/s 3.65x
RAF callback rate 3.75/s 37.43/s 9.97x
RAF-gap p95 917.6ms 115.9ms 87% lower
Missed-frame-budget equivalents 12,304 1,354 89% lower
Longest >33ms RAF-gap streak 26 3 88% lower
Maximum bottom drift 64px 0px eliminated
Bottom-drift transitions 5 0 eliminated
Blank samples 0 0 unchanged

The June 5 product predates the titlebar-tab surface, so only the streaming scenario is directly comparable there.

@Hona Hona requested a review from Brendonovich as a code owner June 19, 2026 04:48
Copilot AI review requested due to automatic review settings June 19, 2026 04:48

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in Playwright E2E “performance diagnostics” suite for the app, intended for manual benchmarking/profiling (tab switching, cached repaint tracing, and streaming timeline diagnostics) while keeping it out of normal CI/local test discovery unless explicitly enabled.

Changes:

  • Gate packages/app/e2e/performance/** from Playwright discovery unless OPENCODE_PERFORMANCE=1 is set.
  • Add performance-focused Playwright specs plus deterministic stress fixtures and shared page/setup helpers.
  • Add Chrome trace capture helpers/config and pin devtools-tracing with package scripts for analyzing emitted traces.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
perf/manual-app-performance.md Documents how to run the opt-in performance suite and trace tooling.
packages/app/playwright.config.ts Adds testIgnore gating so performance specs are excluded unless opted-in.
packages/app/package.json Adds devtools-tracing + trace analysis scripts.
packages/app/e2e/performance/timeline/timeline-test-helpers.ts Shared helpers for timeline-focused performance tests (settings, storage, mocks).
packages/app/e2e/performance/timeline/session-timeline-stress.fixture.ts Deterministic stress fixture to generate large/varied timeline content.
packages/app/e2e/performance/timeline/session-timeline-benchmark.spec.ts Streaming timeline benchmark + geometry/rendering diagnostics and optional profiling/trace capture.
packages/app/e2e/performance/timeline/session-tab-switch-benchmark.spec.ts Cold/hot session-tab switching benchmark + optional Chrome tracing.
packages/app/e2e/performance/timeline/session-tab-flash.spec.ts Captures cached repaint/mutation/layout-shift traces during tab switching.
packages/app/e2e/performance/playwright.uncapped.config.ts Provides an uncapped Chromium configuration for explicit benchmarking runs.
packages/app/e2e/performance/performance-page.ts Small shared helpers for session URLs + localStorage project setup.
packages/app/e2e/performance/chrome-trace.ts Helper to emit standard Chrome DevTools traces to disk.
bun.lock Locks the added devtools-tracing dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +11
import type { Page } from "@playwright/test"
import path from "node:path"
import { writeFile } from "node:fs/promises"

export async function startChromeTrace(page: Page, name: string) {
const directory = process.env.OPENCODE_PERFORMANCE_TRACE_DIR
if (!directory) return

const session = await page.context().newCDPSession(page)
const events: unknown[] = []
session.on("Tracing.dataCollected", (event) => events.push(...event.value))
Comment on lines +1 to +13
import config from "../../playwright.config"

export default {
...config,
testDir: "..",
outputDir: "../test-results-uncapped",
use: {
...config.use,
launchOptions: {
args: ["--disable-frame-rate-limit", "--disable-gpu-vsync"],
},
},
}
Comment thread perf/manual-app-performance.md Outdated
Comment on lines +23 to +26
- streaming timeline FPS, frame-gap, long-task, geometry, and remount diagnostics

CPU and high-volume visual profiling are disabled by default. Set `TIMELINE_CPU_PROFILE=1` to enable both, or additionally set `TIMELINE_VISUAL_PROFILE=0` for CPU-only profiling.

Comment thread perf/manual-app-performance.md Outdated
Comment on lines +40 to +46
The repository pins `devtools-tracing`, which uses Chrome DevTools' Trace Engine:

```sh
bun trace:stats /tmp/opencode-performance-traces/session-tab-switch-cold.json
bun trace:inp /tmp/opencode-performance-traces/session-tab-switch-cold.json
bun trace:selectors /tmp/opencode-performance-traces/session-tab-switch-cold.json
```
Comment on lines +1010 to +1012
function base64Encode(value: string) {
return Buffer.from(value, "utf8").toString("base64").replace(/\+/g, "-").replace(/\//g, "_").replace(/=/g, "")
}
Comment thread perf/manual-app-performance.md Outdated
Comment on lines +31 to +32
Set `OPENCODE_PERFORMANCE_TRACE_DIR` to emit standard Chrome DevTools traces where supported:

@Hona Hona merged commit c6083a4 into anomalyco:dev Jun 19, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants