test(app): add manual performance diagnostics#32937
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an opt-in Playwright E2E “performance diagnostics” suite for the app, intended for manual benchmarking/profiling (tab switching, cached repaint tracing, and streaming timeline diagnostics) while keeping it out of normal CI/local test discovery unless explicitly enabled.
Changes:
- Gate
packages/app/e2e/performance/**from Playwright discovery unlessOPENCODE_PERFORMANCE=1is set. - Add performance-focused Playwright specs plus deterministic stress fixtures and shared page/setup helpers.
- Add Chrome trace capture helpers/config and pin
devtools-tracingwith package scripts for analyzing emitted traces.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| perf/manual-app-performance.md | Documents how to run the opt-in performance suite and trace tooling. |
| packages/app/playwright.config.ts | Adds testIgnore gating so performance specs are excluded unless opted-in. |
| packages/app/package.json | Adds devtools-tracing + trace analysis scripts. |
| packages/app/e2e/performance/timeline/timeline-test-helpers.ts | Shared helpers for timeline-focused performance tests (settings, storage, mocks). |
| packages/app/e2e/performance/timeline/session-timeline-stress.fixture.ts | Deterministic stress fixture to generate large/varied timeline content. |
| packages/app/e2e/performance/timeline/session-timeline-benchmark.spec.ts | Streaming timeline benchmark + geometry/rendering diagnostics and optional profiling/trace capture. |
| packages/app/e2e/performance/timeline/session-tab-switch-benchmark.spec.ts | Cold/hot session-tab switching benchmark + optional Chrome tracing. |
| packages/app/e2e/performance/timeline/session-tab-flash.spec.ts | Captures cached repaint/mutation/layout-shift traces during tab switching. |
| packages/app/e2e/performance/playwright.uncapped.config.ts | Provides an uncapped Chromium configuration for explicit benchmarking runs. |
| packages/app/e2e/performance/performance-page.ts | Small shared helpers for session URLs + localStorage project setup. |
| packages/app/e2e/performance/chrome-trace.ts | Helper to emit standard Chrome DevTools traces to disk. |
| bun.lock | Locks the added devtools-tracing dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1
to
+11
| import type { Page } from "@playwright/test" | ||
| import path from "node:path" | ||
| import { writeFile } from "node:fs/promises" | ||
|
|
||
| export async function startChromeTrace(page: Page, name: string) { | ||
| const directory = process.env.OPENCODE_PERFORMANCE_TRACE_DIR | ||
| if (!directory) return | ||
|
|
||
| const session = await page.context().newCDPSession(page) | ||
| const events: unknown[] = [] | ||
| session.on("Tracing.dataCollected", (event) => events.push(...event.value)) |
Comment on lines
+1
to
+13
| import config from "../../playwright.config" | ||
|
|
||
| export default { | ||
| ...config, | ||
| testDir: "..", | ||
| outputDir: "../test-results-uncapped", | ||
| use: { | ||
| ...config.use, | ||
| launchOptions: { | ||
| args: ["--disable-frame-rate-limit", "--disable-gpu-vsync"], | ||
| }, | ||
| }, | ||
| } |
Comment on lines
+23
to
+26
| - streaming timeline FPS, frame-gap, long-task, geometry, and remount diagnostics | ||
|
|
||
| CPU and high-volume visual profiling are disabled by default. Set `TIMELINE_CPU_PROFILE=1` to enable both, or additionally set `TIMELINE_VISUAL_PROFILE=0` for CPU-only profiling. | ||
|
|
Comment on lines
+40
to
+46
| The repository pins `devtools-tracing`, which uses Chrome DevTools' Trace Engine: | ||
|
|
||
| ```sh | ||
| bun trace:stats /tmp/opencode-performance-traces/session-tab-switch-cold.json | ||
| bun trace:inp /tmp/opencode-performance-traces/session-tab-switch-cold.json | ||
| bun trace:selectors /tmp/opencode-performance-traces/session-tab-switch-cold.json | ||
| ``` |
Comment on lines
+1010
to
+1012
| function base64Encode(value: string) { | ||
| return Buffer.from(value, "utf8").toString("base64").replace(/\+/g, "-").replace(/\//g, "_").replace(/=/g, "") | ||
| } |
Comment on lines
+31
to
+32
| Set `OPENCODE_PERFORMANCE_TRACE_DIR` to emit standard Chrome DevTools traces where supported: | ||
|
|
thdxr
pushed a commit
that referenced
this pull request
Jun 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
packages/app/AGENTS.mdande2e/performance/AGENTS.mdUsage
CPU and high-volume visual profiling are opt-in through
TIMELINE_CPU_PROFILE=1. The streaming scenario's 30x CPU throttle is a deterministic stress profile, not a simulated device.Metric Integrity
null, never an unmeasured zeroBENCHMARKrecords include schema, run ID, retry identity, and test outcome after page/trace teardown, including missing-report failuresValidation
bunx devtools-tracing statsThe committed smoke and regression suites remain the source of correctness coverage. Benchmarks assert scenario and metric collection completion, not machine-dependent performance budgets.
Fixed-Work Production Comparison
The benchmark tree and shared E2E support files from
f543c8720were copied byte-for-byte onto a clean worktree at the June 5 product revision83dca45dd. Both revisions ran the same production build, schema-v2 Playwright config, 30x CPU stress profile, 320 history turns, 160-delta workload, viewport, and one-worker execution. Every run delivered all 160 deltas, observed final progress index 160, settled final geometry, and passed. Results are medians from three alternating runs per revision. Later commits6409b9212and7d035307ctighten observer admission and preserve reporting through remount/mixed-content regressions; they do not alter the listed fields for these non-remount comparison runs.83dca45dd)f543c8720The June 5 product predates the titlebar-tab surface, so only the streaming scenario is directly comparable there.