ci: parallelize coverage, stop running tests twice per PR#377
Merged
santoshkumarradha merged 2 commits intomainfrom Apr 9, 2026
Merged
ci: parallelize coverage, stop running tests twice per PR#377santoshkumarradha merged 2 commits intomainfrom
santoshkumarradha merged 2 commits intomainfrom
Conversation
The Coverage Summary job was running every test suite in the repo a second time, serially, in one ~5.5min job. Tests ran once in the per-surface workflows (control-plane.yml, sdk-go.yml, ...) and then again inside coverage-summary.sh just to produce coverage numbers. This restructures coverage.yml as a 5-job parallel matrix (one per surface) plus an aggregator that downloads their artifacts and runs the existing coverage-gate.py + patch-coverage-gate.sh unchanged. Changes: * scripts/coverage-surface.sh (new) Single source of truth for "what commands run for surface X with coverage". Takes one arg (control-plane|sdk-go|sdk-python| sdk-typescript|web-ui), runs that surface's tests, writes all the expected filenames under test-reports/coverage/. * scripts/coverage-aggregate.py (new) Extracted from the trailing Python block of coverage-summary.sh so the aggregation can run independently after CI flattens per-surface artifacts into one directory. Byte-equivalent output to the old inline block — verified against coverage-baseline.json. * scripts/coverage-summary.sh (refactored) Now a thin orchestrator that calls coverage-surface.sh for each of the five surfaces then runs coverage-aggregate.py. Local behavior is unchanged. * .github/workflows/coverage.yml (rewritten) Matrix of 5 parallel per-surface jobs, each running coverage-surface.sh and uploading a coverage-<surface> artifact. Aggregator job (needs: per-surface) downloads them, flattens into test-reports/coverage/, runs coverage-aggregate.py, then the existing coverage-gate.py + patch-coverage-gate.sh + sticky PR comments + badge gist steps, all unchanged. Job name stays `coverage-summary` so branch protection rules targeting that check continue to match without reconfiguration. * .github/workflows/control-plane.yml Delete the `Run tests` step from linux-tests. Tests now run only in coverage.yml; linux-tests keeps building the binary (needed for compile-matrix's needs: dependency) and linting. * .github/workflows/sdk-go.yml Delete the `Test` step from build-and-test for the same reason. sdk-python.yml and sdk-typescript.yml are intentionally unchanged: their matrices test cross-version compatibility (Python 3.8-3.12, Node 18+20) which is independent of the coverage measurement (3.11 / Node 20 only). The coverage workflow remains a required status check, so a regression on any surface still blocks merge.
`go tool cover -func=<coverprofile>` resolves package paths in the coverprofile against the nearest go.mod. When it's invoked from the repo root (no go.mod) it fails with: cover: no required module provides package <pkg>: go.mod file not found in current directory or any parent directory The original coverage-summary.sh extract_go_total() took the module dir as an arg and cd'd into it inside a subshell. I dropped that arg when I extracted the function into coverage-surface.sh, so the control-plane and sdk-go matrix jobs ran the `go test` with coverage successfully but then failed at the total-extraction step. Restore the two-argument form and cd into the module dir before running `go tool cover -func`.
Contributor
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
Contributor
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
✅ Patch gate passedEvery surface whose lines were touched by this PR has patch coverage at or above the threshold. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Coverage Summaryrequired check runs ~5m34s on every PR, and most of that wall-clock is spent re-running test suites that the per-surface workflows (control-plane.yml,sdk-go.yml,sdk-python.yml,sdk-typescript.yml) already executed on the same commit. Tests were effectively running twice per PR, serially, in one job.coverage.ymlas a 5-entry parallel matrix — one per surface — plus an aggregator job that downloads the matrix artifacts and runs the existing gates unchanged. Delete the duplicatego teststeps fromcontrol-plane.ymlandsdk-go.yml.control-plane.yml → linux-testsalso drops since it no longer runs tests. Full test coverage is preserved —coverage-summarystays a required status check so any regression still blocks merge.What changed
New files
scripts/coverage-surface.sh— single source of truth for "what commands run for surface X with coverage". Takes one arg (control-plane | sdk-go | sdk-python | sdk-typescript | web-ui), runs that surface's tests, writes all the expected filenames undertest-reports/coverage/. Called by both the local runner and CI.scripts/coverage-aggregate.py— extracted from the trailing Python block of the oldcoverage-summary.shso aggregation can run independently after CI flattens per-surface artifacts into one directory.Refactored
scripts/coverage-summary.sh— thin orchestrator that loops over all five surfaces callingcoverage-surface.sh, then runscoverage-aggregate.py. Local behavior unchanged..github/workflows/coverage.yml— rewritten as `per-surface` matrix (5 parallel jobs, each running `coverage-surface.sh` and uploading a `coverage-` artifact) + `coverage-summary` aggregator job with `needs: per-surface`. Aggregator downloads all 5, flattens into `test-reports/coverage/`, runs `coverage-aggregate.py`, then the existing `coverage-gate.py` + `patch-coverage-gate.sh` + sticky PR comments + badge gist steps unchanged. Job name stays `coverage-summary` so branch protection rules targeting that check name continue to match with no reconfiguration.Modified
.github/workflows/control-plane.yml— delete the `Run tests` step from `linux-tests`. Tests now live only in `coverage.yml`. `linux-tests` keeps building the binary (needed for `compile-matrix`'s `needs:` dependency) and linting..github/workflows/sdk-go.yml— delete the `Test` step from `build-and-test` for the same reason.Intentionally unchanged
Design notes
Test plan
Validation reference
Baseline run used for comparison: PR #373 — `coverage-summary` = 5m34s (the single serialized job this PR splits up).