Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .github/actions/verify-metrics-snapshot/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,15 @@ runs:
restore-keys: |
${{ inputs.artifact_key }}

# Create an empty stub so the diff artifact is always uploaded on PRs.
# The fan-in uses 1-to-1 presence of diff artifacts to detect infra failures
# (a missing diff_* artifact means this action never ran for that snapshot).
# The stub is overwritten by the compare step when a baseline exists.
- name: Create diff file stub
if: github.ref_name != 'main'
shell: bash
run: touch ./.metrics/diff_${{ inputs.snapshot }}.txt

- name: Calculate diff between the snapshots
id: compare-snapshots
if: ${{ (github.ref_name != 'main') && (steps.download-release-snapshot.outputs.cache-matched-key != '') }}
Expand All @@ -61,8 +70,10 @@ runs:
echo "has_diff=true" >> $GITHUB_OUTPUT
fi

# Always upload the diff artifact on PRs (even when empty / no baseline yet).
# Presence of this artifact in the fan-in proves this action ran for the snapshot.
- name: Upload the diff artifact
if: ${{ (github.ref_name != 'main') && (steps.compare-snapshots.outputs.has_diff == 'true') }}
if: github.ref_name != 'main'
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: diff_${{ inputs.artifact_key }}
Expand Down
14 changes: 0 additions & 14 deletions .github/workflows/ci-e2e-all.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,3 @@ jobs:

tailsampling:
uses: ./.github/workflows/ci-e2e-tailsampling.yml

upload_pr_number:
name: Save and Upload PR Number as Artifact
runs-on: ubuntu-latest
steps:
- name: Save PR number as artifact
if: github.event_name == 'pull_request'
run: echo "${{ github.event.number }}" > pr_number.txt
- name: Upload PR number artifact
if: github.event_name == 'pull_request'
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6
with:
name: pr_number
path: pr_number.txt
237 changes: 102 additions & 135 deletions .github/workflows/ci-summary-report.yml

Large diffs are not rendered by default.

14 changes: 6 additions & 8 deletions .github/workflows/label-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,10 @@ jobs:
egress-policy: audit

- name: Check PR label
if: ${{ github.event.pull_request.user.login != 'dependabot' && github.event_name != 'merge_group' }}
# Only fail if NOT merge_group, AND labels DO NOT contain 'changelog:'
if: |
github.event_name != 'merge_group' &&
contains(join(github.event.pull_request.labels.*.name, ','), 'changelog:') == false
run: |
LABEL_NAME="changelog:"
if [[ $(curl -s "https://api.github.com/repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}" | jq -r '.labels[].name' | grep -c "^$LABEL_NAME") -eq 0 ]]; then
echo "Error: Pull request is missing a required label of the form '${LABEL_NAME}***'."
exit 1
fi
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
echo "::error::Pull request is missing a required 'changelog:' label. Found labels: ${{ join(github.event.pull_request.labels.*.name, ', ') }}"
exit 1
117 changes: 48 additions & 69 deletions docs/adr/004-migrating-coverage-gating-to-github-actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,113 +2,92 @@

## Status

Accepted
Accepted (implemented)

## Context

### Current Architecture

Jaeger uses [Codecov](https://codecov.io) for two functions:

1. **Long-term trend tracking**: Coverage is uploaded after each CI run via the Codecov Action.
2. **PR gating**: Codecov's GitHub status check blocks merges when coverage drops below a threshold.

Coverage is collected across 11 CI jobs and uploaded through `.github/actions/upload-codecov/action.yml`. The jobs and their coverage files are:

| Workflow | Coverage files | Codecov flag |
|----------|---------------|--------------|
| `ci-unit-tests.yml` | `cover.out` | `unittests` |
| `ci-e2e-badger.yaml` | `cover.out` | `badger_<version>` |
| `ci-e2e-cassandra.yml` | `cover.out` | `cassandra-<major>-<jaeger>-<schema>` |
| `ci-e2e-clickhouse.yml` | `cover.out` | `clickhouse` |
| `ci-e2e-elasticsearch.yml` | `cover.out`, `cover-index-cleaner.out`, `cover-index-rollover.out` | `elasticsearch-<major>-<jaeger>` |
| `ci-e2e-grpc.yml` | `cover.out` | `grpc_<version>` |
| `ci-e2e-kafka.yml` | `cover.out` | `kafka-<version>-v2` |
| `ci-e2e-memory.yaml` | `cover.out` | `memory_v2` |
| `ci-e2e-opensearch.yml` | `cover.out`, `cover-index-cleaner.out`, `cover-index-rollover.out` | `opensearch-<major>-<jaeger>` |
| `ci-e2e-query.yml` | `cover.out` | `query` |
| `ci-e2e-tailsampling.yml` | `cover.out` | `tailsampling-processor` |

After all CI stages complete, `.github/workflows/ci-summary-report.yml` runs as a `workflow_run` fan-in triggered by the `"CI Orchestrator"` workflow. It downloads all artifacts from the triggering run via the GitHub REST API, generates a metrics comparison report, and posts it as a sticky PR comment.
Coverage is collected across 11 CI jobs (unit tests + E2E), uploaded through `.github/actions/upload-codecov/action.yml`.

### Problem

Codecov's PR status checks suffer from latency (results lag behind CI completion) and intermittent rate-limit failures that block PRs even when coverage is healthy. The gating logic should run entirely within GitHub Actions for faster, more reliable feedback.

## Decision

Extend the existing fan-in workflow pattern to add coverage aggregation and gating alongside the existing metrics comparison. This maximizes reuse of the established `github-script` artifact download infrastructure.

The change is additive: Codecov uploads are retained for long-term historical trending and per-flag breakdown views.

Key design choices:

- **Trigger on the CI Orchestrator**: the `workflow_run` trigger fires on `["CI Orchestrator"]` completion, ensuring the fan-in has access to unit test coverage artifacts as well as E2E coverage. The CI Orchestrator (`ci-orchestrator.yml`) completes only after all stages (lint, unit tests, E2E) finish.

- **Artifacts for cross-run data sharing**: `workflow_run` jobs run with write permissions (required to post PR comments from fork PRs) but cannot access artifacts from the triggering run directly — they must use the GitHub REST API. The existing `github-script` download loop already handles this and automatically picks up `coverage-*` artifacts alongside metrics artifacts.

- **Single job for both PR analysis and baseline updates**: the job runs for `pull_request` events and for pushes to `main`. PR-specific steps (metrics comparison, coverage gate, PR comment, check runs) are conditioned on `pr_number` being set; baseline-save steps are conditioned on `head_branch == 'main'`. Coverage computation runs unconditionally so both flows share the same merge-and-measure logic. This follows the same pattern as the existing metrics snapshot baseline.

- **Coverage policy**: two gates matching `.codecov.yml`:
1. Absolute floor: fail if total coverage drops below 95%.
2. No regression: fail if total coverage dropped compared to the `main` baseline.
Extend the existing `CI Summary Report` fan-in workflow to add coverage aggregation and gating alongside the existing metrics comparison. Codecov uploads are retained for long-term historical trending and per-flag breakdown views.

## Implementation
### Requirements

### `upload-codecov` action (`upload-codecov/action.yml`)
1. Coverage must be merged from all CI jobs (unit tests and E2E) into a single profile.
2. Two independent gates must be applied:
- **Absolute floor**: total coverage ≥ 95%, matching the Codecov project target.
- **No regression**: total coverage must not drop compared to the `main` baseline.
3. The merged profile must be filtered using the same exclusions as `.codecov.yml` (generated files, mocks, integration test infrastructure) so both tools report from a single source of truth.
4. A `Coverage Gate` check-run must always be posted to the PR — even when no coverage data is available — so it can be used as a required status check in branch protection.
5. The workflow must run for `pull_request`, `merge_group`, and `push` (to `main`) events triggered through the CI Orchestrator, as well as via manual `workflow_dispatch`.
6. On `main`-branch runs, the coverage baseline must be cached for future PR comparisons.

- Rename the `flags` input to `flag` (singular — all callers pass exactly one value).
- After staging the coverage files, upload them as a `coverage-<flag>` artifact (7-day retention) **before** the Codecov upload step, so the artifact is available to the fan-in even if the Codecov upload fails (e.g. rate-limit).
- The artifact name is derived as `coverage-<flag>`, removing the need for a separate `artifact-name` input.
### Success Criteria

### Caller workflows (11 files)
- `Coverage Gate` and `Metrics Comparison` check-runs appear on every PR and merge-queue run.
- Coverage regressions block PRs when `Coverage Gate` is added to required status checks.
- Manual re-runs via `workflow_dispatch` allow re-posting checks from any branch.

Update every `upload-codecov` call site to use `flag:` (singular) instead of `flags:`. No other change is needed — artifact naming is derived automatically from the flag value.
## Implementation Overview

### `gocovmerge` tool (`internal/tools/`)
### Coverage Artifact Pipeline

Add `github.com/wadey/gocovmerge` as a pinned blank import in `internal/tools/tools.go` and a corresponding `install-coverage-tools` Make target in `scripts/makefiles/Tools.mk`. Coverage percentage is computed with `go tool cover -func` from the standard Go toolchain — no additional binary is required.
Each CI job uploads its coverage profile as a `coverage-<flag>` artifact (7-day retention) via `.github/actions/upload-codecov/action.yml`, alongside the existing Codecov upload.

### Fan-in workflow (`ci-compare-metrics.yml` → `ci-summary-report.yml`)
### Fan-in Workflow (`ci-summary-report.yml`)

Rename the workflow. The single `summary-report` job runs for both `pull_request` events and pushes to `main`:
The single `summary-report` job:

- Downloads all artifacts to `.artifacts/` via the existing `github-script` loop. On `main`-branch runs no PR number is found; a warning is logged and the step succeeds so subsequent steps can continue.
- Runs `scripts/e2e/metrics_summary.sh` for metrics comparison (PR runs only, gated on `pr_number`).
- Unconditionally merges all `coverage-*/**.out` profiles with `gocovmerge` and computes total percentage with `go tool cover -func`.
- On PR runs: restores baseline from `actions/cache`, applies the two coverage gates, appends a coverage section to `combined_summary.md`, posts a sticky PR comment, and creates `Metrics Comparison` and `Coverage Gate` check runs.
- On main-branch runs: saves the computed coverage percentage to `actions/cache` under `coverage-baseline_<run_id>` (prefix `coverage-baseline`) for future PR comparisons.
1. **Resolves the source run** — determines the CI Orchestrator run ID (from `workflow_run` event or `workflow_dispatch` input), validates it succeeded, and extracts PR metadata (number + head SHA) via the GitHub API.
2. **Downloads all artifacts** — uses `gh run download` to fetch all artifacts from the source run.
3. **Merges and gates coverage** — merges all `coverage-*/*.out` profiles with `gocovmerge`, filters excluded paths, and applies the two coverage gates.
4. **Posts results** — creates `Metrics Comparison` and `Coverage Gate` check-runs on the PR. When no coverage data exists, `Coverage Gate` reports success with a "skipped" note to satisfy branch protection.
5. **Saves baseline on `main`** — caches the coverage percentage for future PR comparisons.

### Branch protection (optional)
### Key Files

Add `Coverage Gate` to the required status checks alongside the existing `Metrics Comparison` check.
| File | Role |
|------|------|
| `.github/workflows/ci-summary-report.yml` | Fan-in workflow |
| `.github/actions/upload-codecov/action.yml` | Coverage artifact upload |
| `.github/workflows/ci-orchestrator.yml` | Triggers the fan-in |
| `scripts/e2e/filter_coverage.py` | Applies `.codecov.yml` exclusions |
| `internal/tools/tools.go` | `gocovmerge` tool dependency |
| `.codecov.yml` | Single source of truth for ignore patterns |

## Consequences

### Positive

- **Faster feedback**: coverage gate result appears as soon as the CI Orchestrator completes, without waiting for Codecov's external processing pipeline.
- **Reliability**: eliminates Codecov rate-limit failures and network timeouts blocking PRs.
- **Consolidated reporting**: performance metrics and coverage appear in a single sticky PR comment, reducing comment noise.
- **Minimal new infrastructure**: the `github-script` artifact download loop and `actions/cache` baseline pattern are reused directly from existing workflows.
- **Faster feedback**: coverage gate result appears as soon as the CI Orchestrator completes.
- **Reliability**: eliminates Codecov rate-limit failures blocking PRs.
- **Consolidated reporting**: performance metrics and coverage appear in a single sticky PR comment.
- **Required status check safe**: `Coverage Gate` is always created, even when coverage is skipped.

### Negative

- **Artifact storage cost**: `coverage-*` artifacts are ~1–5 MB each × ~20 matrix jobs = ~50–100 MB per CI run, retained 7 days. GitHub-hosted storage is generally within free-tier limits for open-source projects.
- **Longer summary workflow**: `make install-coverage-tools`, `gocovmerge`, and `go tool cover` add steps to the fan-in job.
- **One new tool dependency**: `github.com/wadey/gocovmerge` is added to `internal/tools/go.mod`, version-pinned like all other tools.
- **Trigger change latency**: triggering on `"CI Orchestrator"` means the fan-in waits for all three stages, which is consistent with the existing Codecov reporting latency.
- **Artifact storage cost**: `coverage-*` artifacts add ~50–100 MB per CI run (7-day retention).
- **One tool dependency**: `github.com/wadey/gocovmerge` in `internal/tools/go.mod`.

### Neutral

- Codecov remains active; removing it can be a follow-up decision once the new gate has been validated.
- Codecov remains active for long-term trending; removing it can be a follow-up decision.

## References

- Reusable coverage action: `.github/actions/upload-codecov/action.yml`
- CI Summary Report workflow: `.github/workflows/ci-summary-report.yml`
- Metrics snapshot reusable action (cache pattern): `.github/actions/verify-metrics-snapshot/action.yaml`
- Main CI orchestrator: `.github/workflows/ci-orchestrator.yml`
- Tool registry: `internal/tools/tools.go`, `internal/tools/go.mod`
- Tool install targets: `scripts/makefiles/Tools.mk`
- Coverage policy: `.codecov.yml`
- [CI Summary Report workflow](/.github/workflows/ci-summary-report.yml)
- [Coverage upload action](/.github/actions/upload-codecov/action.yml)
- [CI Orchestrator](/.github/workflows/ci-orchestrator.yml)
- [Coverage filter script](/scripts/e2e/filter_coverage.py)
- [Tool registry](/internal/tools/tools.go)
- [Coverage policy](/.codecov.yml)
107 changes: 107 additions & 0 deletions scripts/e2e/filter_coverage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
#!/usr/bin/env python3
# Copyright (c) 2026 The Jaeger Authors.
# SPDX-License-Identifier: Apache-2.0
#
# Filters a Go coverage profile in-place by applying the same exclusions defined
# in .codecov.yml so coverage metrics stay in sync between this gate and Codecov.
#
# Usage:
# python3 scripts/e2e/filter_coverage.py <coverage.out> [path/to/.codecov.yml]

import fnmatch
import os
import sys


def load_exclusions(codecov_path: str) -> list[str]:
"""Return raw glob patterns from the ignore: section of .codecov.yml."""
patterns = []
in_ignore = False
with open(codecov_path) as f:
for line in f:
stripped = line.strip()
if stripped == 'ignore:':
in_ignore = True
elif in_ignore:
if stripped.startswith('#'):
continue
if stripped.startswith('- '):
patterns.append(stripped[2:].strip('"').strip("'"))
elif stripped and not line[0].isspace():
in_ignore = False
return patterns


def read_module_path() -> str:
"""
Read the Go module path so we can strip it from coverage import paths
to produce repo-relative paths that match the .codecov.yml patterns.
"""
go_mod_path = os.path.join(os.path.dirname(codecov_path), 'go.mod')
with open(go_mod_path) as f:
Comment thread
yurishkuro marked this conversation as resolved.
for line in f:
if line.startswith('module '):
return line.split()[1].strip()
raise ValueError(f'no module directive found in {go_mod_path}')


def should_exclude(path: str, patterns: list[str]) -> bool:
"""Return True if path matches any exclusion pattern.

Patterns with wildcards are matched via fnmatch. Patterns without
wildcards are treated as plain path prefixes.
"""
for pattern in patterns:
if '*' in pattern or '?' in pattern:
if fnmatch.fnmatch(path, pattern):
return True
else:
if path.startswith(pattern):
return True
return False


def main() -> None:
if len(sys.argv) < 2:
print(f'usage: {sys.argv[0]} <coverage.out> [.codecov.yml]', file=sys.stderr)
sys.exit(1)

coverage_path = sys.argv[1]
codecov_path = sys.argv[2] if len(sys.argv) > 2 else '.codecov.yml'

try:
exclusions = load_exclusions(codecov_path)
except FileNotFoundError:
print(f'error: {codecov_path} not found', file=sys.stderr)
sys.exit(1)

module_prefix = read_module_path(go_mod_path) + '/'
kept = skipped = 0
Comment thread
yurishkuro marked this conversation as resolved.
kept_lines = []
Comment thread
yurishkuro marked this conversation as resolved.
Outdated
with open(coverage_path) as f:
for line in f:
if line.startswith('mode:'):
kept_lines.append(line)
continue
# Coverage lines: "github.com/.../file.go:line.col,line.col stmts count"
# Extract the file path (everything before the first colon).
import_path = line.split(':')[0]
# Strip module prefix to get a repo-relative path for matching.
if import_path.startswith(module_prefix):
path = import_path[len(module_prefix):]
else:
path = import_path
if should_exclude(path, exclusions):
skipped += 1
else:
kept_lines.append(line)
kept += 1

with open(coverage_path, 'w') as f:
f.writelines(kept_lines)

print(f'filter_coverage: kept {kept}, excluded {skipped} lines', file=sys.stderr)


if __name__ == '__main__':
main()
Loading
Loading