Skip to content

Fix OKE demo Prometheus upgrade#8183

Merged
yurishkuro merged 3 commits into
jaegertracing:mainfrom
danish9039:oke-demo-fix
Mar 16, 2026
Merged

Fix OKE demo Prometheus upgrade#8183
yurishkuro merged 3 commits into
jaegertracing:mainfrom
danish9039:oke-demo-fix

Conversation

@danish9039
Copy link
Copy Markdown
Member

@danish9039 danish9039 commented Mar 15, 2026

Fixes #8179.

Problem

The scheduled OKE demo deployment was upgrading kube-prometheus-stack without pinning a chart version. When the chart moved to 82.10.4, the cluster still had older Prometheus Operator CRDs, so the upgrade failed on Alertmanager.spec.hostNetwork.

Fix

This pins the Prometheus chart to 82.10.4 and enables the chart CRD upgrade path with forced conflict takeover, so the CRDs are updated before the Prometheus resources are upgraded.

Verification

  • bash -n examples/oci/deploy-all.sh

@danish9039 danish9039 requested a review from a team as a code owner March 15, 2026 21:27
Copilot AI review requested due to automatic review settings March 15, 2026 21:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Pins the OKE demo’s kube-prometheus-stack Helm upgrade to a known-good chart version and enables CRD upgrade handling to prevent Prometheus Operator CRD/schema drift from breaking the scheduled OKE deployment workflow.

Changes:

  • Pin prometheus-community/kube-prometheus-stack to 82.10.4 in examples/oci/deploy-all.sh.
  • Enable the chart’s CRD upgrade job and force conflict takeover during CRD apply.
  • Add a small regression test script for the Helm invocation and a runbook documenting the incident and recovery steps.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
examples/oci/deploy-all.sh Pins the Prometheus chart version and enables CRD upgrade job with forced conflict resolution.
examples/oci/deploy-all-test.sh Adds a lightweight regression test that asserts the expected Helm command is invoked.
docs/oke-prometheus-upgrade-runbook.md Documents the failure mode, recovery steps, and rationale for the pinned/CRD-upgrade fix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Signed-off-by: danish9039 <[email protected]>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 15, 2026

CI Summary Report

Metrics Comparison

❌ 72 metric change(s) detected

View changed metrics

metrics_snapshot_cassandras_4.x_v004_v2_auto

metrics_snapshot_cassandras_4.x_v004_v2_manual

metrics_snapshot_cassandras_5.x_v004_v2_auto

metrics_snapshot_cassandras_5.x_v004_v2_manual

metrics_snapshot_elasticsearch_8.x_v2
3 added

  • jaeger_storage_latency_seconds
  • jaeger_storage_requests
  • rpc_server_call_duration_seconds

metrics_snapshot_elasticsearch_9.x_v2
3 removed

  • jaeger_storage_latency_seconds
  • jaeger_storage_requests
  • rpc_server_call_duration_seconds

Code Coverage

✅ Coverage 96.8% (baseline 96.8%)

➡️ View CI run | View publish logs
2026-03-15 22:28:28 UTC

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.66%. Comparing base (6bf0ed8) to head (3f3c82c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8183   +/-   ##
=======================================
  Coverage   95.66%   95.66%           
=======================================
  Files         317      317           
  Lines       16747    16747           
=======================================
  Hits        16021    16021           
  Misses        572      572           
  Partials      154      154           
Flag Coverage Δ
badger_v1 9.05% <ø> (ø)
badger_v2 1.04% <ø> (ø)
cassandra-4.x-v1-manual 13.25% <ø> (ø)
cassandra-4.x-v2-auto 1.03% <ø> (ø)
cassandra-4.x-v2-manual 1.03% <ø> (ø)
cassandra-5.x-v1-manual 13.25% <ø> (ø)
cassandra-5.x-v2-auto 1.03% <ø> (ø)
cassandra-5.x-v2-manual 1.03% <ø> (ø)
clickhouse 1.16% <ø> (ø)
elasticsearch-6.x-v1 16.83% <ø> (ø)
elasticsearch-7.x-v1 16.86% <ø> (ø)
elasticsearch-8.x-v1 17.01% <ø> (ø)
elasticsearch-8.x-v2 1.04% <ø> (-0.05%) ⬇️
elasticsearch-9.x-v2 1.09% <ø> (+0.04%) ⬆️
grpc_v1 7.79% <ø> (ø)
grpc_v2 1.04% <ø> (ø)
kafka-3.x-v2 1.04% <ø> (ø)
memory_v2 1.04% <ø> (ø)
opensearch-1.x-v1 16.91% <ø> (ø)
opensearch-2.x-v1 16.91% <ø> (ø)
opensearch-2.x-v2 1.04% <ø> (ø)
opensearch-3.x-v2 1.04% <ø> (ø)
query 1.04% <ø> (ø)
tailsampling-processor 0.52% <ø> (ø)
unittests 94.35% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: danish9039 <[email protected]>
@yurishkuro yurishkuro added the changelog:ci Change related to continuous integration / testing label Mar 16, 2026
@yurishkuro yurishkuro merged commit 25baa25 into jaegertracing:main Mar 16, 2026
66 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog:ci Change related to continuous integration / testing documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[chore]: OKE Deployment Failure: Alertmanager CRD incompatibility with kube-prometheus-stack

3 participants