Fix OKE demo Prometheus upgrade#8183
Conversation
There was a problem hiding this comment.
Pull request overview
Pins the OKE demo’s kube-prometheus-stack Helm upgrade to a known-good chart version and enables CRD upgrade handling to prevent Prometheus Operator CRD/schema drift from breaking the scheduled OKE deployment workflow.
Changes:
- Pin
prometheus-community/kube-prometheus-stackto82.10.4inexamples/oci/deploy-all.sh. - Enable the chart’s CRD upgrade job and force conflict takeover during CRD apply.
- Add a small regression test script for the Helm invocation and a runbook documenting the incident and recovery steps.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| examples/oci/deploy-all.sh | Pins the Prometheus chart version and enables CRD upgrade job with forced conflict resolution. |
| examples/oci/deploy-all-test.sh | Adds a lightweight regression test that asserts the expected Helm command is invoked. |
| docs/oke-prometheus-upgrade-runbook.md | Documents the failure mode, recovery steps, and rationale for the pinned/CRD-upgrade fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Signed-off-by: danish9039 <[email protected]>
624cfbb to
1b0694d
Compare
CI Summary ReportMetrics Comparison❌ 72 metric change(s) detected View changed metricsmetrics_snapshot_cassandras_4.x_v004_v2_auto metrics_snapshot_cassandras_4.x_v004_v2_manual metrics_snapshot_cassandras_5.x_v004_v2_auto metrics_snapshot_cassandras_5.x_v004_v2_manual metrics_snapshot_elasticsearch_8.x_v2
metrics_snapshot_elasticsearch_9.x_v2
Code Coverage✅ Coverage 96.8% (baseline 96.8%) ➡️ View CI run | View publish logs |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8183 +/- ##
=======================================
Coverage 95.66% 95.66%
=======================================
Files 317 317
Lines 16747 16747
=======================================
Hits 16021 16021
Misses 572 572
Partials 154 154
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: danish9039 <[email protected]>
Fixes #8179.
Problem
The scheduled OKE demo deployment was upgrading
kube-prometheus-stackwithout pinning a chart version. When the chart moved to82.10.4, the cluster still had older Prometheus Operator CRDs, so the upgrade failed onAlertmanager.spec.hostNetwork.Fix
This pins the Prometheus chart to
82.10.4and enables the chart CRD upgrade path with forced conflict takeover, so the CRDs are updated before the Prometheus resources are upgraded.Verification
bash -n examples/oci/deploy-all.sh