-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Introduce a context for canceling wait operations #31435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d471fea to
27f6766
Compare
|
@joejulian I think you mentioned in the dev meeting that some changes are required? Can you please clarify? 🙏 Thanks! |
27f6766 to
eb2c833
Compare
|
@joejulian @gjenkins8 @mattfarina I added tests, please review and let me know if anything else is missing 🙏 |
eb2c833 to
a592695
Compare
|
@gjenkins8 Fixed both comments, thanks! PTAL one more time 🙏 |
|
@joejulian I was reflecting about what you said in the last dev meeting and I think I understood what you meant. Upon a wait timeout, the release is left in the pending state i.e. Flux has dealt with this a long time ago: we detect this pending state and "unlock" the release to keep moving forward. A wait timeout is a very common issue and Flux couldn't leave this unattended for so long. Here's how we do it: Detection:
So, as I said before, from the perspective of us SDK users, the feature being introduced in this PR is merely "a faster/smarter/customized way for producing timeouts on wait operations". It's beyond the scope of this PR dealing with timeouts. I think with this we can lift the embargo here, what do you think? @joejulian @gjenkins8 @mattfarina @scottrigby @stealthybox @stefanprodan Furthermore, I have backported the legacy waiter changes here to a fork from - apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2025-11-01T12:13:44Z"
involvedObject:
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
name: podinfo
namespace: frontend
resourceVersion: "92444307"
uid: 29c2a513-5662-42b6-afe4-ed157a5c60f2
kind: Event
lastTimestamp: "2025-11-01T12:13:44Z"
message: |-
Helm upgrade failed for release frontend/podinfo with chart [email protected]+971fef0d04d5: New reconciliation triggered by HelmRelease/frontend/podinfo
Last Helm logs:
2025-11-01T12:12:50.46219424Z: Looks like there are no changes for Service "podinfo"
2025-11-01T12:12:50.477013785Z: Patch Deployment "podinfo" in namespace frontend
2025-11-01T12:12:50.494020691Z: Looks like there are no changes for Ingress "podinfo"
2025-11-01T12:12:50.508865623Z: Patch ServiceMonitor "podinfo" in namespace frontend
2025-11-01T12:12:50.526101352Z: waiting for release podinfo resources (created: 0 updated: 4 deleted: 0)
2025-11-01T12:12:50.526105791Z: beginning wait for 4 resources with timeout of 3m0s
2025-11-01T12:12:50.550119838Z: Deployment is not ready: frontend/podinfo. observedGeneration (22) does not match spec generation (23).
2025-11-01T12:12:52.547186135Z: Deployment is not ready: frontend/podinfo. 0 out of 1 expected pods are ready
2025-11-01T12:13:42.540396144Z: Deployment is not ready: frontend/podinfo. 0 out of 1 expected pods are ready (24 duplicate lines omitted)
2025-11-01T12:13:44.349242962Z: wait for resources failed after 54s: context canceled
2025-11-01T12:13:44.367689523Z: warning: Upgrade "podinfo" failed: context canceled
metadata:
annotations:
helm.toolkit.fluxcd.io/app-version: 6.9.2
helm.toolkit.fluxcd.io/oci-digest: sha256:971fef0d04d5b3d03d035701dad59411ea0f60e28d16190f02469ddfe5587588
helm.toolkit.fluxcd.io/revision: 6.9.2+971fef0d04d5
helm.toolkit.fluxcd.io/token: sha256:1bb60c0d9974502e0cbea5486437617fcec597d9c99f034796135fdabf91b062
creationTimestamp: "2025-11-01T12:13:44Z"
name: podinfo.1873e0f58808d7bd
namespace: frontend
resourceVersion: "92444900"
uid: 2a338055-6054-4038-8072-44c389311d4b
reason: HealthCheckCanceled
reportingComponent: helm-controller
reportingInstance: ""
source:
component: helm-controller
type: WarningAnd immediately after this event appears, helm-controller reconciles the Flux HelmRelease once again, this time successfully: This shows that Flux is correctly dealing with the release left on the pending state, i.e. the same code path that deals with wait timeouts is kicking in. |
|
Sure, that's fine with me. I just wanted to make sure we had an expected outcome that can be documented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes here -- introducing a context for the wait operation (that can be canceled by the caller) -- look good to me.
Additional behavior around what to do with the release once the wait has been canceled, remains/is the callers responsibility (and hopefully Helm can improve here over time too)
Authors have tested the changes above (via flux)
Signed-off-by: Matheus Pimenta <[email protected]>
a592695 to
efc1702
Compare
scottrigby
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
joejulian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already been approved and merged, but I'll still add my posthumous (not sure if that's a good word for this) approval.
Closes: #31382
The behavior expected for a release with a custom wait context that gets done during a wait operation is expected to be the same as the behavior for a timeout (which is the default behavior).