-
Notifications
You must be signed in to change notification settings - Fork 6.5k
feat(health): Add custom healthcheck for NATS resources #23167 #23180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| hs = {} | ||
| local healthy_cons = { Created=true, Updated=true, Noop=true, SkipCreate=true, SkipUpdate=true } | ||
| if obj.status ~= nil then | ||
| if obj.status.conditions ~= nil then | ||
| for i, condition in ipairs(obj.status.conditions) do | ||
| if condition.status == "False" then | ||
| hs.status = "Degraded" | ||
| hs.message = condition.message | ||
| return hs | ||
| end | ||
| if condition.type == "Ready" and healthy_cons[condition.reason] then | ||
| hs.status = "Healthy" | ||
| hs.message = condition.message | ||
| return hs | ||
| end | ||
| if condition.type == "Ready" and not healthy_cons[condition.reason] then | ||
| hs.status = "Progressing" | ||
| hs.message = condition.message | ||
| local pattern = "(%d+)%-(%d+)%-(%d+)%a(%d+)%:(%d+)%:([%d]+)%.%d+Z" | ||
| local year, month, day, hour, minute, seconds = condition.lastTransitionTime:match(pattern) | ||
| local event_time = os.time{year = year, month = month, day = day, hour = hour, | ||
| min = minute, sec = seconds} | ||
| if os.difftime(os.time(), event_time) > 30 then | ||
|
Comment on lines
+19
to
+23
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Argo CD is not the place to decide that. If the resource is unhealthy after 30 seconds and this has an impact, then it should have a feature similar to a progressDeadlineSeconds and update the status/conditions accordingly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The resource doesn't have a "progressDeadlineSeconds" definition. I could check if it exists in case they add it in the future and keep the 30 seconds as a reasonable default? |
||
| hs.status = "Degraded" | ||
| hs.message = "Trying to create resource for more than 30 seconds" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens later? Would it keep trying or give up? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not defined in their documentation. I think it will depend on the reason it didn't work. I have seen it retry multiple times before giving up but also restarting the Nack controller or interacting with the NATS server resources directly can put it in an inconsistent state and I am not sure how it untangles itself. |
||
| end | ||
| return hs | ||
| end | ||
| end | ||
| end | ||
| end | ||
| hs.status = "Progressing" | ||
| hs.message = "Waiting for Nacks" | ||
| return hs | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| tests: | ||
| - healthStatus: | ||
| status: Progressing | ||
| message: Waiting for Nacks | ||
| inputPath: test_data/progressing.yaml | ||
| - healthStatus: | ||
| status: Degraded | ||
| message: 'failed to create account' | ||
| inputPath: test_data/degraded_failure.yaml | ||
| - healthStatus: | ||
| status: Degraded | ||
| message: 'Trying to create resource for more than 30 seconds' | ||
| inputPath: test_data/degraded_stuck.yaml | ||
| - healthStatus: | ||
| status: Healthy | ||
| message: 'Account successfully updated' | ||
| inputPath: test_data/healthy_updated.yaml | ||
| - healthStatus: | ||
| status: Healthy | ||
| message: 'Account successfully created' | ||
| inputPath: test_data/healthy_created.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Account | ||
| metadata: | ||
| name: account-name | ||
| spec: | ||
| name: a | ||
| servers: | ||
| - nats://nats:4222 | ||
| tls: | ||
| secret: | ||
| name: nack-a-tls | ||
| ca: "ca.crt" | ||
| cert: "tls.crt" | ||
| key: "tls.key" | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:25:20.89689315Z' | ||
| message: >- | ||
| failed to create account | ||
| reason: Errored | ||
| status: 'False' | ||
| type: Ready | ||
| observedGeneration: 0 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Account | ||
| metadata: | ||
| name: account-name | ||
| spec: | ||
| name: a | ||
| servers: | ||
| - nats://nats:4222 | ||
| tls: | ||
| secret: | ||
| name: nack-a-tls | ||
| ca: "ca.crt" | ||
| cert: "tls.crt" | ||
| key: "tls.key" | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:22:01.962295009Z' | ||
| message: Creating account | ||
| reason: Creating | ||
| status: 'True' | ||
| type: Ready | ||
|
Comment on lines
+19
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds weird to me to have a resource that is "creating" and have the Ready status be already true. Are you sure that it is the behavior of the controller? This seems incorrect. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is. I think their idea is that "the controller knows about this resource" makes it ready. The controller then needs to read the properties from the resource and make the corresponding calls to the NATS server to create whatever the resource is representing. |
||
| observedGeneration: 0 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Account | ||
| metadata: | ||
| name: account-name | ||
| spec: | ||
| name: a | ||
| servers: | ||
| - nats://nats:4222 | ||
| tls: | ||
| secret: | ||
| name: nack-a-tls | ||
| ca: "ca.crt" | ||
| cert: "tls.crt" | ||
| key: "tls.key" | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:22:01.962295009Z' | ||
| message: Account successfully created | ||
| reason: Created | ||
| status: 'True' | ||
| type: Ready | ||
| observedGeneration: 1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Account | ||
| metadata: | ||
| name: account-name | ||
| spec: | ||
| name: a | ||
| servers: | ||
| - nats://nats:4222 | ||
| tls: | ||
| secret: | ||
| name: nack-a-tls | ||
| ca: "ca.crt" | ||
| cert: "tls.crt" | ||
| key: "tls.key" | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:22:01.962295009Z' | ||
| message: Account successfully updated | ||
| reason: Updated | ||
| status: 'True' | ||
| type: Ready | ||
| observedGeneration: 1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Account | ||
| metadata: | ||
| name: account-name | ||
| spec: | ||
| name: a | ||
| servers: | ||
| - nats://nats:4222 | ||
| tls: | ||
| secret: | ||
| name: nack-a-tls | ||
| ca: "ca.crt" | ||
| cert: "tls.crt" | ||
| key: "tls.key" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| hs = {} | ||
| local healthy_cons = { Created=true, Updated=true, Noop=true, SkipCreate=true, SkipUpdate=true } | ||
| if obj.status ~= nil then | ||
| if obj.status.conditions ~= nil then | ||
| for i, condition in ipairs(obj.status.conditions) do | ||
| if condition.status == "False" then | ||
| hs.status = "Degraded" | ||
| hs.message = condition.message | ||
| return hs | ||
| end | ||
| if condition.type == "Ready" and healthy_cons[condition.reason] then | ||
| hs.status = "Healthy" | ||
| hs.message = condition.message | ||
| return hs | ||
| end | ||
| if condition.type == "Ready" and not healthy_cons[condition.reason] then | ||
| hs.status = "Progressing" | ||
| hs.message = condition.message | ||
| local pattern = "(%d+)%-(%d+)%-(%d+)%a(%d+)%:(%d+)%:([%d]+)%.%d+Z" | ||
| local year, month, day, hour, minute, seconds = condition.lastTransitionTime:match(pattern) | ||
| local event_time = os.time{year = year, month = month, day = day, hour = hour, | ||
| min = minute, sec = seconds} | ||
| if os.difftime(os.time(), event_time) > 30 then | ||
| hs.status = "Degraded" | ||
| hs.message = "Trying to create resource for more than 30 seconds" | ||
| end | ||
| return hs | ||
| end | ||
| end | ||
| end | ||
| end | ||
| hs.status = "Progressing" | ||
| hs.message = "Waiting for Nacks" | ||
| return hs |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| tests: | ||
| - healthStatus: | ||
| status: Progressing | ||
| message: Waiting for Nacks | ||
| inputPath: test_data/progressing.yaml | ||
| - healthStatus: | ||
| status: Degraded | ||
| message: 'failed to create consumer' | ||
| inputPath: test_data/degraded_failure.yaml | ||
| - healthStatus: | ||
| status: Degraded | ||
| message: 'Trying to create resource for more than 30 seconds' | ||
| inputPath: test_data/degraded_stuck.yaml | ||
| - healthStatus: | ||
| status: Healthy | ||
| message: 'Consumer successfully updated' | ||
| inputPath: test_data/healthy_updated.yaml | ||
| - healthStatus: | ||
| status: Healthy | ||
| message: 'Consumer successfully created' | ||
| inputPath: test_data/healthy_created.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Consumer | ||
| metadata: | ||
| name: consumer-name | ||
| spec: | ||
| streamName: mystream | ||
| durableName: my-pull-consumer | ||
| deliverPolicy: all | ||
| filterSubject: orders.received | ||
| maxDeliver: 20 | ||
| ackPolicy: explicit | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:25:20.89689315Z' | ||
| message: >- | ||
| failed to create consumer | ||
| reason: Errored | ||
| status: 'False' | ||
| type: Ready | ||
| observedGeneration: 0 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Consumer | ||
| metadata: | ||
| name: consumer-name | ||
| spec: | ||
| streamName: mystream | ||
| durableName: my-pull-consumer | ||
| deliverPolicy: all | ||
| filterSubject: orders.received | ||
| maxDeliver: 20 | ||
| ackPolicy: explicit | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:22:01.962295009Z' | ||
| message: Creating consumer | ||
| reason: Creating | ||
| status: 'True' | ||
| type: Ready | ||
| observedGeneration: 0 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Consumer | ||
| metadata: | ||
| name: consumer-name | ||
| spec: | ||
| streamName: mystream | ||
| durableName: my-pull-consumer | ||
| deliverPolicy: all | ||
| filterSubject: orders.received | ||
| maxDeliver: 20 | ||
| ackPolicy: explicit | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:22:01.962295009Z' | ||
| message: Consumer successfully created | ||
| reason: Created | ||
| status: 'True' | ||
| type: Ready | ||
| observedGeneration: 1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Consumer | ||
| metadata: | ||
| name: consumer-name | ||
| spec: | ||
| streamName: mystream | ||
| durableName: my-pull-consumer | ||
| deliverPolicy: all | ||
| filterSubject: orders.received | ||
| maxDeliver: 20 | ||
| ackPolicy: explicit | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:22:01.962295009Z' | ||
| message: Consumer successfully updated | ||
| reason: Updated | ||
| status: 'True' | ||
| type: Ready | ||
| observedGeneration: 1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: Consumer | ||
| metadata: | ||
| name: consumer-name | ||
| spec: | ||
| streamName: mystream | ||
| durableName: my-pull-consumer | ||
| deliverPolicy: all | ||
| filterSubject: orders.received | ||
| maxDeliver: 20 | ||
| ackPolicy: explicit |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| hs = {} | ||
| local healthy_cons = { Created=true, Updated=true, Noop=true, SkipCreate=true, SkipUpdate=true } | ||
| if obj.status ~= nil then | ||
| if obj.status.conditions ~= nil then | ||
| for i, condition in ipairs(obj.status.conditions) do | ||
| if condition.status == "False" then | ||
| hs.status = "Degraded" | ||
| hs.message = condition.message | ||
| return hs | ||
| end | ||
| if condition.type == "Ready" and healthy_cons[condition.reason] then | ||
| hs.status = "Healthy" | ||
| hs.message = condition.message | ||
| return hs | ||
| end | ||
| if condition.type == "Ready" and not healthy_cons[condition.reason] then | ||
| hs.status = "Progressing" | ||
| hs.message = condition.message | ||
| local pattern = "(%d+)%-(%d+)%-(%d+)%a(%d+)%:(%d+)%:([%d]+)%.%d+Z" | ||
| local year, month, day, hour, minute, seconds = condition.lastTransitionTime:match(pattern) | ||
| local event_time = os.time{year = year, month = month, day = day, hour = hour, | ||
| min = minute, sec = seconds} | ||
| if os.difftime(os.time(), event_time) > 30 then | ||
| hs.status = "Degraded" | ||
| hs.message = "Trying to create resource for more than 30 seconds" | ||
| end | ||
| return hs | ||
| end | ||
| end | ||
| end | ||
| end | ||
| hs.status = "Progressing" | ||
| hs.message = "Waiting for Nacks" | ||
| return hs |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| tests: | ||
| - healthStatus: | ||
| status: Progressing | ||
| message: Waiting for Nacks | ||
| inputPath: test_data/progressing.yaml | ||
| - healthStatus: | ||
| status: Degraded | ||
| message: 'failed to create keyvalue' | ||
| inputPath: test_data/degraded_failure.yaml | ||
| - healthStatus: | ||
| status: Degraded | ||
| message: 'Trying to create resource for more than 30 seconds' | ||
| inputPath: test_data/degraded_stuck.yaml | ||
| - healthStatus: | ||
| status: Healthy | ||
| message: 'KeyValue successfully updated' | ||
| inputPath: test_data/healthy_updated.yaml | ||
| - healthStatus: | ||
| status: Healthy | ||
| message: 'KeyValue successfully created' | ||
| inputPath: test_data/healthy_created.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| apiVersion: jetstream.nats.io/v1beta2 | ||
| kind: KeyValue | ||
| metadata: | ||
| name: keyvalue-name | ||
| spec: | ||
| bucket: my-key-value | ||
| history: 20 | ||
| storage: file | ||
| maxBytes: 2048 | ||
| compression: true | ||
| status: | ||
| conditions: | ||
| - lastTransitionTime: '2025-05-14T14:25:20.89689315Z' | ||
| message: >- | ||
| failed to create keyvalue | ||
| reason: Errored | ||
| status: 'False' | ||
| type: Ready | ||
| observedGeneration: 0 |
Uh oh!
There was an error while loading. Please reload this page.