Skip to content

Conversation

@Koalk
Copy link

@Koalk Koalk commented May 28, 2025

NATS uses a controller called nacks to deploy resources in a deployed server based on CRDs created. The controller provides kubernetes events giving the status updates of the processing of the CRDs.
Closes #23167

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@Koalk Koalk requested a review from a team as a code owner May 28, 2025 09:25
@bunnyshell
Copy link

bunnyshell bot commented May 28, 2025

❌ Preview Environment undeployed from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

@codecov
Copy link

codecov bot commented May 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (master@97a38b4). Learn more about missing BASE report.
Report is 102 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #23180   +/-   ##
=========================================
  Coverage          ?   59.98%           
=========================================
  Files             ?      343           
  Lines             ?    57890           
  Branches          ?        0           
=========================================
  Hits              ?    34725           
  Misses            ?    20384           
  Partials          ?     2781           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@crenshaw-dev crenshaw-dev changed the title feat: Add custom healthcheck for NATS resources #23167 feat(health): Add custom healthcheck for NATS resources #23167 Jun 4, 2025
@crenshaw-dev crenshaw-dev added this to the v3.1 milestone Jun 4, 2025
min = minute, sec = seconds}
if os.difftime(os.time(), event_time) > 30 then
hs.status = "Degraded"
hs.message = "Trying to create resource for more than 30 seconds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens later? Would it keep trying or give up?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not defined in their documentation. I think it will depend on the reason it didn't work. I have seen it retry multiple times before giving up but also restarting the Nack controller or interacting with the NATS server resources directly can put it in an inconsistent state and I am not sure how it untangles itself.

Comment on lines +19 to +23
local pattern = "(%d+)%-(%d+)%-(%d+)%a(%d+)%:(%d+)%:([%d]+)%.%d+Z"
local year, month, day, hour, minute, seconds = condition.lastTransitionTime:match(pattern)
local event_time = os.time{year = year, month = month, day = day, hour = hour,
min = minute, sec = seconds}
if os.difftime(os.time(), event_time) > 30 then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argo CD is not the place to decide that. If the resource is unhealthy after 30 seconds and this has an impact, then it should have a feature similar to a progressDeadlineSeconds and update the status/conditions accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource doesn't have a "progressDeadlineSeconds" definition. I could check if it exists in case they add it in the future and keep the 30 seconds as a reasonable default?

Comment on lines +19 to +21
reason: Creating
status: 'True'
type: Ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds weird to me to have a resource that is "creating" and have the Ready status be already true. Are you sure that it is the behavior of the controller? This seems incorrect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. I think their idea is that "the controller knows about this resource" makes it ready. The controller then needs to read the properties from the resource and make the corresponding calls to the NATS server to create whatever the resource is representing.

@agaudreault agaudreault modified the milestones: v3.1, v3.2 Jun 16, 2025
@Koalk Koalk force-pushed the nats-nack-healthcheck branch from eaf54bd to 28a9694 Compare June 17, 2025 09:49
@Koalk Koalk force-pushed the nats-nack-healthcheck branch from 28a9694 to dbb11e7 Compare August 27, 2025 10:07
@crenshaw-dev crenshaw-dev added the component:health-check Issue related to built-in Health Check customizations label Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:health-check Issue related to built-in Health Check customizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NATS custom healthchecks for CRDs

4 participants