Skip to content

Conversation

@jsolana
Copy link
Contributor

@jsolana jsolana commented Feb 25, 2025

Signed-off-by: Javier Solana [email protected]

Part of #21899

Add a "dry_run" label to distinguish dryrun activity from real ones to argocd_app_info metric.

The main issue is that if there are alerts based on these metrics, and the dry-run execution identifies an error (e.g., a change that violates a Kyverno policy or an invalid CRD schema) updates the metric, which can potentially trigger alerts based on it.

Adding the dry_run label allow distinguish real activity from dryrun. Eg:

# alert manager alert definition
- alert: ArgoCDApplicationUnknown
  expr: sum by (cluster) (argocd_app_info{sync_status="Unknown", dry_run!="true"}) > 0
  for: 15m
...

Note: This situation has been occurring since the origins of the argocd_app_info metric.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@jsolana jsolana requested a review from a team as a code owner February 25, 2025 19:36
@bunnyshell
Copy link

bunnyshell bot commented Feb 25, 2025

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

@codecov
Copy link

codecov bot commented Feb 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.94%. Comparing base (a1f90b5) to head (475226d).
Report is 449 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #22010      +/-   ##
==========================================
+ Coverage   55.92%   55.94%   +0.01%     
==========================================
  Files         343      343              
  Lines       57429    57432       +3     
==========================================
+ Hits        32119    32128       +9     
+ Misses      22657    22652       -5     
+ Partials     2653     2652       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@agaudreault agaudreault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dry-run is a temporary parameter on the operation. It is not meant to be persisted in the application information since it is a transient attribute. It would make sense to add it to "operations" metrics, because it is the operation that is in dry-run. This way you could know how many dry-run vs normal operations are made.

I think this is not the correct way to address the linked issue

@jsolana
Copy link
Contributor Author

jsolana commented Feb 25, 2025

It would make sense to add it to "operations" metrics, because it is the operation that is in dry-run. This way you could know how many dry-run vs normal operations are made.

I think both approaches make sense.

On one hand, adding the dry_run label to metrics more related to operations, such as argocd_app_k8s_request_total and argocd_app_sync_total, makes sense. However, I also believe it is reasonable to include it in argocd_app_info, despite its transient nature. This is because, at the moment metrics are collected, the application's reflected state could correspond to a dry-run operation. With this label, it can be ignored at the alert definition level.

The main idea of this issue is not only solve questions like how many dry-run vs normal operations are made but the status of my apps are real or based in a dryrun?

Copy link
Member

@nitishfy nitishfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the failing tests that you asked on slack, have you tried running make codegen-local command before pushing changes?

…_total and argocd_app_sync_total

Signed-off-by: Javier Solana <[email protected]>
@jsolana
Copy link
Contributor Author

jsolana commented Feb 26, 2025

For the failing tests that you asked on slack, have you tried running make codegen-local command before pushing changes?

Already done locally and there is no change. With a new commit it is solved then, It seems was a transient error. Thanks for your help!

Copy link

@harshitSkyscanner harshitSkyscanner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@harshitSkyscanner
Copy link

Should we also add dry_run label to logAppEvent (https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L2540) ?

@jsolana jsolana requested a review from agaudreault February 27, 2025 17:39
@jsolana
Copy link
Contributor Author

jsolana commented Feb 27, 2025

Should we also add dry_run label to logAppEvent (https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L2540) ?

Mm maybe is interesting for logs perspective to add dryrun info but I think it already add this info in logs.

Anyway in this case I'll prefer to move it in a different PR and issue cause is not as easy as modify the contract of logAppEvent (internally it uses ctrl.auditLogger.LogAppEvent that is used extensively throughout the project)

@nitishfy nitishfy self-assigned this Mar 4, 2025
@nitishfy
Copy link
Member

nitishfy commented Mar 4, 2025

Thanks for passing the failing checks, I'll be reviewing this shortly.

Copy link
Member

@nitishfy nitishfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jsolana
Copy link
Contributor Author

jsolana commented Mar 7, 2025

Hi!

Kyverno just merged a similar approach here.

Anything I can do to push for this PR? Thanks!

@jsolana
Copy link
Contributor Author

jsolana commented Mar 11, 2025

Friendly reminder @agaudreault ! Anything I can do to push for this change? Thanks!

@jsolana jsolana requested a review from agaudreault March 25, 2025 15:15
@jsolana
Copy link
Contributor Author

jsolana commented Mar 26, 2025

@agaudreault let me know if it is ok. Im gonna start taking a look to persist dryrun under Status. Thanks!

@agaudreault agaudreault changed the title chore: add "dry_run" label to argocd_app_info metric chore: add "dry_run" label to sync metrics Mar 26, 2025
@agaudreault agaudreault merged commit 75098e5 into argoproj:master Mar 26, 2025
29 checks passed
Hapshanko pushed a commit to Hapshanko/argo-cd that referenced this pull request Apr 29, 2025
Co-authored-by: Javier Solana <[email protected]>
Co-authored-by: Pasha Kostohrys <[email protected]>
Signed-off-by: Hapshanko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants