Summary
Add a Prometheus counter argocd_cluster_events_ignored_total that increments each time skipResourceUpdate() filters out a resource event due to ignoreResourceUpdates rules. Currently there is zero observability into whether these rules are working — the only signal is a debug-level log line.
Motivation
The ignoreResourceUpdates feature (introduced in v2.8) suppresses unnecessary reconciliation when watched Kubernetes resources change in fields that operators have deemed irrelevant (e.g., /status, /metadata/managedFields). However, there is no metric to observe how many events are being filtered. The only signal is a debug-level log line in controller/cache/cache.go:
log.WithFields(log.Fields{...}).Debugf("Ignoring change of object ...")
This makes it impossible to measure the effectiveness of ignoreResourceUpdates rules without enabling debug logging on the application controller — which is prohibitively expensive at scale.
The cost of debug logging (the only current alternative)
We operate a large ArgoCD deployment (10 controller shards, 300+ clusters). When we temporarily enabled debug logging on the application controller to observe ignoreResourceUpdates behavior, we measured the following impact over 10-minute windows:
| Level |
Log lines / 10min |
Bytes / 10min |
Lines / hour |
Bytes / hour |
| info |
~454K |
~170 MB |
~2.7M |
~1.0 GB |
| debug |
~7.8M |
~1.4 GB |
~46.7M |
~8.0 GB |
| multiplier |
17x |
8x |
17x |
8x |
Extrapolated: debug logging costs an additional ~169 GB/day in log volume. This makes it impractical to run debug logging for any extended period to tune ignoreResourceUpdates rules, yet without it there is zero observability into whether the rules are working or how much load they're shedding.
Use cases
- Measure effectiveness: Compare
argocd_cluster_events_ignored_total against argocd_cluster_events_total to see what percentage of events are being filtered per resource type.
- Tune rules: Identify high-frequency resource types that aren't yet covered by ignore rules.
- Detect regressions: Alert if the ratio suddenly changes, indicating a misconfiguration or upstream behavior change.
Proposal
Add a new counter using the same labels as the existing argocd_cluster_events_total counter (server, group, kind):
argocd_cluster_events_ignored_total{server="...", group="apps", kind="Deployment"} 42
Changes required
controller/metrics/metrics.go: Define clusterEventsIgnoredCounter counter, add struct field, register, expose IncClusterEventsIgnoredCount() method, and reset on expiration.
controller/cache/cache.go: Call IncClusterEventsIgnoredCount() in the skipResourceUpdate early-return path.
controller/metrics/metrics_test.go: Add test for the new counter.
docs/operator-manual/metrics.md: Document the new metric.
I have a working implementation ready and can submit a PR.
Summary
Add a Prometheus counter
argocd_cluster_events_ignored_totalthat increments each timeskipResourceUpdate()filters out a resource event due toignoreResourceUpdatesrules. Currently there is zero observability into whether these rules are working — the only signal is a debug-level log line.Motivation
The
ignoreResourceUpdatesfeature (introduced in v2.8) suppresses unnecessary reconciliation when watched Kubernetes resources change in fields that operators have deemed irrelevant (e.g.,/status,/metadata/managedFields). However, there is no metric to observe how many events are being filtered. The only signal is a debug-level log line incontroller/cache/cache.go:This makes it impossible to measure the effectiveness of
ignoreResourceUpdatesrules without enabling debug logging on the application controller — which is prohibitively expensive at scale.The cost of debug logging (the only current alternative)
We operate a large ArgoCD deployment (10 controller shards, 300+ clusters). When we temporarily enabled debug logging on the application controller to observe
ignoreResourceUpdatesbehavior, we measured the following impact over 10-minute windows:Extrapolated: debug logging costs an additional ~169 GB/day in log volume. This makes it impractical to run debug logging for any extended period to tune
ignoreResourceUpdatesrules, yet without it there is zero observability into whether the rules are working or how much load they're shedding.Use cases
argocd_cluster_events_ignored_totalagainstargocd_cluster_events_totalto see what percentage of events are being filtered per resource type.Proposal
Add a new counter using the same labels as the existing
argocd_cluster_events_totalcounter (server,group,kind):Changes required
controller/metrics/metrics.go: DefineclusterEventsIgnoredCountercounter, add struct field, register, exposeIncClusterEventsIgnoredCount()method, and reset on expiration.controller/cache/cache.go: CallIncClusterEventsIgnoredCount()in theskipResourceUpdateearly-return path.controller/metrics/metrics_test.go: Add test for the new counter.docs/operator-manual/metrics.md: Document the new metric.I have a working implementation ready and can submit a PR.