Skip to content

Conversation

@mmckeen
Copy link
Contributor

@mmckeen mmckeen commented Oct 28, 2025

Description

With advanced metrics, high-cardinality labels can cause the metrics export to bloat leading to unbounded memory and resource usage.

This PR sets up an optional TTL for advanced metrics defined in the MetricsConfiguration CRD. By default, the TTL is infinite and cleanup is not tracked or done.

When defined, on a period equal to the TTL metrics which have not been updated within the last TTL duration will be removed from the metrics export.

For counters and gauges that look like counters, this will be treated by Prometheus similar to any other missing metric (for example from an application restart). As long as functions like rate or increase are used, calculations will remain accurate.

Related Issue

#1692

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Deployed, modified CRD to enable/disable and change the TTL.

Metrics are re-initialized as expected.

Made sure that CRD validation rejects invalid TTL values.


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@mmckeen mmckeen requested a review from a team as a code owner October 28, 2025 19:12
@mmckeen mmckeen force-pushed the removeStaleMetrics branch from 25a4fc2 to af3a22c Compare October 29, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant