What did you do?
deploy postgres-exporter, run lots of queries
What did you expect to see?
everything works, nothing crashes
What did you see instead? Under which circumstances?
The number of time series created by postgres-exporter increased rapidly. Prometheus was OOM killed soon after.
Additional comment
I understand that it's commonly agreed that Prometheus metrics should have reasonably cardinality and avoid ID-like labels such as trace ID or query ID. This best practice has been discussed in this post and various community issues.
Of course, the user could always disable or drop these metrics, as I already have. But these are still relevant information and could cause confusion to those who haven't investigate closely (It did in my team). These data should be organized into more well reasonable metrics that either sum over different queries or put them into histograms.
Environment
insert list of flags used here
insert logs relevant to the issue here
What did you do?
deploy postgres-exporter, run lots of queries
What did you expect to see?
everything works, nothing crashes
What did you see instead? Under which circumstances?
The number of time series created by postgres-exporter increased rapidly. Prometheus was OOM killed soon after.
Additional comment
I understand that it's commonly agreed that Prometheus metrics should have reasonably cardinality and avoid ID-like labels such as trace ID or query ID. This best practice has been discussed in this post and various community issues.
Of course, the user could always disable or drop these metrics, as I already have. But these are still relevant information and could cause confusion to those who haven't investigate closely (It did in my team). These data should be organized into more well reasonable metrics that either sum over different queries or put them into histograms.
Environment
System information:
insert output of
uname -srmherepostgres_exporter version:
0.8.0
postgres_exporter flags:
PostgresSQL version:
insert PostgreSQL version here
Logs: