-
Notifications
You must be signed in to change notification settings - Fork 125
Add subscription collector and CLI flag for PVE subscription info #370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add subscription collector and CLI flag for PVE subscription info #370
Conversation
| labels=["node", "level"], | ||
| ) | ||
|
|
||
| for node in nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is iterating through nodes. Therefore it should go into node metrics, not cluster metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fair. I moved the implementation to node.py and reset cluster.py to its previous state.
|
Testing this on a degraded cluster (one node missing). I get the following trace: The relevant frame is: This is a well known issue (in the context of this project), and also very much non-intuitive. This call is resulting in (at least) two HTTP(S) requests. One from So please remove the |
… used in other collectors and mittigate scape error on degraded clusters
|
Removed the |
src/pve_exporter/cli.py
Outdated
| clusterflags.add_argument('--collector.subscription', dest='collector_subscription', | ||
| action=BooleanOptionalAction, default=True, | ||
| help='Exposes PVE subscription info') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block should be moved to nodeflags (further down).
| info_metric = GaugeMetricFamily( | ||
| "pve_subscription_info", | ||
| "Proxmox VE subscription info (1 if present)", | ||
| labels=["node", "level", "status"], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be able to alert on the subscription status, then it shouldn't be a label on an *_info metric. Take a look at the pve_ha_state and pve_lock_state metrics (#302 and #303). With that metric design, I can add an alert which triggers if pve_lock_state != 0 remains for more than, e.g. 5 minutes. And I can have more relaxed alerts for pve_lock_state{state="backup"} != 0 (because backups can take longer).
It looks like the subscription status is an enum with the following options: new notfound active invalid expired suspended.
Thus, the metrics could maybe look more like this:
pve_subscription_status{id="node/proxmox",status="new"} 0.0
pve_subscription_status{id="node/proxmox",status="notfound"} 0.0
pve_subscription_status{id="node/proxmox",status="active"} 1.0
pve_subscription_status{id="node/proxmox",status="invalid"} 0.0
pve_subscription_status{id="node/proxmox",status="expired"} 0.0
Alerting could then be done on pve_subscription_status{status!="active"} != 0
src/pve_exporter/collector/node.py
Outdated
| ) | ||
|
|
||
| next_due_metric = GaugeMetricFamily( | ||
| "pve_subscription_next_due_timestamp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the prometheus metric naming recommendations, this should end in the unit (i.e. with a _seconds suffix).
pve_subscription_next_due_timestamp_seconds
…atus metric. added id to subscription labels
|
I have moved the flag to the I have added the I have added the I have added |
This PR adds two new metrics to the PVE exporter exposing Proxmox subscription information.
pve_subscription_info: gauge with labels forlevel(community, etc.),node, andstatus(value is1when a subscription is present).pve_subscription_next_due_timestamp: gauge containing the Unix timestamp for the next subscription renewal, with labelsnodeandlevel.Example: