Skip to content

Conversation

@nicololuescher
Copy link

@nicololuescher nicololuescher commented Nov 27, 2025

This PR adds two new metrics to the PVE exporter exposing Proxmox subscription information.

pve_subscription_info: gauge with labels for level (community, etc.), node, and status (value is 1 when a subscription is present).
pve_subscription_next_due_timestamp : gauge containing the Unix timestamp for the next subscription renewal, with labels node and level.

Example:

# HELP pve_subscription_info Proxmox VE subscription info (1 if present)
# TYPE pve_subscription_info gauge
pve_subscription_info{level="c",node="node-001",status="active"} 1.0
# HELP pve_subscription_next_due_timestamp Subscription next due date as Unix timestamp
# TYPE pve_subscription_next_due_timestamp gauge
pve_subscription_next_due_timestamp{level="c",node="node-001"} 1.788756e+09

labels=["node", "level"],
)

for node in nodes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is iterating through nodes. Therefore it should go into node metrics, not cluster metrics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fair. I moved the implementation to node.py and reset cluster.py to its previous state.

@znerol
Copy link
Member

znerol commented Nov 28, 2025

Testing this on a degraded cluster (one node missing). I get the following trace:

Traceback (most recent call last):
  File "prometheus-pve-exporter/src/pve_exporter/http.py", line 101, in view
    return view_registry[endpoint](**params)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "prometheus-pve-exporter/src/pve_exporter/http.py", line 37, in on_pve
    output = collect_pve(
        self._config[module],
    ...<3 lines>...
        self._collectors
    )
  File "prometheus-pve-exporter/src/pve_exporter/collector/__init__.py", line 58, in collect_pve
    return generate_latest(registry)
  File "prometheus-pve-exporter/.venv/lib/python3.13/site-packages/prometheus_client/exposition.py", line 289, in generate_latest
    for metric in registry.collect():
                  ~~~~~~~~~~~~~~~~^^
  File "prometheus-pve-exporter/.venv/lib/python3.13/site-packages/prometheus_client/registry.py", line 97, in collect
    yield from collector.collect()
  File "prometheus-pve-exporter/src/pve_exporter/collector/node.py", line 155, in collect
    subscription = self._pve.nodes(node['name']).subscription.get()
  File "prometheus-pve-exporter/.venv/lib/python3.13/site-packages/proxmoxer/core.py", line 167, in get
    return self(args)._request("GET", params=params)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "prometheus-pve-exporter/.venv/lib/python3.13/site-packages/proxmoxer/core.py", line 147, in _request
    raise ResourceException(
    ...<6 lines>...
    )
proxmoxer.core.ResourceException: 595 Errors during connection establishment, proxy handshake: No route to host - {'errors': b''}

The relevant frame is:

  File "prometheus-pve-exporter/src/pve_exporter/collector/node.py", line 155, in collect
    subscription = self._pve.nodes(node['name']).subscription.get()

This is a well known issue (in the context of this project), and also very much non-intuitive. This call is resulting in (at least) two HTTP(S) requests. One from pve_exporter to the target node. And at least one additionally one from the target node to node['name']. If node['name'] is down, then the whole scrape will fail. See #156 (and also #55 and #58) for more details.

So please remove the for loop and instead implement the same approach as NodeConfigCollector and NodeReplicationCollector:

        node = None
        for entry in self._pve.cluster.status.get():
            if entry['type'] == 'node' and entry['local']:
                node = entry['name']
                break

        subscription = self._pve.nodes(node).subscription.get()
        [...]

… used in other collectors and mittigate scape error on degraded clusters
@nicololuescher
Copy link
Author

Removed the for loop and implemented approach of other collectors.

Comment on lines 31 to 33
clusterflags.add_argument('--collector.subscription', dest='collector_subscription',
action=BooleanOptionalAction, default=True,
help='Exposes PVE subscription info')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block should be moved to nodeflags (further down).

Comment on lines 140 to 144
info_metric = GaugeMetricFamily(
"pve_subscription_info",
"Proxmox VE subscription info (1 if present)",
labels=["node", "level", "status"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to be able to alert on the subscription status, then it shouldn't be a label on an *_info metric. Take a look at the pve_ha_state and pve_lock_state metrics (#302 and #303). With that metric design, I can add an alert which triggers if pve_lock_state != 0 remains for more than, e.g. 5 minutes. And I can have more relaxed alerts for pve_lock_state{state="backup"} != 0 (because backups can take longer).

It looks like the subscription status is an enum with the following options: new notfound active invalid expired suspended.

Thus, the metrics could maybe look more like this:

pve_subscription_status{id="node/proxmox",status="new"} 0.0
pve_subscription_status{id="node/proxmox",status="notfound"} 0.0
pve_subscription_status{id="node/proxmox",status="active"} 1.0
pve_subscription_status{id="node/proxmox",status="invalid"} 0.0
pve_subscription_status{id="node/proxmox",status="expired"} 0.0

Alerting could then be done on pve_subscription_status{status!="active"} != 0

)

next_due_metric = GaugeMetricFamily(
"pve_subscription_next_due_timestamp",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the prometheus metric naming recommendations, this should end in the unit (i.e. with a _seconds suffix).

pve_subscription_next_due_timestamp_seconds

…atus metric. added id to subscription labels
@nicololuescher
Copy link
Author

I have moved the flag to the nodeflags arguments.

I have added the _seconds to the timestamp metric to indicate the unit.

I have added the pve_subscription_status as per your suggestion with the label status indicating the current state.

I have added id as a label to be consistent with the other metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants