Skip to content

Conversation

@ShravaniVangur
Copy link
Contributor

@ShravaniVangur ShravaniVangur commented Nov 4, 2025

This pr introduces two new metrics- the total number of cephfs pv and the total number of subvolumes in a cephfilesystem.
This pr also introduces an alert which gets triggered when stale subvolumes are present.
Ref: https://issues.redhat.com/browse/RHSTOR-7531

//Todo: Addition of test files.

This commit introduces two new metrics- the total number of cephfs
pv and the total number of subvolumes in a cephfilesystem.

Signed-off-by: ShravaniVangur <[email protected]>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ShravaniVangur
Once this PR has been reviewed and has the lgtm label, please assign leelavg for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

If total number of pvs and subvolumes do not match,
the alert gets triggered.

Signed-off-by: ShravaniVangur <[email protected]>
@ShravaniVangur ShravaniVangur changed the title (WIP) metrics-exporter: add new metrics to check for stale subvolumes metrics-exporter: add new metrics and alert to check for stale subvolumes Nov 10, 2025
@ShravaniVangur ShravaniVangur marked this pull request as ready for review November 10, 2025 11:48
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 10, 2025
return nil, fmt.Errorf("no CephFilesystem CRs found")
}

fsName := cephFilesystems.Items[0].Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the fsName is of the first filesystem, do we need to check if there is only one filesystem or a comment with reasoning for Items[0]

rookClient: rookclient.NewForConfigOrDie(opts.Kubeconfig),
cephClusterNamespace: opts.AllowedNamespaces[0],
cephAuthNamespace: opts.CephAuthNamespace,
monitorConfig: cephMonitorConfig{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think cephMonitorConfig is never initialised, we need to run initCeph for it to be initialised

Comment on lines +487 to +488
ocs_cephfs_pv_count != ocs_cephfs_subvolume_count and on()
(hour() == 0 and day_of_week() == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want this alert to only run on Monday 0:00?
this might also not run, because the condition of hour()==0 will only be true for 60 minutes
but our requirements for the alert is the condition should be true for 6h

Comment on lines +172 to +177
subvolGroupCounts, err := c.runCephfsSubvolumeCountFn(c.monitorConfig, c.rookClient, c.cephClusterNamespace)
if err != nil {
klog.Errorf("failed to get CephFS subvolume counts during Replace: %v", err)
} else {
klog.Infof("CephFS subvolumegroup counts during Replace: %v", subvolGroupCounts)
c.CephFSSubvolumeCountMap = subvolGroupCounts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collect function readsCephFSSubvolumeCountMap with RLock but we don't write with a lock acquired

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants