-
Notifications
You must be signed in to change notification settings - Fork 189
metrics-exporter: add new metrics and alert to check for stale subvolumes #3570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
metrics-exporter: add new metrics and alert to check for stale subvolumes #3570
Conversation
This commit introduces two new metrics- the total number of cephfs pv and the total number of subvolumes in a cephfilesystem. Signed-off-by: ShravaniVangur <[email protected]>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ShravaniVangur The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If total number of pvs and subvolumes do not match, the alert gets triggered. Signed-off-by: ShravaniVangur <[email protected]>
| return nil, fmt.Errorf("no CephFilesystem CRs found") | ||
| } | ||
|
|
||
| fsName := cephFilesystems.Items[0].Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the fsName is of the first filesystem, do we need to check if there is only one filesystem or a comment with reasoning for Items[0]
| rookClient: rookclient.NewForConfigOrDie(opts.Kubeconfig), | ||
| cephClusterNamespace: opts.AllowedNamespaces[0], | ||
| cephAuthNamespace: opts.CephAuthNamespace, | ||
| monitorConfig: cephMonitorConfig{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think cephMonitorConfig is never initialised, we need to run initCeph for it to be initialised
| ocs_cephfs_pv_count != ocs_cephfs_subvolume_count and on() | ||
| (hour() == 0 and day_of_week() == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want this alert to only run on Monday 0:00?
this might also not run, because the condition of hour()==0 will only be true for 60 minutes
but our requirements for the alert is the condition should be true for 6h
| subvolGroupCounts, err := c.runCephfsSubvolumeCountFn(c.monitorConfig, c.rookClient, c.cephClusterNamespace) | ||
| if err != nil { | ||
| klog.Errorf("failed to get CephFS subvolume counts during Replace: %v", err) | ||
| } else { | ||
| klog.Infof("CephFS subvolumegroup counts during Replace: %v", subvolGroupCounts) | ||
| c.CephFSSubvolumeCountMap = subvolGroupCounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collect function readsCephFSSubvolumeCountMap with RLock but we don't write with a lock acquired
This pr introduces two new metrics- the total number of cephfs pv and the total number of subvolumes in a cephfilesystem.
This pr also introduces an alert which gets triggered when stale subvolumes are present.
Ref: https://issues.redhat.com/browse/RHSTOR-7531
//Todo: Addition of test files.