Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron#3991
Conversation
00f6e20 to
19bd66f
Compare
akalenyu
left a comment
There was a problem hiding this comment.
Apologies for not being present in the technical discussion, really appreciate the effort on this! and ofc I understand that I am outnumbered here.
It just feels wrong to me to have all sorts of backwards (essentially) APIs in CDI to work around this
| // DataImportCronAccessModesByProvisionerKey defines required access modes for DataImportCron PVCs | ||
| // Some provisioners require specific access modes for DataImportCron-created PVCs | ||
| var DataImportCronAccessModesByProvisionerKey = map[string][]v1.PersistentVolumeAccessMode{ | ||
| "pd.csi.storage.gke.io": {rwo}, | ||
| "pd.csi.storage.gke.io/hyperdisk": {rwo}, | ||
| } | ||
|
|
||
| // DataImportCronSnapshotClassParametersByProvisionerKey defines required VolumeSnapshotClass parameters for DataImportCron. | ||
| // Some provisioners require specific parameters in the VolumeSnapshotClass for DataImportCron snapshots. | ||
| var DataImportCronSnapshotClassParametersByProvisionerKey = map[string]map[string]string{ | ||
| // https://docs.cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/backup-pd-volume-snapshots#restore-snapshot | ||
| "pd.csi.storage.gke.io": { | ||
| "snapshot-type": "images", | ||
| }, | ||
| "pd.csi.storage.gke.io/hyperdisk": { | ||
| "snapshot-type": "images", | ||
| }, | ||
| } |
There was a problem hiding this comment.
To me, the fact that this degree of handling is required in CDI screams
that for that provider, there is a need for a separate storage class for cron purposes.
That provider would then represent itself internally as
"pd.csi.storage.gke.io/hyperdisk-crons": {{rwo, block}},somewhat similar to
but I think this can remain vague like "rwo" since that is really the only capability of their golden image storage classAnd we would be using the "correct" volumesnapshotclass via storageProfile.Status.SnapshotClass
(the special storage class would hint us which one to choose via snapclass parameter or something)
There was a problem hiding this comment.
Thanks Alex, I completely agree with that approach.. tbh it would eliminate the need for most of these changes.
Trying to avoid adding an extra SC by forcing two VSCs to work with a single SC is too problematic.
we need to see how to reflect that need to create that dedicated SC for golden images.
regardless, working on your suggested changes asap :)
There was a problem hiding this comment.
we need to see how to reflect that need to create that dedicated SC for golden images.
yeah that's the tough bit. I am not against adding convenience APIs under HCO for example if that helps them
(hco.spec field instead of going over each cron and adding a storage class name)
|
/cc @awels @arnongilboa @Acedus |
|
If we absolutely MUST go this path then I think we should start discussing an API extension for StorageProfiles; |
+1, CDI should remain as oblivious as possible to quirks of the storage providers it has to work with, this is no exception. |
19bd66f to
d406174
Compare
5c9a8c0 to
432b4e8
Compare
432b4e8 to
dff9273
Compare
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: akalenyu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@arnongilboa please lgtm if you're happy 🙏 |
| return false, err | ||
| } | ||
| actualVSC := snapshot.Spec.VolumeSnapshotClassName | ||
| if desiredVSC == nil || ptr.Equal(actualVSC, desiredVSC) { |
There was a problem hiding this comment.
why ptr.Equal() and not reflect.DeepEqual()?
|
|
||
| // getSnapshotClassForDataImportCron returns the VolumeSnapshotClass name to use for DataImportCron snapshots. | ||
| func (r *DataImportCronReconciler) getSnapshotClassForDataImportCron(pvc *corev1.PersistentVolumeClaim, storageProfile *cdiv1.StorageProfile) (*string, error) { | ||
| if storageProfile.Annotations != nil { |
There was a problem hiding this comment.
don't need to check != nil
| Expect(dv.Spec.Storage.AccessModes).To(BeEmpty()) | ||
| }) | ||
|
|
||
| It("Should apply RWO access mode to PVC spec when using PVC instead of Storage", func() { |
There was a problem hiding this comment.
Missed that point. Are you sure we want this behavior? afaik we shouldn't allow PVC without accessMode and should also not override it when it's set.
84c2375 to
56086e4
Compare
|
/retest |
56086e4 to
d923fb0
Compare
| if snapshot.Spec.VolumeSnapshotClassName != nil { | ||
| actualVSC = *snapshot.Spec.VolumeSnapshotClassName | ||
| } | ||
| if desiredVSC == "" || actualVSC == desiredVSC { |
There was a problem hiding this comment.
shouldn't return true when actualVSC != "" && desiredVSC == "" ?
There was a problem hiding this comment.
yes, but why delete a working snapshot when no VSC is available?
There was a problem hiding this comment.
When VSC is not available is says you can't create a snapshot, so not sure about keeping an existing one. @akalenyu ?
There was a problem hiding this comment.
yeah it probably won't be able to restore from the volumesnap, but we don't handle this case today prior to this PR. I don't think it's much of a concern if everything converges eventually when it becomes available
… annotations Some storage provisioners have specific requirements when creating boot source snapshots. For example, GKE Persistent Disk needs a VolumeSnapshotClass with snapshot-type: images parameter, and snapshots must be taken from an RWO PVC. This change makes CDI automatically detect these provisioner-specific needs and configure DataImportCron accordingly. The StorageProfile controller now reconciles two new annotations based on the underlying provisioner: cdi.kubevirt.io/useReadWriteOnceForDataImportCron signals that RWO should be used for access mode, and cdi.kubevirt.io/snapshotClassForDataImportCron specifies which VolumeSnapshotClass to use for boot source snapshots. The DataImportCron controller watches these annotations and ensures the correct VolumeSnapshotClass and access mode are used. Signed-off-by: Noam Assouline <[email protected]>
d923fb0 to
b75c27d
Compare
|
/test pull-containerized-data-importer-e2e-upg |
|
/test pull-containerized-data-importer-e2e-nfs I believe we're waiting for @arnongilboa lgtm |
|
/lgtm |
|
/test pull-containerized-data-importer-e2e-ceph-wffc |
|
/override pull-containerized-data-importer-e2e-ceph-wffc |
|
@akalenyu: Overrode contexts on behalf of akalenyu: pull-containerized-data-importer-e2e-ceph-wffc DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherrypick release-v1.64 |
|
@akalenyu: new pull request created: #4018 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
1. thanks to kubevirt/containerized-data-importer#3991 sp-balanced-storage-rwo is no longer needed. 2. adding onlineResize and WFFC capabilities Signed-off-by: Noam Assouline <[email protected]>
1. thanks to kubevirt/containerized-data-importer#3991 sp-balanced-storage-rwo is no longer needed. 2. adding onlineResize and WFFC capabilities Signed-off-by: Noam Assouline <[email protected]>
Add support for provisioner-specific requirements when creating snapshots and PVCs for DataImportCron. Some provisioners have specific needs:
What this PR does / why we need it:
Standard GCP snapshots (using pd.csi.storage.gke.io) are limited to 6 restores per hour per snapshot. Using a VolumeSnapshotClass with
snapshot-type: imagesenables unlimited restores, but these snapshots cannot be created from ReadWriteMany (RWX) PVCs. More details can be found hereThis PR adds provisioner-aware DataImportCron configuration via StorageProfile annotations:
StorageProfile Controller: Automatically detects provisioner requirements and sets:
cdi.kubevirt.io/useReadWriteOnceForDataImportCron: Signals RWO access mode for DataImportCron PVCs when not explicitly configured.Automatically populated for
pd.csi.storage.gke.io/*driver.cdi.kubevirt.io/snapshotClassForDataImportCron: Specifies the VolumeSnapshotClass name (auto-discovers matching VSC with required parameters)DataImportCron Controller:
This ensures snapshot creation succeeds without restore rate limits for GKE, while still allowing the final volume to be restored with the desired access mode. The approach is extensible - new provisioner requirements can be added by updating the storage capabilities configuration.
Which issue(s) this PR fixes:
Jira: https://issues.redhat.com/browse/CNV-73302
Release note: