Skip to content

Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron#3991

Merged
kubevirt-bot merged 1 commit into
kubevirt:mainfrom
noamasu:provisioner-aware-vsc-selection-for-dic
Jan 26, 2026
Merged

Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron#3991
kubevirt-bot merged 1 commit into
kubevirt:mainfrom
noamasu:provisioner-aware-vsc-selection-for-dic

Conversation

@noamasu
Copy link
Copy Markdown
Collaborator

@noamasu noamasu commented Dec 23, 2025

Add support for provisioner-specific requirements when creating snapshots and PVCs for DataImportCron. Some provisioners have specific needs:

  • GKE Persistent Disk requires snapshot-type: images parameter in VSC for DataImportCron snapshots
  • GKE Persistent Disk requires RWO access mode for DataImportCron PVCs

What this PR does / why we need it:

Standard GCP snapshots (using pd.csi.storage.gke.io) are limited to 6 restores per hour per snapshot. Using a VolumeSnapshotClass with snapshot-type: images enables unlimited restores, but these snapshots cannot be created from ReadWriteMany (RWX) PVCs. More details can be found here

This PR adds provisioner-aware DataImportCron configuration via StorageProfile annotations:

  • StorageProfile Controller: Automatically detects provisioner requirements and sets:

    • cdi.kubevirt.io/useReadWriteOnceForDataImportCron: Signals RWO access mode for DataImportCron PVCs when not explicitly configured.
      Automatically populated for pd.csi.storage.gke.io/* driver.
    • cdi.kubevirt.io/snapshotClassForDataImportCron: Specifies the VolumeSnapshotClass name (auto-discovers matching VSC with required parameters)
  • DataImportCron Controller:

    • Applies RWO access mode to DataVolume PVCs when annotation is set and no access modes are configured (preserves existing configurations)
    • Uses the specified VolumeSnapshotClass from annotation when creating snapshots
    • edit: The Controller monitors these annotations for changes and triggers re-creation automatically to ensure the correct VSC is used.

This ensures snapshot creation succeeds without restore rate limits for GKE, while still allowing the final volume to be restored with the desired access mode. The approach is extensible - new provisioner requirements can be added by updating the storage capabilities configuration.

Which issue(s) this PR fixes:

Jira: https://issues.redhat.com/browse/CNV-73302

Release note:

Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Dec 23, 2025
@kubevirt-bot kubevirt-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L labels Dec 23, 2025
@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from 00f6e20 to 19bd66f Compare December 23, 2025 13:30
@noamasu
Copy link
Copy Markdown
Collaborator Author

noamasu commented Dec 23, 2025

/cc @arnongilboa @akalenyu

@noamasu noamasu changed the title Add provisioner-aware VolumeSnapshotClass selection for DataImportCron WIP: Add provisioner-aware VolumeSnapshotClass selection for DataImportCron Dec 23, 2025
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2025
Copy link
Copy Markdown
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not being present in the technical discussion, really appreciate the effort on this! and ofc I understand that I am outnumbered here.

It just feels wrong to me to have all sorts of backwards (essentially) APIs in CDI to work around this

Comment on lines +210 to +227
// DataImportCronAccessModesByProvisionerKey defines required access modes for DataImportCron PVCs
// Some provisioners require specific access modes for DataImportCron-created PVCs
var DataImportCronAccessModesByProvisionerKey = map[string][]v1.PersistentVolumeAccessMode{
"pd.csi.storage.gke.io": {rwo},
"pd.csi.storage.gke.io/hyperdisk": {rwo},
}

// DataImportCronSnapshotClassParametersByProvisionerKey defines required VolumeSnapshotClass parameters for DataImportCron.
// Some provisioners require specific parameters in the VolumeSnapshotClass for DataImportCron snapshots.
var DataImportCronSnapshotClassParametersByProvisionerKey = map[string]map[string]string{
// https://docs.cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/backup-pd-volume-snapshots#restore-snapshot
"pd.csi.storage.gke.io": {
"snapshot-type": "images",
},
"pd.csi.storage.gke.io/hyperdisk": {
"snapshot-type": "images",
},
}
Copy link
Copy Markdown
Collaborator

@akalenyu akalenyu Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the fact that this degree of handling is required in CDI screams
that for that provider, there is a need for a separate storage class for cron purposes.

That provider would then represent itself internally as

"pd.csi.storage.gke.io/hyperdisk-crons": {{rwo, block}},

somewhat similar to

if sc.Parameters["migratable"] == "true" {
but I think this can remain vague like "rwo" since that is really the only capability of their golden image storage class

And we would be using the "correct" volumesnapshotclass via storageProfile.Status.SnapshotClass
(the special storage class would hint us which one to choose via snapclass parameter or something)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex, I completely agree with that approach.. tbh it would eliminate the need for most of these changes.
Trying to avoid adding an extra SC by forcing two VSCs to work with a single SC is too problematic.

we need to see how to reflect that need to create that dedicated SC for golden images.

regardless, working on your suggested changes asap :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to see how to reflect that need to create that dedicated SC for golden images.

yeah that's the tough bit. I am not against adding convenience APIs under HCO for example if that helps them
(hco.spec field instead of going over each cron and adding a storage class name)

@akalenyu
Copy link
Copy Markdown
Collaborator

/cc @awels @arnongilboa @Acedus
I am totally ready to get challenged on this 🙏

@kubevirt-bot kubevirt-bot requested review from Acedus and awels December 23, 2025 13:56
@akalenyu
Copy link
Copy Markdown
Collaborator

If we absolutely MUST go this path then I think we should start discussing an API extension for StorageProfiles;
Either we introduce a new overload for dataImportCronSourceFormat or a new field entirely, something along the lines of
dataImportCronSourceSpec

@Acedus
Copy link
Copy Markdown
Contributor

Acedus commented Dec 23, 2025

If we absolutely MUST go this path then I think we should start discussing an API extension for StorageProfiles; Either we introduce a new overload for dataImportCronSourceFormat or a new field entirely, something along the lines of dataImportCronSourceSpec

+1, CDI should remain as oblivious as possible to quirks of the storage providers it has to work with, this is no exception.

@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from 19bd66f to d406174 Compare January 14, 2026 12:29
@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch 2 times, most recently from 5c9a8c0 to 432b4e8 Compare January 14, 2026 14:34
@kubevirt-bot kubevirt-bot added size/L and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL labels Jan 14, 2026
@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from 432b4e8 to dff9273 Compare January 14, 2026 14:36
@coveralls
Copy link
Copy Markdown

coveralls commented Jan 14, 2026

Coverage Status

coverage: 49.485% (+0.03%) from 49.454%
when pulling b75c27d on noamasu:provisioner-aware-vsc-selection-for-dic
into f064590 on kubevirt:main.

@noamasu noamasu changed the title WIP: Add provisioner-aware VolumeSnapshotClass selection for DataImportCron Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron Jan 14, 2026
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2026
@akalenyu
Copy link
Copy Markdown
Collaborator

/approve

@kubevirt-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akalenyu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2026
@akalenyu
Copy link
Copy Markdown
Collaborator

/retest
unrelated. nfs lane really a pain. ooms in CI

@akalenyu
Copy link
Copy Markdown
Collaborator

@arnongilboa please lgtm if you're happy 🙏

return false, err
}
actualVSC := snapshot.Spec.VolumeSnapshotClassName
if desiredVSC == nil || ptr.Equal(actualVSC, desiredVSC) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ptr.Equal() and not reflect.DeepEqual()?


// getSnapshotClassForDataImportCron returns the VolumeSnapshotClass name to use for DataImportCron snapshots.
func (r *DataImportCronReconciler) getSnapshotClassForDataImportCron(pvc *corev1.PersistentVolumeClaim, storageProfile *cdiv1.StorageProfile) (*string, error) {
if storageProfile.Annotations != nil {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to check != nil

Expect(dv.Spec.Storage.AccessModes).To(BeEmpty())
})

It("Should apply RWO access mode to PVC spec when using PVC instead of Storage", func() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that point. Are you sure we want this behavior? afaik we shouldn't allow PVC without accessMode and should also not override it when it's set.

@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from 84c2375 to 56086e4 Compare January 22, 2026 13:02
@noamasu
Copy link
Copy Markdown
Collaborator Author

noamasu commented Jan 22, 2026

/retest

@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from 56086e4 to d923fb0 Compare January 22, 2026 13:41
if snapshot.Spec.VolumeSnapshotClassName != nil {
actualVSC = *snapshot.Spec.VolumeSnapshotClassName
}
if desiredVSC == "" || actualVSC == desiredVSC {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't return true when actualVSC != "" && desiredVSC == "" ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but why delete a working snapshot when no VSC is available?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When VSC is not available is says you can't create a snapshot, so not sure about keeping an existing one. @akalenyu ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it probably won't be able to restore from the volumesnap, but we don't handle this case today prior to this PR. I don't think it's much of a concern if everything converges eventually when it becomes available

… annotations

Some storage provisioners have specific requirements when creating boot source snapshots. For example, GKE Persistent Disk needs a VolumeSnapshotClass with snapshot-type: images parameter, and snapshots must be taken from an RWO PVC.
This change makes CDI automatically detect these provisioner-specific needs and configure DataImportCron accordingly.
The StorageProfile controller now reconciles two new annotations based on the underlying provisioner: cdi.kubevirt.io/useReadWriteOnceForDataImportCron signals that RWO should be used for access mode, and cdi.kubevirt.io/snapshotClassForDataImportCron specifies which VolumeSnapshotClass to use for boot source snapshots.
The DataImportCron controller watches these annotations and ensures the correct VolumeSnapshotClass and access mode are used.

Signed-off-by: Noam Assouline <[email protected]>
@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from d923fb0 to b75c27d Compare January 22, 2026 15:32
@noamasu
Copy link
Copy Markdown
Collaborator Author

noamasu commented Jan 25, 2026

/test pull-containerized-data-importer-e2e-upg

@akalenyu
Copy link
Copy Markdown
Collaborator

/test pull-containerized-data-importer-e2e-nfs
also unrelated

I believe we're waiting for @arnongilboa lgtm

@arnongilboa
Copy link
Copy Markdown
Collaborator

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 26, 2026
@noamasu
Copy link
Copy Markdown
Collaborator Author

noamasu commented Jan 26, 2026

/test pull-containerized-data-importer-e2e-ceph-wffc
[FAIL] Clone Populator tests Clone from Snapshot Fallback to host assisted [It] should finish the clone after creating the source snapshot
this looks like a cluster-level infrastructure problems..
[It] [test_id:10033] with snapshot sources
looks like this one is historically flaky. also setDataImportCronResourceLabels and CopyAllowedLabels calls remain unchanged, i dont see how it is related to my change.

@akalenyu
Copy link
Copy Markdown
Collaborator

/override pull-containerized-data-importer-e2e-ceph-wffc

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@akalenyu: Overrode contexts on behalf of akalenyu: pull-containerized-data-importer-e2e-ceph-wffc

Details

In response to this:

/override pull-containerized-data-importer-e2e-ceph-wffc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubevirt-bot kubevirt-bot merged commit 6d28b9a into kubevirt:main Jan 26, 2026
21 checks passed
@akalenyu
Copy link
Copy Markdown
Collaborator

/cherrypick release-v1.64

@kubevirt-bot
Copy link
Copy Markdown
Contributor

@akalenyu: new pull request created: #4018

Details

In response to this:

/cherrypick release-v1.64

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

noamasu added a commit to noamasu/ocp-virt-validation-checkup that referenced this pull request Mar 17, 2026
1. thanks to kubevirt/containerized-data-importer#3991 sp-balanced-storage-rwo is no longer needed.
2. adding onlineResize and WFFC capabilities

Signed-off-by: Noam Assouline <[email protected]>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ocp-virt-validation-checkup that referenced this pull request Mar 17, 2026
1. thanks to kubevirt/containerized-data-importer#3991 sp-balanced-storage-rwo is no longer needed.
2. adding onlineResize and WFFC capabilities

Signed-off-by: Noam Assouline <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants