Skip to content

Conversation

@zgzzbws
Copy link

@zgzzbws zgzzbws commented Nov 5, 2025

What changes were proposed in this pull request?

This PR adds support for mounting volumes using CSI drivers.

Why are the changes needed?

As noted in this issue, the Kubernetes cluster operates in a multi-tenant environment and cluster-wide changes cannot be made when deploying applications. Besides, the application requires a static shared file system, but solutions like hostPath are unavailable due to lack of control over the underlying VMs, and PersistentVolumeClaim (PVC) is unsuitable because provisioning a PersistentVolume (PV) requires cluster-wide changes. Additionally, security policies preclude the use of NFS. Therefore, a CSI solution is necessary.

Additionally, by adding CSI driver support, Spark running on k8s can now dynamically and on-demand leverage diverse storage services (such as high-performance SSDs, low-cost HDDs, etc.) provided by the underlying infrastructure. This enhances platform independence and simplifies operations and maintenance.

Does this PR introduce any user-facing change?

Users can now using CSI driver by adding configs like:

spark-submit \
--conf spark.kubernetes.executor.volumes.csiVolumeClaim.[VolumeName].mount.path=/mnt/disk  \
--conf spark.kubernetes.executor.volumes.csiVolumeClaim.[VolumeName].csiDriverName=file.csi.azure.com \
--conf spark.kubernetes.executor.volumes.csiVolumeClaim.[VolumeName].options.shareName=EXISTING_SHARE_NAME \
--conf spark.kubernetes.executor.volumes.csiVolumeClaim.[VolumeName].options.secretName=azure-secret \
...

How was this patch tested?

add ut test

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @zgzzbws .

May I ask why you don't use the existing PVC features by defining storageClass instead of this PR? I'm wondering if the existing PVC feature is insufficient your use cases.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-storage
provisioner: your-csi-name
parameters:
  type: ssd
  iops: "1000"
reclaimPolicy: Delete
volumeBindingMode: Immediate

@zgzzbws
Copy link
Author

zgzzbws commented Nov 6, 2025

Thank you for reviewing the PR. @dongjoon-hyun

The issue describes a case that requires using CSI to mount volumes.

Additionally, the existing PVC feature relies on pre-provisioned PersistentVolumeClaims. Pre-creating PVCs for each job introduces operational complexities (cleanup, naming collisions, etc.). This PR enables Spark to handle provisioning dynamically during pod startup.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 6, 2025

According to your point, do you mean this? If it's correct, please put this into your PR description. It will become a commit log permanently which helps all downstream understand what is the main contribution of your commit.

In our case, Kubernetes cluster is multi-tenant and we cannot make cluster-wide changes when deploying our application to the Kubernetes cluster. Our application requires static shared file system. So, we cannot use hostPath (don't have control of hosting VMs) and persistentVolumeClaim (requires cluster-wide change when deploying PV). Our security department does not allow nfs.

@zgzzbws
Copy link
Author

zgzzbws commented Nov 7, 2025

Got it, I’ve updated the PR description as suggested. @dongjoon-hyun

According to your point, do you mean this? If it's correct, please put this into your PR description. It will become a commit log permanently which helps all downstream understand what is the main contribution of your commit.

@dongjoon-hyun
Copy link
Member

Thank you. BTW, since this item requires more extensive tests for all public cloud services and internal custom storage, I added this item as a subtask of Apache Spark 4.2.0 (SPARK-54137) to give the community enough time to validate.

@dongjoon-hyun
Copy link
Member

Personally, I'm a little busy for preparing Apache Spark 4.1.0 as the release manager. I'll revisit this as soon as possible I can. Maybe, other committers can help you first.

@zgzzbws
Copy link
Author

zgzzbws commented Nov 7, 2025

Could you review this PR when you have some time, @viirya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants