-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-47010][K8S] Support csi driver for volume type #52896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @zgzzbws .
May I ask why you don't use the existing PVC features by defining storageClass instead of this PR? I'm wondering if the existing PVC feature is insufficient your use cases.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-storage
provisioner: your-csi-name
parameters:
type: ssd
iops: "1000"
reclaimPolicy: Delete
volumeBindingMode: Immediate
|
Thank you for reviewing the PR. @dongjoon-hyun The issue describes a case that requires using CSI to mount volumes. Additionally, the existing PVC feature relies on pre-provisioned PersistentVolumeClaims. Pre-creating PVCs for each job introduces operational complexities (cleanup, naming collisions, etc.). This PR enables Spark to handle provisioning dynamically during pod startup. |
|
According to your point, do you mean this? If it's correct, please put this into your PR description. It will become a commit log permanently which helps all downstream understand what is the main contribution of your commit.
|
|
Got it, I’ve updated the PR description as suggested. @dongjoon-hyun
|
|
Thank you. BTW, since this item requires more extensive tests for all public cloud services and internal custom storage, I added this item as a subtask of Apache Spark 4.2.0 (SPARK-54137) to give the community enough time to validate. |
|
Personally, I'm a little busy for preparing Apache Spark 4.1.0 as the release manager. I'll revisit this as soon as possible I can. Maybe, other committers can help you first. |
|
Could you review this PR when you have some time, @viirya |
What changes were proposed in this pull request?
This PR adds support for mounting volumes using CSI drivers.
Why are the changes needed?
As noted in this issue, the Kubernetes cluster operates in a multi-tenant environment and cluster-wide changes cannot be made when deploying applications. Besides, the application requires a static shared file system, but solutions like hostPath are unavailable due to lack of control over the underlying VMs, and PersistentVolumeClaim (PVC) is unsuitable because provisioning a PersistentVolume (PV) requires cluster-wide changes. Additionally, security policies preclude the use of NFS. Therefore, a CSI solution is necessary.
Additionally, by adding CSI driver support, Spark running on k8s can now dynamically and on-demand leverage diverse storage services (such as high-performance SSDs, low-cost HDDs, etc.) provided by the underlying infrastructure. This enhances platform independence and simplifies operations and maintenance.
Does this PR introduce any user-facing change?
Users can now using CSI driver by adding configs like:
How was this patch tested?
add ut test
Was this patch authored or co-authored using generative AI tooling?
No