What happened:
We're running into issues with pods being stuck on ContainerCreating state with the error:
corporate-ohayg 11m Warning FailedMount pod/web-6c5dd4d7b7-g2k2w MountVolume.MountDevice failed for volume "pv-corporate-ohayg-nfs-web" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name nfs.csi.k8s.io not found in the list of registered CSI drivers
we check the logs of the csn-nfs-node-xxxx pod running on the same node as the above pod is scheduled on:
thenuja.viknarajah@LA-1215 ~ (qa-london:kube-system) ❯ kubectl logs csi-nfs-node-pkhbh --all-containers
I0122 15:22:11.223866 1 main.go:137] "Calling CSI driver to discover driver name"
I0122 15:22:11.226265 1 main.go:145] "CSI driver name" driver="nfs.csi.k8s.io"
I0122 15:22:11.226309 1 main.go:174] "ServeMux listening" address="localhost:29653"
E0122 15:22:55.667666 1 main.go:68] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0122 15:23:25.667140 1 main.go:68] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0122 15:23:55.668176 1 main.go:68] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0122 15:24:25.667039 1 main.go:68] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0122 15:24:55.668143 1 main.go:68] "Failed to establish connection to CSI driver" err="context deadline exceeded"
I0122 15:22:07.346639 1 main.go:154] "Version" version="v2.15.0"
I0122 15:22:07.346713 1 main.go:155] "Running node-driver-registrar" mode=""
I0122 15:22:07.346717 1 main.go:176] "Attempting to open a gRPC connection" csiAddress="/csi/csi.sock"
I0122 15:22:11.205084 1 main.go:184] "Calling CSI driver to discover driver name"
I0122 15:22:11.207641 1 main.go:193] "CSI driver name" csiDriverName="nfs.csi.k8s.io"
I0122 15:22:11.207679 1 node_register.go:56] "Starting Registration Server" socketPath="/registration/nfs.csi.k8s.io-reg.sock"
I0122 15:22:11.207834 1 node_register.go:66] "Registration Server started" socketPath="/registration/nfs.csi.k8s.io-reg.sock"
I0122 15:22:11.207909 1 node_register.go:96] "Skipping HTTP server"
I0122 15:24:55.935771 1 nfs.go:90] Driver: nfs.csi.k8s.io version: v4.12.1
I0122 15:24:55.936331 1 nfs.go:147]
DRIVER INFORMATION:
-------------------
Build Date: "2025-10-13T14:06:17Z"
Compiler: gc
Driver Name: nfs.csi.k8s.io
Driver Version: v4.12.1
Git Commit: ""
Go Version: go1.24.6
Platform: linux/amd64
Streaming logs below:
I0122 15:24:55.940258 1 mount_linux.go:334] Detected umount with safe 'not mounted' behavior
I0122 15:24:55.940612 1 server.go:117] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
the following logs lines are missing:
I0122 10:45:05.376383 1 main.go:99] "Received GetInfo call" request=""
I0122 10:45:05.407652 1 main.go:111] "Received NotifyRegistrationStatus call" status="plugin_registered:true"
on the bottlerocket nodes of such cases, we're seeing that:
csi.sock exists in the plugin dir
bash-5.1# ls -la /var/lib/kubelet/plugins/csi-nfsplugin/
total 0
drwxr-xr-x. 2 root root 22 Jan 22 10:45 .
drwxr-xr-x. 7 root root 108 Jan 22 10:13 ..
srwxr-xr-x. 1 root root 0 Jan 22 10:45 csi.sock
however,
can't find nfs.csi.k8s.io-reg.sock in the plugin registry dir
bash-5.1# ls -la /var/lib/kubelet/plugins_registry/
total 4
drwxr-x---. 2 root root 38 Jan 22 09:51 .
drwxr-xr-x. 11 root root 4096 Jan 22 10:10 ..
srwx------. 1 root root 0 Jan 22 09:51 ebs.csi.aws.com-reg.sock
it works fine on restart, however the restart should be automated somehow and node_register shouldn't be starting up if it hasn't successfully registered the plugin. we're unsure why this is happening in our qa eks cluster. doesn't happen at all in our prod clusters and they're both running the same version of nfs csi drivers and the same version of eks with the same helm configuration.
Environment:
- CSI Driver version: v4.12.1
- Kubernetes version (use
kubectl version):
Client Version: v1.34.1
Kustomize Version: v5.7.1
Server Version: v1.33.5-eks-3025e55
- OS (e.g. from /etc/os-release): bottlerocket (bottlerocket-aws-k8s-1.33-x86_64-v1.41.0-bc3ad241)
What happened:
We're running into issues with pods being stuck on ContainerCreating state with the error:
we check the logs of the csn-nfs-node-xxxx pod running on the same node as the above pod is scheduled on:
the following logs lines are missing:
on the bottlerocket nodes of such cases, we're seeing that:
csi.sock exists in the plugin dir
however,
can't find
nfs.csi.k8s.io-reg.sockin the plugin registry dirit works fine on restart, however the restart should be automated somehow and node_register shouldn't be starting up if it hasn't successfully registered the plugin. we're unsure why this is happening in our qa eks cluster. doesn't happen at all in our prod clusters and they're both running the same version of nfs csi drivers and the same version of eks with the same helm configuration.
Environment:
kubectl version):