Summary
When creating an NFS-based PVC under load, the driver may successfully create the shared folder on DSM but fail to configure the NFS permissions. The error from DSM (Error code:2370) is logged but not propagated to the CSI consumer. As a result, CreateVolume returns success, the PV is bound, but pods cannot mount the volume because the share has no NFS export rules.
This leads to repeated MountVolume.SetUp failed: ... reason given by server: No such file or directory errors that are misleading: the folder exists on DSM, but it's not exported via NFS.
Environment
- synology-csi version: v1.2.1
- DSM version: 7.x
- Kubernetes version:
- CSI sidecars:
- csi-provisioner: v3.0.0
- csi-attacher: v3.3.0
- csi-resizer: v1.3.0
- csi-snapshotter: v8.2.1
- csi-node-driver-registrar: v2.3.0
- Storage protocol: NFS (
csi.storage.k8s.io/fstype: nfs)
- Workload: Kasten K10 backup/restore operations + Helm chart deployments creating multiple PVCs concurrently
Reproduction
- Configure a StorageClass using protocol
nfs against a Synology NAS
- Trigger creation of multiple PVCs in rapid succession (e.g. Helm chart with 5 PVCs, or Kasten restore with multiple volumes)
- Observe controller logs
Observed behavior
Logs from csi-plugin show the share creation succeeds but the privilege configuration fails:
[ERROR] [service/dsm.go:544] [10.20.58.200] Failed to create Volume: rpc error: code = Internal desc = Failed to create share, err: Share system is temporary busy
[ERROR] [driver/utils.go:126] GRPC error: rpc error: code = Internal desc = Couldn't find any host available to create Volume
...
[ERROR] [service/share_volume.go:208] [10.20.58.200] Failed to load share nfs privilege: DSM Api error. Error code:2370
[INFO] [driver/utils.go:128] GRPC response: {"volume":{"capacity_bytes":...,"volume_context":{"baseDir":"/volume1/k8s-csi-pvc-...","protocol":"nfs",...}}}
The GRPC response is a successful CreateVolume reply, even though the privilege setting failed.
On the NAS side, showmount -e <nas-ip> confirms that some shares are not exported (their NFS Permissions tab in DSM is empty), while their folders exist on /volume1/.
The corresponding pod fails to mount with:
MountVolume.SetUp failed for volume "pvc-XXXX" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o nfsvers=4.1 <nas-ip>:/volume1/k8s-csi-pvc-XXXX /var/lib/kubelet/...
Output: mount.nfs: mounting <nas-ip>:/volume1/k8s-csi-pvc-XXXX failed, reason given by server: No such file or directory
Expected behavior
If setSharePrivilege (or equivalent) fails, CreateVolume should:
- Return an error to the CSI client so that the external-provisioner can retry
- Roll back by deleting the orphaned shared folder, so that retries don't accumulate ghost folders on DSM
Impact
- Kasten K10 restores fail unpredictably (some PVCs unmountable)
- Helm chart deployments with multiple PVCs partially fail
- Manual remediation required: editing each affected share on DSM and adding NFS permissions by hand
- DSM accumulates ghost shared folders that exist but are not exported
Workaround applied
We added the following args to the csi-provisioner sidecar to reduce concurrency:
args:
- --worker-threads=1
- --retry-interval-start=10s
- --retry-interval-max=300s
- --timeout=180s
This reduces the frequency of the issue (by serializing CreateVolume calls) but does not eliminate it: the Error 2370 can still occur sporadically and the bug in error handling remains.
Suspected location in code
Based on log line service/share_volume.go:208, the error from setSharePrivilege (or whatever function configures NFS permissions on a newly created share) appears to be logged with [ERROR] but the function returns nil/success to the caller.
A correct fix would either:
- (a) Return the error to abort
CreateVolume, plus rollback the share creation
- (b) Implement a retry loop with backoff specifically for DSM Error 2370 (which is transient: "share system busy"), and only fail after exhausting retries
Happy to provide additional logs or test a proposed fix.
Summary
When creating an NFS-based PVC under load, the driver may successfully create the shared folder on DSM but fail to configure the NFS permissions. The error from DSM (
Error code:2370) is logged but not propagated to the CSI consumer. As a result,CreateVolumereturns success, the PV is bound, but pods cannot mount the volume because the share has no NFS export rules.This leads to repeated
MountVolume.SetUp failed: ... reason given by server: No such file or directoryerrors that are misleading: the folder exists on DSM, but it's not exported via NFS.Environment
csi.storage.k8s.io/fstype: nfs)Reproduction
nfsagainst a Synology NASObserved behavior
Logs from
csi-pluginshow the share creation succeeds but the privilege configuration fails:The
GRPC responseis a successfulCreateVolumereply, even though the privilege setting failed.On the NAS side,
showmount -e <nas-ip>confirms that some shares are not exported (their NFS Permissions tab in DSM is empty), while their folders exist on/volume1/.The corresponding pod fails to mount with:
Expected behavior
If
setSharePrivilege(or equivalent) fails,CreateVolumeshould:Impact
Workaround applied
We added the following args to the
csi-provisionersidecar to reduce concurrency:This reduces the frequency of the issue (by serializing CreateVolume calls) but does not eliminate it: the
Error 2370can still occur sporadically and the bug in error handling remains.Suspected location in code
Based on log line
service/share_volume.go:208, the error fromsetSharePrivilege(or whatever function configures NFS permissions on a newly created share) appears to be logged with[ERROR]but the function returns nil/success to the caller.A correct fix would either:
CreateVolume, plus rollback the share creationHappy to provide additional logs or test a proposed fix.