-
Notifications
You must be signed in to change notification settings - Fork 42.1k
Fix volume reconstruction and add e2e tests #75071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @mkimuram. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/ok-to-test |
|
BTW, CSI Reconstruction is flaky in general: #72500 |
|
/retest |
|
Is it possible to manually arrange tests in this PR, because they aren't tested by regular CI, due to the fact that these tests have Note that below three tests are added for both filesystem and block volume in this PR. For filesystem, all tests passed in my environment. (As they almost just call existing tests from framework.) Can this PR go ahead by disabling these failing tests for another PR that fix #74914 to utilize this test? |
|
/test pull-kubernetes-e2e-gce-csi-serial Is an optional job that runs serial CSI tests |
I found that the path is eventually deleted if manually checked after the test failure. Therefore, we might at least reduce the flake by adding rather longer wait before here. (In my environment, test with iscsi block volume fails more often in non-force delete case than in force delete case, so we might just need ~30sec wait in general and even more for CSI as a workaround?) |
|
In the graceful delete case, kubelet should wait until volume manager finishes unmounting the volume before deleting the Pod object. So we shouldn't need to add a timeout for the graceful deletion case. |
|
/test pull-kubernetes-e2e-gce-csi-serial |
|
/test pull-kubernetes-e2e-gce-csi-serial |
|
/assign @bswartz |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, mkimuram, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| var volumeMapper volumepkg.BlockVolumeMapper | ||
| if utilfeature.DefaultFeatureGate.Enabled(features.BlockVolume) && volume.volumeMode == v1.PersistentVolumeBlock { | ||
| var newMapperErr error | ||
| if mapperPlugin != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not your change, but I don't see the possibility of mapperPlugin is nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment.
You're right that this path won't be reached, beause ReconstructVolumeOperation returns error when mapperPlugin == nil just after FindMapperPluginByName returns nil for mapperPlugin when plugin doesn't support block volume.
However, it doesn't seem to be so obvious, because it depends on other function's implementation.
In this case, do you mean to delete this check here or any other ideas for improve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just seems strange to check this quite late here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess @jingxu97 points checkPath is always set if we can remove if mapperPlugin != nil { condition and that would make CheckVolumeExistenceOperation() call safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| // Check existence of mount point for filesystem volume or symbolic link for block volume | ||
| isExist, checkErr := rc.operationExecutor.CheckVolumeExistenceOperation(volumeSpec, volume.volumePath, volumeSpec.Name(), rc.mounter, uniqueVolumeName, volume.podName, pod.UID, attachablePlugin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volume.volumePath is used incorrectly here. Before this PR, volumeMounter.GetPath() was used. This breaks CSI reconstruction, as CheckVolumeExistenceOperation checks the pod directory [1] and not the directory where the volume should be mounted [2]
1: /var/lib/kubelet/pods/4ec8e3aa-f83e-4d49-96ea-3859f09f7dd7/volumes/kubernetes.io~csi/pvc-873e54a8-a2c1-4872-8a3c-b68b71a81e42
2: /var/lib/kubelet/pods/4ec8e3aa-f83e-4d49-96ea-3859f09f7dd7/volumes/kubernetes.io~csi/pvc-873e54a8-a2c1-4872-8a3c-b68b71a81e42/mount
|
Oh, I should have read through the original code, too. Sorry for that. I've re-organized the commits to move the fix into the right commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in MountedVolume definition, we can add the comment to make it clear that mounter is required for file system volumes, but not required for BlockVolumes, and BlockVolumeMapper is the opposite.
| return nil, fmt.Errorf("Volume: %q is not mounted", uniqueVolumeName) | ||
| } | ||
| var volumeMapper volumepkg.BlockVolumeMapper | ||
| var volumeMounter volumepkg.Mounter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you indicate this change into the PR's title and comment since this is a code change, not just test change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review.
I've changed the PR's title and comments and added comments in MountedVolume definition.
(New code change is merged to "Don't create mounter when reconstructing block volume" commit.)
PTAL
| var volumeMapper volumepkg.BlockVolumeMapper | ||
| if utilfeature.DefaultFeatureGate.Enabled(features.BlockVolume) && volume.volumeMode == v1.PersistentVolumeBlock { | ||
| var newMapperErr error | ||
| if mapperPlugin != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess @jingxu97 points checkPath is always set if we can remove if mapperPlugin != nil { condition and that would make CheckVolumeExistenceOperation() call safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change here?
I cannot find the reason from the PR message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
framework.CreateNginxPod can't handle block volume and inline volume, but framework.CreateSecPod can.
Current tests in generic_persistent_volume-disruptive.go doesn't actually test them, but it would be good to use the similar logic to testsuites/disruptive.go to prepare for future imporvement and maintenance.
test/e2e/storage/utils/utils.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can replace here with framework.ExpectEqual(result.Code, 0, fmt.Sprintf("Expected grep exit code of 0, got %d", result.Code))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed as suggested.
test/e2e/storage/utils/utils.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove if err != nil { because of framework.ExpectNoError(err, "Expected pod to be not found.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed as suggested.
CSI mounter will create a new directory + json for a filesystem volume, leading to even more orphaned files/directories.
4cd0029 to
c130b77
Compare
mkimuram
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment. I've fixed as suggested. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
framework.CreateNginxPod can't handle block volume and inline volume, but framework.CreateSecPod can.
Current tests in generic_persistent_volume-disruptive.go doesn't actually test them, but it would be good to use the similar logic to testsuites/disruptive.go to prepare for future imporvement and maintenance.
test/e2e/storage/utils/utils.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed as suggested.
test/e2e/storage/utils/utils.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed as suggested.
|
/retest |
|
thanks for the work! /lgtm |
|
Just reformating Jing's lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR fixes a bug that volume reconstruction does not work properly for csi, by failure in handling filesystem volume and block volume with a proper logic. To get the path to check for existence of volume,
Mounter'sGetPath()should be used for file system andBlockVolumeMapper'sGetPodDeviceMapPath()should be used for block volume. In addition,Mountershould be created only for file system volume andBlockVolumeMappershould be created only for block volume, because csi plugin creates a new directory + json onMounterandBlockVolumeMappercreation, leading to even more orphaned files/directories.(This PR was originally intended to add e2e tests for block volume as discussed in #74545 , and found the bug and fix it.)
Which issue(s) this PR fixes:
Special notes for your reviewer:
/sig storage
cc @bswartz @wongma7 @msau42
Does this PR introduce a user-facing change?: