Skip to content

Conversation

@mkimuram
Copy link
Contributor

@mkimuram mkimuram commented Mar 6, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
This PR fixes a bug that volume reconstruction does not work properly for csi, by failure in handling filesystem volume and block volume with a proper logic. To get the path to check for existence of volume, Mounter 's GetPath() should be used for file system and BlockVolumeMapper's GetPodDeviceMapPath() should be used for block volume. In addition, Mounter should be created only for file system volume and BlockVolumeMapper should be created only for block volume, because csi plugin creates a new directory + json on Mounter and BlockVolumeMapper creation, leading to even more orphaned files/directories.
(This PR was originally intended to add e2e tests for block volume as discussed in #74545 , and found the bug and fix it.)

Which issue(s) this PR fixes:

Special notes for your reviewer:
/sig storage
cc @bswartz @wongma7 @msau42

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 6, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @mkimuram. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 6, 2019
@k8s-ci-robot k8s-ci-robot requested review from copejon and janetkuo March 6, 2019 23:42
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Mar 6, 2019
@mkimuram
Copy link
Contributor Author

mkimuram commented Mar 6, 2019

@msau42 @wongma7

Some of the fixes and additions to the existing test framework are borrowed from #74693. Also, changes that are discussed in #65248 are also included.

@msau42
Copy link
Member

msau42 commented Mar 6, 2019

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 6, 2019
@msau42
Copy link
Member

msau42 commented Mar 6, 2019

BTW, CSI Reconstruction is flaky in general: #72500

@mkimuram
Copy link
Contributor Author

mkimuram commented Mar 7, 2019

/retest

@mkimuram
Copy link
Contributor Author

mkimuram commented Mar 7, 2019

@msau42

Is it possible to manually arrange tests in this PR, because they aren't tested by regular CI, due to the fact that these tests have [Disruptive] tag?

Note that below three tests are added for both filesystem and block volume in this PR.

"Should test that pv written before kubelet restart is readable after restart."
"Should test that pv used in a pod that is deleted while the kubelet is down cleans up when the kubelet returns."
"Should test that pv used in a pod that is force deleted while the kubelet is down cleans up when the kubelet returns."

For filesystem, all tests passed in my environment. (As they almost just call existing tests from framework.)
However, for block, only first test passes, and rest of the tests fail. (Maybe due to the issue discussed #74914)

Can this PR go ahead by disabling these failing tests for another PR that fix #74914 to utilize this test?
Or should we need to include fix to this PR?

@msau42
Copy link
Member

msau42 commented Mar 7, 2019

/test pull-kubernetes-e2e-gce-csi-serial

Is an optional job that runs serial CSI tests

@mkimuram
Copy link
Contributor Author

mkimuram commented Mar 7, 2019

@msau42

BTW, CSI Reconstruction is flaky in general: #72500

I found that the path is eventually deleted if manually checked after the test failure. Therefore, we might at least reduce the flake by adding rather longer wait before here.

(In my environment, test with iscsi block volume fails more often in non-force delete case than in force delete case, so we might just need ~30sec wait in general and even more for CSI as a workaround?)

@msau42
Copy link
Member

msau42 commented Mar 7, 2019

In the graceful delete case, kubelet should wait until volume manager finishes unmounting the volume before deleting the Pod object. So we shouldn't need to add a timeout for the graceful deletion case.

@mkimuram
Copy link
Contributor Author

mkimuram commented Mar 7, 2019

/test pull-kubernetes-e2e-gce-csi-serial

@mkimuram
Copy link
Contributor Author

mkimuram commented Mar 8, 2019

/test pull-kubernetes-e2e-gce-csi-serial

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 31, 2019
@msau42
Copy link
Member

msau42 commented May 6, 2019

/assign @bswartz

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2019
@msau42
Copy link
Member

msau42 commented Jul 22, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane, mkimuram, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2019
var volumeMapper volumepkg.BlockVolumeMapper
if utilfeature.DefaultFeatureGate.Enabled(features.BlockVolume) && volume.volumeMode == v1.PersistentVolumeBlock {
var newMapperErr error
if mapperPlugin != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not your change, but I don't see the possibility of mapperPlugin is nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97

Thank you for your comment.

You're right that this path won't be reached, beause ReconstructVolumeOperation returns error when mapperPlugin == nil just after FindMapperPluginByName returns nil for mapperPlugin when plugin doesn't support block volume.

However, it doesn't seem to be so obvious, because it depends on other function's implementation.
In this case, do you mean to delete this check here or any other ideas for improve this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seems strange to check this quite late here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess @jingxu97 points checkPath is always set if we can remove if mapperPlugin != nil { condition and that would make CheckVolumeExistenceOperation() call safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97 @oomichi

I've moved nil check for mapperPlugin earlier in reconstructVolume in "Move nil check for mapperPlugin earlier in reconstructVolume" commit. PTAL

(I will merge this change with the previous commit if this looks good.)

}

// Check existence of mount point for filesystem volume or symbolic link for block volume
isExist, checkErr := rc.operationExecutor.CheckVolumeExistenceOperation(volumeSpec, volume.volumePath, volumeSpec.Name(), rc.mounter, uniqueVolumeName, volume.podName, pod.UID, attachablePlugin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volume.volumePath is used incorrectly here. Before this PR, volumeMounter.GetPath() was used. This breaks CSI reconstruction, as CheckVolumeExistenceOperation checks the pod directory [1] and not the directory where the volume should be mounted [2]

1: /var/lib/kubelet/pods/4ec8e3aa-f83e-4d49-96ea-3859f09f7dd7/volumes/kubernetes.io~csi/pvc-873e54a8-a2c1-4872-8a3c-b68b71a81e42
2: /var/lib/kubelet/pods/4ec8e3aa-f83e-4d49-96ea-3859f09f7dd7/volumes/kubernetes.io~csi/pvc-873e54a8-a2c1-4872-8a3c-b68b71a81e42/mount

@jsafrane
Copy link
Member

@mkimuram, I fixed CSI volume reconstruction in the last commit of this PR: #80743 (running tests there).

Can you please include it in this PR? I am sorry I broke it in the first place.

@mkimuram
Copy link
Contributor Author

mkimuram commented Jul 30, 2019

@jsafrane

Oh, I should have read through the original code, too. Sorry for that.
Previous code seems to revert #74653 work (@jingxu97 was right) and now it is taken into account.
Also, checking GetPodDeviceMapPath for block volume should be the right place to check.

I've re-organized the commits to move the fix into the right commits.
(Deleted the test disable commit and merged the fix commit into one.)

PTAL @jsafrane @msau42 @jingxu97

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in MountedVolume definition, we can add the comment to make it clear that mounter is required for file system volumes, but not required for BlockVolumes, and BlockVolumeMapper is the opposite.

return nil, fmt.Errorf("Volume: %q is not mounted", uniqueVolumeName)
}
var volumeMapper volumepkg.BlockVolumeMapper
var volumeMounter volumepkg.Mounter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indicate this change into the PR's title and comment since this is a code change, not just test change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97

Thank you for your review.

I've changed the PR's title and comments and added comments in MountedVolume definition.
(New code change is merged to "Don't create mounter when reconstructing block volume" commit.)

PTAL

@mkimuram mkimuram changed the title Move disruptive tests to testsuites and add ones for block volume Fix volume reconstruction and add e2e tests Jul 30, 2019
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 30, 2019
var volumeMapper volumepkg.BlockVolumeMapper
if utilfeature.DefaultFeatureGate.Enabled(features.BlockVolume) && volume.volumeMode == v1.PersistentVolumeBlock {
var newMapperErr error
if mapperPlugin != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess @jingxu97 points checkPath is always set if we can remove if mapperPlugin != nil { condition and that would make CheckVolumeExistenceOperation() call safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change here?
I cannot find the reason from the PR message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework.CreateNginxPod can't handle block volume and inline volume, but framework.CreateSecPod can.
Current tests in generic_persistent_volume-disruptive.go doesn't actually test them, but it would be good to use the similar logic to testsuites/disruptive.go to prepare for future imporvement and maintenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can replace here with framework.ExpectEqual(result.Code, 0, fmt.Sprintf("Expected grep exit code of 0, got %d", result.Code))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove if err != nil { because of framework.ExpectNoError(err, "Expected pod to be not found.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested.

Copy link
Contributor Author

@mkimuram mkimuram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oomichi

Thank you for your comment. I've fixed as suggested. PTAL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework.CreateNginxPod can't handle block volume and inline volume, but framework.CreateSecPod can.
Current tests in generic_persistent_volume-disruptive.go doesn't actually test them, but it would be good to use the similar logic to testsuites/disruptive.go to prepare for future imporvement and maintenance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested.

@jsafrane
Copy link
Member

jsafrane commented Aug 2, 2019

/retest

@jingxu97
Copy link
Contributor

jingxu97 commented Aug 5, 2019

thanks for the work! /lgtm

@jsafrane
Copy link
Member

jsafrane commented Aug 6, 2019

Just reformating Jing's lgtm
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 6, 2019
@k8s-ci-robot k8s-ci-robot merged commit 345e58b into kubernetes:master Aug 6, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Aug 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants