-
Notifications
You must be signed in to change notification settings - Fork 210
Bug 2002834: lib/resourcemerge/core: Remove unrecognized volumes and mounts #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2002834: lib/resourcemerge/core: Remove unrecognized volumes and mounts #654
Conversation
|
@wking: This pull request references Bugzilla bug 2002834, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
44618c3 to
b5cba93
Compare
Since this package was created in d9f6718 (lib: add lib for applying objects, 2018-08-14, openshift#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until [1]: 1. In 4.3, the autoscaler asked for a ca-cert volume mount, based on the cluster-autoscaler-operator-ca config map. 2. In 4.4, the autoscaler dropped those manifest entries [2]. 3. In 4.9, the autoscaler asked the CVO to remove the config map [3]. That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map. We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's spec.overrides to take control of a particular CVO-managed resource. This joins a series of similar previous tightenings, including 02bb9ba (lib/resourcemerge/core: Clear env and envFrom if unset in manifest, 2021-04-20, openshift#549) and ca299b8 (lib/resourcemerge: remove ports which are no longer required, 2020-02-13, openshift#322). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2002834 [2]: openshift/cluster-autoscaler-operator@f08589d#diff-547486373183980619528df695869ed32b80c18383bc16b57a5ee931bf0edd39L89 [3]: openshift/cluster-autoscaler-operator@9a7b3be#diff-d0cf785e044c611986a4d9bdd65bb373c86f9eb1c97bd3f105062184342a872dR4
b5cba93 to
83faa6e
Compare
vrutkovs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vrutkovs, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherrypick release-4.9 |
|
@wking: once the present PR merges, I will cherry-pick it on top of release-4.9 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-agnostic-upgrade |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
Sandbox issue is orthogonal. /override ci/prow/e2e-agnostic-upgrade |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
Sandbox issue is still orthogonal. /override ci/prow/e2e-agnostic-upgrade |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
I dunno what this /override ci/prow/e2e-agnostic-operator |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-operator In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: All pull requests linked via external trackers have merged: Bugzilla bug 2002834 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@wking: new pull request created: #657 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…-api-access This content is injected by an admission webhook [1,2]. When we started removing not-in-manifest volumes in 83faa6e (lib/resourcemerge/core: Remove unrecognized volumes and mounts, 2021-09-14, openshift#654), the cluster-version operator started removing the webhook-injected volume, leading to the cluster-version operator crash-looping on updates from 4.8 to 4.9 with messages like [3]: F0920 13:23:23.565439 1 start.go:24] error: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable With this commit, we follow the precedent of the Kubernetes API server's own manifest [4,5]. [1]: https://github.com/kubernetes/kubernetes/blob/2f68346fbb6246961ce0a3176418630950aea500/plugin/pkg/admission/serviceaccount/admission.go#L53-L54 [2]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume [3]: https://bugzilla.redhat.com/show_bug.cgi?id=2005581 [4]: openshift/cluster-kube-apiserver-operator#1142 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1946479
…-api-access This content is injected by an admission webhook [1,2]. When we started removing not-in-manifest volumes in 83faa6e (lib/resourcemerge/core: Remove unrecognized volumes and mounts, 2021-09-14, openshift#654), the cluster-version operator started removing the webhook-injected volume, leading to the cluster-version operator crash-looping on updates from 4.8 to 4.9 with messages like [3]: F0920 13:23:23.565439 1 start.go:24] error: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable With this commit, we follow the precedent of the Kubernetes API server's own manifest [4,5]. [1]: https://github.com/kubernetes/kubernetes/blob/2f68346fbb6246961ce0a3176418630950aea500/plugin/pkg/admission/serviceaccount/admission.go#L53-L54 [2]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume [3]: https://bugzilla.redhat.com/show_bug.cgi?id=2005581 [4]: openshift/cluster-kube-apiserver-operator#1142 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1946479
…-api-access This content is injected by an admission webhook [1,2]. When we started removing not-in-manifest volumes in 83faa6e (lib/resourcemerge/core: Remove unrecognized volumes and mounts, 2021-09-14, openshift#654), the cluster-version operator started removing the webhook-injected volume, leading to the cluster-version operator crash-looping on updates from 4.8 to 4.9 with messages like [3]: F0920 13:23:23.565439 1 start.go:24] error: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable With this commit, we follow the precedent of the Kubernetes API server's own manifest [4,5]. [1]: https://github.com/kubernetes/kubernetes/blob/2f68346fbb6246961ce0a3176418630950aea500/plugin/pkg/admission/serviceaccount/admission.go#L53-L54 [2]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume [3]: https://bugzilla.redhat.com/show_bug.cgi?id=2005581 [4]: openshift/cluster-kube-apiserver-operator#1142 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1946479
…-api-access This content is injected by an admission webhook [1,2]. When we started removing not-in-manifest volumes in 83faa6e (lib/resourcemerge/core: Remove unrecognized volumes and mounts, 2021-09-14, openshift#654), the cluster-version operator started removing the webhook-injected volume, leading to the cluster-version operator crash-looping on updates from 4.8 to 4.9 with messages like [3]: F0920 13:23:23.565439 1 start.go:24] error: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable With this commit, we follow the precedent of the Kubernetes API server's own manifest [4,5]. [1]: https://github.com/kubernetes/kubernetes/blob/2f68346fbb6246961ce0a3176418630950aea500/plugin/pkg/admission/serviceaccount/admission.go#L53-L54 [2]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume [3]: https://bugzilla.redhat.com/show_bug.cgi?id=2005581 [4]: openshift/cluster-kube-apiserver-operator#1142 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1946479
…-api-access This content is injected by an admission webhook [1,2]. When we started removing not-in-manifest volumes in 83faa6e (lib/resourcemerge/core: Remove unrecognized volumes and mounts, 2021-09-14, openshift#654), the cluster-version operator started removing the webhook-injected volume, leading to the cluster-version operator crash-looping on updates from 4.8 to 4.9 with messages like [3]: F0920 13:23:23.565439 1 start.go:24] error: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable With this commit, we follow the precedent of the Kubernetes API server's own manifest [4,5]. [1]: https://github.com/kubernetes/kubernetes/blob/2f68346fbb6246961ce0a3176418630950aea500/plugin/pkg/admission/serviceaccount/admission.go#L53-L54 [2]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume [3]: https://bugzilla.redhat.com/show_bug.cgi?id=2005581 [4]: openshift/cluster-kube-apiserver-operator#1142 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1946479
…-api-access This content is injected by an admission webhook [1,2]. When we started removing not-in-manifest volumes in 83faa6e (lib/resourcemerge/core: Remove unrecognized volumes and mounts, 2021-09-14, openshift#654), the cluster-version operator started removing the webhook-injected volume, leading to the cluster-version operator crash-looping on updates from 4.8 to 4.9 with messages like [3]: F0920 13:23:23.565439 1 start.go:24] error: error creating clients: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable With this commit, we follow the precedent of the Kubernetes API server's own manifest [4,5]. [1]: https://github.com/kubernetes/kubernetes/blob/2f68346fbb6246961ce0a3176418630950aea500/plugin/pkg/admission/serviceaccount/admission.go#L53-L54 [2]: https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume [3]: https://bugzilla.redhat.com/show_bug.cgi?id=2005581 [4]: openshift/cluster-kube-apiserver-operator#1142 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1946479
We had been merging by name since ensureVolumeMounts landed in 83faa6e (lib/resourcemerge/core: Remove unrecognized volumes and mounts, 2021-09-14, openshift#654). But as pointed out in [1], a single volume may be mounted at multiple paths. With this commit, I'm pivoting to merge by mountPath, which is the patchMergeKey [2] (and it makes sense that you wouldn't have multiple volumes mounted at the same path). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2026560 [2]: https://github.com/kubernetes/api/blob/1d6faf224f146dd002553f55cd9fcaaaa0dc00cb/core/v1/types.go#L2367
We had been merging by name since ensureVolumeMounts landed in 83faa6e (lib/resourcemerge/core: Remove unrecognized volumes and mounts, 2021-09-14, openshift#654). But as pointed out in [1], a single volume may be mounted at multiple paths. With this commit, I'm pivoting to merge by mountPath, which is the patchMergeKey [2] (and it makes sense that you wouldn't have multiple volumes mounted at the same path). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2026560 [2]: https://github.com/kubernetes/api/blob/1d6faf224f146dd002553f55cd9fcaaaa0dc00cb/core/v1/types.go#L2367
Since this package was created in d9f6718 (#7), the volume(mount) merge logic has required manifest entries to exist, but has allowed in-cluster entries to persist without removal. That hasn't been a problem until:
That lead some born-in 4.3 clusters to have crashlooping autoscalers, because the mount attempts kept failing on the missing config map.
We couldn't think of a plausible reason why cluster admins would want to inject additional volume mounts in a CVO-managed pod configuration, so this commit removes that ability and begins clearing away any volume(mount) configuration that is not present in the reconciling manifest. Cluster administrators who do need to add additional mounts in an emergency are free to use ClusterVersion's
spec.overridesto take control of a particular CVO-managed resource.This joins a series of similar previous tightenings, including 02bb9ba (#549) and ca299b8 (#322).