-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Bug 1755073: docs/user/*/install_upi: explicitly-set-control-plane-unschedulable #2440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1755073: docs/user/*/install_upi: explicitly-set-control-plane-unschedulable #2440
Conversation
|
@wking: This pull request references Bugzilla bug 1755073, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/approve for another review |
|
/lgtm |
7daf18e to
55caef5
Compare
|
I'd left off the meat of the commit-message subject 🤦♂️. Fixed with 7daf18eb8 -> 55caef592. Can I get a fresh |
|
/lgtm |
docs/user/aws/install_upi.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
routers are now ingress controllers :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
routers are now ingress controllers :-)
The pod is still called router. But is "ingress controller pod" the preferred English? Do we have plans to rename the pods to match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have "Ingress router pods" in the official docs, so if any rephrasing is needed we'd probably want to bump those too.
docs/user/gcp/install_upi.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
/hold |
We grew replicas-zeroing in c22d042 (docs/user/aws/install_upi: Add 'sed' call to zero compute replicas, 2019-05-02, openshift#1649) to set the stage for changing the 'replicas: 0' semantics from "we'll make you some dummy MachineSets" to "we won't make you MachineSets". But that hasn't happened yet, and since 64f96df (scheduler: Use schedulable masters if no compute hosts defined, 2019-07-16, openshift#2004) 'replicas: 0' for compute has also meant "add the 'worker' role to control-plane nodes". That leads to racy problems when ingress comes through a load balancer, because Kubernetes load balancers exclude control-plane nodes from their target set [1,2] (although this may get relaxed soonish [3]). If the router pods get scheduled on the control plane machines due to the 'worker' role, they are not reachable from the load balancer and ingress routing breaks [4]. Seth says: > pod nodeSelectors are not like taints/tolerations. They only have > effect at scheduling time. They are not continually enforced. which means that attempting to address this issue as a day-2 operation would mean removing the 'worker' role from the control-plane nodes and then manually evicting the router pods to force rescheduling. So until we get the changes from [3], we can either drop the zeroing [5] or adjust the scheduler configuration to remove the effect of the zeroing. In both cases, this is a change we'll want to revert later once we bump Kubernetes to pick up a fix for the service load-balancer targets. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1671136#c1 [2]: kubernetes/kubernetes#65618 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1744370#c6 [4]: https://bugzilla.redhat.com/show_bug.cgi?id=1755073 [5]: openshift#2402
55caef5 to
485057a
Compare
|
/hold cancel Still need to work out #2440 (comment) , but with the |
|
/lgtm /hold |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, sdodson, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
|
I don't think the |
We grew replicas-zeroing in c22d042 (#1649) to set the stage for changing the
replicas: 0semantics from "we'll make you some dummy MachineSets" to "we won't make you MachineSets". But that hasn't happened yet, and since 64f96df (#2004)replicas: 0for compute has also meant "add theworkerrole to control-plane nodes". That leads to racy problems when ingress comes through a load balancer, because Kubernetes load balancers exclude control-plane nodes from their target set (see here and here, although this may get relaxed soonish). If the router pods get scheduled on the control plane machines due to theworkerrole, they are not reachable from the load balancer and ingress routing breaks. @sjenning says:which means that attempting to address this issue as a day-2 operation would mean removing the
workerrole from the control-plane nodes and then manually evicting the router pods to force rescheduling. So until we get the changes from here, we can either drop the zeroing (#2402) or adjust the scheduler configuration to remove the effect of the zeroing. In both cases, this is a change we'll want to revert later once we bump Kubernetes to pick up a fix for the service load-balancer targets.