-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[release-1.11] 🐛 Do not overwrite global http.DefaultClient TLSConfig #13062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-1.11] 🐛 Do not overwrite global http.DefaultClient TLSConfig #13062
Conversation
The current TLS configuration was overriding the TLSConfig for the global `http.DefaultClient`. This call is being used by controllers such as the `ExtensionConfig` controller which calls this function from multiple concurrent workers. This leads to a race where the TLS `ServerName` is configured differently to that of the URL it is trying to call and X509 validation fails. An example can be seen from the CAPI logs below: ``` E1126 12:43:22.449064 1 controller.go:347] "Reconciler error" err="failed to discover ExtensionConfig extension-config-a: failed to discover extension \"extension-config-a\": http call failed: Post \"https://extension-config-a-runtimehooks.extension-config-a-system.svc:443/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovery?timeout=10s\": tls: failed to verify certificate: x509: certificate is valid for extension-config-a-runtimehooks.extension-config-a-system.svc, extension-config-a-runtimehooks.extension-config-a-system.svc.cluster.local, not extension-config-b.extension-config-b-system.svc" controller="extensionconfig" controllerGroup="runtime.cluster.x-k8s.io" controllerKind="ExtensionConfig" ExtensionConfig="extension-config-a" namespace="" name="extension-config-a" reconcileID="dfd00b69-3666-4818-b4a0-52eb1c391848" E1126 12:53:42.919995 1 controller.go:347] "Reconciler error" err="failed to discover ExtensionConfig extension-config-b: failed to discover extension \"extension-config-b\": http call failed: Post \"https://extension-config-b.extension-config-b-system.svc:443/hooks.runtime.cluster.x-k8s.io/v1alpha1/discovery?timeout=10s\": tls: failed to verify certificate: x509: certificate is valid for extension-config-b.extension-config-b-system.svc, extension-config-b.extension-config-b-system.svc.cluster.local, not extension-config-a-runtimehooks.extension-config-a-system.svc" controller="extensionconfig" controllerGroup="runtime.cluster.x-k8s.io" controllerKind="ExtensionConfig" ExtensionConfig="extension-config-b" namespace="" name="extension-config-b" reconcileID="4cc93a96-cfcf-49f8-8276-b3725fc8e1b8" ``` Notice how the URL and the expected hostname are swapped in each log indicating a race (TLSConfig being reconfigured in the middle of the call by different worker threads.
|
/area clusterclass |
|
/lgtm |
|
LGTM label has been added. Git tree hash: 098bf6ef63ee02f17631703f87bd3c9f1289c3c7
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: neolit123 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@neolit123 please give maintainers time to chime in before merging PRs. |
|
💯 , that would be great! We are close to the release and should be careful on the changes we do. |
This is an automated cherry-pick of #13058
/assign jimmidyson