Skip to content

Conversation

@cmurphy
Copy link
Contributor

@cmurphy cmurphy commented Oct 22, 2021

Without this change, if a cert is updated (e.g. to add CNs) while the
listener is in the middle of Accept()ing a new connection, the
connection gets dropped, we'll see a message like this in the server
logs:

http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection

and the client (like a browser) won't necessarily reconnect. This change
modifies the GetCertificate routine in the listener's tls.Config to
keep track of the state of the incoming connections and only close
connections that have completed GetCertificate and therefore are
finished with their TLS handshake, so that only old established
connections are closed.

Issue: rancher/rancher#34038

Related context:

@cmurphy cmurphy marked this pull request as ready for review October 22, 2021 20:47
@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 22, 2021

@ibuildthecloud and @brandond could you review this? (I can't add reviewers)

I've tried to verify that this change doesn't cause a regression to the three related issues mentioned above and seems to be working fine, but I could be missing something.

@brandond
Copy link
Member

brandond commented Oct 22, 2021

See the comment at https://github.com/rancher/dynamiclistener/pull/28/files#r468192991 as to why I moved it.

If you can confirm that this doesn't regress the issue that moving it fixed then I'm good.

@brandond
Copy link
Member

brandond commented Oct 22, 2021

Is the http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection error perhaps caused by the fact that it's closing the connection that it's in the middle of completing a TLS handshake on? In that case, the correct fix might be to not add the connection to dynamicListener.conns until after the handshake is complete, so that it doesn't get closed when reloading the cert? We haven't even completed the TLS handshake yet so we don't need to close it in order to get the right cert loaded.

@brandond
Copy link
Member

brandond commented Oct 22, 2021

You might also be using dynamiclistener in an environment where you don't need to or shouldn't close existing connections when the cert changes?

@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 22, 2021

Is the http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection error perhaps caused by the fact that it's closing the connection that it's in the middle of completing a TLS handshake on? In that case, the correct fix might be to not add the connection to dynamicListener.conns until after the handshake is complete, so that it doesn't get closed when reloading the cert?

That's what seems to be happening, so moving

dynamiclistener/listener.go

Lines 254 to 256 in 94e2249

if l.conns != nil {
conn = l.wrap(conn)
}
somewhere else would fix it if I can find the right place to put it.

You might also be using dynamiclistener in an environment where you don't need to or shouldn't close existing connections when the cert changes?

Based on rancher/rancher#25386 closing existing connections is intentional and necessary for websockets to work but I'm not sure if it's needed otherwise.

@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 22, 2021

There's nowhere to move the dynamicListener.wrap to that would help because the problem occurs during the dynamiclistener.Listener.Accept routine which is mostly a black box apart from GetCertificate, so I made the change there to look at the client hello data and skip closing the connection that was in the middle of its handshake.

@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 22, 2021

This unfortunately doesn't help if there are multiple requests made at once and the cert is updated for any of them, since all the connections are wrapped and then all except one are closed 🤔

@brandond
Copy link
Member

Ah yeah, that doesn't help the other connections does it. Is there some way to add a flag to the wrapped connections to indicate whether or not we've finished the handshake (or at least returned a cert to be used for the handshake) yet? We only want to close connections that were completed before the cert was updated, right?

@cmurphy cmurphy changed the title Force close connections immediately on cert update Skip closing an initializing connection Oct 25, 2021
@cmurphy
Copy link
Contributor Author

cmurphy commented Oct 25, 2021

Alright now the wrapper tracks whether the connection has ever gotten a cert and uses that to know whether it's safe to close

@cmurphy cmurphy requested a review from brandond October 25, 2021 17:53
Without this change, if a cert is updated (e.g. to add CNs) while the
listener is in the middle of Accept()ing a new connection, the
connection gets dropped, we'll see a message like this in the server
logs:

  http: TLS handshake error from 127.0.0.1:51232: write tcp 127.0.7.1:8443->127.0.0.1:51232: use of closed network connection

and the client (like a browser) won't necessarily reconnect. This change
modifies the GetCertificate routine in the listener's tls.Config to
keep track of the state of the incoming connections and only close
connections that have completed GetCertificate and therefore are
finished with their TLS handshake, so that only old established
connections are closed.
@ibuildthecloud ibuildthecloud merged commit 6b37dc1 into rancher:master Oct 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants