Skip to content

feat(relay): don't close connections upon errors in relay server#4718

Merged
mergify[bot] merged 17 commits intomasterfrom
feat/remove-connection-close-relayed
Nov 1, 2023
Merged

feat(relay): don't close connections upon errors in relay server#4718
mergify[bot] merged 17 commits intomasterfrom
feat/remove-connection-close-relayed

Conversation

@thomaseizinger
Copy link
Copy Markdown
Contributor

@thomaseizinger thomaseizinger commented Oct 24, 2023

Description

To remove the usages of ConnectionHandlerEvent::Close from the relay-server, we unify what used to be called CircuitFailedReason and FatalUpgradeError. Whilst the errors may be fatal for the particular circuit, they are not necessarily fatal for the entire connection.

Related: #3591.
Resolves: #4716.

Notes & open questions

  • Should we do some kind of "smart" connection management upon failures on the streams further up? At the moment, we don't expose the details of which connection a stream failed on. I am leaning towards saying "no" here and instead relying more on fix(swarm): keep connections alive while active streams exist #4595. Once we do more automated keep-alive tracking, bad connections will close automatically much more aggressively. That is because any error on a stream will lead to the user dropping the stream which means we will automatically return KeepAlive::No.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@mergify

This comment was marked as resolved.

@mergify

This comment was marked as resolved.

Copy link
Copy Markdown
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work.

Can you add a test that ensures #4752 is fixed with this pull request, i.e. that a connection is not closed even though the remote does not support the relay protocol as one expected?

@thomaseizinger
Copy link
Copy Markdown
Contributor Author

Can you add a test that ensures #4752 is fixed with this pull request, i.e. that a connection is not closed even though the remote does not support the relay protocol as one expected?

To properly test this, I think we need to also merge the fix for the client side (#4745) otherwise, the client will simply close the connection. Happy to add a test once both PRs are landed.

@thomaseizinger thomaseizinger added the internal-change Pull requests that make internal changes to crates and thus don't need to include a changelog entry. label Oct 29, 2023
@thomaseizinger thomaseizinger force-pushed the feat/remove-connection-close-relayed branch from 14ee70d to 34c8f71 Compare October 29, 2023 23:12
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Oct 30, 2023

This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏

mergify bot pushed a commit that referenced this pull request Oct 31, 2023
To make a reservation with a relay, a user calls `Swarm::listen_on` with an address of the relay, suffixed with a `/p2pcircuit` protocol. Similarly, to establish a circuit to another peer, a user needs to call `Swarm::dial` with such an address. Upon success, the `Swarm` then issues a `SwarmEvent::NewListenAddr` event in case of a successful reservation or a `SwarmEvent::ConnectionEstablished` in case of a successful connect.

The story is different for errors. Somewhat counterintuitively, the actual reason of an error during these operations are only reported as `relay::Event`s without a direct correlation to the user's `Swarm::listen_on` or `Swarm::dial` calls.

With this PR, we send these errors back "into" the `Transport` and report them as `SwarmEvent::ListenerClosed` or `SwarmEvent::OutgoingConnectionError`. This is conceptually more correct. Additionally, by sending these errors back to the transport, we no longer use `ConnectionHandlerEvent::Close` which entirely closes the underlying relay connection. In case the connection is not used for something else, it will be closed by the keep-alive algorithm.

Resolves: #4717.
Related: #3591.
Related: #4718.

Pull-Request: #4745.
@mergify mergify bot merged commit 823d0b2 into master Nov 1, 2023
@mergify mergify bot deleted the feat/remove-connection-close-relayed branch November 1, 2023 01:51
mergify bot pushed a commit that referenced this pull request Nov 2, 2023
This PR implements the long-awaited design of disallowing `ConnectionHandler`s to close entire connections. Instead, users should close connections via `ToSwarm::CloseConnection` from a `NetworkBehaviour` or - even better - from the `Swarm` via `close_connection`. A `NetworkBehaviour` also does not have a "full" view onto how a connection is used but at least it can correlate whether it created the connection via the `ConnectionId`. In general, the more modular and friendly approach is to stop "using" a connection if a particular protocol no longer needs it. As a result of the keep-alive algorithm, such a connection is then closed automatically.

Depends-on: #4745.
Depends-on: #4718.
Depends-on: #4749.
Related: #3353.
Related: #4714.
Resolves: #3591.

Pull-Request: #4755.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal-change Pull requests that make internal changes to crates and thus don't need to include a changelog entry. send-it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove usage of ConnectionHandlerEvent::Close from libp2p_relay

2 participants