-
Notifications
You must be signed in to change notification settings - Fork 30
TransportService: Improve connection stability by downgrading connections on substream inactivity #253
Description
Context
The transport service handler intent is to downgrade connections (leading to possibly closing them), if substreams where not open in a given timeframe:
litep2p/src/protocol/transport_service.rs
Lines 126 to 130 in d50ec10
| /// Close the connection if no substreams are open within this time frame. | |
| keep_alive_timeout: Duration, | |
| /// Pending keep-alive timeouts. | |
| pending_keep_alive_timeouts: FuturesUnordered<BoxFuture<'static, (PeerId, ConnectionId)>>, |
When a connection is established, the timeout is properly tracked:
litep2p/src/protocol/transport_service.rs
Lines 213 to 216 in d50ec10
| self.pending_keep_alive_timeouts.push(Box::pin(async move { | |
| tokio::time::sleep(keep_alive_timeout).await; | |
| (peer, connection_id) | |
| })); |
When the timeout expires, the connection is downgraded:
litep2p/src/protocol/transport_service.rs
Lines 414 to 427 in d50ec10
| while let Poll::Ready(Some((peer, connection_id))) = | |
| self.pending_keep_alive_timeouts.poll_next_unpin(cx) | |
| { | |
| if let Some(context) = self.connections.get_mut(&peer) { | |
| tracing::trace!( | |
| target: LOG_TARGET, | |
| ?peer, | |
| ?connection_id, | |
| "keep-alive timeout over, downgrade connection", | |
| ); | |
| context.downgrade(&connection_id); | |
| } | |
| } |
Issue
Opening of substreams are not taken into account for downgrading connections:
litep2p/src/protocol/transport_service.rs
Line 410 in d50ec10
| Some(event) => return Poll::Ready(Some(event.into())), |
Here we should match for Some(InnerTransportEvent::SubstreamOpened and move forward the timeout future.
With the current implementation, we downgrade connections at 5 seconds intervals.
Implications
We need to double-check if the protocol name matches the substream open protocol.
Further, InnerTransportEvent::SubstreamOpened needs to be extended with a connectionID because the TransportService holds up to 2 connections (primary and secondary). We should properly advance the timeout of the connection ID.