Skip to content

req-resp: Fix panic on connection closed for substream open failure#291

Merged
lexnv merged 3 commits intomasterfrom
lexnv/investigate-mismatch
Nov 27, 2024
Merged

req-resp: Fix panic on connection closed for substream open failure#291
lexnv merged 3 commits intomasterfrom
lexnv/investigate-mismatch

Conversation

@lexnv
Copy link
Copy Markdown
Collaborator

@lexnv lexnv commented Nov 26, 2024

When a new connection is reported to the request-response protocol, the protocol tracks the state of the connection if and only if it could open a substream with the remote peer.

We do not track the state of the connection in cases where the substream cannot be opened because the connection is closed.

A further ConnectionClosed event triggered a debug_assert assumption for this case. The assumption is wrong since opening substreams can fail which informs us that the connection is closed.

Connection details not tracked:

Err(error) => {
tracing::warn!(
target: LOG_TARGET,
?peer,
protocol = %self.protocol,
request_id = ?context.request_id,
?error,
"failed to open substream",
);
return self
.report_request_failure(
peer,
context.request_id,
RequestResponseError::Rejected(error.into()),
)
.await;
}

Which later leads to panics:

let Some(context) = self.peers.remove(&peer) else {
tracing::error!(
target: LOG_TARGET,
?peer,
"state mismatch: peer doesn't exist",
);
debug_assert!(false);

Related to: #290

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv changed the title req-resp: Investigate state mismatch req-resp: Fix panic on substream open failure Nov 27, 2024
@lexnv lexnv changed the title req-resp: Fix panic on substream open failure req-resp: Fix panic on connection closed for substream open failure Nov 27, 2024
@lexnv lexnv self-assigned this Nov 27, 2024
@lexnv lexnv merged commit eb0d6f2 into master Nov 27, 2024
@lexnv lexnv deleted the lexnv/investigate-mismatch branch November 27, 2024 13:11
lexnv added a commit that referenced this pull request Nov 27, 2024
## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([#291](#291))
- crypto/noise: Verify crypto/noise signature payload
([#278](#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([#292](#292))


cc @paritytech/networking

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 27, 2024
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.





## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([#292](paritytech/litep2p#292))


## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: #6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)


### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.


![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)



cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: GitHub Action <action@github.com>
github-merge-queue bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 28, 2024
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.





## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([#292](paritytech/litep2p#292))


## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: #6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)


### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.


![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)



cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: GitHub Action <action@github.com>
github-actions bot pushed a commit to paritytech/polkadot-sdk that referenced this pull request Nov 28, 2024
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.

## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([#292](paritytech/litep2p#292))

## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: #6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)

### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.

![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)

cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: GitHub Action <action@github.com>
(cherry picked from commit 51c3e95)
Krayt78 pushed a commit to Krayt78/polkadot-sdk that referenced this pull request Dec 18, 2024
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.





## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([paritytech#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([paritytech#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([paritytech#292](paritytech/litep2p#292))


## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: paritytech#6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)


### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.


![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)



cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: GitHub Action <action@github.com>
dudo50 pushed a commit to paraspell-research/polkadot-sdk that referenced this pull request Jan 4, 2025
This includes a critical fix for debug release versions of litep2p
(which are running in Kusama as validators).

While at it, have stopped the oncall pain of alerts around
`incoming_connections_total`. We can rethink the metric expose of
litep2p in Q1.





## [0.8.2] - 2024-11-27

This release ensures that the provided peer identity is verified at the
crypto/noise protocol level, enhancing security and preventing potential
misuses.
The release also includes a fix that caused `TransportService` component
to panic on debug builds.

### Fixed

- req-resp: Fix panic on connection closed for substream open failure
([paritytech#291](paritytech/litep2p#291))
- crypto/noise: Verify crypto/noise signature payload
([paritytech#278](paritytech/litep2p#278))

### Changed

- transport_service/logs: Provide less details for trace logs
([paritytech#292](paritytech/litep2p#292))


## Testing Done

This has been extensively tested in Kusama on all validators, that are
now running litep2p.

Deployed PR: paritytech#6638

### Litep2p Dashboards
![Screenshot 2024-11-26 at 19 19
41](https://github.com/user-attachments/assets/e00b2b2b-7e64-4d96-ab26-165e2b8d0dc9)


### Libp2p vs Litep2p CPU usage

After deploying litep2p we have reduced CPU usage from around 300-400%
to 200%, this is a significant boost in performance, freeing resources
for other subcomponents to function more optimally.


![image(1)](https://github.com/user-attachments/assets/fa793df5-4d58-4601-963d-246e56dd2a26)



cc @paritytech/sdk-node

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: GitHub Action <action@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants