Skip to content

Commit 21afdb7

Browse files
paritytech-release-backport-bot[bot]alexgghbkchrgithub-actions[bot]
authored
[stable2506] Backport #9264 (#9276)
Backport #9264 into `stable2506` from alexggh. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Bastian Köcher <[email protected]> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
1 parent 861c780 commit 21afdb7

2 files changed

Lines changed: 32 additions & 7 deletions

File tree

polkadot/node/network/gossip-support/src/lib.rs

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -90,13 +90,15 @@ const TRY_RERESOLVE_AUTHORITIES: Duration = Duration::from_secs(2);
9090
const LOW_CONNECTIVITY_WARN_DELAY: Duration = Duration::from_secs(600);
9191

9292
/// If connectivity is lower than this in percent, issue warning in logs.
93-
const LOW_CONNECTIVITY_WARN_THRESHOLD: usize = 90;
93+
const LOW_CONNECTIVITY_WARN_THRESHOLD: usize = 85;
9494

9595
/// The Gossip Support subsystem.
9696
pub struct GossipSupport<AD> {
9797
keystore: KeystorePtr,
9898

9999
last_session_index: Option<SessionIndex>,
100+
/// Whether we are currently an authority or not.
101+
is_authority_now: bool,
100102
/// The minimum known session we build the topology for.
101103
min_known_session: SessionIndex,
102104
// Some(timestamp) if we failed to resolve
@@ -163,6 +165,7 @@ where
163165
min_known_session: u32::MAX,
164166
authority_discovery,
165167
finalized_needed_session: None,
168+
is_authority_now: false,
166169
metrics,
167170
}
168171
}
@@ -282,6 +285,9 @@ where
282285
"New session detected",
283286
);
284287
self.last_session_index = Some(session_index);
288+
self.is_authority_now =
289+
ensure_i_am_an_authority(&self.keystore, &session_info.discovery_keys)
290+
.is_ok();
285291
}
286292

287293
// Connect to authorities from the past/present/future.
@@ -705,13 +711,11 @@ where
705711
.resolved_authorities
706712
.iter()
707713
.filter(|(a, _)| !self.connected_authorities.contains_key(a));
708-
// TODO: Make that warning once connectivity issues are fixed (no point in warning, if
709-
// we already know it is broken.
710-
// https://github.com/paritytech/polkadot/issues/3921
711-
if connected_ratio <= LOW_CONNECTIVITY_WARN_THRESHOLD {
712-
gum::debug!(
714+
if connected_ratio <= LOW_CONNECTIVITY_WARN_THRESHOLD && self.is_authority_now {
715+
gum::error!(
713716
target: LOG_TARGET,
714-
"Connectivity seems low, we are only connected to {}% of available validators (see debug logs for details)", connected_ratio
717+
session_index = self.last_session_index.as_ref().map(|s| *s).unwrap_or_default(),
718+
"Connectivity seems low, we are only connected to {connected_ratio}% of available validators (see debug logs for details), if this persists more than a session action needs to be taken"
715719
);
716720
}
717721
let pretty = PrettyAuthorities(unconnected_authorities);

prdoc/pr_9264.prdoc

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
title: 'gossip-support: make low connectivity message an error'
2+
doc:
3+
- audience: Node Dev
4+
description: |-
5+
All is not well when a validator is not properly connected, e.g: of things that might happen:
6+
- Finality might be slightly delay because validator will be no-show because they can't retrieve PoVs to validate approval work: https://github.com/paritytech/polkadot-sdk/issues/8915.
7+
- When they author blocks they won't back things because gossiping of backing statements happen using the grid topology:, e.g blocks authored by validators with a low number of peers:
8+
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frpc-polkadot.helixstreet.io#/explorer/query/26931262
9+
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frpc-polkadot.helixstreet.io#/explorer/query/26931260
10+
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fpolkadot.api.onfinality.io%2Fpublic-ws#/explorer/query/26931334
11+
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fpolkadot-public-rpc.blockops.network%2Fws#/explorer/query/26931314
12+
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fpolkadot-public-rpc.blockops.network%2Fws#/explorer/query/26931292
13+
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fpolkadot-public-rpc.blockops.network%2Fws#/explorer/query/26931447
14+
15+
16+
The problem is seen in `polkadot_parachain_peer_count` metrics, but it seems people are not monitoring that well enough, so let's make it more visible nodes with low connectivity are not working in good conditions.
17+
18+
I also reduced the threshold to 85%, so that we don't trigger this error to eagerly.
19+
crates:
20+
- name: polkadot-gossip-support
21+
bump: patch

0 commit comments

Comments
 (0)