Skip to content

Commit d3204cb

Browse files
alexgghjpserrat
authored andcommitted
[1 / 5] Optimize logic for gossiping assignments (paritytech#4848)
This is part of the work to further optimize the approval subsystems, if you want to understand the full context start with reading paritytech#4849 (comment), however that's not necessary, as this change is self-contained and nodes would benefit from it regardless of subsequent changes landing or not. While testing with 1000 validators I found out that the logic for determining the validators an assignment should be gossiped to is taking a lot of time, because it always iterated through all the peers, to determine which are X and Y neighbours and to which we should randomly gossip(4 samples). This could be actually optimised, so we don't have to iterate through all peers for each new assignment, by fetching the list of X and Y peer ids from the topology first and then stopping the loop once we took the 4 random samples. With this improvements we reduce the total CPU time spent in approval-distribution with 15% on networks with 500 validators and 20% on networks with 1000 validators. ## Test coverage: `propagates_assignments_along_unshared_dimension` and `propagates_locally_generated_assignment_to_both_dimensions` cover already logic and they passed, confirm that there is no breaking change. Additionally, the approval voting benchmark measure the traffic sent to other peers, so I confirmed that for various network size there is no difference in the size of the traffic sent to other peers. --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
1 parent e8af1d7 commit d3204cb

4 files changed

Lines changed: 65 additions & 13 deletions

File tree

polkadot/node/network/approval-distribution/src/lib.rs

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1431,6 +1431,21 @@ impl State {
14311431
let required_routing = topology.map_or(RequiredRouting::PendingTopology, |t| {
14321432
t.local_grid_neighbors().required_routing_by_index(validator_index, local)
14331433
});
1434+
// Peers that we will send the assignment to.
1435+
let mut peers = HashSet::new();
1436+
1437+
let peers_to_route_to = topology
1438+
.as_ref()
1439+
.map(|t| t.peers_to_route(required_routing))
1440+
.unwrap_or_default();
1441+
1442+
for peer in peers_to_route_to {
1443+
if !entry.known_by.contains_key(&peer) {
1444+
continue
1445+
}
1446+
1447+
peers.insert(peer);
1448+
}
14341449

14351450
// All the peers that know the relay chain block.
14361451
let peers_to_filter = entry.known_by();
@@ -1456,20 +1471,13 @@ impl State {
14561471
let n_peers_total = self.peer_views.len();
14571472
let source_peer = source.peer_id();
14581473

1459-
// Peers that we will send the assignment to.
1460-
let mut peers = Vec::new();
1461-
14621474
// Filter destination peers
14631475
for peer in peers_to_filter.into_iter() {
14641476
if Some(peer) == source_peer {
14651477
continue
14661478
}
14671479

1468-
if let Some(true) = topology
1469-
.as_ref()
1470-
.map(|t| t.local_grid_neighbors().route_to_peer(required_routing, &peer))
1471-
{
1472-
peers.push(peer);
1480+
if peers.contains(&peer) {
14731481
continue
14741482
}
14751483

@@ -1485,7 +1493,11 @@ impl State {
14851493

14861494
if route_random {
14871495
approval_entry.routing_info_mut().mark_randomly_sent(peer);
1488-
peers.push(peer);
1496+
peers.insert(peer);
1497+
}
1498+
1499+
if approval_entry.routing_info().random_routing.is_complete() {
1500+
break
14891501
}
14901502
}
14911503

polkadot/node/network/approval-distribution/src/tests.rs

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2404,7 +2404,7 @@ fn propagates_locally_generated_assignment_to_both_dimensions() {
24042404
let assignments = vec![(cert.clone(), candidate_index)];
24052405
let approvals = vec![approval.clone()];
24062406

2407-
let assignment_sent_peers = assert_matches!(
2407+
let mut assignment_sent_peers = assert_matches!(
24082408
overseer_recv(overseer).await,
24092409
AllMessages::NetworkBridgeTx(NetworkBridgeTxMessage::SendValidationMessage(
24102410
sent_peers,
@@ -2428,12 +2428,14 @@ fn propagates_locally_generated_assignment_to_both_dimensions() {
24282428
assert_matches!(
24292429
overseer_recv(overseer).await,
24302430
AllMessages::NetworkBridgeTx(NetworkBridgeTxMessage::SendValidationMessage(
2431-
sent_peers,
2431+
mut sent_peers,
24322432
Versioned::V1(protocol_v1::ValidationProtocol::ApprovalDistribution(
24332433
protocol_v1::ApprovalDistributionMessage::Approvals(sent_approvals)
24342434
))
24352435
)) => {
24362436
// Random sampling is reused from the assignment.
2437+
sent_peers.sort();
2438+
assignment_sent_peers.sort();
24372439
assert_eq!(sent_peers, assignment_sent_peers);
24382440
assert_eq!(sent_approvals, approvals);
24392441
}
@@ -2678,7 +2680,7 @@ fn propagates_to_required_after_connect() {
26782680
let assignments = vec![(cert.clone(), candidate_index)];
26792681
let approvals = vec![approval.clone()];
26802682

2681-
let assignment_sent_peers = assert_matches!(
2683+
let mut assignment_sent_peers = assert_matches!(
26822684
overseer_recv(overseer).await,
26832685
AllMessages::NetworkBridgeTx(NetworkBridgeTxMessage::SendValidationMessage(
26842686
sent_peers,
@@ -2702,12 +2704,14 @@ fn propagates_to_required_after_connect() {
27022704
assert_matches!(
27032705
overseer_recv(overseer).await,
27042706
AllMessages::NetworkBridgeTx(NetworkBridgeTxMessage::SendValidationMessage(
2705-
sent_peers,
2707+
mut sent_peers,
27062708
Versioned::V1(protocol_v1::ValidationProtocol::ApprovalDistribution(
27072709
protocol_v1::ApprovalDistributionMessage::Approvals(sent_approvals)
27082710
))
27092711
)) => {
27102712
// Random sampling is reused from the assignment.
2713+
sent_peers.sort();
2714+
assignment_sent_peers.sort();
27112715
assert_eq!(sent_peers, assignment_sent_peers);
27122716
assert_eq!(sent_approvals, approvals);
27132717
}

polkadot/node/network/protocol/src/grid_topology.rs

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -313,6 +313,23 @@ impl SessionGridTopologyEntry {
313313
self.topology.is_validator(peer)
314314
}
315315

316+
/// Returns the list of peers to route based on the required routing.
317+
pub fn peers_to_route(&self, required_routing: RequiredRouting) -> Vec<PeerId> {
318+
match required_routing {
319+
RequiredRouting::All => self.topology.peer_ids.iter().copied().collect(),
320+
RequiredRouting::GridX => self.local_neighbors.peers_x.iter().copied().collect(),
321+
RequiredRouting::GridY => self.local_neighbors.peers_y.iter().copied().collect(),
322+
RequiredRouting::GridXY => self
323+
.local_neighbors
324+
.peers_x
325+
.iter()
326+
.chain(self.local_neighbors.peers_y.iter())
327+
.copied()
328+
.collect(),
329+
RequiredRouting::None | RequiredRouting::PendingTopology => Vec::new(),
330+
}
331+
}
332+
316333
/// Updates the known peer ids for the passed authorities ids.
317334
pub fn update_authority_ids(
318335
&mut self,
@@ -524,6 +541,11 @@ impl RandomRouting {
524541
pub fn inc_sent(&mut self) {
525542
self.sent += 1
526543
}
544+
545+
/// Returns `true` if we already took all the necessary samples.
546+
pub fn is_complete(&self) -> bool {
547+
self.sent >= self.target
548+
}
527549
}
528550

529551
/// Routing mode

prdoc/pr_4848.prdoc

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
title: Optimize logic for gossiping assignments
2+
3+
doc:
4+
- audience: Node Dev
5+
description: |
6+
Optimize the logic for gossiping assignments by obtaining the list of peer ids
7+
from the topology instead of iterating through all connected validators, this
8+
gives us a 15% to 20% reduction in cpu usage.
9+
10+
crates:
11+
- name: polkadot-approval-distribution
12+
bump: minor
13+
- name: polkadot-node-network-protocol
14+
bump: minor

0 commit comments

Comments
 (0)