Share peers between syncing strategies#2814
Conversation
For block and state requests. No justification requests yet.
3664ba2 to
02259bc
Compare
Co-authored-by: Aaro Altonen <[email protected]>
This reverts commit 4bb9a7b.
altonen
left a comment
There was a problem hiding this comment.
Did a first pass and left some comments but I will have to go over it again.
I'm not super excited about peer_best_blocks or of the fact that we store peers both in PeerPool and in each strategy and then deal with all the possible inconsistencies that follow from that on runtime. Ideally PeerPool would store strategy-specific data but I don't know how feasible that is and I'll do some testing. I also think we may experience yet unknown issues if GapSync, ChainSync and Sync2, or whatever its name will be, have independent (and potentially differing) views of peers. If peer best and common number were stored in PeerPool, all strategies could query and update them and we'd get rid of peer_best_blocks. What do you think?
I also think allowed_requests has to go because ChainSync is now checking the availability of a peer three times and the likelyhood that at one point during future refactorings one of them will go out of sync with the others is non-zero.
| state: Option<StateStrategy<B>>, | ||
| chain_sync: Option<ChainSync<B, Client>>, | ||
| peer_pool: Arc<Mutex<PeerPool>>, | ||
| peer_best_blocks: HashMap<PeerId, (B::Hash, NumberFor<B>)>, |
There was a problem hiding this comment.
It's needed to seed the peers when switching between the strategies. Otherwise, for example, the state strategy won't be aware of the best hash/number (and won't know what peer to request a state from) until the block announcement is received from a peer.
| #[must_use] | ||
| fn add_peer_inner( | ||
| /// Process new peers assigning proper states and initiating requests. | ||
| fn handle_new_peers( |
There was a problem hiding this comment.
It's unclear to me why this function is needed and why can't the new peers be handled in ChainSync::block_requests()
There was a problem hiding this comment.
I'll look if this can be simplified.
| self.actions.push(ChainSyncAction::CancelRequest { peer_id }); | ||
| self.peer_pool.lock().free_peer(&peer_id); |
There was a problem hiding this comment.
I wonder if here's a race condition. The peer is immediately freed but the cancellation is postponed until SyncingEngine processes the event so if some other strategy selects the peer while the request hasn't been canceled, it could result in two in-flight requests. Maybe the strategy/SyncingStrategy should be notified by SyncingEngine once the request is actually canceled so that freeing the peer is safe.
There was a problem hiding this comment.
It's even worse than that. When another strategy initiates the request, the old one is automatically dropped. And when we finally cancel the request, we can cancel a legitimate request of another strategy.
I'm not super happy with any kind of ACKing and bookkeeping, but it looks like we need to keep track of pending cancellations in every strategy and free peers in PeerPool when something like on_request_cancelled() is heard from SyncingEngine.
There was a problem hiding this comment.
Is there a reason why pending cancellations couldn't be stored in SyncingStrategy and once the request has been canceled, SyncingEngine calls SyncingStrategy which releases the peer?
There was a problem hiding this comment.
This means intercepting request cancellation actions in SyncingStrategy on their way to SyncingEngine. Not the best design ever, but this is probably better then dealing with it in every strategy.
There was a problem hiding this comment.
If we store peer's best and common blocks in PeerPool, we'd get rid of peer_best_blocks, allow SyncingEngine to replace its HashMap<PeerId, Peer<B>> with PeerPool and also allow it to free the peer once the request is canceled.
But now that I'm thinking this whole approach again, is there any fundamental reason why GapSync and ChainSync couldn't send a request to the same peer at the same time? I believe this limitation only applies to the sending end, namely the strategies, and made sense before when everything was one state machine. The request handlers shouldn't care how many requests from the same peer are in the queue. If GapSync and ChainSync are separate, is there a valid reason why they couldn't send simultaneous requests to a peer? We must ascertain that two copies of the same request are not sent because that'd get us banned, implying an existence of some kind of PeerPool for shared data, but is there a reason why they couldn't send two different requests at the same time? So the limitation of one request per peer still applies but now it's per strategy.
There was a problem hiding this comment.
is there any fundamental reason why GapSync and ChainSync couldn't send a request to the same peer at the same time?
Valid point, there is nothing in the block request handler that forbids "simultaneous" requests. But if we get rid of the global PeerPool, we'll need to introduce a way to cancel specific requests, as otherwise strategies could cancel requests of each other.
There was a problem hiding this comment.
Each strategy could be identified by a unique key and when strategy returns StrategyAction::SendRequest { PeerId, Request }, SyncingStrategy would convert it to SyncingAction::SendRequest { PeerId, Key, Request } and PendingResponses would keep track of responses with (PeerId, Key).
There was a problem hiding this comment.
Yes, that's what I had in mind — attaching strategy IDs to requests.
Co-authored-by: Aaro Altonen <[email protected]>
| common_number: Zero::zero(), | ||
| best_hash, | ||
| best_number, | ||
| state: PeerSyncState::New, |
There was a problem hiding this comment.
A new peer and an Available peer are handled differently in the code, and the idea was to postpone peer initialization that can lead to requests (handle_new_peers()) until actions(), so that we can call strategies in the specific order and implement the priorities of reserving the peers (e.g., Sync 2.0, then ChainSync, then GapSync).
| #[must_use] | ||
| fn add_peer_inner( | ||
| /// Process new peers assigning proper states and initiating requests. | ||
| fn handle_new_peers( |
There was a problem hiding this comment.
I'll look if this can be simplified.
|
|
||
| (Some(PeerSyncState::Available), None) | ||
| } else { | ||
| if self.peer_pool.try_reserve_peer(&peer_id) { |
There was a problem hiding this comment.
Nit: Can we move this if to else if one line above? All the indentation in this file makes it a bit hard to follow.
| as the peer is reserved by another syncing strategy.", | ||
| ); | ||
|
|
||
| (None, None) |
There was a problem hiding this comment.
Nit: I think you could directly return None here, then could get rid of the if let Some() few lines below.
| Ok(None) | ||
| }, | ||
| } | ||
| self.allowed_requests.add(&peer_id); |
There was a problem hiding this comment.
Hmm not directly related to this PR, but how does this allowed_requests work? It looks like it contains peers that we could potentially send a block request to. However, I don't fully understand why we reset it regularly to All, like here
There was a problem hiding this comment.
I don't completely understand the logic behind it too, it looks like the only useful thing it's doing is blocking block request during state download in fast sync.
| std::iter::from_fn(move || { | ||
| if let Some((peer, request)) = matcher.next(peers) { | ||
| if let Some((peer_id, request)) = matcher.next(peers, peer_pool) { | ||
| // TODO: reserve the peer in `PeerPool`. |
There was a problem hiding this comment.
Leftover todo?
Edit: Ah, its done already in next call it looks like.
| for mut available_peer in self.peer_pool.lock().available_peers() { | ||
| let peer_id = available_peer.peer_id(); | ||
| if let Some(peer) = self.peers.get_mut(&peer_id) { | ||
| if peer.state.is_available() && peer.common_number >= sync.target_number() { |
There was a problem hiding this comment.
Here we check for peer.state.is_available() even though it is in the peer_pool as available. I understand that the pool is shared between the strategies, but I am not sure whether it is legal for the peer to be available in the peer pool but not in chain_sync.
There was a problem hiding this comment.
Yes, should not be needed.
|
@skunert Thanks for reviewing the PR, but I'm about to publish another one that should completely supersede it. So, please don't spend more time on reviewing this one for now. |
|
Closing in favor of #3224. |
This PR should supersede #2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running `ChainSync` and `GapSync` as independent strategies, and running `ChainSync` and Sync 2.0 alongside each other. The difference with #2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, `PeerPool` is not needed. CC @skunert --------- Co-authored-by: Sebastian Kunert <[email protected]>
This PR should supersede paritytech#2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running `ChainSync` and `GapSync` as independent strategies, and running `ChainSync` and Sync 2.0 alongside each other. The difference with paritytech#2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, `PeerPool` is not needed. CC @skunert --------- Co-authored-by: Sebastian Kunert <[email protected]>
Introduce the ability to share peers between syncing strategies and reserve them for requests. This is needed to run
GapSyncas a separate strategy, and, ultimately, run Sync 2.0 alongside withChainSync.