Prepare syncing for parallel sync strategies#3224
Conversation
skunert
left a comment
There was a problem hiding this comment.
The logic LGTM! Just left some questions and minor remarks.
There was a problem hiding this comment.
While reading I was asking myself if it was feasible to use a message based system for the individual strategies too. Instead of the syncing engine calling individual on_XY handlers of the strategy, it could pass a message down to the individual strategies, and each strategy decides if it wants to handle it.
Would get rid of strategy knowing which substrategy is interested in the individual results. But yeah this is more an educational question for me (and maybe there are cases were this does not even work), nothing to act upon.
There was a problem hiding this comment.
Firstly, I'd like to comment on the feasibility of a fully async interface with strategies. When we implemented a bidirectional async interface between ProtocolController and Notifications, it turned out that it breaks the constraints of the originally half-synchronous interface (where we call on_XY handlers on ProtocolController and poll it as a stream for actions), requiring some tricks with handling of duplicate messages and discarding some "invalid" messages. We decided to just live with some smaller inconsistencies, because the proper implementation would require complex ACKing system with state machines in both ProtocolController and Notifications with lots of states. In syncing, our goal is to focus on a syncing state machines, and not on a message passing state machines — this is why we got rid of all the polling in the strategies.
On the other hand, what could work is introducing synchronous subscriber-like system, where we call a generic on_event(event: Event) handler that dispatches the events to specific strategies down the tree. Logically, this would be the same as calling specific on_XY handlers, but could replace the manual matches on active strategies with a subscriber-looking system, where a strategy instead registers itself for specific events. I'm not sure though if it's possible to have a "proper" Rust implementation of this without downcasting of abstract events when they reach specific strategies — otherwise, it looks like event matching would just move to strategies with a burden of them knowing about all the event types.
@altonen do you have something to add?
There was a problem hiding this comment.
What Sebastian is proposing is similar to the trait approach I've been harping about. If we store the active strategies in HashMap<StategyKey, Box<dyn Strategy>>, SyncingStrategy could iterate over all active strategies when handling an event and each strategy can decide whether it wants to handle that particular event. Of course if a key is provided, e.g., when a response is received, then SyncingStrategy would only call the specified strategy. This would clean up much of the code in SyncingStrategy and would allow plugging custom syncing implementations.
Like you described in the second paragraph, I don't see how Sebastian's proposal necessarily implies any async code though. I think the code would still work the way it does now but instead of SyncingStrategy checking explicitly if a strategy could be interested in an event, it won't make any assumptions, passes the event to the strategy and if it's not interested, it will just ignore it.
There was a problem hiding this comment.
Yes, the second paragraph captures pretty well what I meant. When we at some point add more strategies with different response handlers we would just add another message instead of adding a new on_XY on Strategy and the concrete implementation.
There was a problem hiding this comment.
If I got it right, what @altonen is proposing implies that all strategies handle all event types. I.e., should provide on_XY handlers for all XY, implementing some generic Strategy trait, even though they may not be interested in all XY.
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
This PR should supersede paritytech#2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running `ChainSync` and `GapSync` as independent strategies, and running `ChainSync` and Sync 2.0 alongside each other. The difference with paritytech#2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore, `PeerPool` is not needed. CC @skunert --------- Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
This PR should supersede #2814 and accomplish the same with less changes. It's needed to run sync strategies in parallel, like running
ChainSyncandGapSyncas independent strategies, and runningChainSyncand Sync 2.0 alongside each other.The difference with #2814 is that we allow simultaneous requests to remote peers initiated by different strategies, as this is not tracked on the remote node in any way. Therefore,
PeerPoolis not needed.Build upon #2467.
CC @skunert