Do not run unneeded subsystems on collator and its alongside node#3061
Do not run unneeded subsystems on collator and its alongside node#3061s0me0ne-unkn0wn merged 10 commits intomasterfrom
Conversation
|
The CI pipeline was cancelled due to failure one of the required jobs. |
ordian
left a comment
There was a problem hiding this comment.
I quite like this! At the cost of some minor code duplication, we get rid of some hacky conditional logic and gain more flexibility in instantiating different type of overseer for collators.
Noticed a couple of things that can be cleaned up further, but other than that LGTM.
|
Nice! Do you see anything speaking against reusing this also for the minimal collator node? In the end we have the nearly identical overseer here again: |
Yeah, that's a good point, and that's what I tried first. It seemed to be quite logical but came out as usual. The current "full scale" overseer is implemented through the |
|
looks like cumulus pov recovery test is failing, @skunert do you know why wouldn't it work with minimal overseer? |
|
@ordian looks like I forgot to include something required 🤔 |
|
Fixed it |
skunert
left a comment
There was a problem hiding this comment.
Yeah, that's a good point, and that's what I tried first. It seemed to be quite logical but came out as usual. The current "full scale" overseer is implemented through the OverseerBuilder which is generic over RuntimeClient, and over DefaultSubsystemClient as well. They are two different levels of API implementation and work fine like that.
Hmm I have not taken a deep dive on that code, but on first sight it looks like DefaultSubsystemClient only implements the single trait RuntimeApiSubsystemClient. So why does the overseer building codepath even need to know about it. Can we not modify that code to just take some type that implements RuntimeApiSubsystemClient (and iirc AuxStore and ChainApiBackend). This is satisfied by both BlockChainRpcClient and DefaultSubsystemClient.
But anyway, I don't insist that this needs to be investigated right now. Code looks good 👍
|
@skunert makes sense! I'll try to follow up on that. |
resolve #3116 a follow-up on #3061 (review): - [x] reuse collator overseer builder for polkadot-node and collator - [x] run zombienet test (0001-parachains-smoke-test.toml) - [x] make wasm build errors more user-friendly for an easier problem detection when using different toolchains in Rust --------- Co-authored-by: ordian <[email protected]> Co-authored-by: s0me0ne-unkn0wn <[email protected]>
resolve paritytech#3116 a follow-up on paritytech#3061 (review): - [x] reuse collator overseer builder for polkadot-node and collator - [x] run zombienet test (0001-parachains-smoke-test.toml) - [x] make wasm build errors more user-friendly for an easier problem detection when using different toolchains in Rust --------- Co-authored-by: ordian <[email protected]> Co-authored-by: s0me0ne-unkn0wn <[email protected]>
Currently, collators and their alongside nodes spin up a full-scale overseer running a bunch of subsystems that are not needed if the node is not a validator. That was considered to be harmless; however, we've got problems with unused subsystems getting stalled for a reason not currently known, resulting in the overseer exiting and bringing down the whole node.
This PR aims to only run needed subsystems on such nodes, replacing the rest with
DummySubsystem.It also enables collator-optimized availability recovery subsystem implementation.
Partially solves #1730.