Prevent dispute-coordinator from doing any work before the initial node sync is complete.#6682
Prevent dispute-coordinator from doing any work before the initial node sync is complete.#6682
dispute-coordinator from doing any work before the initial node sync is complete.#6682Conversation
…node sync is complete. On the first `ActiveLeavesUpdate` the subsystem queries the runtime to obtain `RollingSessionWindow`. This often leads to errors because the first leaf update generated by overseer is either the genesis block (when the local database is empty) or the last seen block before the node was stopped. This often leads to `NotSupported` errors when querying the runtime api. The mitigation is to pass a `SyncOracle` instance when constructing `dispute coordinator` and don't do any work until the full sync is complete.
a601ff8 to
0a1cd7f
Compare
|
The CI pipeline was cancelled due to failure one of the required jobs. |
| match ctx.recv().await? { | ||
| FromOrchestra::Signal(OverseerSignal::Conclude) => return Ok(None), | ||
| FromOrchestra::Signal(OverseerSignal::ActiveLeaves(update)) => { | ||
| if sync_oracle.is_major_syncing() { |
There was a problem hiding this comment.
I feel like this could be racy:
- We receive the signal while still syncing, but we catch up quickly afterwards.
is_major_syncingreturns false, although we were syncing when receiving that update.
There was a problem hiding this comment.
TL;DR: This PR is old and I have forgotten to close it in time.
Yes, it is. I was relying on the fact that we don't get ActiveLeaves during major sync (unless there is a reorg) so this should be the initial leaf.
But then I quickly realized this is a bad assumption and thought about putting a flag in ActiveLeavesUpdate indicating if we are in initial major sync.
Then @ordian filed #6694 and we decided to fix the problem more generally by blocking signals in overseer during initial major sync.
The result is #6689 which is still work in progress.
|
Closing this in favor of #6689 |
On the first
ActiveLeavesUpdatethe subsystem queries the runtime to obtainRollingSessionWindow. This often leads to errors because the first leaf update generated by overseer is either the genesis block (when the local database is empty) or the last seen block before the node was stopped. This often leads toNotSupportederrors when querying the runtime api.The mitigation is to pass a
SyncOracleinstance when constructingdispute coordinatorand don't do any work until the full sync is complete.Related to paritytech/polkadot-sdk#793