Rework dispute-coordinator to use RuntimeInfo for obtaining session information instead of RollingSessionWindow#6968
Conversation
…it should be an async function
Adjust `dispute-coordinator` initialization to use `RuntimeInfo`
…an be made Remove some fixmes
Rework new session handling code
eskimor
left a comment
There was a problem hiding this comment.
Looks good overall. We can maximize robustness (left comments) by using different relay_parents in pre-filling and actual use where possible. This way if at least one of them works, we are good, which means maximum robustness against errors. (Pruning, migration problems, ..)
BradleyOlson64
left a comment
There was a problem hiding this comment.
Looks good to me. Getting rid of the rolling session window demystifies session logic a lot! Much appreciated.
This is not true. runtime-api cache caches session info also only by session index. So you have a valid point, that making the LRU size 6 is not strictly necessary. I think it is a good idea regardless, as we have better control and more importantly a guarantee*) that the last 6 sessions are indeed cached and not already pruned. *) In the absence of errors. |
I haven't noticed that
I agree, I've modified it already. |
If you look at runtime-api cache for session info, it actually ignores relay parent: polkadot/node/core/runtime-api/src/lib.rs Line 137 in ecad912 |
That's true. I think @eskimor mentioned that somewhere but I had forgotten to edit the description. |
Co-authored-by: ordian <[email protected]>
| match session_idx { | ||
| Ok(session_idx) | ||
| if self.last_consecutive_cached_session.is_none() || | ||
| session_idx > | ||
| self.last_consecutive_cached_session.expect( | ||
| "The first clause explicitly handles `None` case. qed.", | ||
| ) => |
There was a problem hiding this comment.
| match session_idx { | |
| Ok(session_idx) | |
| if self.last_consecutive_cached_session.is_none() || | |
| session_idx > | |
| self.last_consecutive_cached_session.expect( | |
| "The first clause explicitly handles `None` case. qed.", | |
| ) => | |
| let should_cache_session = |session_idx: &SessionIndex| { | |
| self.last_consecutive_cached_session.is_none() || | |
| session_idx > | |
| &self | |
| .last_consecutive_cached_session | |
| .expect("The first clause explicitly handles `None` case. qed.") | |
| }; | |
| match session_idx { | |
| Ok(session_idx) if should_cache_session(&session_idx) => { |
Is it more readable to extract the check as a closure?
| match session_idx { | ||
| Ok(session_idx) | ||
| if self.last_consecutive_cached_session.is_none() || | ||
| session_idx > | ||
| self.last_consecutive_cached_session.expect( | ||
| "The first clause explicitly handles `None` case. qed.", | ||
| ) => |
| gap_in_cache = true; | ||
| } | ||
|
|
||
| if !gap_in_cache { |
There was a problem hiding this comment.
What does last_consecutive_cached_session stand for exactly ? For example if we have this this situation: [ fail ok, ok, ok, ok] ?
There was a problem hiding this comment.
We want the last DISPUTE_WINDOW sessions cached. IF we have to cache sessions 1..5 and for some reason 3 fails, on next ActiveLeaves update we want to retry fetching it and start caching from 3. So in this case last_consecutive_cached_session will be set to 2.
Regarding your question: [ fail ok, ok, ok, ok]. If the failed session is with index X, then last_consecutive_cached_session will be set to x-1.
|
I'll do one final burn in next week and merge. |
|
@tdimitrov can we merge? I will likely have to touch some dispute-coordinator code soon, would be good to have this merged first. |
|
I wanted to do one more burnin but the new changes are small - I'd say it's safe to merge. |
|
bot merge |
* master: malus: dont panic on missing validation data (#6952) Offences Migration v1: Removes `ReportsByKindIndex` (#7114) Fix stalling dispute coordinator. (#7125) Fix rolling session window (#7126) [ci] Update buildah command and version (#7128) Bump assigned_slots params (#6991) XCM: Remote account converter (#6662) Rework `dispute-coordinator` to use `RuntimeInfo` for obtaining session information instead of `RollingSessionWindow` (#6968) Revert default proof size back to 64 KB (#7115)
* master: (39 commits) malus: dont panic on missing validation data (#6952) Offences Migration v1: Removes `ReportsByKindIndex` (#7114) Fix stalling dispute coordinator. (#7125) Fix rolling session window (#7126) [ci] Update buildah command and version (#7128) Bump assigned_slots params (#6991) XCM: Remote account converter (#6662) Rework `dispute-coordinator` to use `RuntimeInfo` for obtaining session information instead of `RollingSessionWindow` (#6968) Revert default proof size back to 64 KB (#7115) update rocksdb to 0.20.1 (#7113) Reduce base proof size weight component to zero (#7081) PVF: Move PVF workers into separate crate (#7101) Companion for #13923 (#7111) update safe call filter (#7080) PVF: Don't dispute on missing artifact (#7011) XCM: Properly set the pricing for the DMP router (#6843) pvf: Update docs for PVF artifacts (#6551) Bump syn from 2.0.14 to 2.0.15 (#7093) Companion for substrate#13771 (#6983) Added Dwellir Nigeria bootnodes. (#7097) ...
Part of #6812
The PR contains two notable changes:
RollingSessionInfousage fromdispute-coordinatorTo use
RuntimeInfoinstead ofRollingSessionWindowtwo problems has to be solved:RuntimeInfocall needs asenderand aparent_hashfor querying the runtime. The first one seems to be always available. The situation with the second one is more complicated.RollingSessionWindowprovides two methods -earliest_sessonandlatest_session. These are the first and the last session cached.RuntimeInfohasn't got such concept so it needs to be emulated (ordispute-coordinatorto be reworked) somehow.Problem 1 is mainly caused by the way
dispute-coordinatoris initialized. It gets the first active leaf as anOptionand twoVecs with scraped onchain votes and participations. On further investigation I realised that these parameters are used only on startup and then they are empty. TheVecs aredrain-ed and theOptionistakeen. So if the subsytem crashes and gets restarted - it starts up with empty data.For this reason I introduce
struct InitialDatacontaining all three of them:It is passed to the
dispute-coordinatorviaOption. So if there is some initialization data (which requires runtime calls) we have got the leaf used to fetch it and we can use it. If there is no initialization data - we are good.Problem 2 is more straightforward.
dispute-coordinatorwill keep track of the sessions by itself. I made one simplification here which I'm not sure is correct (see notable change 2).A few words about caching, which I have misunderstood initially.
RuntimeInfohas its own cache forSessionInfo(and doesn't solely rely on theruintime-apisubsystem cache for runtime calls as I wrongly assumed). This has got a nice side effect. For example in this call:polkadot/node/subsystem-util/src/runtime/mod.rs
Line 153 in 27ddd27
If the session with index
session_indexis in cache therelay_parentparameter doesn't matter as no runtime call is made.This is an improvement over theruntime-apicache where the results for different relay parents are treated as separate keys.TODOs: