-
Notifications
You must be signed in to change notification settings - Fork 33
skip_pool: add skip_certified check, multi cert support & fix bugs #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .iter() | ||
| .any(|range| range.contains(&slot)) | ||
| { | ||
| // If we are already have a certificate no reason to rescan (potentially costly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the assumption here is if a slot is already part of a skip certificate then it's fine to not update, as once part of a skip certificate it cannot be removed. Also the up_to_date is to avoid scanning the range every time a new vote is added, however if you feel it's clearer we can mimic the behavior in query
generally I think we need to have protections around ingesting all-to-all / gossip when catching up as it could overwrite votes from replay that are necessary to skip certify "old" slots leading to catch up stalling.
i don't know the best way to deal with this yet, will explore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally I think we need to have protections around ingesting all-to-all / gossip when catching up as it could overwrite votes from replay that are necessary to skip certify "old" slots leading to catch up stalling.
Each block in replay should have the skip/notarization certificates necessary to prove block replay, so checking the vote states should be sufficient to prove the block is valid, shouldn't require checking the skip pool here I think.
As long as we're getting valid blocks, catch up shouldn't stall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true, just trying to figure out the best way to query.
If we're catching up, we should only look at the skip certificates from replayed votes, however if we're at tip then we should ingest all-to-all / gossip as well. We can't really tell which scenario we're in so I think we could:
- Maintain a separate skip pool with only votes from replay, check the replay skip pool first and then this general skip pool
- Tag votes with their source here and allow storage of 1 vote per source per validator
Open to any better suggestions haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense
- For new blocks, check vote state. A separate skip pool could work, or we could run a simple algorithm that scans across each vote state and checks if the most recent skip vote covers the entire range (M, N) where M is the parent and N is this replayed block. If so, increment the stake counter.
- We stream the latest votes during replay into this main skip pool
| let certs = self | ||
| .tree | ||
| .iter() | ||
| .scan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be less nesting here to use a filter_map:
let mut accumulated = 0f64;
let mut current_contributors = BTreeSet::new();
let mut cert: Option<(Slot, BTreeSet<Pubkey>)> = None;
let certs = self.tree.iter().filter_map(move |(slot, (starts, ends))| {
// Once we hit threshold return the cert
if accumulated < threshold_stake {
let (start_slot, contributors)) = cert.take().expect("must have some contributors");
return Some((start_slot..=*slot, contributors));
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, got too eager trying to skyline 81b4b02
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lmao, write all your code in one line
| .iter() | ||
| .any(|range| range.contains(&slot)) | ||
| { | ||
| // If we are already have a certificate no reason to rescan (potentially costly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally I think we need to have protections around ingesting all-to-all / gossip when catching up as it could overwrite votes from replay that are necessary to skip certify "old" slots leading to catch up stalling.
Each block in replay should have the skip/notarization certificates necessary to prove block replay, so checking the vote states should be sufficient to prove the block is valid, shouldn't require checking the skip pool here I think.
As long as we're getting valid blocks, catch up shouldn't stall.
| if accumulated <= threshold_stake { | ||
| if let Some((start_slot, contributors)) = ¤t_cert { | ||
| // Skip certificate has ended, reset and publish | ||
| certs.push(((*start_slot, *slot), contributors.clone())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the threshold never falls below threshold_stake before the loop ends, we should still push the current cert if it exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can actually happen right, since every vote has to end accumulated is guaranteed to be <= threshold_stake before the loop ends.
Added a debug assert for this case 4c08dbe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah, we separate out the ends, makes sense
Problem
The replay voting loop needs the ability to query if a certain slot is
skip_certifiedCertain situations can cause multiple skip range certificates to be available in the skip pool (see
test_multi_certfor an example). In these scenarios we need the ability to query all skip certs to see if the voting loop slot isskip_certified.Current skip pool impl also has incorrect behavior around consecutive slots and contributors (see
test_consecutive_slotsandtest_contributor_removed)Summary of Changes
Add functions for
skip_certified, and a general multi cert scan fn. Does not remove thequeryimplementation, will potentially remove when integrating the voting loop / maybe start leader.