-
Notifications
You must be signed in to change notification settings - Fork 33
remove consensus logic from replay #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
wen-coding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to jump on the plane soon, I'll review rest of it later.
| let leader_fanout = { | ||
| match &vote_op { | ||
| // Alpenglow relies on leaders to propagate votes otherwise we have to rely | ||
| // on gossip, especially true for skip votes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we would send all-to-all as well? Otherwise we can't guarantee 3 delta... We can discuss this in SLC
gossip/src/cluster_info.rs
Outdated
| let mut num_crds_votes = 0; | ||
| let vote_index = { | ||
| let gossip_crds = | ||
| self.time_gossip_read_lock("gossip_read_push_vote", &self.stats.push_vote_read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: push_alpenglow_vote
And maybe give alpenglow vote its own stats?
| self.time_gossip_read_lock("gossip_read_push_vote", &self.stats.push_vote_read); | ||
| (0..MAX_LOCKOUT_HISTORY as u8) | ||
| .filter_map(|ix| { | ||
| let vote = CrdsValueLabel::Vote(ix, self_pubkey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And our own label as well, don't share the same queue with the old votes.
| let vote_index = { | ||
| let gossip_crds = | ||
| self.time_gossip_read_lock("gossip_read_push_vote", &self.stats.push_vote_read); | ||
| (0..MAX_LOCKOUT_HISTORY as u8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 32 enough for Alpenglow? You are sharing the same vote buffer for Notar/Skip/Final, and in the async execution world Final might be further away from Notar.
| let vote: &CrdsData = gossip_crds.get(&vote)?; | ||
| num_crds_votes += 1; | ||
| match &vote { | ||
| CrdsData::Vote(_, vote) => Some((vote.wallclock, ix)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, there was a should_evict_vote check in the original code, we don't need it any more?
Also we need to think carefully whether we should evict the oldest vote. Could all the finalize votes be kicked out because we have too many skip/notar on the newer votes?
I'm inclined to say "please give each type of vote its own queue" now...
core/src/banking_stage.rs
Outdated
| ) -> bool { | ||
| let tpu_bank = bank_forks.write().unwrap().insert(tpu_bank); | ||
| let parent_slot = tpu_bank.parent_slot(); | ||
| info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the info log here? Can we just log at maybe_start_leader?
core/src/replay_stage.rs
Outdated
| .unwrap() | ||
| .root_bank() | ||
| .feature_set | ||
| .activated_slot(&solana_feature_set::alpenglow::id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When validators enter the next Epoch at different speed, would it happen that some validators don't have enough votes for root to enter the new Epoch, so they were forever stuck in the old Epoch? (Because there are no new TowerBFT votes to confirm the fork they are on maybe?)
Should we enable Alpenglow at Epoch N+2 to make sure everyone's on the same page?
Also, we are checking this outside the loop? Could we enter a new Epoch inside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
migration will not be blocked on votes from the previous epoch, you just need to respect lockout and switching rules for all proposed alpenglow slots until at least one alpenglow slot gets a notarization
| let mut process_duplicate_slots_time = Measure::start("process_duplicate_slots"); | ||
| if !tpu_has_bank { | ||
| Self::process_duplicate_slots( | ||
| if !is_alpenglow_migration_complete { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move all non_alpenglow operations into a separate function for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored
core/src/replay_stage.rs
Outdated
| retransmit_not_propagated_time.as_us(), | ||
| ); | ||
| } else { | ||
| // Alpenglow specific logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, move all Alpenglow specific logic into separate function for better readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, refactored
core/src/replay_stage.rs
Outdated
| // Note that like today, if this occurs during your leader slot | ||
| // it will cause you to dump your leader slot. This should be ok because | ||
| // the fact that there is a greater/valid slot than your own must mean there | ||
| // was a skip certificate for your slot, so it's ok to abandon your leader slot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, your comment implies:
if poh_recorder.read().unwrap().start_slot() < highest_frozen_bank.slot() {
but your actual comparison uses !=
Should we reset poh_recorder if start_slot is larger than highest_frozen_bank.slot()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's impossible for start slot > highest_frozen_bank, we only ever start leader banks from parents that are
- notarized
- frozen
which necessarily means the highest frozen bank > start slot, added an assert and a comment
core/src/replay_stage.rs
Outdated
| ); | ||
| retransmit_not_propagated_time.stop(); | ||
| // Try to finalize the highest notarized block | ||
| let highest_notarized_slot = cert_pool.highest_notarized_slot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we finalize the highest non-finalized block instead?
Could it be notarizations keep coming:
Slot 100 notarized -> Didn't get to finalize Slot 100 before Slot 101 is also notarized -> Didn't get to finalize Slot 101 before Slot 102 is also notarized ...
I know we will eventually get all of them finalized, but that might mean Slot 100 finalization takes more than 3 delta.
core/src/replay_stage.rs
Outdated
| let maybe_vote_bank = | ||
| bank_forks.read().unwrap().get(highest_notarized_slot); | ||
| if let Some(vote_bank) = maybe_vote_bank { | ||
| if vote_bank.is_frozen() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If highest_notarized_slot is not frozen, I think we should start repair somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repair will happen in the background like it does today based on votes in gossip, we just need to have the repair structure read the new votes #56
| if authorized_voter_keypairs.is_empty() { | ||
| return GenerateVoteTxResult::NonVoting; | ||
| } | ||
| if let Some(slot) = wait_to_vote_slot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still work in Alpenglow? In PoH if you wait long enough some leader will eventually start building for a future slot X you are waiting on, but I believe now we need to send skip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only used to manipulate tests in local cluster, should be fine
fcf67b2 to
d9bd8cd
Compare
| Self::initiate_alpenglow_migration(poh_recorder, is_alpenglow_migration_complete); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment on line 2254:
Suddenly confused:
if parent_slot < *first_alpenglow_slot.as_ref().unwrap_or(&u64::MAX) {
assert!(parent_bank.is_frozen());
}
Why don't we need to make sure parent_bank is frozen if we are past first alpenglow slot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, this is a bug from moving the start leader logic out of the certificate pool, where we used to check the alpenglow slot is frozen before starting the leader.
Because in alpenglow mode we set the parent_slot to cert_pool.highest_not_skip_certificate_slot(), we can have a certificate for a non frozen slot if our replay is slow. This assert exists in today's code because parent bank is always frozen
We need to return in alpenglow mode if the parent bank isn't frozen. I need to refactor this alpegnlow-specific logic anyways so will do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds perfect. When you refactor, please try to split TowerBFT and Alpenglow logic as much as possible. It's really hard to read the code when the two are mixed together. I know we have to do it somehow for the migration, but hopefully we can at least split things into smaller functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split up the logic in maybe_start_leader()
Problem
Replay lacks alpenglow awareness
Summary of Changes
With this, we can now spin up a single alpenglow leader in a one node cluster and watch it make blocks
This builds on #44 and #43, only the last commit is relevant
Fixes #