Skip to content

Conversation

@carllin
Copy link
Contributor

@carllin carllin commented Feb 18, 2025

Problem

Replay lacks alpenglow awareness

Summary of Changes

  1. Call the Poh migration path from replay, but doesn't handle lockouts/epoch boundaries safely yet
  2. Handles starting a leader in replay
  3. Handles pushing votes from replay

With this, we can now spin up a single alpenglow leader in a one node cluster and watch it make blocks

This builds on #44 and #43, only the last commit is relevant
Fixes #

Copy link
Contributor

@wen-coding wen-coding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to jump on the plane soon, I'll review rest of it later.

let leader_fanout = {
match &vote_op {
// Alpenglow relies on leaders to propagate votes otherwise we have to rely
// on gossip, especially true for skip votes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we would send all-to-all as well? Otherwise we can't guarantee 3 delta... We can discuss this in SLC

let mut num_crds_votes = 0;
let vote_index = {
let gossip_crds =
self.time_gossip_read_lock("gossip_read_push_vote", &self.stats.push_vote_read);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: push_alpenglow_vote

And maybe give alpenglow vote its own stats?

self.time_gossip_read_lock("gossip_read_push_vote", &self.stats.push_vote_read);
(0..MAX_LOCKOUT_HISTORY as u8)
.filter_map(|ix| {
let vote = CrdsValueLabel::Vote(ix, self_pubkey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And our own label as well, don't share the same queue with the old votes.

let vote_index = {
let gossip_crds =
self.time_gossip_read_lock("gossip_read_push_vote", &self.stats.push_vote_read);
(0..MAX_LOCKOUT_HISTORY as u8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 32 enough for Alpenglow? You are sharing the same vote buffer for Notar/Skip/Final, and in the async execution world Final might be further away from Notar.

let vote: &CrdsData = gossip_crds.get(&vote)?;
num_crds_votes += 1;
match &vote {
CrdsData::Vote(_, vote) => Some((vote.wallclock, ix)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, there was a should_evict_vote check in the original code, we don't need it any more?

Also we need to think carefully whether we should evict the oldest vote. Could all the finalize votes be kicked out because we have too many skip/notar on the newer votes?

I'm inclined to say "please give each type of vote its own queue" now...

) -> bool {
let tpu_bank = bank_forks.write().unwrap().insert(tpu_bank);
let parent_slot = tpu_bank.parent_slot();
info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the info log here? Can we just log at maybe_start_leader?

.unwrap()
.root_bank()
.feature_set
.activated_slot(&solana_feature_set::alpenglow::id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When validators enter the next Epoch at different speed, would it happen that some validators don't have enough votes for root to enter the new Epoch, so they were forever stuck in the old Epoch? (Because there are no new TowerBFT votes to confirm the fork they are on maybe?)

Should we enable Alpenglow at Epoch N+2 to make sure everyone's on the same page?

Also, we are checking this outside the loop? Could we enter a new Epoch inside the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migration will not be blocked on votes from the previous epoch, you just need to respect lockout and switching rules for all proposed alpenglow slots until at least one alpenglow slot gets a notarization

let mut process_duplicate_slots_time = Measure::start("process_duplicate_slots");
if !tpu_has_bank {
Self::process_duplicate_slots(
if !is_alpenglow_migration_complete {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move all non_alpenglow operations into a separate function for readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored

retransmit_not_propagated_time.as_us(),
);
} else {
// Alpenglow specific logic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, move all Alpenglow specific logic into separate function for better readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, refactored

// Note that like today, if this occurs during your leader slot
// it will cause you to dump your leader slot. This should be ok because
// the fact that there is a greater/valid slot than your own must mean there
// was a skip certificate for your slot, so it's ok to abandon your leader slot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, your comment implies:
if poh_recorder.read().unwrap().start_slot() < highest_frozen_bank.slot() {
but your actual comparison uses !=
Should we reset poh_recorder if start_slot is larger than highest_frozen_bank.slot()?

Copy link
Contributor Author

@carllin carllin Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's impossible for start slot > highest_frozen_bank, we only ever start leader banks from parents that are

  1. notarized
  2. frozen

which necessarily means the highest frozen bank > start slot, added an assert and a comment

);
retransmit_not_propagated_time.stop();
// Try to finalize the highest notarized block
let highest_notarized_slot = cert_pool.highest_notarized_slot();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we finalize the highest non-finalized block instead?

Could it be notarizations keep coming:
Slot 100 notarized -> Didn't get to finalize Slot 100 before Slot 101 is also notarized -> Didn't get to finalize Slot 101 before Slot 102 is also notarized ...

I know we will eventually get all of them finalized, but that might mean Slot 100 finalization takes more than 3 delta.

let maybe_vote_bank =
bank_forks.read().unwrap().get(highest_notarized_slot);
if let Some(vote_bank) = maybe_vote_bank {
if vote_bank.is_frozen() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If highest_notarized_slot is not frozen, I think we should start repair somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repair will happen in the background like it does today based on votes in gossip, we just need to have the repair structure read the new votes #56

if authorized_voter_keypairs.is_empty() {
return GenerateVoteTxResult::NonVoting;
}
if let Some(slot) = wait_to_vote_slot {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still work in Alpenglow? In PoH if you wait long enough some leader will eventually start building for a future slot X you are waiting on, but I believe now we need to send skip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is only used to manipulate tests in local cluster, should be fine

@carllin carllin marked this pull request as ready for review February 28, 2025 01:09
@carllin carllin changed the title Add simple Alpenglow migration to Replay Add simple Alpenglow start leader logic to Replay Feb 28, 2025
@carllin carllin changed the title Add simple Alpenglow start leader logic to Replay Add Alpenglow start leader logic to Replay Feb 28, 2025
@carllin carllin changed the title Add Alpenglow start leader logic to Replay Add Alpenglow start leader logic to ReplayStage Feb 28, 2025
@carllin carllin force-pushed the master branch 2 times, most recently from fcf67b2 to d9bd8cd Compare March 1, 2025 04:45
Self::initiate_alpenglow_migration(poh_recorder, is_alpenglow_migration_complete);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment on line 2254:
Suddenly confused:

if parent_slot < *first_alpenglow_slot.as_ref().unwrap_or(&u64::MAX) {
            assert!(parent_bank.is_frozen());
}

Why don't we need to make sure parent_bank is frozen if we are past first alpenglow slot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, this is a bug from moving the start leader logic out of the certificate pool, where we used to check the alpenglow slot is frozen before starting the leader.

Because in alpenglow mode we set the parent_slot to cert_pool.highest_not_skip_certificate_slot(), we can have a certificate for a non frozen slot if our replay is slow. This assert exists in today's code because parent bank is always frozen

We need to return in alpenglow mode if the parent bank isn't frozen. I need to refactor this alpegnlow-specific logic anyways so will do that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds perfect. When you refactor, please try to split TowerBFT and Alpenglow logic as much as possible. It's really hard to read the code when the two are mixed together. I know we have to do it somehow for the migration, but hopefully we can at least split things into smaller functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split up the logic in maybe_start_leader()

@carllin carllin merged commit fdf2f25 into anza-xyz:master Mar 3, 2025
7 checks passed
@AshwinSekar AshwinSekar moved this to Pending migration in Alpenglow Jun 13, 2025
@AshwinSekar AshwinSekar changed the title Add Alpenglow start leader logic to ReplayStage remove consensus logic from replay Jun 13, 2025
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 2, 2025
@AshwinSekar AshwinSekar removed this from Alpenglow Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants