Skip to content

Conversation

@wen-coding
Copy link
Contributor

  • Move skip cert into per-slot pool, use Arc so we don't do too many copies
  • Fix stake calculation by caching epoch_stakes_map and epoch_schedule
  • Moved the original tests in skip_pool to certificate_pool and changed API accordingly
  • This should now accept multiple skips per pubkey

@wen-coding wen-coding linked an issue Mar 31, 2025 that may be closed by this pull request
@wen-coding wen-coding changed the title Fix epoch stakes in cert pool Move skip cert to per-slot pool and fix epoch stakes in cert pool Mar 31, 2025
…wise all slots older than construct bank slot are ignored.
// than old_start. Someone may have sent (2, 5) and (3, 7), these two may
// arrive out of order. (2, 5) will not be able to land in cert of slot 3 to 5,
// but it can still land in cert of slot 2.
new_start >= old_start && new_end > old_end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to accept any ordering of votes.
e.g. catching up from snapshot but you receive an all-to-all skip vote from tip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are able to accept votes in any order, but it's good to have definitive order of which one wins if votes can arrive in arbitrary order.
If someone voted (3, 5), then later (4, 7), previously we reject (4, 7), now we say for slot 3 we keep (3, 5) vote in the skip certificate, but for slot 4 and 5 we have two choices:
A. keep (3, 5) which arrives first
B. replace (3, 5) with (4, 7)
The problem of A is that we decide which one to keep by arriving order, so someone could keep sending us (3, 5) and (4, 7) and the transaction in slot 4 and 5 will keep changing. Not end of the world, it's still correct, but we have lots of unnecessary replacements.
So this code uses B, we decide an order, (4, 7) is always better than (3, 5) because 4 is greater than 3 and 7 is greater than 5.
But you are right in that it's a bit arbitrary why (4, 6) is worse than (3, 7). All we need is a deterministic order to reduce needless replacements, the code is still correct in that no one can un-vote skip.
So how about this order:
Because normal validators almost always extend the end, let's compare end first then compare start and leave the one with hopefully wider range so we can hopefully have fewer transactions in the cert:

  1. new_end > old_end: (4, 8) > (3, 5), (2, 8) > (3, 5), (3, 8) > (3, 5)
  2. new_end == old_end, check if max(new_start, local_root) < max(old_start, local_root): if local_root is 6, reject (2, 8) when (3, 8) is available. But (7, 9) > (8, 9) when local_root is 6.
  3. new_end < old_end: reject

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thought about it more, changed to:
(new_end > old_end) || (new_end == old_end && new_start > old_start)

We can argue both ways and replace it with:
(new_end > old_end) || (new_end == old_end && new_start < old_start)

The algorithm just needs a deterministic order to be efficient.

Comment on lines 120 to 130
let current_highest_slot = self
.highest_slot_map
.entry(certificate_type.clone())
.or_default();
if slot > *current_highest_slot {
self.highest_slot_map.insert(certificate_type, slot);
true
} else {
false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be able to simplify this to:

fn set_highest_slot(&mut self, certificate_type: CertificateType, slot: Slot) -> bool {
    let current = self.highest_slot_map.entry(certificate_type)
        .and_modify(|s| {
            if slot > *s {
                *s = slot;
            }
        })
        .or_insert(slot);

    *current == slot
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I want return value to be true only if the entry didn't exist or the original value is smaller to the new value, changed.

certificate
.get_certificate_iter()
.for_each(|(pubkey, entry)| {
tx_set.insert((*pubkey, entry.skip_range()), entry.transaction().clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it possible we would need multiple disjoint skip votes for a specific validator? For instance a validator could vote (1, 4), and then (6, 9), and we might need both votes to make the skip certificate for (1, 9)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's possible, which is why the key to tx_set is (pubkey, skip_range). So we would have
(key1, (1, 4)), tx1
(key1, (6, 9)), tx2
So both tx1 and tx2 would be added to the certificate.

Comment on lines +59 to +98
pub(crate) fn can_accept_new_skip_range(
old_skip_range: Option<(Slot, Slot)>,
new_skip_range: Option<(Slot, Slot)>,
) -> bool {
info!(
"Can accept new skip range: old_skip_range: {:?}, new_skip_range: {:?}",
old_skip_range, new_skip_range
);
if let Some((old_start, old_end)) = old_skip_range {
if let Some((new_start, new_end)) = new_skip_range {
// Because we now use per-slot skip certificate pool, users are never allowed
// to un-vote. If user sent (2, 5) then (3, 6), the skip cert pool for slot 2
// will still have (2, 5) saved. The skip cert for slot 3 to 5 will replace
// (2, 5) with (3, 6), but the user's skip is still recorded.
// But skip votes can arrive out of order (all-to-all vs Gossip), so we need
// a deterministic order to decide whether to replace the old skip range.
// Note that this doesn't affect correctness, it's just optimization to reduce
// unnecessary skip vote replacement in cert pool.
// Because normal validators almost always extend the end, let's compare end
// first then compare start and leave the one with the wider range so we can
// hopefully have fewer transactions in the cert:
// 1.new_end > old_end: (4, 8) > (3, 5), (2, 8) > (3, 5), (3, 8) > (3, 5)
// 2.new_end == old_end, check if new_start > old_start, because hopefully
// larger new_start means this transaction is more recent than the old one.
// Reject if new_start <= old_start. (3, 8) > (2, 8).
// 3.new_end < old_end: reject new one
new_end > old_end || (new_end == old_end && new_start > old_start)
} else {
// We should always replace skip with a skip, so should never happen that new
// skip range is None when old one is Some.
panic!("New skip range should never be None");
}
} else {
// If both are None, this is notarization or finalization, do not accept
// We should always replace skip with a skip, so should never happen that old is
// None and new is Some
assert!(new_skip_range.is_none());
false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlapping skip votes should be slashable and we should reject them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we use per-slot skip pool, And we allow multiple overlapping skip votes per pubkey, the skip range in skip votes are basically abbreviation. What are we trying to achieve by checking for overlapping skip votes? What damage can it do? Do we care?
If we allow overlapping skip votes, then users can safely discard any slot older than local root, even though those slots might have been skipped by the current user. Otherwise users have to remember what he sent for last skip vote, otherwise his new votes might all get rejected. This is not impossible, but it's a bit annoying.

@wen-coding wen-coding requested a review from carllin April 15, 2025 16:04
@wen-coding wen-coding linked an issue Apr 16, 2025 that may be closed by this pull request
@wen-coding wen-coding requested a review from AshwinSekar April 16, 2025 21:32
Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm #157 will handle moving skip to a single slot instead of a range

@wen-coding wen-coding merged commit 6a29080 into anza-xyz:master Apr 16, 2025
7 checks passed
@wen-coding wen-coding deleted the fix_epoch_stakes_in_cert_pool branch April 16, 2025 21:41
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
…za-xyz#123)

Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
Co-authored-by: carllin <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
…za-xyz#123)

Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
Co-authored-by: carllin <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
…za-xyz#123)

Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
Co-authored-by: carllin <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
…za-xyz#123)

Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
Co-authored-by: carllin <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
…za-xyz#123)

Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
Co-authored-by: carllin <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 1, 2025
…za-xyz#123)

Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
Co-authored-by: carllin <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
bw-solana pushed a commit to bw-solana/alpenglow that referenced this pull request Aug 2, 2025
…za-xyz#123)

Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
Co-authored-by: carllin <[email protected]>
Co-authored-by: Ashwin Sekar <[email protected]>
Co-authored-by: ksn6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle skip ranges across epochs Clean old stakers from skip pool every epoch

4 participants