-
Notifications
You must be signed in to change notification settings - Fork 33
WIP: generating rewards certs and adding them to the block footer #582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
76d6e7d to
cfe8f88
Compare
c9cb68d to
3170fc4
Compare
3170fc4 to
f10419e
Compare
|
|
||
| // consensus pool does not need votes for slots other than root slot however the rewards container may still need them. | ||
| if vote_message.vote.slot() <= root_bank.slot() { | ||
| // the only module that takes a write lock on consensus_rewards is this one and it does not take the write lock while it is verifying votes so this should never block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feedback from Brennan: maybe we can do this inline if we can demonstrate that we have enough compute resources to handle the additional verification. This will also require ensuring that the dedup can handle the DDoS vectors we are worried about.
votor/src/consensus_rewards.rs
Outdated
| } | ||
|
|
||
| /// Adds received [`VoteMessage`]s from other nodes. | ||
| pub fn add_vote_message(&mut self, root_slot: Slot, vote: VoteMessage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential optimisation: if we look up how many validators we have for this epoch, then we can detect here if we have received votes from all the validators and pre-generate the certificates in this function instead of build_rewards_certs. build_rewards_certs is more in the critical path than this function is. Not sure how often we will actually hit this case though.
steviez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Briefly glanced. I still need to work through bls sigverifier to be able to give more in depth reviews on approach. Pretty strapped for time at the moment tho
votor/src/consensus_rewards/entry.rs
Outdated
| /// map from validator rank to the skip vote. | ||
| skip: BTreeMap<u16, VoteMessage>, | ||
| /// notar votes are indexed by block id as different validators may vote for different blocks. | ||
| notar: BTreeMap<Hash, BTreeMap<u16, VoteMessage>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub struct ConsensusRewards {
/// Per [`Slot`], stores skip and notar votes.
votes: BTreeMap<Slot, Entry>,
/// Stores the latest pubkey for the current node.
cluster_info: Arc<ClusterInfo>,
/// Stores the leader schedules.
leader_schedule_cache: Arc<LeaderScheduleCache>,
}I have some concerns about allocations, this is several layers deep of BTreeMap nesting. It looks like we'd also be doing this with a lock held
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. What if I switch to HashMap and use with_capacity()? In almost all of the instances, we have a good idea of what the expected size of the containers will be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced most of the btreemaps with hashmaps now. One remains where we are actually relying on the ordering or keys so not so easy to remove.
core/src/block_creation_loop.rs
Outdated
| let handle = std::thread::spawn(move || { | ||
| // XXX: how to look up the slot. | ||
| let slot = u64::MAX; | ||
| consensus_rewards.read().build_rewards_certs(slot) | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't final, but spawning unnamed threads like this is not how we typically like to do stuff in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely, this was just to show what I propose to do. Is there an example that I can follow on how I could do something like this?
core/src/block_creation_loop.rs
Outdated
| replay_stage::{Finalizer, ReplayStage}, | ||
| }, | ||
| crossbeam_channel::Receiver, | ||
| parking_lot::RwLock as PLRwLock, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this over std::sync::RwLock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of reasons:
- My understanding is that parking lot is fairer to writers and ensures that writers do not starve. In particular, if there is a writer waiting, then additional readers also have to wait
- for small critical sections, benchmarks seems to suggest that parking lot does better.
akhi3030
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the feedback Steve. I'll push a change to use HashMap and with_capacity to minimise allocations.
core/src/block_creation_loop.rs
Outdated
| replay_stage::{Finalizer, ReplayStage}, | ||
| }, | ||
| crossbeam_channel::Receiver, | ||
| parking_lot::RwLock as PLRwLock, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of reasons:
- My understanding is that parking lot is fairer to writers and ensures that writers do not starve. In particular, if there is a writer waiting, then additional readers also have to wait
- for small critical sections, benchmarks seems to suggest that parking lot does better.
core/src/block_creation_loop.rs
Outdated
| let handle = std::thread::spawn(move || { | ||
| // XXX: how to look up the slot. | ||
| let slot = u64::MAX; | ||
| consensus_rewards.read().build_rewards_certs(slot) | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absolutely, this was just to show what I propose to do. Is there an example that I can follow on how I could do something like this?
votor/src/consensus_rewards/entry.rs
Outdated
| /// map from validator rank to the skip vote. | ||
| skip: BTreeMap<u16, VoteMessage>, | ||
| /// notar votes are indexed by block id as different validators may vote for different blocks. | ||
| notar: BTreeMap<Hash, BTreeMap<u16, VoteMessage>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. What if I switch to HashMap and use with_capacity()? In almost all of the instances, we have a good idea of what the expected size of the containers will be.
This allows us to use `with_capacity` which hopefully prevents allocations
core/src/block_creation_loop.rs
Outdated
| block_timer: Instant, | ||
| block_timeout: Duration, | ||
| ) -> Result<(), PohRecorderError> { | ||
| // Taking a read lock on consensus_rewards can contend with the write lock in bls_sigverifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feedback from @AshwinSekar: #607 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AshwinSekar : I've reworked the PR to use channels and a dedicated thread now. There is also a "stateless" version of wants_vote that we could call from the BLS sigverifier to do some filtering and avoid cloning all votes. I'll add support for that as a separate commit.
15dc61a to
1785420
Compare
See https://docs.google.com/document/d/1jyycl6EwP8eVqdrYj3tmFHfG5U86SPr91WXG3DvNyWQ/edit?tab=t.0 for a design doc
This is a WIP of how we can generate and add rewards certificates in the block footer.
The approach taken here completely bypasses the consensus pool which means that for the votes relevant for rewards, we are copying them an additional time and aggregating them into certificates an additional time.