runtime: Collect stake delegations only once during epoch activation#8065
runtime: Collect stake delegations only once during epoch activation#8065vadorovsky merged 1 commit intoanza-xyz:masterfrom
Conversation
2b1439a to
5525c4f
Compare
5525c4f to
5917723
Compare
b52cfbf to
3b50554
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8065 +/- ##
========================================
Coverage 81.9% 81.9%
========================================
Files 860 860
Lines 326456 326603 +147
========================================
+ Hits 267624 267784 +160
+ Misses 58832 58819 -13 🚀 New features to boost your workflow:
|
3b50554 to
d5159c1
Compare
|
There is an issue with this PR for epoch_reward_cache. The PR moved the cache check after the computation. Before the PR, the cache was checked before computing rewards in calculate_rewards_and_distribute_vote_rewards. After the PR, the cache is only populated in save_rewards, which happens after the expensive computation. |
And your worry is that it will take more than one slot? Or is there something else you have in mind? To be precise - the computation you're talking about, currently takes around 50ms. And the entire epoch boundary after this change - 330ms. So I think we are fine. The overall goal of my optimizations here is to keep epoch boundary below one slot. |
Yes. we used to have many forks at epoch boundary. And the cache is introduced to avoid computing the rewards again at forks. If we are certain that there is going to be no forks, we can remove the cache. In this Pr, we store to the cache but never read from it. seems a waste. |
ef1f93b to
8ec02be
Compare
| } | ||
|
|
||
| // Calculate rewards from previous epoch and distribute vote rewards | ||
| pub(in crate::bank) fn calculate_rewards_and_distribute_vote_rewards( |
There was a problem hiding this comment.
Hmm well you also split out Bank::store_vote_accounts_partitioned (also inside Bank::save_rewards) from Bank::calculate_rewards_and_distribute_vote_rewards so the core part of vote reward distribution is actually not in there. But I see your point about the other distribution code being in there still. Do you think we could move all of that code into Bank::save_rewards (maybe rename this to distribute_vote_rewards) so that all the code for vote reward distribution is in the same place?
Specifically:
Bank::update_vote_rewards- Capitalization update
And then Bank::create_epoch_rewards_sysvar can be called after Bank::save_rewards.
I don't care a lot about keeping the datapoints ("epoch_rewards" and "epoch-rewards-status-update") consistent but others may disagree.
1d141dd to
4f7bb25
Compare
260508d to
0f0253f
Compare
runtime/src/bank.rs
Outdated
| self.begin_partitioned_rewards( | ||
| parent_slot, | ||
| parent_height, | ||
| parent_epoch, |
There was a problem hiding this comment.
Sorry, the commit with my suggestions had a mistake.. these params are out of order. parent_epoch should be before parent_slot.
Processing new epoch (`Bank::process_new_epoch`) involves collecting
stake delegations twice:
1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake
history entry and refresh vote accounts.
2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used
in `Bank::calculate_stake_vote_rewards` to calculate rewards for
stakers and voters.
The overall time of crossing the epoch boundary is ~519ms:
```
update_epoch_us=519953i
```
Where the two heaviest operations are `collect()`` calls on stake
delegations, each of them taking ~200-220ms.
Reduce that to just one collect by passing the vector 1) with freshly
computed stake history and vote accounts to `Bank::begin_partitioned_rewards`.
This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`.
The new time of crossing the epoch boundary is ~337ms:
```
update_epoch_us=337371i
```
Making that change possible required several refactors:
* Tale `&PointValue` in `Bank::create_epoch_rewards_sysvar`. That makes
it easier to operate on references of `PartitionedRewardsCalculation`.
Copying integers from `PointValue` is cheap and has no visible
performance impact.
* Split `Stakes::activate_epoch`, that was performing calculations and
mutating the cache at the same time. The calculations got split to
`Stakes::calculate_activated_stake` that takes `&self`.
* Add `Stakes::stake_delegations_ves` method. Stake delegations are
stored as hash array mapped trie (HAMT)[0], which means that inserts,
deletions and lookups are average-case O(1) and worst-case O(log n).
However, the performance of iterations is poor due to depth-first
traversal and jumps. Currently it's also impossible to iterate over it
with rayon. That issue is known and handled by converting the HAMT to
a vector with `stakes.stake_delegations.iter().collect()`. Move that
trick to a dedicated method that describes the performance
consequences.
* Add `FilteredStakeDelegation` wrapper type, that wraps a vector of
stake delegations and acts as a lazy iterator that filters out ones
with insufficient stake.
* Split the code dealing with rewards calculation and vote rewards
distribution into separate methods:
* `Bank::calculate_rewards` that takes `&self` and does not acquire
any locks.
* `Bank::begin_partitioned_rewards` that takes `&mut self`, sets
calculation status and creates a sysvar.
* `Bank::distribute_vote_rewards` that stores partitioned rewards and
increases capitalization.
[0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
Fixes: anza-xyz#8282
0f0253f to
b84c5cf
Compare
…nza-xyz#8065) Processing new epoch (`Bank::process_new_epoch`) involves collecting stake delegations twice: 1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake history entry and refresh vote accounts. 2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used in `Bank::calculate_stake_vote_rewards` to calculate rewards for stakers and voters. The overall time of crossing the epoch boundary is ~519ms: ``` update_epoch_us=519953i ``` Where the two heaviest operations are `collect()`` calls on stake delegations, each of them taking ~200-220ms. Reduce that to just one collect by passing the vector 1) with freshly computed stake history and vote accounts to `Bank::begin_partitioned_rewards`. This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`. The new time of crossing the epoch boundary is ~337ms: ``` update_epoch_us=337371i ``` Making that change possible required several refactors: * Tale `&PointValue` in `Bank::create_epoch_rewards_sysvar`. That makes it easier to operate on references of `PartitionedRewardsCalculation`. Copying integers from `PointValue` is cheap and has no visible performance impact. * Split `Stakes::activate_epoch`, that was performing calculations and mutating the cache at the same time. The calculations got split to `Stakes::calculate_activated_stake` that takes `&self`. * Add `Stakes::stake_delegations_ves` method. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with `stakes.stake_delegations.iter().collect()`. Move that trick to a dedicated method that describes the performance consequences. * Add `FilteredStakeDelegation` wrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake. * Split the code dealing with rewards calculation and vote rewards distribution into separate methods: * `Bank::calculate_rewards` that takes `&self` and does not acquire any locks. * `Bank::begin_partitioned_rewards` that takes `&mut self`, sets calculation status and creates a sysvar. * `Bank::distribute_vote_rewards` that stores partitioned rewards and increases capitalization. [0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie Fixes: anza-xyz#8282
|
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
…8065) Processing new epoch (`Bank::process_new_epoch`) involves collecting stake delegations twice: 1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake history entry and refresh vote accounts. 2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used in `Bank::calculate_stake_vote_rewards` to calculate rewards for stakers and voters. The overall time of crossing the epoch boundary is ~519ms: ``` update_epoch_us=519953i ``` Where the two heaviest operations are `collect()`` calls on stake delegations, each of them taking ~200-220ms. Reduce that to just one collect by passing the vector 1) with freshly computed stake history and vote accounts to `Bank::begin_partitioned_rewards`. This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`. The new time of crossing the epoch boundary is ~337ms: ``` update_epoch_us=337371i ``` Making that change possible required several refactors: * Tale `&PointValue` in `Bank::create_epoch_rewards_sysvar`. That makes it easier to operate on references of `PartitionedRewardsCalculation`. Copying integers from `PointValue` is cheap and has no visible performance impact. * Split `Stakes::activate_epoch`, that was performing calculations and mutating the cache at the same time. The calculations got split to `Stakes::calculate_activated_stake` that takes `&self`. * Add `Stakes::stake_delegations_ves` method. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with `stakes.stake_delegations.iter().collect()`. Move that trick to a dedicated method that describes the performance consequences. * Add `FilteredStakeDelegation` wrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake. * Split the code dealing with rewards calculation and vote rewards distribution into separate methods: * `Bank::calculate_rewards` that takes `&self` and does not acquire any locks. * `Bank::begin_partitioned_rewards` that takes `&mut self`, sets calculation status and creates a sysvar. * `Bank::distribute_vote_rewards` that stores partitioned rewards and increases capitalization. [0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie Fixes: #8282 (cherry picked from commit 3a2abd6)
…ation (backport of #8065) (#9321) runtime: Collect stake delegations only once during epoch activation (#8065) Processing new epoch (`Bank::process_new_epoch`) involves collecting stake delegations twice: 1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake history entry and refresh vote accounts. 2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used in `Bank::calculate_stake_vote_rewards` to calculate rewards for stakers and voters. The overall time of crossing the epoch boundary is ~519ms: ``` update_epoch_us=519953i ``` Where the two heaviest operations are `collect()`` calls on stake delegations, each of them taking ~200-220ms. Reduce that to just one collect by passing the vector 1) with freshly computed stake history and vote accounts to `Bank::begin_partitioned_rewards`. This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`. The new time of crossing the epoch boundary is ~337ms: ``` update_epoch_us=337371i ``` Making that change possible required several refactors: * Tale `&PointValue` in `Bank::create_epoch_rewards_sysvar`. That makes it easier to operate on references of `PartitionedRewardsCalculation`. Copying integers from `PointValue` is cheap and has no visible performance impact. * Split `Stakes::activate_epoch`, that was performing calculations and mutating the cache at the same time. The calculations got split to `Stakes::calculate_activated_stake` that takes `&self`. * Add `Stakes::stake_delegations_ves` method. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with `stakes.stake_delegations.iter().collect()`. Move that trick to a dedicated method that describes the performance consequences. * Add `FilteredStakeDelegation` wrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake. * Split the code dealing with rewards calculation and vote rewards distribution into separate methods: * `Bank::calculate_rewards` that takes `&self` and does not acquire any locks. * `Bank::begin_partitioned_rewards` that takes `&mut self`, sets calculation status and creates a sysvar. * `Bank::distribute_vote_rewards` that stores partitioned rewards and increases capitalization. [0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie Fixes: #8282 (cherry picked from commit 3a2abd6) Co-authored-by: Michal R <[email protected]>
Problem
Processing new epoch (
Bank::process_new_epoch) involves collecting stake delegations twice:Stakes::activate_epoch, to create a stake history entry and refresh vote accounts.Bank::filter_stake_delegations, which is then used inBank::calculate_stake_vote_rewardsto calculate rewards for stakers and voters.The overall time of crossing the epoch boundary is ~519ms:
Where the two heaviest operations are
collect()calls on stake delegations, each of them taking ~200-220ms:Summary of Changes
Reduce that to just one collect to a
Vec<(&Pubkey, &StakeAccount)>done on the beginning ofBank::process_new_epochand passing the stake delegations to the other methods.The new time of crossing the epoch boundary is ~337ms:
There is only one heavy
collect()done on stake delegations, which still takes the most of main thread's time. But that's the best we can do while still usingim::HashMap.Making that change possible required several refactors:
&PointValueinBank::create_epoch_rewards_sysvar. That makes it easier to operate on references ofPartitionedRewardsCalculation. Copying integers fromPointValueis cheap and has no visibleperformance impact.
Stakes::activate_epoch, that was performing calculations and mutating the cache at the same time. The calculations got split toStakes::calculate_activated_stakethat takes&self.Stakes::stake_delegations_vesmethod. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector withstakes.stake_delegations.iter().collect(). Move that trick to a dedicated method that describes the performance consequences.FilteredStakeDelegationwrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake.Bank::calculate_rewardsthat takes&selfand does not acquire any locks.Bank::begin_partitioned_rewardsthat takes&mut self, sets calculation status and creates a sysvar.Bank::distribute_vote_rewardsthat stores partitioned rewards and increases capitalization.[0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie
Fixes: #8282