runtime: Collect stake delegations only once during epoch activation by vadorovsky · Pull Request #8065 · anza-xyz/agave

vadorovsky · 2025-09-16T11:59:14Z

Problem

Processing new epoch (Bank::process_new_epoch) involves collecting stake delegations twice:

In Stakes::activate_epoch, to create a stake history entry and refresh vote accounts.
In Bank::filter_stake_delegations, which is then used in Bank::calculate_stake_vote_rewards to calculate rewards for stakers and voters.

The overall time of crossing the epoch boundary is ~519ms:

update_epoch_us=519953i

Where the two heaviest operations are collect() calls on stake delegations, each of them taking ~200-220ms:

Summary of Changes

Reduce that to just one collect to a Vec<(&Pubkey, &StakeAccount)> done on the beginning of Bank::process_new_epoch and passing the stake delegations to the other methods.

The new time of crossing the epoch boundary is ~337ms:

update_epoch_us=337371i

There is only one heavy collect() done on stake delegations, which still takes the most of main thread's time. But that's the best we can do while still using im::HashMap.

Making that change possible required several refactors:

Tale &PointValue in Bank::create_epoch_rewards_sysvar. That makes it easier to operate on references of PartitionedRewardsCalculation. Copying integers from PointValue is cheap and has no visible
performance impact.
Split Stakes::activate_epoch, that was performing calculations and mutating the cache at the same time. The calculations got split to Stakes::calculate_activated_stake that takes &self.
Add Stakes::stake_delegations_ves method. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with stakes.stake_delegations.iter().collect(). Move that trick to a dedicated method that describes the performance consequences.
Add FilteredStakeDelegation wrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake.
Split the code dealing with rewards calculation and vote rewards distribution into separate methods:
- Bank::calculate_rewards that takes &self and does not acquire any locks.
- Bank::begin_partitioned_rewards that takes &mut self, sets calculation status and creates a sysvar.
- Bank::distribute_vote_rewards that stores partitioned rewards and increases capitalization.

[0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie

Fixes: #8282

codecov-commenter · 2025-10-03T10:41:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.9%. Comparing base (0f761dc) to head (b84c5cf).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #8065    +/-   ##
========================================
  Coverage    81.9%    81.9%            
========================================
  Files         860      860            
  Lines      326456   326603   +147     
========================================
+ Hits       267624   267784   +160     
+ Misses      58832    58819    -13

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

HaoranYi · 2025-10-06T19:02:58Z

There is an issue with this PR for epoch_reward_cache.

The PR moved the cache check after the computation. Before the PR, the cache was checked before computing rewards in calculate_rewards_and_distribute_vote_rewards. After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

vadorovsky · 2025-10-06T21:55:49Z

After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

And your worry is that it will take more than one slot? Or is there something else you have in mind?

To be precise - the computation you're talking about, currently takes around 50ms. And the entire epoch boundary after this change - 330ms. So I think we are fine. The overall goal of my optimizations here is to keep epoch boundary below one slot.

HaoranYi · 2025-10-07T14:22:03Z

After the PR, the cache is only populated in save_rewards, which happens after the expensive computation.

And your worry is that it will take more than one slot? Or is there something else you have in mind?

To be precise - the computation you're talking about, currently takes around 50ms. And the entire epoch boundary after this change - 330ms. So I think we are fine. The overall goal of my optimizations here is to keep epoch boundary below one slot.

Yes. we used to have many forks at epoch boundary. And the cache is introduced to avoid computing the rewards again at forks. If we are certain that there is going to be no forks, we can remove the cache. In this Pr, we store to the cache but never read from it. seems a waste.

runtime/src/bank.rs

jstarry · 2025-11-05T17:30:43Z

runtime/src/bank/partitioned_epoch_rewards/calculation.rs

+    }
+
+    // Calculate rewards from previous epoch and distribute vote rewards
+    pub(in crate::bank) fn calculate_rewards_and_distribute_vote_rewards(


Hmm well you also split out Bank::store_vote_accounts_partitioned (also inside Bank::save_rewards) from Bank::calculate_rewards_and_distribute_vote_rewards so the core part of vote reward distribution is actually not in there. But I see your point about the other distribution code being in there still. Do you think we could move all of that code into Bank::save_rewards (maybe rename this to distribute_vote_rewards) so that all the code for vote reward distribution is in the same place?

Specifically:

Bank::update_vote_rewards

Capitalization update

And then Bank::create_epoch_rewards_sysvar can be called after Bank::save_rewards.

I don't care a lot about keeping the datapoints ("epoch_rewards" and "epoch-rewards-status-update") consistent but others may disagree.

jstarry

This looks correct to me. I added a comment with some more suggested refactorings but this is fine as is already. Nice work!

jstarry · 2025-11-07T11:07:39Z

runtime/src/bank.rs

+        self.begin_partitioned_rewards(
+            parent_slot,
+            parent_height,
+            parent_epoch,


Sorry, the commit with my suggestions had a mistake.. these params are out of order. parent_epoch should be before parent_slot.

Processing new epoch (`Bank::process_new_epoch`) involves collecting stake delegations twice: 1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake history entry and refresh vote accounts. 2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used in `Bank::calculate_stake_vote_rewards` to calculate rewards for stakers and voters. The overall time of crossing the epoch boundary is ~519ms: ``` update_epoch_us=519953i ``` Where the two heaviest operations are `collect()`` calls on stake delegations, each of them taking ~200-220ms. Reduce that to just one collect by passing the vector 1) with freshly computed stake history and vote accounts to `Bank::begin_partitioned_rewards`. This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`. The new time of crossing the epoch boundary is ~337ms: ``` update_epoch_us=337371i ``` Making that change possible required several refactors: * Tale `&PointValue` in `Bank::create_epoch_rewards_sysvar`. That makes it easier to operate on references of `PartitionedRewardsCalculation`. Copying integers from `PointValue` is cheap and has no visible performance impact. * Split `Stakes::activate_epoch`, that was performing calculations and mutating the cache at the same time. The calculations got split to `Stakes::calculate_activated_stake` that takes `&self`. * Add `Stakes::stake_delegations_ves` method. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with `stakes.stake_delegations.iter().collect()`. Move that trick to a dedicated method that describes the performance consequences. * Add `FilteredStakeDelegation` wrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake. * Split the code dealing with rewards calculation and vote rewards distribution into separate methods: * `Bank::calculate_rewards` that takes `&self` and does not acquire any locks. * `Bank::begin_partitioned_rewards` that takes `&mut self`, sets calculation status and creates a sysvar. * `Bank::distribute_vote_rewards` that stores partitioned rewards and increases capitalization. [0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie Fixes: anza-xyz#8282

…nza-xyz#8065) Processing new epoch (`Bank::process_new_epoch`) involves collecting stake delegations twice: 1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake history entry and refresh vote accounts. 2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used in `Bank::calculate_stake_vote_rewards` to calculate rewards for stakers and voters. The overall time of crossing the epoch boundary is ~519ms: ``` update_epoch_us=519953i ``` Where the two heaviest operations are `collect()`` calls on stake delegations, each of them taking ~200-220ms. Reduce that to just one collect by passing the vector 1) with freshly computed stake history and vote accounts to `Bank::begin_partitioned_rewards`. This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`. The new time of crossing the epoch boundary is ~337ms: ``` update_epoch_us=337371i ``` Making that change possible required several refactors: * Tale `&PointValue` in `Bank::create_epoch_rewards_sysvar`. That makes it easier to operate on references of `PartitionedRewardsCalculation`. Copying integers from `PointValue` is cheap and has no visible performance impact. * Split `Stakes::activate_epoch`, that was performing calculations and mutating the cache at the same time. The calculations got split to `Stakes::calculate_activated_stake` that takes `&self`. * Add `Stakes::stake_delegations_ves` method. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with `stakes.stake_delegations.iter().collect()`. Move that trick to a dedicated method that describes the performance consequences. * Add `FilteredStakeDelegation` wrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake. * Split the code dealing with rewards calculation and vote rewards distribution into separate methods: * `Bank::calculate_rewards` that takes `&self` and does not acquire any locks. * `Bank::begin_partitioned_rewards` that takes `&mut self`, sets calculation status and creates a sysvar. * `Bank::distribute_vote_rewards` that stores partitioned rewards and increases capitalization. [0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie Fixes: anza-xyz#8282

mergify · 2025-11-27T15:49:39Z

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

…8065) Processing new epoch (`Bank::process_new_epoch`) involves collecting stake delegations twice: 1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake history entry and refresh vote accounts. 2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used in `Bank::calculate_stake_vote_rewards` to calculate rewards for stakers and voters. The overall time of crossing the epoch boundary is ~519ms: ``` update_epoch_us=519953i ``` Where the two heaviest operations are `collect()`` calls on stake delegations, each of them taking ~200-220ms. Reduce that to just one collect by passing the vector 1) with freshly computed stake history and vote accounts to `Bank::begin_partitioned_rewards`. This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`. The new time of crossing the epoch boundary is ~337ms: ``` update_epoch_us=337371i ``` Making that change possible required several refactors: * Tale `&PointValue` in `Bank::create_epoch_rewards_sysvar`. That makes it easier to operate on references of `PartitionedRewardsCalculation`. Copying integers from `PointValue` is cheap and has no visible performance impact. * Split `Stakes::activate_epoch`, that was performing calculations and mutating the cache at the same time. The calculations got split to `Stakes::calculate_activated_stake` that takes `&self`. * Add `Stakes::stake_delegations_ves` method. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with `stakes.stake_delegations.iter().collect()`. Move that trick to a dedicated method that describes the performance consequences. * Add `FilteredStakeDelegation` wrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake. * Split the code dealing with rewards calculation and vote rewards distribution into separate methods: * `Bank::calculate_rewards` that takes `&self` and does not acquire any locks. * `Bank::begin_partitioned_rewards` that takes `&mut self`, sets calculation status and creates a sysvar. * `Bank::distribute_vote_rewards` that stores partitioned rewards and increases capitalization. [0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie Fixes: #8282 (cherry picked from commit 3a2abd6)

…ation (backport of #8065) (#9321) runtime: Collect stake delegations only once during epoch activation (#8065) Processing new epoch (`Bank::process_new_epoch`) involves collecting stake delegations twice: 1) In `Bank::compute_new_epoch_caches_and_rewards`, to create a stake history entry and refresh vote accounts. 2) In `Bank::get_epoch_reward_calculate_param_info`, which is then used in `Bank::calculate_stake_vote_rewards` to calculate rewards for stakers and voters. The overall time of crossing the epoch boundary is ~519ms: ``` update_epoch_us=519953i ``` Where the two heaviest operations are `collect()`` calls on stake delegations, each of them taking ~200-220ms. Reduce that to just one collect by passing the vector 1) with freshly computed stake history and vote accounts to `Bank::begin_partitioned_rewards`. This way, we can avoid calling `Bank::get_epoch_reward_calculate_param_info`. The new time of crossing the epoch boundary is ~337ms: ``` update_epoch_us=337371i ``` Making that change possible required several refactors: * Tale `&PointValue` in `Bank::create_epoch_rewards_sysvar`. That makes it easier to operate on references of `PartitionedRewardsCalculation`. Copying integers from `PointValue` is cheap and has no visible performance impact. * Split `Stakes::activate_epoch`, that was performing calculations and mutating the cache at the same time. The calculations got split to `Stakes::calculate_activated_stake` that takes `&self`. * Add `Stakes::stake_delegations_ves` method. Stake delegations are stored as hash array mapped trie (HAMT)[0], which means that inserts, deletions and lookups are average-case O(1) and worst-case O(log n). However, the performance of iterations is poor due to depth-first traversal and jumps. Currently it's also impossible to iterate over it with rayon. That issue is known and handled by converting the HAMT to a vector with `stakes.stake_delegations.iter().collect()`. Move that trick to a dedicated method that describes the performance consequences. * Add `FilteredStakeDelegation` wrapper type, that wraps a vector of stake delegations and acts as a lazy iterator that filters out ones with insufficient stake. * Split the code dealing with rewards calculation and vote rewards distribution into separate methods: * `Bank::calculate_rewards` that takes `&self` and does not acquire any locks. * `Bank::begin_partitioned_rewards` that takes `&mut self`, sets calculation status and creates a sysvar. * `Bank::distribute_vote_rewards` that stores partitioned rewards and increases capitalization. [0] https://en.wikipedia.org/wiki/Hash_array_mapped_trie Fixes: #8282 (cherry picked from commit 3a2abd6) Co-authored-by: Michal R <[email protected]>

vadorovsky mentioned this pull request Sep 16, 2025

runtime: Avoid redundant collections of stake delegations into a vector #7770

Closed

vadorovsky force-pushed the epoch-one-iteration branch 5 times, most recently from 2b1439a to 5525c4f Compare September 23, 2025 11:54

vadorovsky force-pushed the epoch-one-iteration branch from 5525c4f to 5917723 Compare September 29, 2025 12:51

vadorovsky changed the title ~~runtime: Iterate over stake delegations only once during epoch activation~~ runtime: Collect stake delegations only once during epoch activation Sep 29, 2025

vadorovsky force-pushed the epoch-one-iteration branch 12 times, most recently from b52cfbf to 3b50554 Compare October 3, 2025 09:59

vadorovsky marked this pull request as ready for review October 3, 2025 10:50

vadorovsky requested review from HaoranYi, alessandrod, jstarry and t-nelson October 3, 2025 10:52

vadorovsky force-pushed the epoch-one-iteration branch from 3b50554 to d5159c1 Compare October 4, 2025 08:15

vadorovsky marked this pull request as ready for review November 3, 2025 17:30

vadorovsky force-pushed the epoch-one-iteration branch 3 times, most recently from ef1f93b to 8ec02be Compare November 5, 2025 08:15

vadorovsky requested a review from jstarry November 5, 2025 09:47

jstarry reviewed Nov 5, 2025

View reviewed changes

vadorovsky force-pushed the epoch-one-iteration branch 2 times, most recently from 1d141dd to 4f7bb25 Compare November 6, 2025 13:45

jstarry previously approved these changes Nov 6, 2025

View reviewed changes

vadorovsky dismissed jstarry’s stale review via baa1a55 November 7, 2025 09:18

vadorovsky force-pushed the epoch-one-iteration branch 3 times, most recently from 260508d to 0f0253f Compare November 7, 2025 09:29

vadorovsky requested a review from jstarry November 7, 2025 11:00

jstarry reviewed Nov 7, 2025

View reviewed changes

vadorovsky force-pushed the epoch-one-iteration branch from 0f0253f to b84c5cf Compare November 7, 2025 11:53

jstarry approved these changes Nov 7, 2025

View reviewed changes

vadorovsky added this pull request to the merge queue Nov 7, 2025

Merged via the queue into anza-xyz:master with commit 3a2abd6 Nov 7, 2025
44 checks passed

vadorovsky deleted the epoch-one-iteration branch November 7, 2025 13:29

mergify bot mentioned this pull request Nov 19, 2025

v3.1: runtime: Test epoch rewards cache for multiple forks (backport of #8802) #9148

Merged

vadorovsky added the v3.1 Backport to v3.1 branch label Nov 27, 2025

mergify bot mentioned this pull request Nov 27, 2025

v3.1: runtime: Collect stake delegations only once during epoch activation (backport of #8065) #9321

Merged

vadorovsky mentioned this pull request Feb 23, 2026

runtime: bench stakes cache #10760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: Collect stake delegations only once during epoch activation#8065

runtime: Collect stake delegations only once during epoch activation#8065
vadorovsky merged 1 commit intoanza-xyz:masterfrom
vadorovsky:epoch-one-iteration

vadorovsky commented Sep 16, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 3, 2025 •

edited

Loading

Uh oh!

HaoranYi commented Oct 6, 2025

Uh oh!

vadorovsky commented Oct 6, 2025 •

edited

Loading

Uh oh!

HaoranYi commented Oct 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

jstarry Nov 5, 2025 •

edited

Loading

Uh oh!

jstarry left a comment •

edited

Loading

Uh oh!

jstarry Nov 7, 2025

Uh oh!

Uh oh!

mergify bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

vadorovsky commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

HaoranYi commented Oct 6, 2025

Uh oh!

vadorovsky commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HaoranYi commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jstarry Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jstarry left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jstarry Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vadorovsky commented Sep 16, 2025 •

edited

Loading

codecov-commenter commented Oct 3, 2025 •

edited

Loading

vadorovsky commented Oct 6, 2025 •

edited

Loading

HaoranYi commented Oct 7, 2025 •

edited

Loading

jstarry Nov 5, 2025 •

edited

Loading

jstarry left a comment •

edited

Loading