Skip to content

staking-async: add missing new_session_genesis#8310

Merged
sigurpol merged 8 commits intomasterfrom
multi_phase_default_snapshot_as_fallback
Apr 25, 2025
Merged

staking-async: add missing new_session_genesis#8310
sigurpol merged 8 commits intomasterfrom
multi_phase_default_snapshot_as_fallback

Conversation

@sigurpol
Copy link
Copy Markdown
Contributor

@sigurpol sigurpol commented Apr 23, 2025

Fix issue #8302 (introduced by #8127), where the staking-async module could fail during genesis.

The issue was related to the staking-async module in the Polkadot SDK, specifically with the implementation of the historical::SessionManager trait in the ah-client pallet with missing implementations of the new_session_genesis method in two different places:

  • In the pallet_session::SessionManager<T::AccountId> implementation
  • In the historical::SessionManager<T::AccountId, sp_staking::Exposure<T::AccountId, BalanceOf>> implementation

Note: the SessionManager trait requires the implementation of new_session_genesis for proper functioning, especially during chain initialization.

The pallet-staking-async/ah-client has different operating modes:

  • Passive: Delegates operations to a fallback implementation
  • Buffered: Buffers operations for later processing
  • Active: Performs operations directly

The fix ensures that in Passive mode, the new_session_genesis method correctly delegates to the fallback implementation, while in other modes it returns None.

@sigurpol sigurpol added I2-bug The node fails to follow expected behavior. T1-FRAME This PR/Issue is related to core FRAME, the framework. T10-tests This PR/Issue is related to tests. labels Apr 23, 2025
@sigurpol
Copy link
Copy Markdown
Contributor Author

/cmd prdoc --bump patch

@sigurpol sigurpol force-pushed the multi_phase_default_snapshot_as_fallback branch from b228382 to d762ea4 Compare April 23, 2025 14:06
- Updated the `asap` function to prepare the snapshot for fallback elections
at block zero, ensuring successful execution even at genesis.
- Modified the fallback logic to include a check for the genesis block
alongside the existing runtime benchmarks feature.

This change improves the robustness of the election provider by ensuring
that fallback elections can be executed from the very first block.

In particular, it fixes #8302, a regression introduced by PR #8127.

While the solution has the  benefit to be limited and not invasive, a
better fix would be probably not to rely on the `asap()` method at all
for genesis handling but ensure that the session manager correctly calls
`new_session()` or `new_session_genesis()` respectively.

Note also that the regression described in #8302 does NOT affect any
running chain, but mostly testing when we spin-off a new node (e.g.
look at the related issue in the staking-miner
[here](paritytech/polkadot-staking-miner#1031)).
@sigurpol sigurpol force-pushed the multi_phase_default_snapshot_as_fallback branch from d762ea4 to 63a0b03 Compare April 23, 2025 14:26
@sigurpol sigurpol added the T2-pallets This PR/Issue is related to a particular pallet. label Apr 23, 2025
@sigurpol
Copy link
Copy Markdown
Contributor Author

@kianenigma , @Ank4n , @niklasad1 PTAL 🙏
And also please check that labels and prdoc make sense since I am new to this whole process 🙇

@sigurpol sigurpol requested review from a team and kianenigma April 23, 2025 14:33
@kianenigma
Copy link
Copy Markdown
Contributor

Did you figure out why is_genesis is not set, while it should? This is more of a "root cause" rather than "fix symptoms" :))

@sigurpol
Copy link
Copy Markdown
Contributor Author

Did you figure out why is_genesis is not set, while it should? This is more of a "root cause" rather than "fix symptoms" :))

I know and agree, best I have found is here and I haven't investigated further. This is why I put this part in the commit message

While the solution has the benefit to be limited in scope and not invasive, a better fix would be probably not to rely on the asap() method at all for genesis block but ensure that the session manager correctly calls new_session() or new_session_genesis() respectively.

@sigurpol sigurpol force-pushed the multi_phase_default_snapshot_as_fallback branch from a7ceb4a to 0d6e254 Compare April 24, 2025 10:03
@kianenigma
Copy link
Copy Markdown
Contributor

Sorry I took the liberty to investigate and push in 3525e4f

So the root cause is this. Note that in Westend runtime, type SessionManager = SomeUnrelatedWrapper<AhStakingClient>, which basically means ah-client is the manager, and since it is in OperatingMode::Passive by default, it will pass everything to its type Fallback, which is our good 'ol pallet-staking. Yet, you see the danger of trait auto implementations here: trait SessionManager has an auto-impl for the genesis one, and we forgot about it (cc @Ank4n to also review this).

I am okay with the rest of the changes, but i am sure westend will work fine without them, if you just keep the last commit.

<Self as pallet_session::SessionManager<_>>::start_session(start_index)
}

fn new_session_genesis(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] we also need to cleanup comment at line 530 since we now implement new_session_genesis :)

 Fix issue #8302 (introduced by #8127), where the staking-async
 module could fail during genesis.

 The issue was related to the staking-async module in the Polkadot SDK,
 specifically with the implementation of the `historical::SessionManager`
 trait in the `ah-client` pallet with missing implementations of
 the new_session_genesis method in two different places:
- In the pallet_session::SessionManager<T::AccountId> implementation
- In the historical::SessionManager<T::AccountId, sp_staking::Exposure<T::AccountId, BalanceOf<T>>>
implementation

Note: the SessionManager trait requires the implementation of new_session_genesis for
proper functioning, especially during chain initialization.

The pallet-staking-async/ah-client has different operating modes:

- Passive: Delegates operations to a fallback implementation
- Buffered: Buffers operations for later processing
- Active: Performs operations directly

The fix ensures that in Passive mode, the new_session_genesis method correctly
delegates to the fallback implementation, while in other modes it returns None.
@sigurpol sigurpol changed the title epm: enhance fallback election handling at genesis block staking-async: add missing new_session_genesis Apr 24, 2025
@paritytech-workflow-stopper
Copy link
Copy Markdown

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/14643551981
Failed job name: fmt

The hack in do_elect() in handling genesis block is not needed anymore
now that we have fixed the root issue in the staking-async-ah-client.
@sigurpol
Copy link
Copy Markdown
Contributor Author

@kianenigma , I've reverted the rest of the changes (tested against the usual miner CI's test -> it works fine w/o).
I've also made a small change at your change since it was not compiling.

@sigurpol sigurpol added this pull request to the merge queue Apr 25, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 25, 2025
@sigurpol sigurpol added this pull request to the merge queue Apr 25, 2025
Merged via the queue into master with commit 7e65abb Apr 25, 2025
235 of 244 checks passed
@sigurpol sigurpol deleted the multi_phase_default_snapshot_as_fallback branch April 25, 2025 09:24
wassimans pushed a commit to wassimans/polkadot-sdk that referenced this pull request Apr 27, 2025
Fix issue paritytech#8302 (introduced by paritytech#8127), where the staking-async module
could fail during genesis.

The issue was related to the staking-async module in the Polkadot SDK,
specifically with the implementation of the `historical::SessionManager`
trait in the `ah-client` pallet with missing implementations of the
new_session_genesis method in two different places:
- In the pallet_session::SessionManager<T::AccountId> implementation
- In the historical::SessionManager<T::AccountId,
sp_staking::Exposure<T::AccountId, BalanceOf<T>>> implementation

Note: the SessionManager trait requires the implementation of
new_session_genesis for proper functioning, especially during chain
initialization.

The pallet-staking-async/ah-client has different operating modes:
- Passive: Delegates operations to a fallback implementation
- Buffered: Buffers operations for later processing
- Active: Performs operations directly

The fix ensures that in Passive mode, the new_session_genesis method
correctly delegates to the fallback implementation, while in other modes
it returns None.

---------

Co-authored-by: kianenigma <[email protected]>
ordian added a commit that referenced this pull request Apr 28, 2025
* master: (120 commits)
  [CI] Improve GH build status checking (#8331)
  [CI/CD] Use original PR name in prdoc check for the backport PR's to the stable branches (#8329)
  Add new host APIs set_storage_or_clear and get_storage_or_zero (#7857)
  push to dockerhub (#8322)
  Snowbridge - V1 - Adds 2 hop transfer to Rococo (#7956)
  [AHM] Prepare `election-provider-multi-block` for full lazy data deletion (#8304)
  Check umbrella version (#8250)
  [AHM] Fully bound staking async (#8303)
  migrate parachain-templates tests to `gha` (#8226)
  staking-async: add missing new_session_genesis (#8310)
  New NFT traits: granular and abstract interface (#5620)
  Extract create_pool_with_native_on macro to common crate (#8289)
  XCMP: use batching when enqueuing inbound messages (#8021)
  Snowbridge - Tests refactor (#8014)
  Allow configuration of worst case buy execution weight (#7944)
  Fix faulty pre-upgrade migration check in pallet-session (#8294)
  [pallet-revive] add get_storage_var_key for variable-sized keys (#8274)
  add poke_deposit extrinsic to pallet-recovery (#7882)
  `txpool`: use tracing for structured logging (#8001)
  [revive] eth-rpc refactoring (#8148)
  ...
castillax pushed a commit that referenced this pull request May 12, 2025
Fix issue #8302 (introduced by #8127), where the staking-async module
could fail during genesis.

The issue was related to the staking-async module in the Polkadot SDK,
specifically with the implementation of the `historical::SessionManager`
trait in the `ah-client` pallet with missing implementations of the
new_session_genesis method in two different places:
- In the pallet_session::SessionManager<T::AccountId> implementation
- In the historical::SessionManager<T::AccountId,
sp_staking::Exposure<T::AccountId, BalanceOf<T>>> implementation

Note: the SessionManager trait requires the implementation of
new_session_genesis for proper functioning, especially during chain
initialization.

The pallet-staking-async/ah-client has different operating modes:
- Passive: Delegates operations to a fallback implementation
- Buffered: Buffers operations for later processing
- Active: Performs operations directly

The fix ensures that in Passive mode, the new_session_genesis method
correctly delegates to the fallback implementation, while in other modes
it returns None.

---------

Co-authored-by: kianenigma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

I2-bug The node fails to follow expected behavior. T1-FRAME This PR/Issue is related to core FRAME, the framework. T2-pallets This PR/Issue is related to a particular pallet. T10-tests This PR/Issue is related to tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants