[WIP] Paged Slashing by Ank4n · Pull Request #6788 · paritytech/polkadot-sdk

Ank4n · 2024-12-07T22:38:03Z

Closes #3610.

TODO

Introduce new config SlashCancellationDuration. Offence then can only be reported for offences in eras between offence_era..(current_era - SlashCancellationDuration). SlashCancellationDuration is reserved for cancelling slashes via governance if needed.
At the time of slash we only note (offence_era, validator, and perbill and slashing_era) and avoid computing slash amounts for each nominator.
Migration of Unapplied Slashes. Use tasks for each page of ErasStakersPaged for the validators that need to be slashed.
Slash computation happens at the time of slashing using the existing ErasStakersPaged storage.
Configuration and Integrity checks to allow at least 2 full era for multi block slashing.

Open questions

1 era should be enough to slash all pages. If there are still some pages not slashed, what should we do with them? Ignore?We probably just need to guarantee a minimum number of blocks for slashed pages.
Since staking has lot of these system tasks, we need to ensure blocks are never overweight. Avoid election and slashing happening at the same time?
KeyOwnerProofSystem (Historical) migration to only validator ids would be complex. If we don't really need it, I would actually like to get rid of it.

Set back the token for the cmd_bot in the backport flow so that it work again, till the new set up will be figured out with the sec team

- change bench to default to old CLI - fix profile to production --------- Co-authored-by: GitHub Action <action@github.com> Co-authored-by: command-bot <>

…6690) After finality started lagging on kusama around 025-11-25 15:55:40 validators started seeing ocassionally this log, when importing votes covering more than one assignment. ``` Possible bug: Vote import failed ``` That happens because the assumption that assignments from the same validator would have the same required routing doesn't hold after you enabled aggression, so you might end up receiving the first assignment then you modify the routing for it in `enable_aggression` then your receive the second assignment and the vote covering both assignments, so the rouing for the first and second assingment wouldn't match and we would fail to import the vote. From the logs I've seen, I don't think this is the reason the network didn't fully recover until the failsafe kicked it, because the votes had been already imported in approval-voting before this error. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

The dependency on `pallet_balances` doesn't seem to be necessary. At least everything compiles for me without it. Removed this dependency and a few others that seem to be left overs. --------- Co-authored-by: GitHub Action <action@github.com>

…#4834) # Description Sending XCM messages to other chains requires paying a "transport fee". This can be paid either: - from `origin` local account if `jit_withdraw = true`, - taken from Holding register otherwise. This currently works for following hops/scenarios: 1. On destination no transport fee needed (only sending costs, not receiving), 2. Local/originating chain: just set JIT=true and fee will be paid from signed account, 3. Intermediary hops - only if intermediary is acting as reserve between two untrusted chains (aka only for `DepositReserveAsset` instruction) - this was fixed in #3142 But now we're seeing more complex asset transfers that are mixing reserve transfers with teleports depending on the involved chains. # Example E.g. transferring DOT between Relay and parachain, but through AH (using AH instead of the Relay chain as parachain's DOT reserve). In the `Parachain --1--> AssetHub --2--> Relay` scenario, DOT has to be reserve-withdrawn in leg `1`, then teleported in leg `2`. On the intermediary hop (AssetHub), `InitiateTeleport` fails to send onward message because of missing transport fees. We also can't rely on `jit_withdraw` because the original origin is lost on the way, and even if it weren't we can't rely on the user having funded accounts on each hop along the way. # Solution/Changes - Charge the transport fee in the executor from the transferred assets (if available), - Only charge from transferred assets if JIT_WITHDRAW was not set, - Only charge from transferred assets if unless using XCMv5 `PayFees` where we do not have this problem. # Testing Added regression tests in emulated transfers. Fixes #4832 Fixes #6637 --------- Signed-off-by: Adrian Catangiu <adrian@parity.io> Co-authored-by: Francisco Aguirre <franciscoaguirreperez@gmail.com>

We removed the `require_weight_at_most` field and later changed it to `fallback_max_weight`. This was to have a fallback when sending a message to v4 chains, which happens in the small time window when chains are upgrading. We originally put no fallback for a message in snowbridge's inbound queue but we should have one. This PR adds it. --------- Co-authored-by: GitHub Action <action@github.com>

Co-authored-by: Francisco Aguirre <franciscoaguirreperez@gmail.com>

Closes: #5551 ## Description With [permissionless lanes PR#4949](#4949), the congestion mechanism based on sending `Transact(report_bridge_status(is_congested))` from `pallet-xcm-bridge-hub` to `pallet-xcm-bridge-hub-router` was replaced with a congestion mechanism that relied on monitoring XCMP queues. However, this approach could cause issues, such as suspending the entire XCMP queue instead of isolating the affected bridge. This PR reverts back to using `report_bridge_status` as before. ## TODO - [x] benchmarks - [x] prdoc ## Follow-up #6231 --------- Co-authored-by: GitHub Action <action@github.com> Co-authored-by: command-bot <> Co-authored-by: Adrian Catangiu <adrian@parity.io>

Close: #5858 --------- Co-authored-by: Bastian Köcher <git@kchr.de>

Fixes: paritytech/ci_cd#1079 Improvements: - switch to github native token creation action - refresh branch in CI from HEAD, to prevent failure - add APP token when pushing, to allow CI to be retriggering by bot

# Description **Understood assignment:** Initial assignment description is in #6194. In order to Simplify the display of commands and ensure they are tested for chain spec builder's `polkadot-sdk` reference docs, find every occurrence of `#[docify::export]` where `process:Command` is used, and replace the use of `process:Command` by `run_cmd!` from the `cmd_lib crate`. --------- Co-authored-by: Iulian Barbu <14218860+iulianbarbu@users.noreply.github.com>

The way we build the messages we need to send to approval-distribution can result in a situation where is we have multiple assignments covered by a coalesced approval, the messages are sent in this order: ASSIGNMENT1, APPROVAL, ASSIGNMENT2, because we iterate over each candidate and add to the queue of messages both the assignment and the approval for that candidate, and when the approval reaches the approval-distribution subsystem it won't be imported and gossiped because one of the assignment for it is not known. So in a network where a lot of nodes are restarting at the same time we could end up in a situation where a set of the nodes correctly received the assignments and approvals before the restart and approve their blocks and don't trigger their assignments. The other set of nodes should receive the assignments and approvals after the restart, but because the approvals never get broacasted anymore because of this bug, the only way they could approve is if other nodes start broadcasting their assignments. I think this bug contribute to the reason the network did not recovered on `25-11-25 15:55:40` after the restarts. Tested this scenario with a `zombienet` where `nodes` are finalising blocks because of aggression and all nodes are restarted at once and confirmed the network lags and doesn't recover before and it does after the fix --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

paritytech-workflow-stopper · 2025-01-20T08:48:58Z

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12864151912
Failed job name: fmt

Ank4n · 2025-02-03T02:45:25Z

moved #7424

…ication (#7424) closes #3610. helps #6344, but need to migrate storage `Offences::Reports` before we can remove exposure dependency in RC pallets. replaces #6788. ## Context Slashing in staking is unbounded currently, which is a major blocker until staking can move to a parachain (AH). ### Current Slashing Process (Unbounded) 1. **Offence Reported** - Offences include multiple validators, each with potentially large exposure pages. - Slashes are **computed immediately** and scheduled for application after **28 eras**. 2. **Slash Applied** - All unapplied slashes are executed in **one block** at the start of the **28th era**. This is an **unbounded operation**. ### Proposed Slashing Process (Bounded) 1. **Offence Queueing** - Offences are **queued** after basic sanity checks. 2. **Paged Offence Processing (Computing Slash)** - Slashes are **computed one validator exposure page at a time**. - **Unapplied slashes** are stored in a **double map**: - **Key 1 (k1):** `EraIndex` - **Key 2 (k2):** `(Validator, SlashFraction, PageIndex)` — a unique identifier for each slash page 3. **Paged Slash Application** - Slashes are **applied one page at a time** across multiple blocks. - Slash application starts at the **27th era** (one era earlier than before) to ensure all slashes are applied **before stakers can unbond** (which starts from era 28 onwards). --- ## Worst-Case Block Calculation for Slash Application ### Polkadot: - **1 era = 24 hours**, **1 block = 6s** → **14,400 blocks/era** - On parachains (**12s blocks**) → **7,200 blocks/era** ### Kusama: - **1 era = 6 hours**, **1 block = 6s** → **3,600 blocks/era** - On parachains (**12s blocks**) → **1,800 blocks/era** ### Worst-Case Assumptions: - **Total stakers:** 40,000 nominators, 1000 validators. (Polkadot currently has ~23k nominators and 500 validators) - **Max slashed:** 50% so 20k nominators, 250 validators. - **Page size:** Validators with multiple page: (512 + 1)/2 = 256 , Validators with single page: 1 ### Calculation: There might be a more accurate way to calculate this worst-case number, and this estimate could be significantly higher than necessary, but it shouldn’t exceed this value. Blocks needed: 250 + 20k/256 = ~330 blocks. ## *Potential Improvement:* - Consider adding an **Offchain Worker (OCW)** task to further optimize slash application in future updates. - Dynamically batch unapplied slashes based on number of nominators in the page, or process until reserved weight limit is exhausted. ---- ## Summary of Changes ### Storage - **New:** - `OffenceQueue` *(StorageDoubleMap)* - **K1:** Era - **K2:** Offending validator account - **V:** `OffenceRecord` - `OffenceQueueEras` *(StorageValue)* - **V:** `BoundedVec<EraIndex, BoundingDuration>` - `ProcessingOffence` *(StorageValue)* - **V:** `(Era, offending validator account, OffenceRecord)` - **Changed:** - `UnappliedSlashes`: - **Old:** `StorageMap<K -> Era, V -> Vec<UnappliedSlash>>` - **New:** `StorageDoubleMap<K1 -> Era, K2 -> (validator_acc, perbill, page_index), V -> UnappliedSlash>` ### Events - **New:** - `SlashComputed { offence_era, slash_era, offender, page }` - `SlashCancelled { slash_era, slash_key, payout }` ### Error - **Changed:** - `InvalidSlashIndex` → Renamed to `InvalidSlashRecord` - **Removed:** - `NotSortedAndUnique` - **Added:** - `EraNotStarted` ### Call - **Changed:** - `cancel_deferred_slash(era, slash_indices: Vec<u32>)` → Now takes `Vec<(validator_acc, slash_fraction, page_index)>` - **New:** - `apply_slash(slash_era, slash_key: (validator_acc, slash_fraction, page_index))` ### Runtime Config - `FullIdentification` is now set to a unit type (`()`) / null identity, replacing the previous exposure type for all runtimes using `pallet_session::historical`. ## TODO - [x] Fixed broken `CancelDeferredSlashes`. - [x] Ensure on_offence called only with validator account for identification everywhere. - [ ] Ensure we never need to read full exposure. - [x] Tests for multi block processing and application of slash. - [x] Migrate UnappliedSlashes - [x] Bench (crude, needs proper bench as followup) - [x] on_offence() - [x] process_offence() - [x] apply_slash() ## Followups (tracker [link](#7596)) - [ ] OCW task to process offence + apply slashes. - [ ] Minimum time for governance to cancel deferred slash. - [ ] Allow root or staking admin to add a custom slash. - [ ] Test HistoricalSession proof works fine with eras before removing exposure as full identity. - [ ] Properly bench offence processing and slashing. - [ ] Handle Offences::Reports migration when removing validator exposure as identity. --------- Co-authored-by: Gonçalo Pestana <g6pestana@gmail.com> Co-authored-by: command-bot <> Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com> Co-authored-by: kianenigma <kian@parity.io> Co-authored-by: Giuseppe Re <giuseppe.re@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…ication (paritytech#7424) closes paritytech#3610. helps paritytech#6344, but need to migrate storage `Offences::Reports` before we can remove exposure dependency in RC pallets. replaces paritytech#6788. ## Context Slashing in staking is unbounded currently, which is a major blocker until staking can move to a parachain (AH). ### Current Slashing Process (Unbounded) 1. **Offence Reported** - Offences include multiple validators, each with potentially large exposure pages. - Slashes are **computed immediately** and scheduled for application after **28 eras**. 2. **Slash Applied** - All unapplied slashes are executed in **one block** at the start of the **28th era**. This is an **unbounded operation**. ### Proposed Slashing Process (Bounded) 1. **Offence Queueing** - Offences are **queued** after basic sanity checks. 2. **Paged Offence Processing (Computing Slash)** - Slashes are **computed one validator exposure page at a time**. - **Unapplied slashes** are stored in a **double map**: - **Key 1 (k1):** `EraIndex` - **Key 2 (k2):** `(Validator, SlashFraction, PageIndex)` — a unique identifier for each slash page 3. **Paged Slash Application** - Slashes are **applied one page at a time** across multiple blocks. - Slash application starts at the **27th era** (one era earlier than before) to ensure all slashes are applied **before stakers can unbond** (which starts from era 28 onwards). --- ## Worst-Case Block Calculation for Slash Application ### Polkadot: - **1 era = 24 hours**, **1 block = 6s** → **14,400 blocks/era** - On parachains (**12s blocks**) → **7,200 blocks/era** ### Kusama: - **1 era = 6 hours**, **1 block = 6s** → **3,600 blocks/era** - On parachains (**12s blocks**) → **1,800 blocks/era** ### Worst-Case Assumptions: - **Total stakers:** 40,000 nominators, 1000 validators. (Polkadot currently has ~23k nominators and 500 validators) - **Max slashed:** 50% so 20k nominators, 250 validators. - **Page size:** Validators with multiple page: (512 + 1)/2 = 256 , Validators with single page: 1 ### Calculation: There might be a more accurate way to calculate this worst-case number, and this estimate could be significantly higher than necessary, but it shouldn’t exceed this value. Blocks needed: 250 + 20k/256 = ~330 blocks. ## *Potential Improvement:* - Consider adding an **Offchain Worker (OCW)** task to further optimize slash application in future updates. - Dynamically batch unapplied slashes based on number of nominators in the page, or process until reserved weight limit is exhausted. ---- ## Summary of Changes ### Storage - **New:** - `OffenceQueue` *(StorageDoubleMap)* - **K1:** Era - **K2:** Offending validator account - **V:** `OffenceRecord` - `OffenceQueueEras` *(StorageValue)* - **V:** `BoundedVec<EraIndex, BoundingDuration>` - `ProcessingOffence` *(StorageValue)* - **V:** `(Era, offending validator account, OffenceRecord)` - **Changed:** - `UnappliedSlashes`: - **Old:** `StorageMap<K -> Era, V -> Vec<UnappliedSlash>>` - **New:** `StorageDoubleMap<K1 -> Era, K2 -> (validator_acc, perbill, page_index), V -> UnappliedSlash>` ### Events - **New:** - `SlashComputed { offence_era, slash_era, offender, page }` - `SlashCancelled { slash_era, slash_key, payout }` ### Error - **Changed:** - `InvalidSlashIndex` → Renamed to `InvalidSlashRecord` - **Removed:** - `NotSortedAndUnique` - **Added:** - `EraNotStarted` ### Call - **Changed:** - `cancel_deferred_slash(era, slash_indices: Vec<u32>)` → Now takes `Vec<(validator_acc, slash_fraction, page_index)>` - **New:** - `apply_slash(slash_era, slash_key: (validator_acc, slash_fraction, page_index))` ### Runtime Config - `FullIdentification` is now set to a unit type (`()`) / null identity, replacing the previous exposure type for all runtimes using `pallet_session::historical`. ## TODO - [x] Fixed broken `CancelDeferredSlashes`. - [x] Ensure on_offence called only with validator account for identification everywhere. - [ ] Ensure we never need to read full exposure. - [x] Tests for multi block processing and application of slash. - [x] Migrate UnappliedSlashes - [x] Bench (crude, needs proper bench as followup) - [x] on_offence() - [x] process_offence() - [x] apply_slash() ## Followups (tracker [link](paritytech#7596)) - [ ] OCW task to process offence + apply slashes. - [ ] Minimum time for governance to cancel deferred slash. - [ ] Allow root or staking admin to add a custom slash. - [ ] Test HistoricalSession proof works fine with eras before removing exposure as full identity. - [ ] Properly bench offence processing and slashing. - [ ] Handle Offences::Reports migration when removing validator exposure as identity. --------- Co-authored-by: Gonçalo Pestana <g6pestana@gmail.com> Co-authored-by: command-bot <> Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com> Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com> Co-authored-by: kianenigma <kian@parity.io> Co-authored-by: Giuseppe Re <giuseppe.re@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Ank4n and others added 27 commits December 7, 2024 22:23

config to set slash cancellation duration

3c8ec97

Merge branch 'master' into ankan/mb-slash

6de531d

Merge branch 'master' into ankan/mb-slash

41e1762

[CI/CD]Revert the token changes in backport flow (#6794)

94fef56

Set back the token for the cmd_bot in the backport flow so that it work again, till the new set up will be figured out with the sec team

Mak cmd swap omnibench (#6769)

1d21c6c

- change bench to default to old CLI - fix profile to production --------- Co-authored-by: GitHub Action <action@github.com> Co-authored-by: command-bot <>

Add fallback_weight to the log (#6782)

9a2f756

Co-authored-by: Francisco Aguirre <franciscoaguirreperez@gmail.com>

Remove AccountKeyring everywhere (#5899)

a4dde05

Close: #5858 --------- Co-authored-by: Bastian Köcher <git@kchr.de>

Let cmd bot to trigger ci on commit (#6813)

3c987b0

Fixes: paritytech/ci_cd#1079 Improvements: - switch to github native token creation action - refresh branch in CI from HEAD, to prevent failure - add APP token when pushing, to allow CI to be retriggering by bot

move test under externalities

f6b593b

add deferred slashes storage and some consts

3db6539

rewrite on_offence

b022f11

prune session to extra slashCancellationDuration

6b4a4f8

start moving disabling code to session

2cf7cbb

add new function offending_validator to session interface

d0c4316

Merge branch 'master' into ankan/mb-slash

d44edcc

Merge branch 'master' into ankan/mb-slash

267bf0d

Merge branch 'master' into ankan/mb-slash

ebcb901

clean prdocs

b8c313f

Merge branch 'master' into ankan/mb-slash

7aaeedd

hitting the slashing span wall :(

45209a7

kianenigma mentioned this pull request Jan 28, 2025

[AHM] Multi-block staking election pallet #7282

Merged

26 tasks

Ank4n closed this Feb 3, 2025

miss-stake mentioned this pull request Feb 7, 2025

[Staking] Bounded Slashing: Paginated Offence Processing & Slash Application #7424

Merged

15 tasks

Ank4n deleted the ankan/mb-slash branch April 18, 2025 07:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Paged Slashing#6788

[WIP] Paged Slashing#6788
Ank4n wants to merge 27 commits intomasterfrom
ankan/mb-slash

Ank4n commented Dec 7, 2024 •

edited

Loading

Uh oh!

paritytech-workflow-stopper bot commented Jan 20, 2025

Uh oh!

Ank4n commented Feb 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

Ank4n commented Dec 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Open questions

Uh oh!

paritytech-workflow-stopper bot commented Jan 20, 2025

Uh oh!

Ank4n commented Feb 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Ank4n commented Dec 7, 2024 •

edited

Loading