Skip to content

[WIP] Paged Slashing#6788

Closed
Ank4n wants to merge 27 commits intomasterfrom
ankan/mb-slash
Closed

[WIP] Paged Slashing#6788
Ank4n wants to merge 27 commits intomasterfrom
ankan/mb-slash

Conversation

@Ank4n
Copy link
Copy Markdown
Contributor

@Ank4n Ank4n commented Dec 7, 2024

Closes #3610.

TODO

  • Introduce new config SlashCancellationDuration. Offence then can only be reported for offences in eras between offence_era..(current_era - SlashCancellationDuration). SlashCancellationDuration is reserved for cancelling slashes via governance if needed.
  • At the time of slash we only note (offence_era, validator, and perbill and slashing_era) and avoid computing slash amounts for each nominator.
  • Migration of Unapplied Slashes. Use tasks for each page of ErasStakersPaged for the validators that need to be slashed.
  • Slash computation happens at the time of slashing using the existing ErasStakersPaged storage.
  • Configuration and Integrity checks to allow at least 2 full era for multi block slashing.

Open questions

  • 1 era should be enough to slash all pages. If there are still some pages not slashed, what should we do with them? Ignore?We probably just need to guarantee a minimum number of blocks for slashed pages.
  • Since staking has lot of these system tasks, we need to ensure blocks are never overweight. Avoid election and slashing happening at the same time?
  • KeyOwnerProofSystem (Historical) migration to only validator ids would be complex. If we don't really need it, I would actually like to get rid of it.

Ank4n and others added 27 commits December 7, 2024 22:23
Set back the token for the cmd_bot in the backport flow so that it work
again, till the new set up will be figured out with the sec team
- change bench to default to old CLI
- fix profile to production

---------

Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: command-bot <>
…6690)

After finality started lagging on kusama around 025-11-25 15:55:40
validators started seeing ocassionally this log, when importing votes
covering more than one assignment.
```
Possible bug: Vote import failed
```

That happens because the assumption that assignments from the same
validator would have the same required routing doesn't hold after you
enabled aggression, so you might end up receiving the first assignment
then you modify the routing for it in `enable_aggression` then your
receive the second assignment and the vote covering both assignments, so
the rouing for the first and second assingment wouldn't match and we
would fail to import the vote.

From the logs I've seen, I don't think this is the reason the network
didn't fully recover until the failsafe kicked it, because the votes had
been already imported in approval-voting before this error.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
The dependency on `pallet_balances` doesn't seem to be necessary. At
least everything compiles for me without it. Removed this dependency and
a few others that seem to be left overs.

---------

Co-authored-by: GitHub Action <action@github.com>
…#4834)

# Description

Sending XCM messages to other chains requires paying a "transport fee".
This can be paid either:
- from `origin` local account if `jit_withdraw = true`,
- taken from Holding register otherwise.

This currently works for following hops/scenarios:
1. On destination no transport fee needed (only sending costs, not
receiving),
2. Local/originating chain: just set JIT=true and fee will be paid from
signed account,
3. Intermediary hops - only if intermediary is acting as reserve between
two untrusted chains (aka only for `DepositReserveAsset` instruction) -
this was fixed in #3142

But now we're seeing more complex asset transfers that are mixing
reserve transfers with teleports depending on the involved chains.

# Example

E.g. transferring DOT between Relay and parachain, but through AH (using
AH instead of the Relay chain as parachain's DOT reserve).

In the `Parachain --1--> AssetHub --2--> Relay` scenario, DOT has to be
reserve-withdrawn in leg `1`, then teleported in leg `2`.
On the intermediary hop (AssetHub), `InitiateTeleport` fails to send
onward message because of missing transport fees. We also can't rely on
`jit_withdraw` because the original origin is lost on the way, and even
if it weren't we can't rely on the user having funded accounts on each
hop along the way.

# Solution/Changes

- Charge the transport fee in the executor from the transferred assets
(if available),
- Only charge from transferred assets if JIT_WITHDRAW was not set,
- Only charge from transferred assets if unless using XCMv5 `PayFees`
where we do not have this problem.

# Testing

Added regression tests in emulated transfers.

Fixes #4832
Fixes #6637

---------

Signed-off-by: Adrian Catangiu <adrian@parity.io>
Co-authored-by: Francisco Aguirre <franciscoaguirreperez@gmail.com>
We removed the `require_weight_at_most` field and later changed it to
`fallback_max_weight`.
This was to have a fallback when sending a message to v4 chains, which
happens in the small time window when chains are upgrading.
We originally put no fallback for a message in snowbridge's inbound
queue but we should have one.
This PR adds it.

---------

Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: Francisco Aguirre <franciscoaguirreperez@gmail.com>
Closes: #5551

## Description

With [permissionless lanes
PR#4949](#4949), the
congestion mechanism based on sending
`Transact(report_bridge_status(is_congested))` from
`pallet-xcm-bridge-hub` to `pallet-xcm-bridge-hub-router` was replaced
with a congestion mechanism that relied on monitoring XCMP queues.
However, this approach could cause issues, such as suspending the entire
XCMP queue instead of isolating the affected bridge. This PR reverts
back to using `report_bridge_status` as before.

## TODO
- [x] benchmarks
- [x] prdoc

## Follow-up

#6231

---------

Co-authored-by: GitHub Action <action@github.com>
Co-authored-by: command-bot <>
Co-authored-by: Adrian Catangiu <adrian@parity.io>
Close: #5858

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
Fixes: paritytech/ci_cd#1079
Improvements:
- switch to github native token creation action
- refresh branch in CI from HEAD, to prevent failure
- add APP token when pushing, to allow CI to be retriggering by bot
# Description

**Understood assignment:**
Initial assignment description is in #6194.
In order to Simplify the display of commands and ensure they are tested
for chain spec builder's `polkadot-sdk` reference docs, find every
occurrence of `#[docify::export]` where `process:Command` is used, and
replace the use of `process:Command` by `run_cmd!` from the `cmd_lib
crate`.

---------

Co-authored-by: Iulian Barbu <14218860+iulianbarbu@users.noreply.github.com>
The way we build the messages we need to send to approval-distribution
can result in a situation where is we have multiple assignments covered
by a coalesced approval, the messages are sent in this order:

ASSIGNMENT1, APPROVAL, ASSIGNMENT2, because we iterate over each
candidate and add to the queue of messages both the assignment and the
approval for that candidate, and when the approval reaches the
approval-distribution subsystem it won't be imported and gossiped
because one of the assignment for it is not known.

So in a network where a lot of nodes are restarting at the same time we
could end up in a situation where a set of the nodes correctly received
the assignments and approvals before the restart and approve their
blocks and don't trigger their assignments. The other set of nodes
should receive the assignments and approvals after the restart, but
because the approvals never get broacasted anymore because of this bug,
the only way they could approve is if other nodes start broadcasting
their assignments.

I think this bug contribute to the reason the network did not recovered
on `25-11-25 15:55:40` after the restarts.

Tested this scenario with a `zombienet` where `nodes` are finalising
blocks because of aggression and all nodes are restarted at once and
confirmed the network lags and doesn't recover before and it does after
the fix

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@paritytech-workflow-stopper
Copy link
Copy Markdown

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12864151912
Failed job name: fmt

@Ank4n
Copy link
Copy Markdown
Contributor Author

Ank4n commented Feb 3, 2025

moved #7424

github-merge-queue bot pushed a commit that referenced this pull request Feb 17, 2025
…ication (#7424)

closes #3610.

helps #6344, but need
to migrate storage `Offences::Reports` before we can remove exposure
dependency in RC pallets.

replaces #6788.

## Context  
Slashing in staking is unbounded currently, which is a major blocker
until staking can move to a parachain (AH).

### Current Slashing Process (Unbounded)  

1. **Offence Reported**  
- Offences include multiple validators, each with potentially large
exposure pages.
- Slashes are **computed immediately** and scheduled for application
after **28 eras**.

2. **Slash Applied**  
- All unapplied slashes are executed in **one block** at the start of
the **28th era**. This is an **unbounded operation**.


### Proposed Slashing Process (Bounded)  

1. **Offence Queueing**  
   - Offences are **queued** after basic sanity checks.  

2. **Paged Offence Processing (Computing Slash)**  
   - Slashes are **computed one validator exposure page at a time**.  
   - **Unapplied slashes** are stored in a **double map**:  
     - **Key 1 (k1):** `EraIndex`  
- **Key 2 (k2):** `(Validator, SlashFraction, PageIndex)` — a unique
identifier for each slash page

3. **Paged Slash Application**  
- Slashes are **applied one page at a time** across multiple blocks.
- Slash application starts at the **27th era** (one era earlier than
before) to ensure all slashes are applied **before stakers can unbond**
(which starts from era 28 onwards).

---

## Worst-Case Block Calculation for Slash Application  

### Polkadot:  
- **1 era = 24 hours**, **1 block = 6s** → **14,400 blocks/era**  
- On parachains (**12s blocks**) → **7,200 blocks/era**  

### Kusama:  
- **1 era = 6 hours**, **1 block = 6s** → **3,600 blocks/era**  
- On parachains (**12s blocks**) → **1,800 blocks/era**  

### Worst-Case Assumptions:  
- **Total stakers:** 40,000 nominators, 1000 validators. (Polkadot
currently has ~23k nominators and 500 validators)
- **Max slashed:** 50% so 20k nominators, 250 validators.  
- **Page size:** Validators with multiple page: (512 + 1)/2 = 256 ,
Validators with single page: 1

### Calculation:  
There might be a more accurate way to calculate this worst-case number,
and this estimate could be significantly higher than necessary, but it
shouldn’t exceed this value.

Blocks needed: 250 + 20k/256 = ~330 blocks.

##  *Potential Improvement:*  
- Consider adding an **Offchain Worker (OCW)** task to further optimize
slash application in future updates.
- Dynamically batch unapplied slashes based on number of nominators in
the page, or process until reserved weight limit is exhausted.

----
## Summary of Changes  

### Storage  
- **New:**  
  - `OffenceQueue` *(StorageDoubleMap)*  
    - **K1:** Era  
    - **K2:** Offending validator account  
    - **V:** `OffenceRecord`  
  - `OffenceQueueEras` *(StorageValue)*  
    - **V:** `BoundedVec<EraIndex, BoundingDuration>`  
  - `ProcessingOffence` *(StorageValue)*  
    - **V:** `(Era, offending validator account, OffenceRecord)`  

- **Changed:**  
  - `UnappliedSlashes`:  
    - **Old:** `StorageMap<K -> Era, V -> Vec<UnappliedSlash>>`  
- **New:** `StorageDoubleMap<K1 -> Era, K2 -> (validator_acc, perbill,
page_index), V -> UnappliedSlash>`

### Events  
- **New:**  
  - `SlashComputed { offence_era, slash_era, offender, page }`  
  - `SlashCancelled { slash_era, slash_key, payout }`  

### Error  
- **Changed:**  
  - `InvalidSlashIndex` → Renamed to `InvalidSlashRecord`  
- **Removed:**  
  - `NotSortedAndUnique`  
- **Added:**  
  - `EraNotStarted`  

### Call  
- **Changed:**  
  - `cancel_deferred_slash(era, slash_indices: Vec<u32>)`  
    → Now takes `Vec<(validator_acc, slash_fraction, page_index)>`  
- **New:**  
- `apply_slash(slash_era, slash_key: (validator_acc, slash_fraction,
page_index))`

### Runtime Config  
- `FullIdentification` is now set to a unit type (`()`) / null identity,
replacing the previous exposure type for all runtimes using
`pallet_session::historical`.

## TODO
- [x] Fixed broken `CancelDeferredSlashes`.
- [x] Ensure on_offence called only with validator account for
identification everywhere.
- [ ] Ensure we never need to read full exposure.
- [x] Tests for multi block processing and application of slash.
- [x] Migrate UnappliedSlashes 
- [x] Bench (crude, needs proper bench as followup)
  - [x] on_offence()
  - [x] process_offence()
  - [x] apply_slash()
 
 
## Followups (tracker
[link](#7596))
- [ ] OCW task to process offence + apply slashes.
- [ ] Minimum time for governance to cancel deferred slash.
- [ ] Allow root or staking admin to add a custom slash.
- [ ] Test HistoricalSession proof works fine with eras before removing
exposure as full identity.
- [ ] Properly bench offence processing and slashing.
- [ ] Handle Offences::Reports migration when removing validator
exposure as identity.

---------

Co-authored-by: Gonçalo Pestana <g6pestana@gmail.com>
Co-authored-by: command-bot <>
Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com>
Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com>
Co-authored-by: kianenigma <kian@parity.io>
Co-authored-by: Giuseppe Re <giuseppe.re@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
clangenb pushed a commit to clangenb/polkadot-sdk that referenced this pull request Feb 19, 2025
…ication (paritytech#7424)

closes paritytech#3610.

helps paritytech#6344, but need
to migrate storage `Offences::Reports` before we can remove exposure
dependency in RC pallets.

replaces paritytech#6788.

## Context  
Slashing in staking is unbounded currently, which is a major blocker
until staking can move to a parachain (AH).

### Current Slashing Process (Unbounded)  

1. **Offence Reported**  
- Offences include multiple validators, each with potentially large
exposure pages.
- Slashes are **computed immediately** and scheduled for application
after **28 eras**.

2. **Slash Applied**  
- All unapplied slashes are executed in **one block** at the start of
the **28th era**. This is an **unbounded operation**.


### Proposed Slashing Process (Bounded)  

1. **Offence Queueing**  
   - Offences are **queued** after basic sanity checks.  

2. **Paged Offence Processing (Computing Slash)**  
   - Slashes are **computed one validator exposure page at a time**.  
   - **Unapplied slashes** are stored in a **double map**:  
     - **Key 1 (k1):** `EraIndex`  
- **Key 2 (k2):** `(Validator, SlashFraction, PageIndex)` — a unique
identifier for each slash page

3. **Paged Slash Application**  
- Slashes are **applied one page at a time** across multiple blocks.
- Slash application starts at the **27th era** (one era earlier than
before) to ensure all slashes are applied **before stakers can unbond**
(which starts from era 28 onwards).

---

## Worst-Case Block Calculation for Slash Application  

### Polkadot:  
- **1 era = 24 hours**, **1 block = 6s** → **14,400 blocks/era**  
- On parachains (**12s blocks**) → **7,200 blocks/era**  

### Kusama:  
- **1 era = 6 hours**, **1 block = 6s** → **3,600 blocks/era**  
- On parachains (**12s blocks**) → **1,800 blocks/era**  

### Worst-Case Assumptions:  
- **Total stakers:** 40,000 nominators, 1000 validators. (Polkadot
currently has ~23k nominators and 500 validators)
- **Max slashed:** 50% so 20k nominators, 250 validators.  
- **Page size:** Validators with multiple page: (512 + 1)/2 = 256 ,
Validators with single page: 1

### Calculation:  
There might be a more accurate way to calculate this worst-case number,
and this estimate could be significantly higher than necessary, but it
shouldn’t exceed this value.

Blocks needed: 250 + 20k/256 = ~330 blocks.

##  *Potential Improvement:*  
- Consider adding an **Offchain Worker (OCW)** task to further optimize
slash application in future updates.
- Dynamically batch unapplied slashes based on number of nominators in
the page, or process until reserved weight limit is exhausted.

----
## Summary of Changes  

### Storage  
- **New:**  
  - `OffenceQueue` *(StorageDoubleMap)*  
    - **K1:** Era  
    - **K2:** Offending validator account  
    - **V:** `OffenceRecord`  
  - `OffenceQueueEras` *(StorageValue)*  
    - **V:** `BoundedVec<EraIndex, BoundingDuration>`  
  - `ProcessingOffence` *(StorageValue)*  
    - **V:** `(Era, offending validator account, OffenceRecord)`  

- **Changed:**  
  - `UnappliedSlashes`:  
    - **Old:** `StorageMap<K -> Era, V -> Vec<UnappliedSlash>>`  
- **New:** `StorageDoubleMap<K1 -> Era, K2 -> (validator_acc, perbill,
page_index), V -> UnappliedSlash>`

### Events  
- **New:**  
  - `SlashComputed { offence_era, slash_era, offender, page }`  
  - `SlashCancelled { slash_era, slash_key, payout }`  

### Error  
- **Changed:**  
  - `InvalidSlashIndex` → Renamed to `InvalidSlashRecord`  
- **Removed:**  
  - `NotSortedAndUnique`  
- **Added:**  
  - `EraNotStarted`  

### Call  
- **Changed:**  
  - `cancel_deferred_slash(era, slash_indices: Vec<u32>)`  
    → Now takes `Vec<(validator_acc, slash_fraction, page_index)>`  
- **New:**  
- `apply_slash(slash_era, slash_key: (validator_acc, slash_fraction,
page_index))`

### Runtime Config  
- `FullIdentification` is now set to a unit type (`()`) / null identity,
replacing the previous exposure type for all runtimes using
`pallet_session::historical`.

## TODO
- [x] Fixed broken `CancelDeferredSlashes`.
- [x] Ensure on_offence called only with validator account for
identification everywhere.
- [ ] Ensure we never need to read full exposure.
- [x] Tests for multi block processing and application of slash.
- [x] Migrate UnappliedSlashes 
- [x] Bench (crude, needs proper bench as followup)
  - [x] on_offence()
  - [x] process_offence()
  - [x] apply_slash()
 
 
## Followups (tracker
[link](paritytech#7596))
- [ ] OCW task to process offence + apply slashes.
- [ ] Minimum time for governance to cancel deferred slash.
- [ ] Allow root or staking admin to add a custom slash.
- [ ] Test HistoricalSession proof works fine with eras before removing
exposure as full identity.
- [ ] Properly bench offence processing and slashing.
- [ ] Handle Offences::Reports migration when removing validator
exposure as identity.

---------

Co-authored-by: Gonçalo Pestana <g6pestana@gmail.com>
Co-authored-by: command-bot <>
Co-authored-by: Kian Paimani <5588131+kianenigma@users.noreply.github.com>
Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com>
Co-authored-by: kianenigma <kian@parity.io>
Co-authored-by: Giuseppe Re <giuseppe.re@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@Ank4n Ank4n deleted the ankan/mb-slash branch April 18, 2025 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

[NPoS] Pagify slashing