Add availability-recovery from systematic chunks by alindima · Pull Request #1644 · paritytech/polkadot-sdk

alindima · 2023-09-20T11:08:38Z

Don't look at the commit history, it's confusing, as this branch is based on another branch that was merged

Fixes #598
Also implements RFC #47

Description

Availability-recovery now first attempts to request the systematic chunks for large POVs (which are the first ~n/3 chunks, which can recover the full data without doing the costly reed-solomon decoding process). This has a fallback of recovering from all chunks, if for some reason the process fails. Additionally, backers are also used as a backup for requesting the systematic chunks if the assigned validator is not offering the chunk (each backer is only used for one systematic chunk, to not overload them).
- Quite obviously, recovering from systematic chunks is much faster than recovering from regular chunks (4000% faster as measured on my apple M2 Pro).
Introduces a ValidatorIndex -> ChunkIndex mapping which is different for every core, in order to avoid only querying the first n/3 validators over and over again in the same session. The mapping is the one described in RFC 47.
The mapping is feature-gated by the NodeFeatures runtime API so that it can only be enabled via a governance call once a sufficient majority of validators have upgraded their client. If the feature is not enabled, the mapping will be the identity mapping and backwards-compatibility will be preserved.
Adds a new chunk request protocol version (v2), which adds the ChunkIndex to the response. This may or may not be checked against the expected chunk index. For av-distribution and systematic recovery, this will be checked, but for regular recovery, no. This is backwards compatible. First, a v2 request is attempted. If that fails during protocol negotiation, v1 is used.
Systematic recovery is only attempted during approval-voting, where we have easy access to the core_index. For disputes and collator pov_recovery, regular chunk requests are used, just as before.

Performance results

Some results from subsystem-bench:

with regular chunk recovery: CPU usage per block 39.82s
with recovery from backers: CPU usage per block 16.03s
with systematic recovery: CPU usage per block 19.07s

End-to-end results here: #598 (comment)

TODO:

…rategies

Signed-off-by: alindima <[email protected]>

…atic-chunks-av-recovery

…vailability-recovery-strategies

this avoids sorting the chunks for systematic recovery, which is the hotpath we aim to optimise

when querying the runtime api

…vailability-recovery-strategies

…ecovery-strategies' into alindima/add-systematic-chunks-av-recovery

paritytech-cicd-pr · 2024-05-24T11:40:11Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6283397

…atic-chunks-av-recovery

this is needed in order to trigger chunk requests because PoVs smaller than 4Mibs will prefer fetching from backers

…atic-chunks-av-recovery

burdges · 2024-05-29T06:32:11Z

We should still explore switching to https://github.com/AndersTrier/reed-solomon-simd since it seems much faster than our own reed-solomn crate.

each backer is only used for one systematic chunk, to not overload them

We could explore relaxing this restriction somewhat, after we know how things respond to this change in the real networks. Also this 1 depends somewhat upon the number of validators.

4000% faster as measured on my apple M2 Pro

Just a nit pick, we still need all approval checkers to rerun the encoding to check the encoding merkle root, so while the decoding becomes almost nothing, the overall availability system remains expensive.

sandreim · 2024-05-29T07:45:44Z

I don't really think we overload the backers, but we do try backers as first option, and if that fails we try systematic chunks. Going back to backers doesn't really makes much sense in this setup.

sandreim · 2024-05-29T07:46:29Z

We should still explore switching to https://github.com/AndersTrier/reed-solomon-simd since it seems much faster than our own reed-solomn crate.

noted: #605 (comment)

alindima · 2024-05-29T08:08:56Z

I don't really think we overload the backers, but we do try backers as first option, and if that fails we try systematic chunks. Going back to backers doesn't really makes much sense in this setup.

Current strategy is to try backers first if pov size is higher than 1 Mib. Otherwise, try systematic recovery (if feature is enabled). During systematic recovery, if a few validators did not give us the right chunk, we try to retrieve these chunks from the backers also (we only try each backer once). This is so that we may still do systematic recovery if one or a few validators are not responding

Just a nit pick, we still need all approval checkers to rerun the encoding to check the encoding merkle root, so while the decoding becomes almost nothing, the overall availability system remains expensive.

Of course. I discovered though that decoding is much slower than encoding. This PR, coupled with some other optimisations I did in our reed-solomon code still enable us to drop the CPU consumption for erasure coding from ~20% to 5% (as measured on versi, on a network with 50 validators and 10 glutton parachains).

* master: (93 commits) Fix broken windows build (#4636) Beefy client generic on aduthority Id (#1816) pallet-staking: Put tests behind `cfg(debug_assertions)` (#4620) Broker new price adapter (#4521) Change `XcmDryRunApi::dry_run_extrinsic` to take a call instead (#4621) Update README.md (#4623) Publish `chain-spec-builder` (#4518) Add omni bencher & chain-spec-builder bins to release (#4557) Moves runtime macro out of experimental flag (#4249) Filter workspace dependencies in the templates (#4599) parachain-inherent: Make `para_id` more prominent (#4555) Add metric to measure the time it takes to gather enough assignments (#4587) Improve On_demand_assigner events (#4339) Conditional `required` checks (#4544) [CI] Deny adding git deps (#4572) [subsytem-bench] Remove redundant banchmark_name param (#4540) Add availability-recovery from systematic chunks (#1644) Remove workspace lints from templates (#4598) `sc-chain-spec`: deprecated code removed (#4410) [subsystem-benchmarks] Add statement-distribution benchmarks (#3863) ...

**Don't look at the commit history, it's confusing, as this branch is based on another branch that was merged** Fixes paritytech#598 Also implements [RFC paritytech#47](polkadot-fellows/RFCs#47) ## Description - Availability-recovery now first attempts to request the systematic chunks for large POVs (which are the first ~n/3 chunks, which can recover the full data without doing the costly reed-solomon decoding process). This has a fallback of recovering from all chunks, if for some reason the process fails. Additionally, backers are also used as a backup for requesting the systematic chunks if the assigned validator is not offering the chunk (each backer is only used for one systematic chunk, to not overload them). - Quite obviously, recovering from systematic chunks is much faster than recovering from regular chunks (4000% faster as measured on my apple M2 Pro). - Introduces a `ValidatorIndex` -> `ChunkIndex` mapping which is different for every core, in order to avoid only querying the first n/3 validators over and over again in the same session. The mapping is the one described in RFC 47. - The mapping is feature-gated by the [NodeFeatures runtime API](paritytech#2177) so that it can only be enabled via a governance call once a sufficient majority of validators have upgraded their client. If the feature is not enabled, the mapping will be the identity mapping and backwards-compatibility will be preserved. - Adds a new chunk request protocol version (v2), which adds the ChunkIndex to the response. This may or may not be checked against the expected chunk index. For av-distribution and systematic recovery, this will be checked, but for regular recovery, no. This is backwards compatible. First, a v2 request is attempted. If that fails during protocol negotiation, v1 is used. - Systematic recovery is only attempted during approval-voting, where we have easy access to the core_index. For disputes and collator pov_recovery, regular chunk requests are used, just as before. ## Performance results Some results from subsystem-bench: with regular chunk recovery: CPU usage per block 39.82s with recovery from backers: CPU usage per block 16.03s with systematic recovery: CPU usage per block 19.07s End-to-end results here: paritytech#598 (comment) #### TODO: - [x] [RFC paritytech#47](polkadot-fellows/RFCs#47) - [x] merge paritytech#2177 and rebase on top of those changes - [x] merge paritytech#2771 and rebase - [x] add tests - [x] preliminary performance measure on Versi: see paritytech#598 (comment) - [x] Rewrite the implementer's guide documentation - [x] paritytech#3065 - [x] paritytech/zombienet#1705 and fix zombienet tests - [x] security audit - [x] final versi test and performance measure --------- Signed-off-by: alindima <[email protected]> Co-authored-by: Javier Viola <[email protected]>

After cherry-picking paritytech#4724 (core sharing + scheduling_lookahead) on top of v1.12.0, five upstream references do not resolve because the enabling PRs landed later: 1. `polkadot_statement_table::{…}` → use local `statement-table` alias (crate package name is `polkadot-statement-table`, imported via `statement-table = { package = … }` in Cargo.toml). 2. `polkadot_primitives::{…}` in `scheduler.rs` and `runtime_api_impl/vstaging.rs` → use local `primitives` alias. 3. `AvailabilityStoreMessage::StoreAvailableData::{core_index, node_features}` → the two new fields were added by the systematic-chunks PR paritytech#1644, which is NOT cherry-picked (it's 7540 LoC / 84 files and brings an unrelated availability-recovery rewrite). Keep the outer function signature as-is for API parity with upstream and drop the two fields at the message-send boundary via `let _ = …;`. The erasure-root check that protects consensus continues to run. 4. `free_cores_and_fill_claimqueue` → in our tree the method is named `free_cores_and_fill_claim_queue` (the rename was a small cleanup that happened concurrently in upstream). Update the two call sites (paras_inherent and runtime_api v10). Build clean: `cargo check -p polkadot -p polkadot-service -p polkadot-runtime-parachains -p thxnet-testnet-runtime` all pass with only pre-existing warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

#30) * statement-distribution: Fix false warning (paritytech#4727) ... when backing group is of size 1. Signed-off-by: Alexandru Gheorghe <[email protected]> * Remove the prospective-parachains subsystem from collators (paritytech#4471) Implements paritytech#4429 Collators only need to maintain the implicit view for the paraid they are collating on. In this case, bypass prospective-parachains entirely. It's still useful to use the GetMinimumRelayParents message from prospective-parachains for validators, because the data is already present there. This enables us to entirely remove the subsystem from collators, which consumed resources needlessly Aims to resolve paritytech#4167 TODO: - [x] fix unit tests * Fix core sharing and make use of scheduling_lookahead (paritytech#4724) Implements most of paritytech#1797 Core sharing (two parachains or more marachains scheduled on the same core with the same `PartsOf57600` value) was not working correctly. The expected behaviour is to have Backed and Included event in each block for the paras sharing the core and the paras should take turns. E.g. for two cores we expect: Backed(a); Included(a)+Backed(b); Included(b)+Backed(a); etc. Instead of this each block contains just one event and there are a lot of gaps (blocks w/o events) during the session. Core sharing should also work when collators are building collations ahead of time TODOs: - [x] Add a zombienet test verifying that the behaviour mentioned above works. - [x] prdoc --------- Co-authored-by: alindima <[email protected]> * fix(1.12.0-backports): adapt paritytech#4724 to v1.12.0 subsystem APIs After cherry-picking paritytech#4724 (core sharing + scheduling_lookahead) on top of v1.12.0, five upstream references do not resolve because the enabling PRs landed later: 1. `polkadot_statement_table::{…}` → use local `statement-table` alias (crate package name is `polkadot-statement-table`, imported via `statement-table = { package = … }` in Cargo.toml). 2. `polkadot_primitives::{…}` in `scheduler.rs` and `runtime_api_impl/vstaging.rs` → use local `primitives` alias. 3. `AvailabilityStoreMessage::StoreAvailableData::{core_index, node_features}` → the two new fields were added by the systematic-chunks PR paritytech#1644, which is NOT cherry-picked (it's 7540 LoC / 84 files and brings an unrelated availability-recovery rewrite). Keep the outer function signature as-is for API parity with upstream and drop the two fields at the message-send boundary via `let _ = …;`. The erasure-root check that protects consensus continues to run. 4. `free_cores_and_fill_claimqueue` → in our tree the method is named `free_cores_and_fill_claim_queue` (the rename was a small cleanup that happened concurrently in upstream). Update the two call sites (paras_inherent and runtime_api v10). Build clean: `cargo check -p polkadot -p polkadot-service -p polkadot-runtime-parachains -p thxnet-testnet-runtime` all pass with only pre-existing warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(leafchain,v1.12.0): UNINCLUDED_SEGMENT_CAPACITY 2 → 1 to sidestep fragment-chain fork deadlock # Problem On v1.12.0 rootchain with async backing enabled (depth=1, ancestry=2) and a small backing topology (3 validators, 1 group, 1 core), the parachain stalls permanently after a few blocks. Validator log symptom (repeats every ~18s): - `Refusing to second candidate at leaf. Is not a potential member.` - `Rejected v2 advertisement ... error=BlockedByBacking` - Provisioner: `candidates_count=0` in every inherent # Root cause The v1.12.0 relay-side prospective-parachains subsystem is the pre-paritytech#4937 ("prospective-parachains rework: take II") fragment-chain. In `polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs:797` `is_fork_or_cycle` returns true if ANY other candidate already has the same `parent_head_hash`, rejecting it as a fork. When inclusion is slow, the collator's cumulus aura-ext consensus hook (FixedVelocityConsensusHook<V=1, C=2>) allows authoring a 2nd block every ~18s with the same parent while the first one is still waiting to be included. Each re-author produces a new block N with different extrinsics → same parent_head_hash → fragment-chain rejects all but the first → backing pipeline never completes a full cycle → permanent stall. Upstream fix is paritytech#4937, first in stable2409. Not cherry-pickable into v1.12.0 without dragging Constraints types from later releases (~1355 LoC rewrite). # Workaround Set UNINCLUDED_SEGMENT_CAPACITY = 1. `FixedVelocityConsensusHook::can_build_upon` checks `size_after_included >= C` and returns false if true. With C=1, once the collator has produced one unincluded block, all further authoring attempts at the same parent are blocked until that block is included on the relay. No forks are ever created at the cumulus side → fragment-chain sees a single candidate per cycle → accepts it normally. Tradeoff: para block production becomes synchronous-backing pace (one block per ~18s = 3 relay slots) instead of async's potential 1:1 with relay. Acceptable for small networks that need v1.12.0 as a stable endpoint rather than a transient upgrade hop. # Validation Rehearsed on forked-testnet 2026-04-18/19: - v0.9.40 → binary upgrade → leafchain setCode spec 20 → 21 (capacity=1) - rootchain setCode v1.12.0 (spec 112000003) - Para advanced from stuck-at-13 → 4556+ overnight, finalization keeping pace with head, validator provisioner consistently reports `candidates_count=1` on backing cycles. Prior capacity=2 rehearsal on the same topology reproducibly stalled para at 13–30 within minutes of rootchain setCode and never recovered. Bump leafchain spec_version 20 → 21 so setCode applies after the v1.12.0 rootchain upgrade. The stable2512 hop restores capacity to 2. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> * fix(v1.12.0): free stuck AvailabilityCores + ClaimQueue atomically with setCode Extends `EnableAsyncBackingAndCoretime` migration to force-free any occupied `AvailabilityCores` and kill `ClaimQueue` right after rewriting `ActiveConfig`. Mirrors what `Scheduler::push_occupied_cores_to_assignment_provider` does at session rotation, so the next `ParaInherent` pass can schedule fresh candidates without waiting ~1 hour for the prod-testnet session boundary. Background: when the v1.12.0 rootchain runtime enables async backing while a candidate is still occupying a core (common under capacity=2 cumulus + pre-paritytech#4937 relay), the occupying entry never reaches availability and the core stays stuck until the next session. Empirically measured on forked-testnet: - plain migration → para stuck 56 min (until session rotation) - this patch + validator restart → para advancing 16 s after setCode InBlock The restart step is required because prospective-parachains / SessionInfo caches live in validator-process memory and need flushing to pick up the new scheduler state. Runbook: `kubectl rollout restart deploy/validator-*` post-setCode. - Bumps spec_version 112_000_003 → 112_000_004 - Idempotent: re-running on a chain where cores are already free is a no-op Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --------- Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Alin Dima <[email protected]> Co-authored-by: Tsvetomir Dimitrov <[email protected]> Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

alindima added 30 commits September 8, 2023 11:08

Draft of RecoveryStrategy based on linked enums

051328a

add copright header

166aae5

fix clippy

518d8fd

Refactor RecoveryStrategy using dynamic dispatch

dae34a5

address comments

640c2d5

Merge branch 'master' into alindima/refactor-availability-recovery-st…

ac18bab

…rategies

erasure-coding: add algorithm for systematic recovery

2269f76

Signed-off-by: alindima <[email protected]>

WIP

dca9ce4

Merge remote-tracking branch 'origin/master' into alindima/add-system…

9eeb93a

…atic-chunks-av-recovery

continue implementation

6f58c3a

Fix some tests

a805a85

Merge remote-tracking branch 'origin/master' into alindima/add-system…

d83bf50

…atic-chunks-av-recovery

Merge remote-tracking branch 'origin/master' into alindima/refactor-a…

2a5d6d5

…vailability-recovery-strategies

replace hashmap of chunks with btreemap

f3d407b

this avoids sorting the chunks for systematic recovery, which is the hotpath we aim to optimise

add chunk indices cache to av-recovery

67841bd

some more improvements

3810eab

don't use the backing group if chunk size query failed

bf39ba0

add ImmediateError to chunks recovery strategy

34408b5

modify and add metrics

b68e671

more review comments

1569e49

fix test

d5c32d1

rollback to using TryConnect for fetch chunks

cb42bef

add runtime API knob for shuffling enablement

06f00a2

av-recovery and av-distr: use relay parent from candidate_receipt

2b58895

when querying the runtime api

add avail_chunk_shuffling params to HostConfiguration

f860b7d

replace BypassAvStore variant with a separate flag

b82b2a0

move requesting_chunks to the strategy

8585628

Merge remote-tracking branch 'origin/master' into alindima/refactor-a…

2f9b767

…vailability-recovery-strategies

Merge remote-tracking branch 'origin/alindima/refactor-availability-r…

824f7e3

…ecovery-strategies' into alindima/add-systematic-chunks-av-recovery

fix tests

818d9c0

fix

6faa1ee

alindima requested a review from a team as a code owner May 27, 2024 06:46

Merge remote-tracking branch 'origin/master' into alindima/add-system…

25dc880

…atic-chunks-av-recovery

alindima force-pushed the alindima/add-systematic-chunks-av-recovery branch from c07b56d to 25dc880 Compare May 27, 2024 06:55

alindima and others added 4 commits May 27, 2024 10:00

try to make markdown linter happy

5669eec

use higher glutton PoV sizes for tests

99f09e3

this is needed in order to trigger chunk requests because PoVs smaller than 4Mibs will prefer fetching from backers

bump zombienet version

ce08e60

Merge remote-tracking branch 'origin/master' into alindima/add-system…

fdbc31f

…atic-chunks-av-recovery

alindima mentioned this pull request May 28, 2024

subsystem-bench: add CI regression benches for all av-recovery strategies #4609

Open

alindima added this pull request to the merge queue May 28, 2024

Merged via the queue into master with commit 523e625 May 28, 2024

alindima deleted the alindima/add-systematic-chunks-av-recovery branch May 28, 2024 08:42

This was referenced Jun 5, 2024

Update polkadot-sdk from v1.7.0 to v1.11.0 moondance-labs/tanssi#573

Closed

Update polkadot-sdk from v1.10.0 to v1.11.0 moondance-labs/tanssi#577

Closed

This was referenced Aug 21, 2024

Update polkadot-sdk from v1.11.0 to stable2407 moondance-labs/tanssi#659

Closed

Update polkadot-sdk from v1.11.0 to stable2407 moonbeam-foundation/moonbeam#2912

Closed

alindima mentioned this pull request Jun 17, 2025

Make zombienet CI great again #8748

Merged

axaysagathiya mentioned this pull request Jul 1, 2025

Investigate on RFC-47 to add availability recovery from systematic chunks ChainSafe/gossamer#4796

Closed

4 tasks

DrudgeRajen mentioned this pull request Apr 18, 2026

cherry-pick: async-backing backports from v1.13 onto upgrade/1.12.0 (parked) thxnet/thxnet-sdk#26

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add availability-recovery from systematic chunks#1644

Add availability-recovery from systematic chunks#1644
alindima merged 176 commits intomasterfrom
alindima/add-systematic-chunks-av-recovery

alindima commented Sep 20, 2023 •

edited

Loading

Uh oh!

paritytech-cicd-pr commented May 24, 2024

Uh oh!

burdges commented May 29, 2024 •

edited

Loading

Uh oh!

sandreim commented May 29, 2024

Uh oh!

sandreim commented May 29, 2024

Uh oh!

alindima commented May 29, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

alindima commented Sep 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Performance results

TODO:

Uh oh!

paritytech-cicd-pr commented May 24, 2024

Uh oh!

burdges commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sandreim commented May 29, 2024

Uh oh!

sandreim commented May 29, 2024

Uh oh!

alindima commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

alindima commented Sep 20, 2023 •

edited

Loading

burdges commented May 29, 2024 •

edited

Loading

alindima commented May 29, 2024 •

edited

Loading