Approve multiple candidates with a single signature#1191

Merged

alexggh merged 124 commits intomasterfrom

alexaggh/feature/approve_multiple_candidates_polkadot_sdk

Dec 13, 2023

Contributor

alexggh commented Aug 27, 2023 •

edited

Loading

The pr migrates: paritytech/polkadot#7554, preliminary measurements and tests are discussed there.

Initial implementation for the plan discussed here: #701 on top of: #1178

Overall idea

When approval-voting checks a candidate and is ready to advertise the approval, defer it in a per-relay chain block until we either have MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what candidates we have available.

This should allow us to reduce the number of approvals messages we have to create/send/verify. The parameters are configurable, so we should find some values that balance:

Security of the network: Delaying broadcasting of an approval shouldn't but the finality at risk and to make sure that never happens we won't delay sending a vote if we are past 2/3 from the no-show time.
Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 & MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from the measurements we did on versi, it bottlenecks approval-distribution/approval-voting when increase significantly the number of validators and parachains
Block storage: In case of disputes we have to import this votes on chain and that increase the necessary storage with MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that disputes are not the normal way of the network functioning and we will limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this should be good enough. Alternatively, we could try to create a better way to store this on-chain through indirection, if that's needed.

Other fixes:

Fixed the fact that we were sending random assignments to non-validators, that was wrong because those won't do anything with it and they won't gossip it either because they do not have a grid topology set, so we would waste the random assignments.
Added metrics to be able to debug potential no-shows and mis-processing of approvals/assignments.

TODO:

Get feedback, that this is moving in the right direction. @ordian @sandreim @eskimor @burdges, let me know what you think.
More and more testing.
Test in versi.
Make MAX_APPROVAL_COALESCE_COUNT & MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
Make sure the backwards compatibility works correctly
Make sure this direction is compatible with other streams of work: Slash approval voters on approving invalid blocks - dynamically #635 & Time Disputes #742
Final versi burn-in before merging

sandreim and others added 5 commits

August 25, 2023 19:15


          merge from archived repo

7230df4

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>


          cargo lock

d04c182

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>


          Merge remote-tracking branch 'origin' into sandreim/the_v2_assignments

f4f0e70

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>


          Approve multiple candidates with a single signature

341c7af

The pr migrates:
- paritytech/polkadot#7554

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Fix build warnings

619fff2

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

alexggh force-pushed the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch from 342308e to 619fff2 Compare

August 27, 2023 11:20

alexggh mentioned this pull request

[DNM] Migrate PR/7554 from polkadot repo #1172

Closed

alexggh added 5 commits

August 27, 2023 14:51


          Merge remote-tracking branch 'origin/master' into feature/approve_mul…

5f1558d

…tiple_candidates_polkadot_sdk


          ci: fix worker binaries could not be found

ed1d9d0

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Add missing bits

7d7b82c

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Build with network-protocol-staging

7bc13d3

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Validate disconnect theory

53f8556

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

alexggh force-pushed the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch from 1cb26cd to 7bc13d3 Compare

August 28, 2023 09:22

sandreim and others added 3 commits

August 28, 2023 15:54


          Merge branch 'master' of github.com:paritytech/polkadot-sdk into sand…

442b1e4

…reim/the_v2_assignments


          Log errors when banning peers

5e004e1

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          fix zombienet test

9850b2f

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

alexggh mentioned this pull request

Versi high number of PeerDisconnect when scaling up number of validators and parachains #1263

Closed

sandreim and others added 10 commits

August 29, 2023 19:06


          Merge branch 'master' of github.com:paritytech/polkadot-sdk into sand…

f71eb31

…reim/the_v2_assignments

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>


          cargo lock

46cfaf1

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>


          Merge branch 'master' of github.com:paritytech/polkadot-sdk into sand…

…reim/the_v2_assignments

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>


          superfluous

47beabd

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>


          Merge branch 'master' into sandreim/the_v2_assignments

ee88408


          Separate approval

3d3e37c

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Revert "Log errors when banning peers"

da61d98

This reverts commit 5e004e1.


          Merge remote-tracking branch 'origin/sandreim/the_v2_assignments' int…

9c0375c

…o feature/approve_multiple_candidates_polkadot_sdk_v2


          Cleanup post migrating hacks when migrating from polkadot repo

f3fee24

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Fixup clippy

6338d33

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

sandreim mentioned this pull request

[DNM] Test / Debug #1635

Closed

alexggh added 2 commits

September 25, 2023 16:37


          Merge remote-tracking branch 'origin/master' into feature/approve_mul…

d4fb01a

…tiple_candidates_polkadot_sdk_v3


          Merge remote-tracking branch 'origin/master' into sandreim/the_v2_ass…

5832ad7

…ignments

alexggh added 10 commits

December 8, 2023 09:44


          Remove network-protocol-staging

871e9cf

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Merge remote-tracking branch 'origin/master' into feature/approve_mul…

ec0a988

…tiple_candidates_polkadot_sdk_v4


          Rename zombienet

a364b64

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Add migration v11 HostConfiguration, missed during rebasing

de7b5c0

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Fix network_protocol_versioning_subsystem_msg

5cf522a

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Fix GetApprovalSignatures

83a7325

... discovered during benchmarking

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Merge remote-tracking branch 'origin/master' into feature/approve_mul…

6af2910

…tiple_candidates_polkadot_sdk_v4


          Fix formatting issues

4d33c9d

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Fixup cargo fmt

30950d2

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Fix some logging messed during rebase

593b82a

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

Contributor Author

alexggh commented Dec 11, 2023

Performed the final sanity checks on versi before merging, tested the following scenarios:

Ran for 24h a really small network with 22 validators and 5 parachains.
Ran for 4-5h a network with 100 validators and 30 parachains.

Checked:

Logs, no unexpected errors and warnings noticed during the testing.
Metrics, finality was around 2.5, no no-shows appeared during the testing.

alexggh and others added 3 commits

December 11, 2023 17:39


          Add prdoc

f8f03e5

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Fixup 0002-upgrade-node failures

8099c16

V2 was not put into the list of fallbacks for the validation protocol,
so the test wrongly fall-backed on v1.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>


          Merge branch 'master' into alexaggh/feature/approve_multiple_candidat…

44f0210

…es_polkadot_sdk

alexggh merged commit a84dd0d into master

alexggh deleted the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch

December 13, 2023 06:43

alexggh mentioned this pull request

Support for new network validation protocol(v3) qdrvm/kagome#1923

Closed

Polkadot-Forum commented Jan 12, 2024

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/raising-awareness-new-network-validation-protocol-version-v3-coming/5639/1

github-merge-queue bot pushed a commit that referenced this pull request


          Introduce approval-voting/distribution benchmark (#2621)

f9f8868

## Summary
Built on top of the tooling and ideas introduced in
#2528, this PR introduces
a synthetic benchmark for measuring and assessing the performance
characteristics of the approval-voting and approval-distribution
subsystems.

Currently this allows, us to simulate the behaviours of these systems
based on the following dimensions:
```
TestConfiguration:
# Test 1
- objective: !ApprovalsTest
    last_considered_tranche: 89
    min_coalesce: 1
    max_coalesce: 6
    enable_assignments_v2: true
    send_till_tranche: 60
    stop_when_approved: false
    coalesce_tranche_diff: 12
    workdir_prefix: "/tmp"
    num_no_shows_per_candidate: 0
    approval_distribution_expected_tof: 6.0
    approval_distribution_cpu_ms: 3.0
    approval_voting_cpu_ms: 4.30
  n_validators: 500
  n_cores: 100
  n_included_candidates: 100
  min_pov_size: 1120
  max_pov_size: 5120
  peer_bandwidth: 524288000000
  bandwidth: 524288000000
  latency:
    min_latency:
      secs: 0
      nanos: 1000000
    max_latency:
      secs: 0
      nanos: 100000000
  error: 0
  num_blocks: 10
```

## The approach
1. We build a real overseer with the real implementations for
approval-voting and approval-distribution subsystems.
2. For a given network size, for each validator we pre-computed all
potential assignments and approvals it would send, because this a
computation heavy operation this will be cached on a file on disk and be
re-used if the generation parameters don't change.
3. The messages will be sent accordingly to the configured parameters
and those are split into 3 main benchmarking scenarios.

## Benchmarking scenarios

### Best case scenario *approvals_throughput_best_case.yaml*
It send to the approval-distribution only the minimum required tranche
to gathered the needed_approvals, so that a candidate is approved.

### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
It sends the tranche needed to approve a candidate when we have a
maximum of *num_no_shows_per_candidate* tranches with no-shows for each
candidate.

### Maximum throughput *approvals_throughput.yaml*
It sends all the tranches for each block and measures the used CPU and
necessary network bandwidth. by the approval-voting and
approval-distribution subsystem.

## How to run it
```
cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
```

## Evaluating performance
### Use the real subsystems metrics
If you follow the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
for installing locally prometheus and grafana, all real metrics for the
`approval-distribution`, `approval-voting` and overseer are available.
E.g:
<img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">

<img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">

<img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">

<img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">

### Profile with pyroscope
1. Setup pyroscope following the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
then run any of the benchmark scenario with `--profile` as the
arguments.
2. Open the pyroscope dashboard in grafana, e.g:
<img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">



### Useful  logs
1. Network bandwidth requirements:
```
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
```

2. Cpu usage by the approval-distribution/approval-voting subsystems.
```
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```

3. Time passed until a given block is approved
```
 Chain selection approved  after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
Chain selection approved  after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
```

### Using benchmark to quantify improvements from
#1178 +
#1191

Using a versi-node we compare the scenarios where all new optimisations
are disabled with a scenarios where tranche0 assignments are sent in a
single message and a conservative simulation where the coalescing of
approvals gives us just 50% reduction in the number of messages we send.

Overall, what we see is a speedup of around 30-40% in the time it takes
to process the necessary messages and a 30-40% reduction in the
necessary bandwidth.

#### Best case scenario comparison(minimum required tranches sent).
Unoptimised
```
    Number of blocks: 10
    Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
    Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
    approval-distribution CPU usage 6.732s
    approval-distribution CPU usage per block 0.673s
    approval-voting CPU usage 9.523s
    approval-voting CPU usage per block 0.952s
```

vs Optimisation enabled
```
   Number of blocks: 10
   Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
   Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
   approval-distribution CPU usage 4.658s
   approval-distribution CPU usage per block 0.466s
   approval-voting CPU usage 6.236s
   approval-voting CPU usage per block 0.624s
```

#### Worst case all tranches sent, very unlikely happens when sharding
breaks.

Unoptimised
```
   Number of blocks: 10
   Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
   Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
   approval-distribution CPU usage 118.681s
   approval-distribution CPU usage per block 11.868s
   approval-voting CPU usage 124.118s
   approval-voting CPU usage per block 12.412s
```

vs optimised
```
    Number of blocks: 10
    Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
    Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
    approval-distribution CPU usage 84.061s
    approval-distribution CPU usage per block 8.406s
    approval-voting CPU usage 96.532s
    approval-voting CPU usage per block 9.653s
```


## TODOs
[x] Polish implementation.
[x] Use what we have so far to evaluate
#1191 before merging.
[x] List of features and additional dimensions we want to use for
benchmarking.
[x] Run benchmark on hardware similar with versi and kusama nodes.
[ ] Add benchmark to be run in CI for catching regression in
performance.
[ ] Rebase on latest changes for network emulation.

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

github-actions bot mentioned this pull request

Update substrate/polkadot/cumulus from v1.3.0 to v1.6.0 moondance-labs/tanssi#419

Closed

alexggh added a commit to alexggh/runtimes that referenced this pull request


          Bump ParachainHost to api version 10 on kusama

d5a3f2d

... to add approval_voting_params API which will allow us to enable
approvals coalescing implementation from:
 - paritytech/polkadot-sdk#1191

Note! Bumping the version will not enable the new logic, that will be
enable at a later date we we decide to call set_approval_voting_params
with max_approval_coalesce_count greater than 1.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

alexggh mentioned this pull request

Bump ParachainHost to api version 10 on kusama polkadot-fellows/runtimes#204

Merged

alexggh added a commit to alexggh/runtimes that referenced this pull request


          Bump ParachainHost to api version 10 on kusama

7f8b6db

... to add approval_voting_params API which will allow us to enable
approvals coalescing implementation from:
 - paritytech/polkadot-sdk#1191

Note! Bumping the version will not enable the new logic, that will be
enable at a later date we we decide to call set_approval_voting_params
with max_approval_coalesce_count greater than 1.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

fellowship-merge-bot bot pushed a commit to polkadot-fellows/runtimes that referenced this pull request


          Bump ParachainHost to api version 10 on kusama (#204)

... to add approval_voting_params API which will allow us to enable
approvals coalescing implementation from:
 - paritytech/polkadot-sdk#1191

Note! Bumping the version will not enable the new logic, that will be
enable at a later date we we decide to call set_approval_voting_params
with max_approval_coalesce_count greater than 1.

<!-- Remember that you can run `/merge` to enable auto-merge in the PR
-->

<!-- Remember to modify the changelog. If you don't need to modify it,
you can check the following box.
Instead, if you have already modified it, simply delete the following
line. -->

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>

github-actions bot mentioned this pull request

Update polkadot-sdk from v1.3.0 to v1.7.2 moonbeam-foundation/moonbeam#2703

Closed

bkchr pushed a commit that referenced this pull request


          Message transactions mortality (#1191)

1ef41a5

* transactions mortality in message and complex relays

* logging + enable in test deployments

* spellcheck

* fmt

Polkadot-Forum commented May 21, 2024

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/what-are-subsystem-benchmarks/8212/1

Polkadot-Forum commented May 21, 2024

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/update-validator-set-size-increase-on-kusama/8218/1

This was referenced Jun 5, 2024

Update polkadot-sdk from v1.7.0 to v1.11.0 moondance-labs/tanssi#573

Closed

Update polkadot-sdk from v1.10.0 to v1.11.0 moondance-labs/tanssi#577

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

eskimor eskimor approved these changes

EgorPopelyaev EgorPopelyaev approved these changes

sandreim sandreim approved these changes

rphmeier Awaiting requested review from rphmeier

+2 more reviewers

ordian ordian approved these changes

altaua altaua approved these changes

Labels

R0-no-crate-publish-required T8-polkadot