Skip to content

Approve multiple candidates with a single signature#1191

Merged
alexggh merged 124 commits intomasterfrom
alexaggh/feature/approve_multiple_candidates_polkadot_sdk
Dec 13, 2023
Merged

Approve multiple candidates with a single signature#1191
alexggh merged 124 commits intomasterfrom
alexaggh/feature/approve_multiple_candidates_polkadot_sdk

Conversation

@alexggh
Copy link
Copy Markdown
Contributor

@alexggh alexggh commented Aug 27, 2023

The pr migrates: paritytech/polkadot#7554, preliminary measurements and tests are discussed there.

Initial implementation for the plan discussed here: #701 on top of: #1178

Overall idea

When approval-voting checks a candidate and is ready to advertise the approval, defer it in a per-relay chain block until we either have MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what candidates we have available.

This should allow us to reduce the number of approvals messages we have to create/send/verify. The parameters are configurable, so we should find some values that balance:

  • Security of the network: Delaying broadcasting of an approval shouldn't but the finality at risk and to make sure that never happens we won't delay sending a vote if we are past 2/3 from the no-show time.
  • Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 & MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from the measurements we did on versi, it bottlenecks approval-distribution/approval-voting when increase significantly the number of validators and parachains
  • Block storage: In case of disputes we have to import this votes on chain and that increase the necessary storage with MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that disputes are not the normal way of the network functioning and we will limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this should be good enough. Alternatively, we could try to create a better way to store this on-chain through indirection, if that's needed.

Other fixes:

  • Fixed the fact that we were sending random assignments to non-validators, that was wrong because those won't do anything with it and they won't gossip it either because they do not have a grid topology set, so we would waste the random assignments.
  • Added metrics to be able to debug potential no-shows and mis-processing of approvals/assignments.

TODO:

sandreim and others added 5 commits August 25, 2023 19:15
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
The pr migrates:
- paritytech/polkadot#7554

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@alexggh alexggh force-pushed the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch from 342308e to 619fff2 Compare August 27, 2023 11:20
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@alexggh alexggh force-pushed the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch from 1cb26cd to 7bc13d3 Compare August 28, 2023 09:22
sandreim and others added 3 commits August 28, 2023 15:54
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
sandreim and others added 10 commits August 29, 2023 19:06
…reim/the_v2_assignments

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
…reim/the_v2_assignments

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
…o feature/approve_multiple_candidates_polkadot_sdk_v2
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@sandreim sandreim mentioned this pull request Sep 19, 2023
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
... discovered during benchmarking

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@alexggh
Copy link
Copy Markdown
Contributor Author

alexggh commented Dec 11, 2023

Performed the final sanity checks on versi before merging, tested the following scenarios:

  • Ran for 24h a really small network with 22 validators and 5 parachains.
  • Ran for 4-5h a network with 100 validators and 30 parachains.

Checked:

  • Logs, no unexpected errors and warnings noticed during the testing.
  • Metrics, finality was around 2.5, no no-shows appeared during the testing.

alexggh and others added 3 commits December 11, 2023 17:39
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
V2 was not put into the list of fallbacks for the validation protocol,
so the test wrongly fall-backed on v1.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
@alexggh alexggh merged commit a84dd0d into master Dec 13, 2023
@alexggh alexggh deleted the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch December 13, 2023 06:43
@Polkadot-Forum
Copy link
Copy Markdown

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/raising-awareness-new-network-validation-protocol-version-v3-coming/5639/1

github-merge-queue bot pushed a commit that referenced this pull request Feb 5, 2024
## Summary
Built on top of the tooling and ideas introduced in
#2528, this PR introduces
a synthetic benchmark for measuring and assessing the performance
characteristics of the approval-voting and approval-distribution
subsystems.

Currently this allows, us to simulate the behaviours of these systems
based on the following dimensions:
```
TestConfiguration:
# Test 1
- objective: !ApprovalsTest
    last_considered_tranche: 89
    min_coalesce: 1
    max_coalesce: 6
    enable_assignments_v2: true
    send_till_tranche: 60
    stop_when_approved: false
    coalesce_tranche_diff: 12
    workdir_prefix: "/tmp"
    num_no_shows_per_candidate: 0
    approval_distribution_expected_tof: 6.0
    approval_distribution_cpu_ms: 3.0
    approval_voting_cpu_ms: 4.30
  n_validators: 500
  n_cores: 100
  n_included_candidates: 100
  min_pov_size: 1120
  max_pov_size: 5120
  peer_bandwidth: 524288000000
  bandwidth: 524288000000
  latency:
    min_latency:
      secs: 0
      nanos: 1000000
    max_latency:
      secs: 0
      nanos: 100000000
  error: 0
  num_blocks: 10
```

## The approach
1. We build a real overseer with the real implementations for
approval-voting and approval-distribution subsystems.
2. For a given network size, for each validator we pre-computed all
potential assignments and approvals it would send, because this a
computation heavy operation this will be cached on a file on disk and be
re-used if the generation parameters don't change.
3. The messages will be sent accordingly to the configured parameters
and those are split into 3 main benchmarking scenarios.

## Benchmarking scenarios

### Best case scenario *approvals_throughput_best_case.yaml*
It send to the approval-distribution only the minimum required tranche
to gathered the needed_approvals, so that a candidate is approved.

### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
It sends the tranche needed to approve a candidate when we have a
maximum of *num_no_shows_per_candidate* tranches with no-shows for each
candidate.

### Maximum throughput *approvals_throughput.yaml*
It sends all the tranches for each block and measures the used CPU and
necessary network bandwidth. by the approval-voting and
approval-distribution subsystem.

## How to run it
```
cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
```

## Evaluating performance
### Use the real subsystems metrics
If you follow the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
for installing locally prometheus and grafana, all real metrics for the
`approval-distribution`, `approval-voting` and overseer are available.
E.g:
<img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">

<img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">

<img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">

<img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">

### Profile with pyroscope
1. Setup pyroscope following the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
then run any of the benchmark scenario with `--profile` as the
arguments.
2. Open the pyroscope dashboard in grafana, e.g:
<img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">



### Useful  logs
1. Network bandwidth requirements:
```
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
```

2. Cpu usage by the approval-distribution/approval-voting subsystems.
```
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```

3. Time passed until a given block is approved
```
 Chain selection approved  after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
Chain selection approved  after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
```

### Using benchmark to quantify improvements from
#1178 +
#1191

Using a versi-node we compare the scenarios where all new optimisations
are disabled with a scenarios where tranche0 assignments are sent in a
single message and a conservative simulation where the coalescing of
approvals gives us just 50% reduction in the number of messages we send.

Overall, what we see is a speedup of around 30-40% in the time it takes
to process the necessary messages and a 30-40% reduction in the
necessary bandwidth.

#### Best case scenario comparison(minimum required tranches sent).
Unoptimised
```
    Number of blocks: 10
    Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
    Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
    approval-distribution CPU usage 6.732s
    approval-distribution CPU usage per block 0.673s
    approval-voting CPU usage 9.523s
    approval-voting CPU usage per block 0.952s
```

vs Optimisation enabled
```
   Number of blocks: 10
   Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
   Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
   approval-distribution CPU usage 4.658s
   approval-distribution CPU usage per block 0.466s
   approval-voting CPU usage 6.236s
   approval-voting CPU usage per block 0.624s
```

#### Worst case all tranches sent, very unlikely happens when sharding
breaks.

Unoptimised
```
   Number of blocks: 10
   Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
   Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
   approval-distribution CPU usage 118.681s
   approval-distribution CPU usage per block 11.868s
   approval-voting CPU usage 124.118s
   approval-voting CPU usage per block 12.412s
```

vs optimised
```
    Number of blocks: 10
    Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
    Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
    approval-distribution CPU usage 84.061s
    approval-distribution CPU usage per block 8.406s
    approval-voting CPU usage 96.532s
    approval-voting CPU usage per block 9.653s
```


## TODOs
[x] Polish implementation.
[x] Use what we have so far to evaluate
#1191 before merging.
[x] List of features and additional dimensions we want to use for
benchmarking.
[x] Run benchmark on hardware similar with versi and kusama nodes.
[ ] Add benchmark to be run in CI for catching regression in
performance.
[ ] Rebase on latest changes for network emulation.

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
alexggh added a commit to alexggh/runtimes that referenced this pull request Feb 28, 2024
... to add approval_voting_params API which will allow us to enable
approvals coalescing implementation from:
 - paritytech/polkadot-sdk#1191

Note! Bumping the version will not enable the new logic, that will be
enable at a later date we we decide to call set_approval_voting_params
with max_approval_coalesce_count greater than 1.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
alexggh added a commit to alexggh/runtimes that referenced this pull request Feb 28, 2024
... to add approval_voting_params API which will allow us to enable
approvals coalescing implementation from:
 - paritytech/polkadot-sdk#1191

Note! Bumping the version will not enable the new logic, that will be
enable at a later date we we decide to call set_approval_voting_params
with max_approval_coalesce_count greater than 1.

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
fellowship-merge-bot bot pushed a commit to polkadot-fellows/runtimes that referenced this pull request Feb 29, 2024
... to add approval_voting_params API which will allow us to enable
approvals coalescing implementation from:
 - paritytech/polkadot-sdk#1191

Note! Bumping the version will not enable the new logic, that will be
enable at a later date we we decide to call set_approval_voting_params
with max_approval_coalesce_count greater than 1.

<!-- Remember that you can run `/merge` to enable auto-merge in the PR
-->

<!-- Remember to modify the changelog. If you don't need to modify it,
you can check the following box.
Instead, if you have already modified it, simply delete the following
line. -->

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
bkchr pushed a commit that referenced this pull request Apr 10, 2024
* transactions mortality in message and complex relays

* logging + enable in test deployments

* spellcheck

* fmt
@Polkadot-Forum
Copy link
Copy Markdown

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/what-are-subsystem-benchmarks/8212/1

@Polkadot-Forum
Copy link
Copy Markdown

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/update-validator-set-size-increase-on-kusama/8218/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

R0-no-crate-publish-required The change does not require any crates to be re-published. T8-polkadot This PR/Issue is related to/affects the Polkadot network.

Projects

Status: Audited
Status: Completed

Development

Successfully merging this pull request may close these issues.

9 participants