Skip to content

Introduce approval-voting/distribution benchmark#2621

Merged
alexggh merged 247 commits intomasterfrom
alexaggh/subsystem-bench-approvals
Feb 5, 2024
Merged

Introduce approval-voting/distribution benchmark#2621
alexggh merged 247 commits intomasterfrom
alexaggh/subsystem-bench-approvals

Conversation

@alexggh
Copy link
Contributor

@alexggh alexggh commented Dec 5, 2023

Summary

Built on top of the tooling and ideas introduced in #2528, this PR introduces a synthetic benchmark for measuring and assessing the performance characteristics of the approval-voting and approval-distribution subsystems.

Currently this allows, us to simulate the behaviours of these systems based on the following dimensions:

TestConfiguration:
# Test 1
- objective: !ApprovalsTest
    last_considered_tranche: 89
    min_coalesce: 1
    max_coalesce: 6
    enable_assignments_v2: true
    send_till_tranche: 60
    stop_when_approved: false
    coalesce_tranche_diff: 12
    workdir_prefix: "/tmp"
    num_no_shows_per_candidate: 0
    approval_distribution_expected_tof: 6.0
    approval_distribution_cpu_ms: 3.0
    approval_voting_cpu_ms: 4.30
  n_validators: 500
  n_cores: 100
  n_included_candidates: 100
  min_pov_size: 1120
  max_pov_size: 5120
  peer_bandwidth: 524288000000
  bandwidth: 524288000000
  latency:
    min_latency:
      secs: 0
      nanos: 1000000
    max_latency:
      secs: 0
      nanos: 100000000
  error: 0
  num_blocks: 10

The approach

  1. We build a real overseer with the real implementations for approval-voting and approval-distribution subsystems.
  2. For a given network size, for each validator we pre-computed all potential assignments and approvals it would send, because this a computation heavy operation this will be cached on a file on disk and be re-used if the generation parameters don't change.
  3. The messages will be sent accordingly to the configured parameters and those are split into 3 main benchmarking scenarios.

Benchmarking scenarios

Best case scenario approvals_throughput_best_case.yaml

It send to the approval-distribution only the minimum required tranche to gathered the needed_approvals, so that a candidate is approved.

Behaviour in the presence of no-shows approvals_no_shows.yaml

It sends the tranche needed to approve a candidate when we have a maximum of num_no_shows_per_candidate tranches with no-shows for each candidate.

Maximum throughput approvals_throughput.yaml

It sends all the tranches for each block and measures the used CPU and necessary network bandwidth. by the approval-voting and approval-distribution subsystem.

How to run it

cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml

Evaluating performance

Use the real subsystems metrics

If you follow the steps in https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana for installing locally prometheus and grafana, all real metrics for the approval-distribution, approval-voting and overseer are available. E.g:
Screenshot 2023-12-05 at 11 07 46

Screenshot 2023-12-05 at 11 09 42 Screenshot 2023-12-05 at 11 10 15 Screenshot 2023-12-05 at 11 10 52

Profile with pyroscope

  1. Setup pyroscope following the steps in https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope, then run any of the benchmark scenario with --profile as the arguments.
  2. Open the pyroscope dashboard in grafana, e.g:
Screenshot 2024-01-09 at 17 09 58

Useful logs

  1. Network bandwidth requirements:
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
  1. Cpu usage by the approval-distribution/approval-voting subsystems.
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
  1. Time passed until a given block is approved
 Chain selection approved  after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
Chain selection approved  after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202

Using benchmark to quantify improvements from #1178 + #1191

Using a versi-node we compare the scenarios where all new optimisations are disabled with a scenarios where tranche0 assignments are sent in a single message and a conservative simulation where the coalescing of approvals gives us just 50% reduction in the number of messages we send.

Overall, what we see is a speedup of around 30-40% in the time it takes to process the necessary messages and a 30-40% reduction in the necessary bandwidth.

Best case scenario comparison(minimum required tranches sent).

Unoptimised

    Number of blocks: 10
    Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
    Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
    approval-distribution CPU usage 6.732s
    approval-distribution CPU usage per block 0.673s
    approval-voting CPU usage 9.523s
    approval-voting CPU usage per block 0.952s

vs Optimisation enabled

   Number of blocks: 10
   Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
   Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
   approval-distribution CPU usage 4.658s
   approval-distribution CPU usage per block 0.466s
   approval-voting CPU usage 6.236s
   approval-voting CPU usage per block 0.624s

Worst case all tranches sent, very unlikely happens when sharding breaks.

Unoptimised

   Number of blocks: 10
   Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
   Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
   approval-distribution CPU usage 118.681s
   approval-distribution CPU usage per block 11.868s
   approval-voting CPU usage 124.118s
   approval-voting CPU usage per block 12.412s

vs optimised

    Number of blocks: 10
    Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
    Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
    approval-distribution CPU usage 84.061s
    approval-distribution CPU usage per block 8.406s
    approval-voting CPU usage 96.532s
    approval-voting CPU usage per block 9.653s

TODOs

[x] Polish implementation.
[x] Use what we have so far to evaluate #1191 before merging.
[x] List of features and additional dimensions we want to use for benchmarking.
[x] Run benchmark on hardware similar with versi and kusama nodes.
[ ] Add benchmark to be run in CI for catching regression in performance.
[ ] Rebase on latest changes for network emulation.

sandreim and others added 30 commits August 25, 2023 19:15
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Andrei Sandu <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
…o feature/approve_multiple_candidates_polkadot_sdk_v2
Signed-off-by: Alexandru Gheorghe <[email protected]>
... the param was incorrectly appended to v9 instead of creating a new
version as v10.

Signed-off-by: Alexandru Gheorghe <[email protected]>
@alexggh alexggh removed request for athei and koute January 22, 2024 07:39
Signed-off-by: Alexandru Gheorghe <[email protected]>
@alindima
Copy link
Contributor

Question: are we aiming to first merge #2970 and then rebase this PR, or to first merge this PR into #2970 ?

@alexggh
Copy link
Contributor Author

alexggh commented Jan 23, 2024

Question: are we aiming to first merge #2970 and then rebase this PR, or to first merge this PR into #2970 ?

First merge #2970 and then rebase this PR.

Base automatically changed from sandreim/availability-write-bench to master January 25, 2024 17:52
@alexggh alexggh added the R0-no-crate-publish-required The change does not require any crates to be re-published. label Jan 29, 2024
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
Signed-off-by: Alexandru Gheorghe <[email protected]>
@alexggh
Copy link
Contributor Author

alexggh commented Feb 2, 2024

Addressed all review feedback, once the CI passes will merge this PR.

@alexggh alexggh added this pull request to the merge queue Feb 5, 2024
Merged via the queue into master with commit f9f8868 Feb 5, 2024
@alexggh alexggh deleted the alexaggh/subsystem-bench-approvals branch February 5, 2024 07:27
@Polkadot-Forum
Copy link

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/what-are-subsystem-benchmarks/8212/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

R0-no-crate-publish-required The change does not require any crates to be re-published. T10-tests This PR/Issue is related to tests. T12-benchmarks This PR/Issue is related to benchmarking and weights.

Projects

Status: Completed

Development

Successfully merging this pull request may close these issues.

5 participants