Skip to content

short-term fix for para inherent weight overestimation#5082

Merged
ordian merged 32 commits intomasterfrom
ao-fix-parainclusion-weight-overestimation
Aug 29, 2024
Merged

short-term fix for para inherent weight overestimation#5082
ordian merged 32 commits intomasterfrom
ao-fix-parainclusion-weight-overestimation

Conversation

@ordian
Copy link
Copy Markdown
Contributor

@ordian ordian commented Jul 19, 2024

closes #849

Context

For the background on this and the long-term fix, see #849 (comment).

Changes

  • The weigh files are renamed from runtime_(parachains|common).* to polkadot_runtime_(parachains|common).*. The reason for it is the renaming introduced in Unify dependency aliases #4633. The new weight command and files are generated now include polkadot_ prefix.
  • The WeightInfo for paras_inherent now includes enter_empty which calculates the cost of processing an empty parachains inherent. This cost is subtracted dynamically when calculating other weights (so the other weights remain the same)

Benefits

See #849 (comment), but TL;DR is that we are not blocked on weights for scaling the number of validators and cores further.

Resolved questions:

TODOs:

  • Rerun benchmarks for Rococo and Westend
  • PRDoc

@ordian ordian added T8-polkadot This PR/Issue is related to/affects the Polkadot network. T12-benchmarks This PR/Issue is related to benchmarking and weights. labels Jul 19, 2024
@ordian
Copy link
Copy Markdown
Contributor Author

ordian commented Jul 22, 2024

bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=westend
bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=rococo
bot clean

@command-bot command-bot bot deleted a comment from github-actions bot Jul 22, 2024
command-bot and others added 6 commits July 22, 2024 14:58
…=rococo --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent
…=westend --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent
* master:
  Bump slotmap from 1.0.6 to 1.0.7 (#5096)
  feat: introduce pallet-parameters to Westend to parameterize inflation (#4938)
  Bump openssl from 0.10.64 to 0.10.66 (#5107)
  Bump lycheeverse/lychee-action from 1.9.1 to 1.10.0 (#5094)
  docs: remove the duplicate word (#5095)
  Prepare PVFs if node is a validator in the next session (#4791)
  Update parity publish (#5105)
@paritytech-cicd-pr
Copy link
Copy Markdown

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6778934

@kianenigma kianenigma added the T2-pallets This PR/Issue is related to a particular pallet. label Jul 24, 2024
@kianenigma
Copy link
Copy Markdown
Contributor

What is the long term fix?

ordian added 2 commits July 25, 2024 10:23
* master: (27 commits)
  Bridges improved tests and nits (#5128)
  Fix misleading comment about RewardHandler in epm config (#3095)
  Introduce a workflow updating the wishlist leaderboards (#5085)
  membership: Restructure pallet into separate files (#4536)
  Fix after ring-proof api change (#5126)
  Bump paritytech/review-bot from 2.4.0 to 2.5.0 (#5057)
  Bump docker/login-action from 3.0.0 to 3.3.0 (#5109)
  Bump docker/build-push-action from 5.1.0 to 6.5.0 (#5108)
  Bump peter-evans/create-pull-request from 5.0.0 to 6.1.0 (#5093)
  Tx Payment: drop ED requirements for tx payments with exchangeable asset  (#4488)
  Remove `pallet-getter` usage from pallet-transaction-payment (#4970)
  pallet macro: do not generate try-runtime related code when frame-support doesn't have try-runtime. (#5099)
  fix(chain-spec): ChainSpecBuilder with object as default genesis (#4345)
  Migrate BEEFY BLS crypto to  bls12-381 curve (#4931)
  Bump clap from 4.5.9 to 4.5.10 in the known_good_semver group (#5120)
  Use jobserver in wasm-builder to limit concurrency of spawned cargo processes (#4946)
  include events for voting (#4613)
  [subsystem-bench] Add mocks for own assignments triggering (#5042)
  Remove not-audited warning (#5114)
  hotfix: blockchain/backend: Skip genesis leaf to unblock syncing (#5103)
  ...
Comment on lines +35 to +37
benchmark.bitfields.clear();
benchmark.backed_candidates.clear();
benchmark.disputes.clear();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these really needed if we just called build() on a default BenchBuilder instance?

ordian added 3 commits August 6, 2024 15:33
* master: (51 commits)
  Remove unused feature gated code from the minimal template (#5237)
  make polkadot-parachain startup errors pretty (#5214)
  Coretime auto-renew (#4424)
  network/strategy: Backoff and ban overloaded peers to avoid submitting the same request multiple times (#5029)
  Fix frame crate usage doc (#5222)
  beefy: Tolerate pruned state on runtime API call (#5197)
  rpc: Enable ChainSpec for polkadot-parachain (#5205)
  Add an adapter for configuring AssetExchanger (#5130)
  Replace env_logger with sp_tracing (#5065)
  Adjust sync templates flow to use new release branch (#5182)
  litep2p/discovery: Publish authority records with external addresses only (#5176)
  Run UI tests in CI for some other crates (#5167)
  Remove `pallet::getter` usage from the pallet-balances (#4967)
  pallet-timestamp: `UnixTime::now` implementation logs error only if called at genesis (#5055)
  [CI] Cache try-runtime check (#5179)
  [Backport] version bumps and the prdocs reordering from stable2407 (#5178)
  [subsystem-benchmark] Update availability-distribution-regression-bench baseline after recent subsystem changes (#5180)
  Remove pallet::getter usage from proxy (#4963)
  Remove pallet::getter macro usage from pallet-election-provider-multi-phase (#4487)
  Review-bot@2.6.0 (#5177)
  ...
@ordian
Copy link
Copy Markdown
Contributor Author

ordian commented Aug 7, 2024

bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=westend
bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=rococo
bot clean

@command-bot command-bot bot deleted a comment from github-actions bot Aug 7, 2024
Copy link
Copy Markdown
Contributor

@tdimitrov tdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for a short term fix this is good. @alexggh do have a point about the 1000 validators but it can be a follow up or part of the proper fix.

@ordian
Copy link
Copy Markdown
Contributor Author

ordian commented Aug 22, 2024

bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=westend
bot bench polkadot-pallet --pallet=polkadot_runtime_parachains::paras_inherent --runtime=rococo
bot clean

@command-bot
Copy link
Copy Markdown

command-bot bot commented Aug 22, 2024

@ordian Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=rococo --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent has finished. Result:

HttpError: Not Found
HttpError: Not Found
    at /app/node_modules/@octokit/request/dist-node/index.js:86:21
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async sendRequestWithRetries (/app/node_modules/octokit-auth-probot/node_modules/@octokit/auth-app/dist-node/index.js:466:12)
    at async Job.doExecute (/app/node_modules/bottleneck/light.js:405:18)

command-bot added 2 commits August 22, 2024 22:32
@command-bot
Copy link
Copy Markdown

command-bot bot commented Aug 23, 2024

@ordian Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=westend --target_dir=polkadot --pallet=polkadot_runtime_parachains::paras_inherent has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7109801 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7109801/artifacts/download.

Copy link
Copy Markdown
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically, how much is the weight consumed by paras inherent in Polkadot and Kusama RC now, vs. how much will it be when this is deployed? (and which one is closer to reality -- although this is hard to measure 🙈)

@ordian ordian added this pull request to the merge queue Aug 29, 2024
@ordian
Copy link
Copy Markdown
Contributor Author

ordian commented Aug 29, 2024

Empirically, how much is the weight consumed by paras inherent in Polkadot and Kusama RC now, vs. how much will it be when this is deployed? (and which one is closer to reality -- although this is hard to measure 🙈)

I've provided some back-of-the-envelope calculations in this comment: #849 (comment), but TL;DR is that the dominant cost are from enter_bitfields (which we multiply by the number of para validators, which is 500 on Kusama at the moment and will grow to 1k). This number was already reduced in polkadot-fellows/runtimes#433 by 55% and the reasoning is explained in background section of #5270. This PR will further reduce it cost by more than a half (need to rerun the benches on Kusama, but you can see the cost for westend for comparison). Please note, that the cost of enter_bitfields by itself will remain the same, but subtract dynamically the cost of enter_empty. These and other annoyances will be addressed with a long-term refactoring mentioned in the first comment.
However, #5270 will bring some (a lot) of the costs back by properly accounting for e.g. processed messages in enacted candidates.

Merged via the queue into master with commit cc7ebe0 Aug 29, 2024
@ordian ordian deleted the ao-fix-parainclusion-weight-overestimation branch August 29, 2024 09:16
ordian added a commit that referenced this pull request Aug 29, 2024
* master: (39 commits)
  short-term fix for para inherent weight overestimation (#5082)
  CI: Add backporting bot (#4795)
  Fix benchmark failures when using `insecure_zero_ed` flag (#5354)
  Command bot GHA v2 - /cmd <cmd> (#5457)
  Remove pallet::getter usage from treasury (#4962)
  Bump blake2b_simd from 1.0.1 to 1.0.2 (#5404)
  Bump rustversion from 1.0.14 to 1.0.17 (#5405)
  Bridge zombienet tests: remove old command (#5434)
  polkadot-parachain: Add omni-node variant with u64 block number (#5269)
  Refactor verbose test (#5506)
  Use umbrella crate for minimal template (#5155)
  IBP Coretime Polkadot bootnodes (#5499)
  rpc server: listen to `ipv6 socket` if available and `--experimental-rpc-endpoint` CLI option (#4792)
  Update approval-voting-regression-bench (#5504)
  change try-runtime rpc domains (#5443)
  polkadot-parachain-bin: Remove contracts parachain (#5471)
  Add feature to allow Aura collator to use full PoV size (#5393)
  Adding stkd bootnodes (#5470)
  Make `PendingConfigs` storage item public (#5467)
  frame-omni-bencher maintenance (#5466)
  ...
github-merge-queue bot pushed a commit that referenced this pull request Aug 29, 2024
On top of #5082.

## Background

Previously, before #3479, we would
[include](https://github.com/paritytech/polkadot-sdk/blame/75074952a859f90213ea25257b71ec2189dbcfc1/polkadot/runtime/parachains/src/builder.rs#L508C12-L508C44)
the cost enacting the candidate into the cost of processing a single
bitfield.
[Now](https://github.com/paritytech/polkadot-sdk/blame/dd48544a573dd02da2082cec1dda7ce735e2e719/polkadot/runtime/parachains/src/builder.rs#L529)
it is different, although the benchmarks seems to be not-up-to date.
Including the cost of enacting a candidate into a processing a single
bitfield cost was incorrect, since we multiple that by the number of
bitfields we have. Instead, we should separate calculate the cost of
processing a single bitfield without enactment, and multiple the cost of
enactment by the actual number of processed candidates (which is limited
by the number cores, not validators).

## Bench

Previously, the weight of `enact_candidate` was calculated manually
(without a benchmark) and then neglected:
https://github.com/paritytech/polkadot-sdk/blob/dd48544a573dd02da2082cec1dda7ce735e2e719/polkadot/runtime/parachains/src/inclusion/mod.rs#L584

In this PR, we have a benchmark for it and it's based on the number of
ump and sent hrmp messages as well as whether the candidate has a
runtime upgrade (new_validation_code).
The differences from the previous attempt
paritytech/polkadot#6929 are that
* we don't include the cost of enactment into the cost of processing a
backed candidate.
The reason for it is that enactment happens not in the same block as
backing (typically the next one), since we process bitfields before
backing votes.
* we don't take into account the size of the runtime upgrade, the
benchmark weight doesn't seem to depend much on it, but rather whether
there was one or not.

Similarly to the previous attempt, we don't account for dmp messages
(fixed cost). Also we don't account properly for received hrmp messages
(hrmp_watermark) because the cost of it depends on the runtime state and
can't be statically deduced in the benchmark (unless we pass the
information about channels as benchmark u32 arguments).

The total weight cost of processing a parainherent now includes the cost
of enactment of each candidate, but we don't do filtering based on that
(because we enact after processing bitfields and making other changes to
the storage).

## Numbers

```
Reads = 7 + (0 * u) + (3 * h) + (8 * c)
Writes = 10 + (1 * u) + (3 * h) + (7 * c)
```
In addition, there is a fixed cost of a few of ms (!) per candidate. 

This might result a full block slightly overflowing its weight with 200
enacted candidates, which in turn could prevent non-mandatory
transactions from being included in a block.

Given our modest limits on max ump and hrmp messages:
```
  maxUpwardMessageNumPerCandidate: 16
  hrmpMaxMessageNumPerCandidate: 10
```
and the fact that runtime upgrades are can't happen very frequently
(`validation_upgrade_cooldown`), we might only go over the limits in
case of many disputes.

TODOs:
- [x] Fix the overweight test
- [x] Generate the weights for Westend and Rococo
- [x] PRDoc

---------

Co-authored-by: command-bot <>
Co-authored-by: Alin Dima <alin@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T2-pallets This PR/Issue is related to a particular pallet. T8-polkadot This PR/Issue is related to/affects the Polkadot network. T12-benchmarks This PR/Issue is related to benchmarking and weights.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

paraInclusion is massively overestimating its weight

8 participants