Update Wasm benchmarks by Robbepop · Pull Request #2957 · paritytech/polkadot-sdk

Robbepop · 2024-01-17T10:02:35Z

In #2941 we found out that the new Wasmi (register) is very effective at optimizing away certain benchmark bytecode constructs in a way that created an unfair advantage over Wasmi (stack) which yielded our former benchmarks to be ineffective at properly measuring the performance impact.

This PR adjusts both affected benchmarks to fix the stated problems.
Affected are

instr_i64const -> instr_i64add: Renamed since it now measures the performance impact of the Wasm i64.add instruction with locals as inputs and outputs. This makes it impossible for Wasmi (register) to aggressively optimize away the entire function body (as it previously did) but still provides a way for Wasmi (register) to shine with its register based execution model.
call_with_code_per_byte: Now uses local.get instead of i32.const for the if condition which prevents Wasmi (register) to aggressively optimizing away whole parts of the if creating an unfair advantage.

cc @athei

The former instr_i64const benchmark has had the problem in that it was too simple to optimize for the new Wasmi (register), resulting in an artificial 20x speedup since Wasmi (register) mostly optimized the whole function body away. The new benchmark body still makes use of the new Wasmi (register) optimizations but in a way that no computation is lost and thus the benchmark is more aligned to real world base weights for both Wasmi (stack) and Wasmi (register) executors.

It now uses local.get(0) instead of i32.const for the if condition which makes it impossible for Wasmi (register) to aggressively optimize whole parts of the if away at compilation time thus creating an unfair advantage in benchmarks towards Wasmi (stack).

Robbepop · 2024-01-17T10:05:04Z

bot bench substrate-pallet --pallet=pallet_contracts

command-bot · 2024-01-17T10:05:09Z

@Robbepop "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts (https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4940621) was cancelled in #2957 (comment)

substrate/frame/contracts/src/benchmarking/mod.rs

Robbepop · 2024-01-17T10:36:14Z

bot cancel

command-bot · 2024-01-17T10:36:34Z

@Robbepop Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4940621 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4940621/artifacts/download.

Robbepop · 2024-01-17T10:36:59Z

bot bench substrate-pallet --pallet=pallet_contracts

command-bot · 2024-01-17T10:37:04Z

@Robbepop https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4941568 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 2-5601ff89-fc96-4e77-ae30-0aefea757569 to cancel this command or bot cancel to cancel all commands in this pull request.

command-bot · 2024-01-17T10:52:16Z

@Robbepop Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4941568 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4941568/artifacts/download.

Robbepop · 2024-01-17T10:52:37Z

bot bench substrate-pallet --pallet=pallet_contracts

command-bot · 2024-01-17T10:52:42Z

@Robbepop https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4941733 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 3-ae5d6a3c-59f0-43bb-b0e3-da828d8189cb to cancel this command or bot cancel to cancel all commands in this pull request.

substrate/frame/contracts/src/schedule.rs

This is weird to me since we change an automatically generated file but code review recommended to change this to make the CI happy so I do.

command-bot · 2024-01-17T11:06:46Z

@Robbepop Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4941733 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4941733/artifacts/download.

Robbepop · 2024-01-17T11:09:32Z

bot bench substrate-pallet --pallet=pallet_contracts

command-bot · 2024-01-17T11:09:37Z

@Robbepop https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4942135 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 4-a4beb28b-b880-41c2-85f3-8a120f623fcf to cancel this command or bot cancel to cancel all commands in this pull request.

…nto HEAD

…=dev --target_dir=substrate --pallet=pallet_contracts

command-bot · 2024-01-17T12:46:10Z

@Robbepop Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4942135 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4942135/artifacts/download.

Robbepop · 2024-01-17T19:29:10Z

I have an explanation for the performance difference between the old instr_i64const and the new instr_i64add.
Recap: Since instr_i64add has roughly double the amount of instructions than instr_i64const we assumed that it would also run twice as slow, however, benchmarks showed that it in fact runs roughly 20% faster. Why is that?

An explanation is the following: Wasmi (stack) has optimized i64.const instructions for i64 (and i32) values in range i32::MIN..i32::MAX. For all other values of i64 it needs to load the constant value from a global constant pool which is slow because it isn't local and not cache friendly (more or less random access).
However, in the old instr_i64const benchmark all of the i64.const constant values were generated in i64::MIN..i64::MAX uniformely making it extremely unlikely (1/2^32) that the constant could be small enough for Wasmi (stack) to use the optimized instruction. Therefore the former instr_i64const benchmark tested the absolute and kinda unrealistic worst case possible for i64.const instructions all the time whereas the new instr_i64add has a more realistic approach and Wasmi (stack) can execute all the instructions in a cache friendly manner making the results more realistic.

athei

Given your findings we need to keep continuing using big numbers to exercise this worst case. While it is "unrealistic" in practise that big numbers are used it is very easy for an attacker to fill a contract with these instructions. If we want to profit from this optimization it has to be accounted for in the wasmi gas model. Not by optimistically benchmarking for that optimization. Benchmarks have to be for the worst case.

While the benchmark itself only improves by 20% you have to keep in mind that we are dividing that by 4 to account for the increase of instructions. So you actually have to compare what is being put into the schedule and not only the raw benchmarks results.

Before we divided by 2. So with this PR we are actually using 40% less gas per instruction.

substrate/frame/contracts/src/schedule.rs

Robbepop · 2024-01-18T15:02:30Z

In defense of the new benchmark using proper i64.add instructions as base weight I have to disagree.

The issue is that the previous benchmark using i64.const was accidentally using a worst-case for the Wasmi (stack), however, under Wasmi (register) this might totally behave differently since both Wasmi versions handle these big constants differently.
Therefore the old benchmark was only implicitly testing the worst case and only by accident since I am sure that the person who implemented this was not even aware of this Wasmi optimization for small constant values.

If we want to do a proper job with accounting for the worst case instruction we probably should at least be explicit about our intent and use i64.load or i64.store or ideally both with random access patterns and a huge linear memory area that does not fit into the CPU cache. However, at this point we are benchmarking CPU details. As discussed in chat this should behave extremely similar to the previous i64.const with big values but is explicit about the worst-case random access and not implicit.

Though, there technically are even worse offenders with respect to performance but it always varies from engine to engine. For example Wasmi (stack) is very efficient at handling function calls whereas Wasmi (register) is much more efficient at handling computation and conditional branches.

At the very moment where we are about to change the underlying Wasm execution engine soon and furthermore where a thorough Polkadot SDK built-in gas model is in sight, I don't see a huge gain in trying to do a splendid job here right now.

If division by 4 is a problem then division by 2 (as planned by me originally) would be a good enough compromise.

athei · 2024-01-18T17:22:45Z

The issue is that the previous benchmark using i64.const was accidentally using a worst-case for the Wasmi (stack), however, under Wasmi (register) this might totally behave differently since both Wasmi versions handle these big constants differently.

But in this PR we are using the stack machine. We don't merge code to master saying "will be fixed with the next PR". If we want to write a benchmark for the register machine we need to do that on the PR that activates the register machine. So maybe you should merge the two PRs after all. You can still compare weights between different commits in the same PR.

Therefore the old benchmark was only implicitly testing the worst case and only by accident since I am sure that the person who implemented this was not even aware of this Wasmi optimization for small constant values.

Why would it matter if it was accidental? When I implemented it I was just paranoid and used random numbers so no optimisation can take place. And this payed off as it turned out that the optimization was dodged avoiding a potential security problem.

If we want to do a proper job with accounting for the worst case instruction we probably should at least be explicit about our intent and use i64.load or i64.store or ideally both with random access patterns and a huge linear memory area that does not fit into the CPU cache. However, at this point we are benchmarking CPU details. As discussed in chat this should behave extremely similar to the previous i64.const with big values but is explicit about the worst-case random access and not implicit.

Then you should do exactly that. You can just use a maximum sized memory (as defined in the Config trait). I know it is annoying that we overestimate the costs. But without a more precise gas model we simply can't reduce the weight safely.
But contracts memory is pretty small so changes are it does not tank the performance too much.

At the very moment where we are about to change the underlying Wasm execution engine soon and furthermore where a thorough Polkadot SDK built-in gas model is in sight, I don't see a huge gain in trying to do a splendid job here right now.

I don't agree with his "we can't be perfect so we might as well be less correct than right now" argument. I am not asking for perfection. I am just asking not to knowingly benchmark something that exercises a best case scenario: Executing 4 very fast instructions and then dividing this by 4. Please keep in mind that having too high numbers is totally fine. It is safe. This code is in production. If we are reducing it by fiddling with the benchmark we need to be very careful. So please just be safe here. You can optimize with the register machine. So if it is less work for you, you can just do all of this on one PR so you don't have to care for the stack machine anymore.

Robbepop · 2024-01-19T15:31:25Z

After some discussion in chat we agreed to use the proposed i64.{load,store} based benchmark since it is kinda similar to what was done previously by accident and kinda models some worst case behavior for a base instruction with random memory access patterns as described earlier. The new benchmark test case is suited for both Wasmi (stack) and Wasmi (register). Wasmi (register) can still profit from some minor optimizations inherent to its design but will suffer from the same worst case random memory accesses.

@athei I just implemented this agreed design test case.

Robbepop · 2024-01-19T16:28:02Z

bot bench substrate-pallet --pallet=pallet_contracts

Robbepop · 2024-01-19T17:08:40Z

bot bench substrate-pallet --pallet=pallet_contracts

command-bot · 2024-01-19T17:08:45Z

@Robbepop https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4975857 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 5-17907f2b-f59e-42ef-85b2-bd5fd0902264 to cancel this command or bot cancel to cancel all commands in this pull request.

athei

Looks good. Let's wait for the benchmark results.

substrate/frame/contracts/src/schedule.rs

substrate/frame/contracts/src/benchmarking/mod.rs

…=dev --target_dir=substrate --pallet=pallet_contracts

command-bot · 2024-01-19T18:42:07Z

@Robbepop Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4975857 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4975857/artifacts/download.

Robbepop · 2024-01-19T19:02:04Z

Applied your review suggestions.

Rendered benchmark results:

Note that instr_i64const is actually named instr_i64_load_store now.

After division by 6 (because of 6 instructions) the benchmarks is rouhgly twice as fast (equals to half gas costs). This is explained because before there every second instructions was causing a worst-case performance behavior due to the random access of big value constants and now only 2 of the 6 instructions have this worst-case behavior, however, at least we are now explicit about it and are engine agnostic. So this should probably be fine.

@athei

In paritytech#2941 we found out that the new Wasmi (register) is very effective at optimizing away certain benchmark bytecode constructs in a way that created an unfair advantage over Wasmi (stack) which yielded our former benchmarks to be ineffective at properly measuring the performance impact. This PR adjusts both affected benchmarks to fix the stated problems. Affected are - `instr_i64const` -> `instr_i64add`: Renamed since it now measures the performance impact of the Wasm `i64.add` instruction with locals as inputs and outputs. This makes it impossible for Wasmi (register) to aggressively optimize away the entire function body (as it previously did) but still provides a way for Wasmi (register) to shine with its register based execution model. - `call_with_code_per_byte`: Now uses `local.get` instead of `i32.const` for the `if` condition which prevents Wasmi (register) to aggressively optimizing away whole parts of the `if` creating an unfair advantage. cc @athei --------- Co-authored-by: command-bot <> Co-authored-by: Alexander Theißen <[email protected]> Co-authored-by: Ignacio Palacios <[email protected]>

Robbepop added 3 commits January 17, 2024 10:52

remove no longer used DynInstr::RandomI64Repeated

7440dd7

Robbepop requested a review from athei as a code owner January 17, 2024 10:02

Robbepop requested a review from a team January 17, 2024 10:02

apply rustfmt

685e256

pgherveou reviewed Jan 17, 2024

View reviewed changes

substrate/frame/contracts/src/benchmarking/mod.rs Outdated Show resolved Hide resolved

pgherveou reviewed Jan 17, 2024

View reviewed changes

substrate/frame/contracts/src/benchmarking/mod.rs Outdated Show resolved Hide resolved

remove unnecessary comment

0c0145c

pgherveou added the R0-no-crate-publish-required The change does not require any crates to be re-published. label Jan 17, 2024

update InstructionWeights for new instr_i64add

fe4c387

fix bug in cost_instr not using macro param

4d43e10

pgherveou reviewed Jan 17, 2024

View reviewed changes

substrate/frame/contracts/src/schedule.rs Outdated Show resolved Hide resolved

pgherveou approved these changes Jan 17, 2024

View reviewed changes

rename instr_i64const -> instr_i64add

e48be13

This is weird to me since we change an automatically generated file but code review recommended to change this to make the CI happy so I do.

command-bot added 2 commits January 17, 2024 11:10

Merge branch 'master' of https://github.com/paritytech/polkadot-sdk i…

8fb2cbd

…nto HEAD

".git/.scripts/commands/bench/bench.sh" --subcommand=pallet --runtime…

152e0e3

…=dev --target_dir=substrate --pallet=pallet_contracts

athei requested changes Jan 18, 2024

View reviewed changes

substrate/frame/contracts/src/schedule.rs Outdated Show resolved Hide resolved

actually divide instruction costs by 4

21e395f

Robbepop added 2 commits January 19, 2024 16:28

craft i64.{load,store} based benchmark

1aa931f

rename new benchmark

3ea7e56

Robbepop added 4 commits January 19, 2024 16:42

use max memory via Config

45dfff9

we execute 6 instructions per block instead of 4 now

29658b7

use weights.rs from master branch

32bab65

Merge branch 'master' into rf-update-wasm-benchmarks

dd13e5c

This comment was marked as outdated.

Sign in to view

fix clippy warning

b240cbc

athei reviewed Jan 19, 2024

View reviewed changes

substrate/frame/contracts/src/schedule.rs Outdated Show resolved Hide resolved

substrate/frame/contracts/src/benchmarking/mod.rs Outdated Show resolved Hide resolved

".git/.scripts/commands/bench/bench.sh" --subcommand=pallet --runtime…

906c7df

…=dev --target_dir=substrate --pallet=pallet_contracts

update comments

7413be0

athei approved these changes Jan 19, 2024

View reviewed changes

Robbepop enabled auto-merge January 19, 2024 19:13

Robbepop added this pull request to the merge queue Jan 19, 2024

Merged via the queue into paritytech:master with commit e02c520 Jan 19, 2024

Robbepop deleted the rf-update-wasm-benchmarks branch January 19, 2024 20:23

Conversation

Robbepop commented Jan 17, 2024

Uh oh!

Robbepop commented Jan 17, 2024

Uh oh!

command-bot bot commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Robbepop commented Jan 17, 2024

Uh oh!

command-bot bot commented Jan 17, 2024

Uh oh!

Robbepop commented Jan 17, 2024

Uh oh!

command-bot bot commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

command-bot bot commented Jan 17, 2024

Uh oh!

Robbepop commented Jan 17, 2024

Uh oh!

command-bot bot commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

command-bot bot commented Jan 17, 2024

Uh oh!

Robbepop commented Jan 17, 2024

Uh oh!

command-bot bot commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

command-bot bot commented Jan 17, 2024

Uh oh!

Robbepop commented Jan 17, 2024

Uh oh!

athei left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Robbepop commented Jan 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

athei commented Jan 18, 2024

Uh oh!

Robbepop commented Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Robbepop commented Jan 19, 2024

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Robbepop commented Jan 19, 2024

Uh oh!

command-bot bot commented Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

athei left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

command-bot bot commented Jan 19, 2024

Uh oh!

Robbepop commented Jan 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

command-bot bot commented Jan 17, 2024 •

edited

Loading

command-bot bot commented Jan 17, 2024 •

edited

Loading

command-bot bot commented Jan 17, 2024 •

edited

Loading

command-bot bot commented Jan 17, 2024 •

edited

Loading

athei left a comment •

edited

Loading

Robbepop commented Jan 18, 2024 •

edited

Loading

Robbepop commented Jan 19, 2024 •

edited

Loading

command-bot bot commented Jan 19, 2024 •

edited

Loading

Robbepop commented Jan 19, 2024 •

edited

Loading