Skip to content

Conversation

@Oppen
Copy link
Contributor

@Oppen Oppen commented Mar 27, 2025

Motivation

Some of our sync APIs can produce starving when running on Tokio due to taking a long time to reach the next await-point.
Specifically, writing to the DB tends to take a long time, which blocks other tasks, sometimes the whole runtime due to how the scheduler in Tokio works.
Thus, we need a way to inform the runtime we're going to be working for a while, and give it control while we wait for stuff.

Description

Take the mutable APIs for the DB and mark them async. Then bubble that up to their users. Then make the functions non-blocking by using spawn_blocking to run on the blocking thread, releasing the runtime to handle more work.
The DB writing APIs had to change to pass-by-value to satisfy the borrow-checker in the blocking task context. I think I can use proper lifetime bounds with a helper crate, if that's preferred. The values were already being discarded after passing to the DB, so passing by value should not be a problem either way.

Special considerations:

  • For some work performed before benchmarks and EF tests which are inherently synchronous I opted for calling with an ad-hoc runtime instance and block_on, as that might reduce the changes needed by localizing the async work. If desired, that can be changed up to making a tokio::main. The same is true for some setup functions for tests.
  • For the DBs I had to separate the Tokio import. This is because they need to compile with L2, which means provers' custom compilers, and those don't support the networking functions in the stdlib, which Tokio with full features (as the workspace dep declares) brings them in.
  • The InMemoryDB was left untouched other than updating the interfaces, given hashmap access should be quick enough.
  • I need to comment on this hack: and_then can't be used on futures and everything became a mess without that little helper.
  • I'm unsure about whether or not we also want to cover the read APIs, at least for consistency I would think so, but for now I left them out.

closes #2402

@Oppen Oppen force-pushed the feat/async_apis branch from 9423796 to deb58b3 Compare March 27, 2025 19:28
@github-actions
Copy link

github-actions bot commented Mar 27, 2025

Lines of code report

Total lines added: 387
Total lines removed: 0
Total lines changed: 387

Detailed view
+------------------------------------------------------+-------+------+
| File                                                 | Lines | Diff |
+------------------------------------------------------+-------+------+
| ethrex/bench/criterion_benchmark.rs                  | 34    | +1   |
+------------------------------------------------------+-------+------+
| ethrex/cmd/ef_tests/blockchain/test_runner.rs        | 139   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/cmd/ef_tests/state/runner/levm_runner.rs      | 367   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/cmd/ef_tests/state/runner/revm_runner.rs      | 526   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/initializers.rs                    | 338   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/cmd/ethrex_l2/src/commands/stack.rs           | 451   | +6   |
+------------------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs               | 507   | +12  |
+------------------------------------------------------+-------+------+
| ethrex/crates/blockchain/fork_choice.rs              | 141   | +2   |
+------------------------------------------------------+-------+------+
| ethrex/crates/blockchain/mempool.rs                  | 574   | +3   |
+------------------------------------------------------+-------+------+
| ethrex/crates/blockchain/payload.rs                  | 551   | +7   |
+------------------------------------------------------+-------+------+
| ethrex/crates/blockchain/smoke_test.rs               | 229   | +24  |
+------------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/block_producer.rs         | 106   | +3   |
+------------------------------------------------------+-------+------+
| ethrex/crates/l2/utils/test_data_io.rs               | 90    | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/rlpx/eth/backend.rs     | 119   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync.rs                 | 577   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/state_sync.rs      | 238   | +4   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/storage_fetcher.rs | 244   | +6   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/storage_healing.rs | 101   | +2   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/trie_rebuild.rs    | 239   | +2   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/engine/fork_choice.rs   | 375   | +2   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/engine/payload.rs       | 683   | +4   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/eth/filter.rs           | 602   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/eth/mod.rs              | 164   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/l2/transaction.rs       | 211   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/rpc.rs                  | 712   | +17  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/utils.rs                | 334   | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/api.rs                         | 222   | +25  |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs                       | 1201  | +64  |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/in_memory.rs          | 545   | +24  |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/libmdbx.rs            | 1233  | +70  |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/redb.rs               | 1055  | +98  |
+------------------------------------------------------+-------+------+

@Oppen Oppen added the performance Block execution throughput and performance in general label Mar 28, 2025
@Oppen Oppen force-pushed the feat/async_apis branch from 04af854 to 80014b9 Compare March 28, 2025 15:52
@github-actions
Copy link

github-actions bot commented Mar 28, 2025

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 186.096 ± 1.458 183.811 188.341 1.00
head 187.574 ± 1.413 185.200 189.885 1.01 ± 0.01

@Oppen Oppen marked this pull request as ready for review March 31, 2025 16:28
@Oppen Oppen requested a review from a team as a code owner March 31, 2025 16:28
@Oppen Oppen force-pushed the feat/async_apis branch 2 times, most recently from 12e2648 to 29500f1 Compare April 1, 2025 15:31
@Oppen Oppen changed the title feat/async apis feat(l1,l2): make write path APIs async Apr 1, 2025
Comment on lines 221 to 228
Box::pin(async {
Self::RemoveDB {
datadir: opts.datadir.clone(),
}
.run(opts)
.await
})
.await?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was done to propagate the prompt before deletion, so I'll restore it. Initially I had to come to a similar solution but abstracted away (just like the other commit) due to it looking a bit more complex.

.await
.map_err(SyncError::JoinHandle)
}?;
Self::add_blocks(blockchain, blocks, sync_head_found).await
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might still need this spawn_blocking in case it hogs the CPU. But if that's the case, I believe the proper place is inside the call.

)?;
let sync_head = fork_choice_state.head_block_hash;
tokio::spawn(async move {
// If we can't get hold of the syncer, then it means that there is an active sync in process
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to restore the comment. It was lost during a period where this task would break the build.

.route("/", post(handle_authrpc_request))
.route(
"/",
post(|ctx, auth, body| async { handle_authrpc_request(ctx, auth, body).await }),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this lambda needs a name? This looks just ugly.

Copy link
Collaborator

@Arkenan Arkenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small comments


/// Executes a block withing a new vm instance and state
fn execute_block(&self, block: &Block) -> Result<BlockExecutionResult, ChainError> {
/// TODO: make the asyncness real
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not?

}

pub fn store_block(
// TODO(PLT): review async
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? Can we start an issue if there's a concrete thing to review?

let since = Instant::now();
// Easiest why to operate on the result of `execute_block` without
// having to add too much control flow or return early
// Async doesn't play well with `.and_then`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't await.and_then work well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of the lambda and_then receives needs to be an Option, but then because that lambda can't be async, I can't await inside it, so the result is a Future.


/// Add a transaction to the mempool checking that the transaction is valid
pub fn add_transaction_to_pool(&self, transaction: Transaction) -> Result<H256, MempoolError> {
pub async fn add_transaction_to_pool(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this one async? The mempool is in memory. Or at least it's not async itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I added that one by mistake during one of the first iterations. The version for blobs also has the same. I'll remove them in a later iteration.

@Oppen Oppen force-pushed the feat/async_apis branch from 5082a95 to 69ef4fc Compare April 3, 2025 20:47
@github-actions
Copy link

github-actions bot commented Apr 3, 2025

Benchmark for b8e385c

Click to view benchmark
Test Base PR %
Trie/cita-trie insert 10k 36.1±1.02ms 37.1±0.55ms +2.77%
Trie/cita-trie insert 1k 3.7±0.02ms 3.9±0.12ms +5.41%
Trie/ethrex-trie insert 10k 190.1±1.41ms 195.2±4.27ms +2.68%
Trie/ethrex-trie insert 1k 17.1±0.78ms 17.5±0.55ms +2.34%

@github-actions
Copy link

github-actions bot commented Apr 3, 2025

Benchmark for 6b7d6b9

Click to view benchmark
Test Base PR %
Trie/cita-trie insert 10k 35.1±0.40ms 35.6±0.44ms +1.42%
Trie/cita-trie insert 1k 3.7±0.04ms 3.7±0.05ms 0.00%
Trie/ethrex-trie insert 10k 187.0±2.20ms 188.7±6.11ms +0.91%
Trie/ethrex-trie insert 1k 17.2±0.26ms 17.1±0.11ms -0.58%

@github-actions
Copy link

github-actions bot commented Apr 3, 2025

Benchmark Results Comparison

PR Results

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
revm_Factorial 247.1 ± 1.6 244.9 251.0 1.00
levm_Factorial 803.9 ± 12.6 794.0 837.2 3.25 ± 0.06

Benchmark Results: Factorial - Recursive

Command Mean [s] Min [s] Max [s] Relative
revm_FactorialRecursive 1.521 ± 0.096 1.383 1.645 1.00
levm_FactorialRecursive 13.898 ± 0.210 13.665 14.193 9.14 ± 0.59

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
revm_Fibonacci 209.0 ± 2.7 207.4 215.9 1.00
levm_Fibonacci 784.3 ± 7.6 773.0 796.9 3.75 ± 0.06

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
revm_ManyHashes 8.7 ± 0.1 8.6 8.8 1.00
levm_ManyHashes 16.3 ± 0.2 16.1 16.6 1.86 ± 0.02

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
revm_BubbleSort 3.275 ± 0.011 3.262 3.292 1.00
levm_BubbleSort 5.603 ± 0.051 5.551 5.693 1.71 ± 0.02

Benchmark Results: ERC20 - Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
revm_ERC20Transfer 253.4 ± 2.7 250.9 258.6 1.00
levm_ERC20Transfer 485.2 ± 12.7 476.5 519.8 1.91 ± 0.05

Benchmark Results: ERC20 - Mint

Command Mean [ms] Min [ms] Max [ms] Relative
revm_ERC20Mint 146.5 ± 5.5 143.6 161.0 1.00
levm_ERC20Mint 313.4 ± 4.5 309.1 323.0 2.14 ± 0.09

Benchmark Results: ERC20 - Approval

Command Mean [s] Min [s] Max [s] Relative
revm_ERC20Approval 1.049 ± 0.007 1.042 1.065 1.00
levm_ERC20Approval 1.830 ± 0.014 1.808 1.854 1.74 ± 0.02

Main Results

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
revm_Factorial 234.6 ± 1.1 233.1 237.2 1.00
levm_Factorial 803.5 ± 6.3 795.7 813.9 3.43 ± 0.03

Benchmark Results: Factorial - Recursive

Command Mean [s] Min [s] Max [s] Relative
revm_FactorialRecursive 1.477 ± 0.108 1.332 1.603 1.00
levm_FactorialRecursive 14.028 ± 0.180 13.668 14.231 9.50 ± 0.70

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
revm_Fibonacci 202.3 ± 1.1 200.3 204.5 1.00
levm_Fibonacci 787.1 ± 5.7 777.8 795.9 3.89 ± 0.03

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
revm_ManyHashes 8.7 ± 0.1 8.6 8.8 1.00
levm_ManyHashes 16.4 ± 0.2 16.2 17.0 1.89 ± 0.03

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
revm_BubbleSort 3.196 ± 0.006 3.188 3.209 1.00
levm_BubbleSort 5.626 ± 0.030 5.587 5.678 1.76 ± 0.01

Benchmark Results: ERC20 - Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
revm_ERC20Transfer 245.0 ± 1.7 243.4 248.9 1.00
levm_ERC20Transfer 482.1 ± 3.9 475.6 488.1 1.97 ± 0.02

Benchmark Results: ERC20 - Mint

Command Mean [ms] Min [ms] Max [ms] Relative
revm_ERC20Mint 140.4 ± 1.0 138.9 142.1 1.00
levm_ERC20Mint 314.6 ± 3.0 311.4 318.8 2.24 ± 0.03

Benchmark Results: ERC20 - Approval

Command Mean [s] Min [s] Max [s] Relative
revm_ERC20Approval 1.043 ± 0.007 1.034 1.058 1.00
levm_ERC20Approval 1.844 ± 0.050 1.801 1.979 1.77 ± 0.05

Copy link
Collaborator

@mpaulucci mpaulucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀

@Oppen Oppen added this pull request to the merge queue Apr 4, 2025
Merged via the queue into main with commit cebab85 Apr 4, 2025
52 checks passed
@Oppen Oppen deleted the feat/async_apis branch April 4, 2025 14:28
github-merge-queue bot pushed a commit that referenced this pull request Apr 15, 2025
**Motivation**

Like with #2336 the goal is to avoid blocking the current task.

**Description**

Makes store getters not related to tries (and thus the EVM) async, and
propagates the changes to users of store. They are made async by using
`spawn_blocking `

Many instances of functional code (`and_then`, `map`) had to be replaced
due to bad async support.

Closes #2424
pedrobergamini pushed a commit to pedrobergamini/ethrex that referenced this pull request Aug 24, 2025
**Motivation**

Some of our sync APIs can produce starving when running on Tokio due to
taking a long time to reach the next `await`-point.
Specifically, writing to the DB tends to take a long time, which blocks
other tasks, sometimes the whole runtime due to how the scheduler in
Tokio works.
Thus, we need a way to inform the runtime we're going to be working for
a while, and give it control while we wait for stuff.

**Description**

Take the mutable APIs for the DB and mark them `async`. Then bubble that
up to their users. Then make the functions non-blocking by using
`spawn_blocking` to run on the blocking thread, releasing the runtime to
handle more work.
The DB writing APIs had to change to pass-by-value to satisfy the
borrow-checker in the blocking task context. I think I can use proper
lifetime bounds with a helper crate, if that's preferred. The values
were already being discarded after passing to the DB, so passing by
value should not be a problem either way.

Special considerations:
- For some work performed before benchmarks and EF tests which are
inherently synchronous I opted for calling with an ad-hoc runtime
instance and `block_on`, as that might reduce the changes needed by
localizing the async work. If desired, that can be changed up to making
a `tokio::main`. The same is true for some setup functions for tests.
- For the DBs I had to separate the Tokio import. This is because they
need to compile with L2, which means provers' custom compilers, and
those don't support the networking functions in the stdlib, which Tokio
with full features (as the workspace dep declares) brings them in.
- The InMemoryDB was left untouched other than updating the interfaces,
given hashmap access should be quick enough.
- I need to comment on [this
hack](https://github.com/lambdaclass/ethrex/pull/2336/files#diff-264636d3ee6ee67bd6e136b8c98f74152de6a8e2a07f597cfb5f622d4e0d815aR143-R146):
`and_then` can't be used on futures and everything became a mess without
that little helper.
- I'm unsure about whether or not we also want to cover the read APIs,
at least for consistency I would think so, but for now I left them out.

closes lambdaclass#2402
pedrobergamini pushed a commit to pedrobergamini/ethrex that referenced this pull request Aug 24, 2025
**Motivation**

Like with lambdaclass#2336 the goal is to avoid blocking the current task.

**Description**

Makes store getters not related to tries (and thus the EVM) async, and
propagates the changes to users of store. They are made async by using
`spawn_blocking `

Many instances of functional code (`and_then`, `map`) had to be replaced
due to bad async support.

Closes lambdaclass#2424
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Block execution throughput and performance in general

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make db-operations for all disk operations async and spawn-blocking

4 participants