chore(l2): optimize trie hashing and `BranchNode` encoding for zkVMs #4723

xqft · 2025-10-01T15:41:04Z

Motivation

Our trie does a preorder traversal for recursive hashing, which allocates a lot of buffers (from the encoder and the keccak hasher) all the way down until it starts actually hashing a leaf. A way better approach to implement recursive hashing is to do a postorder traversal, in which we start allocating memory when we get to the lowest node that needs to be hashed, after which the memory is dealloc'd and the next (parent or sibling) node allocs again. This approach was inspired by risc0's trie, although we had a preorder traversal hashing for the commit() function already.

This PR also optimizes a BranchNode encoding by skipping our Encoder type and writing directly to a preallocated buffer, instead of using two buffers and copying data from one to the other (this is because the current Encoder can't know the length of the encoded data beforehand, but we can calculate it if we know what we are encoding). This could also be implemented for other node types, but they are the minority of node types and the perf. gains are negligible.

A third, smaller optimization is to prevent cloning the cached/computed hashes from every node.

Description

adds memoize_hashes to both Node and NodeRef to implement postorder traversal
adds utility functions to calculate encoded lengths of a NodeHash and a string of bytes
changes BranchNode::encode_raw() to encode into a single buffer
adds compute_hash_ref to return a reference of the cached hashed

Testing
This branch was used in a client snapsynced to Mainnet, running successfully from nov. 4 to nov. 5.

Flamegraphs block 23385900 Mainnet
this reduces cycles of hash_no_commit (trie hashing) by 40%, which is 5% of total cycles

before:

after:

Proving times RTX 4090

10m 07s -> 09m 52s (block 23426995 Mainnet)
16m 42s -> 16m 22s  (block 23426996 Mainnet)

github-actions · 2025-10-28T19:26:44Z

Benchmark for `d9f57da`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	34.6±0.35ms	34.4±0.24ms	-0.58%
Trie/cita-trie insert 1k	3.5±0.01ms	3.5±0.01ms	0.00%
Trie/ethrex-trie insert 10k	48.3±0.85ms	45.8±1.04ms	-5.18%
Trie/ethrex-trie insert 1k	6.2±0.06ms	6.3±0.11ms	+1.61%

github-actions · 2025-10-30T14:42:29Z

Benchmark for `74c4f47`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	40.0±2.23ms	38.8±2.10ms	-3.00%
Trie/cita-trie insert 1k	3.5±0.09ms	3.6±0.26ms	+2.86%
Trie/ethrex-trie insert 10k	32.6±1.41ms	31.5±0.74ms	-3.37%
Trie/ethrex-trie insert 1k	5.3±0.04ms	5.1±0.02ms	-3.77%

github-actions · 2025-10-30T14:42:35Z

Benchmark for `77c8cd7`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	36.1±1.05ms	35.1±0.48ms	-2.77%
Trie/cita-trie insert 1k	3.5±0.07ms	3.6±0.12ms	+2.86%
Trie/ethrex-trie insert 10k	31.3±0.91ms	30.7±0.74ms	-1.92%
Trie/ethrex-trie insert 1k	5.2±0.08ms	5.1±0.06ms	-1.92%

github-actions · 2025-10-30T15:12:22Z

Benchmark for `ec5b844`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	36.4±1.64ms	36.7±1.64ms	+0.82%
Trie/cita-trie insert 1k	3.6±0.05ms	3.6±0.17ms	0.00%
Trie/ethrex-trie insert 10k	31.7±0.64ms	31.1±1.71ms	-1.89%
Trie/ethrex-trie insert 1k	5.3±0.03ms	5.2±0.06ms	-1.89%

github-actions · 2025-10-31T17:12:55Z

Benchmark for `c6d25c5`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	34.7±0.51ms	34.7±0.35ms	0.00%
Trie/cita-trie insert 1k	3.6±0.07ms	3.5±0.02ms	-2.78%
Trie/ethrex-trie insert 10k	31.0±0.92ms	30.3±0.16ms	-2.26%
Trie/ethrex-trie insert 1k	5.3±0.02ms	5.0±0.03ms	-5.66%

github-actions · 2025-11-05T18:30:36Z

Benchmark for `77fe7f9`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	34.5±0.20ms	34.6±0.37ms	+0.29%
Trie/cita-trie insert 1k	3.5±0.01ms	3.5±0.03ms	0.00%
Trie/ethrex-trie insert 10k	30.7±0.68ms	29.6±0.34ms	-3.58%
Trie/ethrex-trie insert 1k	2.8±0.01ms	2.7±0.01ms	-3.57%

github-actions · 2025-11-10T14:02:33Z

Benchmark for `c85ba9d`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	29.7±2.48ms	30.4±2.85ms	+2.36%
Trie/cita-trie insert 1k	2.9±0.06ms	2.9±0.13ms	0.00%
Trie/ethrex-trie insert 10k	29.0±2.47ms	26.9±1.67ms	-7.24%
Trie/ethrex-trie insert 1k	2.3±0.05ms	2.2±0.02ms	-4.35%

Oppen · 2025-11-10T18:15:32Z

crates/common/trie/node_hash.rs

 // Encoded as Vec<u8>
 impl RLPEncode for NodeHash {
    fn encode(&self, buf: &mut dyn bytes::BufMut) {
        RLPEncode::encode(&Into::<Vec<u8>>::into(self), buf)


Why do we build a vec to encode here?

Hmm not sure, maybe to take advantage of NodeHash::as_ref()? might try changing it now that you mentioned it

I think I'll do it in a different PR

crates/common/trie/rlp.rs

Oppen · 2025-11-10T18:18:53Z

crates/common/trie/rlp.rs

+    }
+
+    // Duplicated to prealloc the buffer and avoid calculating the payload length twice
+    fn encode_to_vec(&self) -> Vec<u8> {


Maybe we should make encode_to_vec in the generic implementation call the length method instead, wdyt?

The thing with that is that the generic length encodes into a buffer (a zero capacity Vec) and then returns the length of that buffer. Ideally RLPEncode should have a way to hint what the encoded size would be to prealloc the buffer (this is done manually in BranchNode::encode_to_vec), and default to a non-prealloced buffer.

If we call the current length we end up encoding twice

Co-authored-by: Mario Rugiero <[email protected]>

github-actions · 2025-11-10T19:45:09Z

Benchmark for `e1fc597`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	29.9±1.92ms	37.0±2.16ms	+23.75%
Trie/cita-trie insert 1k	3.0±0.06ms	2.9±0.03ms	-3.33%
Trie/ethrex-trie insert 10k	30.5±0.70ms	29.9±1.14ms	-1.97%
Trie/ethrex-trie insert 1k	2.3±0.04ms	2.3±0.07ms	0.00%

github-actions · 2025-11-10T20:10:34Z

Benchmark for `a4026d4`

Click to view benchmark

Test	Base	PR	%
Trie/cita-trie insert 10k	27.7±2.43ms	36.5±2.86ms	+31.77%
Trie/cita-trie insert 1k	2.9±0.02ms	2.9±0.13ms	0.00%
Trie/ethrex-trie insert 10k	30.7±0.90ms	28.3±1.91ms	-7.82%
Trie/ethrex-trie insert 1k	2.2±0.01ms	2.2±0.02ms	0.00%

…4723) **Motivation** Our trie does a preorder traversal for recursive hashing, which allocates a lot of buffers (from the encoder and the keccak hasher) all the way down until it starts actually hashing a leaf. A way better approach to implement recursive hashing is to do a postorder traversal, in which we start allocating memory when we get to the lowest node that needs to be hashed, after which the memory is dealloc'd and the next (parent or sibling) node allocs again. This approach was inspired by risc0's trie, although we had a preorder traversal hashing for the `commit()` function already. This PR also optimizes a `BranchNode` encoding by skipping our `Encoder` type and writing directly to a preallocated buffer, instead of using two buffers and copying data from one to the other (this is because the current `Encoder` can't know the length of the encoded data beforehand, but we can calculate it if we know what we are encoding). This could also be implemented for other node types, but they are the minority of node types and the perf. gains are negligible. A third, smaller optimization is to prevent cloning the cached/computed hashes from every node. **Description** - adds `memoize_hashes` to both `Node` and `NodeRef` to implement postorder traversal - adds utility functions to calculate encoded lengths of a `NodeHash` and a string of bytes - changes `BranchNode::encode_raw()` to encode into a single buffer - adds `compute_hash_ref` to return a reference of the cached hashed **Testing** This branch was used in a client snapsynced to Mainnet, running successfully from nov. 4 to nov. 5. **Flamegraphs** block 23385900 Mainnet this reduces cycles of `hash_no_commit` (trie hashing) by 40%, which is 5% of total cycles before: <img width="3024" height="646" alt="image" src="https://github.com/user-attachments/assets/a3f8bb05-eb80-4cf5-9347-984a7a7b4501" /> after: <img width="3024" height="618" alt="image" src="https://github.com/user-attachments/assets/1551d8a0-0829-4798-b244-5bca14b56f76" /> **Proving times** RTX 4090 ``` 10m 07s -> 09m 52s (block 23426995 Mainnet) 16m 42s -> 16m 22s (block 23426996 Mainnet) ``` --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Edgar <[email protected]> Co-authored-by: Ivan Litteri <[email protected]> Co-authored-by: Mario Rugiero <[email protected]>

xqft added 30 commits September 25, 2025 16:24

prealloc

a503cdb

fast rlp encoding for branch

cc65ec9

write directly to hasher

0d87f5b

dont use encode_write

2475090

precalc len, assume no inline

9731165

precalc len and write to hasher

bf46e33

quick rlp and hashing for the rest of nodes

9433e3c

fix partial/prefix encoding

7132ea0

fix partial/prefix encoding for case len 1

71519ea

use correct len on value encoding

04faadd

fix extension hash prefix

f7195f2

fix conditional on < 0x80

3cbe873

fix value prefix match case

b99f697

add todo and unreachable

0c3571c

bad assumption of leaf with single byte value

451b340

memoize hash

2f6f981

use encode_raw for branch

dfa1d27

remove write, add arrayvec buf

d8e77d9

lint

7d9aaac

remove arrayvec

b6cc6f6

remove arrayvec import

0641b1f

add comment to memoize_hashes and rename

fa6cd12

simplify memoize

24b6804

remove fast rlp

5d3fa97

dont use encoder

edf7050

update len

44035c4

remove inline assumption

4f0e036

add paylaod length encoding, comments

3dca224

simplify branch encoding

86ae6a0

fix encoded_length()

c2ed4d0

xqft added 3 commits October 30, 2025 11:36

implement encode_to_vec for branch node

42e638d

fix

68f9875

Merge branch 'main' into l2/opt_rlp_buffer

e701c6c

fmt

2720954

Merge branch 'main' into l2/opt_rlp_buffer

2815096

Merge branch 'main' into l2/opt_rlp_buffer

c3d8a8d

ilitteri approved these changes Nov 5, 2025

View reviewed changes

Merge branch 'main' into l2/opt_rlp_buffer

d7dadad

Oppen reviewed Nov 10, 2025

View reviewed changes

crates/common/trie/rlp.rs Outdated Show resolved Hide resolved

Oppen reviewed Nov 10, 2025

View reviewed changes

Oppen approved these changes Nov 10, 2025

View reviewed changes

Update crates/common/trie/rlp.rs

8bccc0c

Co-authored-by: Mario Rugiero <[email protected]>

Merge branch 'main' into l2/opt_rlp_buffer

951c6be

xqft enabled auto-merge November 10, 2025 20:07

xqft added this pull request to the merge queue Nov 10, 2025

Merged via the queue into main with commit 6367f53 Nov 10, 2025
51 of 53 checks passed

xqft deleted the l2/opt_rlp_buffer branch November 10, 2025 21:03

github-project-automation bot moved this to Done in ethrex_l2 Nov 10, 2025

chore(l2): optimize trie hashing and BranchNode encoding for zkVMs #4723

chore(l2): optimize trie hashing and BranchNode encoding for zkVMs #4723

Uh oh!

Conversation

xqft commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2025

Benchmark for d9f57da

Uh oh!

github-actions bot commented Oct 30, 2025

Benchmark for 74c4f47

Uh oh!

github-actions bot commented Oct 30, 2025

Benchmark for 77c8cd7

Uh oh!

github-actions bot commented Oct 30, 2025

Benchmark for ec5b844

Uh oh!

github-actions bot commented Oct 31, 2025

Benchmark for c6d25c5

Uh oh!

github-actions bot commented Nov 5, 2025

Benchmark for 77fe7f9

Uh oh!

github-actions bot commented Nov 10, 2025

Benchmark for c85ba9d

Uh oh!

Oppen Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

xqft Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

xqft Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Oppen Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

xqft Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 10, 2025

Benchmark for e1fc597

Uh oh!

github-actions bot commented Nov 10, 2025

Benchmark for a4026d4

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chore(l2): optimize trie hashing and `BranchNode` encoding for zkVMs #4723

chore(l2): optimize trie hashing and `BranchNode` encoding for zkVMs #4723

xqft commented Oct 1, 2025 •

edited

Loading

Benchmark for `d9f57da`

Benchmark for `74c4f47`

Benchmark for `77c8cd7`

Benchmark for `ec5b844`

Benchmark for `c6d25c5`

Benchmark for `77fe7f9`

Benchmark for `c85ba9d`

xqft Nov 10, 2025 •

edited

Loading

Benchmark for `e1fc597`

Benchmark for `a4026d4`