Skip to content

Conversation

@MegaRedHand
Copy link
Collaborator

@MegaRedHand MegaRedHand commented Nov 14, 2025

Motivation

Our current implementation uses intermediate allocations for hashing trie nodes. This is inefficient, since the buffer could be allocated once.

Also, we're currently allocating multiple buffers in Node::decode_unfinished. This can be replaced by a simple stack allocation.

Description

This PR adds a compute_hash_no_alloc function which receives a buffer and avoids the allocation. It also replaces the temporary buffers used in Node::decode_unfinished with a stack allocated array and references to the original buffer.

Flamegraph before:

Screenshot 2025-11-14 at 20 48 14

Flamegraph after:

Screenshot 2025-11-14 at 20 48 51

@github-actions github-actions bot added L1 Ethereum client performance Block execution throughput and performance in general labels Nov 14, 2025
@github-actions
Copy link

github-actions bot commented Nov 14, 2025

Lines of code report

Total lines added: 56
Total lines removed: 0
Total lines changed: 56

Detailed view
+---------------------------------------------+-------+------+
| File                                        | Lines | Diff |
+---------------------------------------------+-------+------+
| ethrex/crates/common/rlp/structs.rs         | 164   | +1   |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/node.rs           | 352   | +16  |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/node/branch.rs    | 571   | +7   |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/node/extension.rs | 515   | +7   |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/node/leaf.rs      | 302   | +7   |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/rlp.rs            | 136   | +10  |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/trie.rs           | 978   | +1   |
+---------------------------------------------+-------+------+
| ethrex/crates/common/trie/trie_sorted.rs    | 447   | +7   |
+---------------------------------------------+-------+------+

@github-actions
Copy link

Benchmark for fdcc01e

Click to view benchmark
Test Base PR %
Trie/cita-trie insert 10k 27.9±0.65ms 28.3±1.58ms +1.43%
Trie/cita-trie insert 1k 2.8±0.01ms 2.9±0.09ms +3.57%
Trie/ethrex-trie insert 10k 24.7±0.71ms 24.5±0.63ms -0.81%
Trie/ethrex-trie insert 1k 2.2±0.01ms 2.2±0.01ms 0.00%

@github-actions
Copy link

Benchmark for 632c51a

Click to view benchmark
Test Base PR %
Trie/cita-trie insert 10k 27.7±0.71ms 27.4±0.31ms -1.08%
Trie/cita-trie insert 1k 2.9±0.01ms 2.9±0.20ms 0.00%
Trie/ethrex-trie insert 10k 24.2±0.46ms 24.1±0.55ms -0.41%
Trie/ethrex-trie insert 1k 2.2±0.01ms 2.2±0.01ms 0.00%

@github-actions
Copy link

Benchmark for aa0588b

Click to view benchmark
Test Base PR %
Trie/cita-trie insert 10k 27.9±0.35ms 28.1±1.20ms +0.72%
Trie/cita-trie insert 1k 2.8±0.01ms 2.9±0.18ms +3.57%
Trie/ethrex-trie insert 10k 24.9±0.52ms 24.9±0.97ms 0.00%
Trie/ethrex-trie insert 1k 2.2±0.04ms 2.2±0.01ms 0.00%

@github-actions
Copy link

github-actions bot commented Nov 14, 2025

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 59.849 ± 0.309 59.472 60.303 1.00 ± 0.01
head 59.747 ± 0.308 59.265 60.306 1.00

@github-actions
Copy link

Benchmark for 116c57f

Click to view benchmark
Test Base PR %
Trie/cita-trie insert 10k 28.7±1.32ms 28.5±0.82ms -0.70%
Trie/cita-trie insert 1k 2.8±0.01ms 2.9±0.12ms +3.57%
Trie/ethrex-trie insert 10k 25.4±0.97ms 25.1±0.84ms -1.18%
Trie/ethrex-trie insert 1k 2.2±0.01ms 2.2±0.03ms 0.00%

@MegaRedHand MegaRedHand changed the title perf(l1): avoid intermediate allocations when computing trie node hashes perf(l1): avoid intermediate allocations when decoding and hashing nodes Nov 14, 2025
@MegaRedHand MegaRedHand changed the title perf(l1): avoid intermediate allocations when decoding and hashing nodes perf(l1): avoid intermediate allocations when decoding and hashing trie nodes Nov 14, 2025
@MegaRedHand MegaRedHand changed the title perf(l1): avoid intermediate allocations when decoding and hashing trie nodes perf(l1): avoid temporary allocations when decoding and hashing trie nodes Nov 14, 2025
@github-actions
Copy link

Benchmark for 58823f7

Click to view benchmark
Test Base PR %
Trie/cita-trie insert 10k 28.1±0.56ms 28.4±0.70ms +1.07%
Trie/cita-trie insert 1k 2.9±0.01ms 2.8±0.09ms -3.45%
Trie/ethrex-trie insert 10k 24.9±0.83ms 24.8±0.36ms -0.40%
Trie/ethrex-trie insert 1k 2.2±0.01ms 2.2±0.05ms 0.00%

@MegaRedHand MegaRedHand marked this pull request as ready for review November 14, 2025 23:40
@MegaRedHand MegaRedHand requested a review from a team as a code owner November 14, 2025 23:40
Copilot AI review requested due to automatic review settings November 14, 2025 23:40
@ethrex-project-sync ethrex-project-sync bot moved this to In Review in ethrex_l1 Nov 14, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes trie node hashing and decoding by eliminating temporary allocations. The changes introduce buffer reuse patterns for hash computation and replace heap-allocated vectors with stack-allocated arrays in the RLP decoder.

  • Added compute_hash_no_alloc functions that accept a reusable buffer parameter
  • Modified memoize_hashes to accept and reuse a buffer throughout recursive traversals
  • Replaced dynamic vector allocation in RLP decoder with a stack-allocated array of references
  • Added get_encoded_item_ref in RLP decoder to avoid unnecessary Vec allocations

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
crates/common/trie/node.rs Adds compute_hash_no_alloc method and modifies memoize_hashes to accept a buffer parameter for reuse
crates/common/trie/node/branch.rs Implements compute_hash_no_alloc for BranchNode with buffer reuse pattern
crates/common/trie/node/extension.rs Implements compute_hash_no_alloc for ExtensionNode with buffer reuse pattern
crates/common/trie/node/leaf.rs Implements compute_hash_no_alloc for LeafNode with buffer reuse pattern
crates/common/trie/rlp.rs Replaces heap-allocated Vec with stack-allocated array for RLP items, uses reference-based decoding
crates/common/rlp/structs.rs Adds get_encoded_item_ref to return references instead of allocating new vectors
crates/common/trie/trie_sorted.rs Updates hash computation calls to use new buffer-accepting methods with 512-byte capacity
crates/common/trie/trie.rs Updates root hash computation to use buffer reuse pattern
CHANGELOG.md Documents the performance optimization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
}

/// Computes the node's hash
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment is identical to the one for compute_hash above. Consider clarifying that this function uses a provided buffer to avoid allocations, e.g., "Computes the node's hash using the provided buffer to avoid allocations."

Suggested change
/// Computes the node's hash
/// Computes the node's hash using the provided buffer to avoid allocations

Copilot uses AI. Check for mistakes.
self.compute_hash_no_alloc(&mut vec![])
}

/// Computes the node's hash, using the provided buffer
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment is identical to the one for compute_hash above. Consider clarifying that this function uses a provided buffer to avoid allocations, e.g., "Computes the node's hash using the provided buffer to avoid allocations."

Suggested change
/// Computes the node's hash, using the provided buffer
/// Computes the node's hash using the provided buffer to avoid allocations.

Copilot uses AI. Check for mistakes.
self.compute_hash_no_alloc(&mut vec![])
}

/// Computes the node's hash, using the provided buffer
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment is identical to the one for compute_hash above. Consider clarifying that this function uses a provided buffer to avoid allocations, e.g., "Computes the node's hash using the provided buffer to avoid allocations."

Suggested change
/// Computes the node's hash, using the provided buffer
/// Computes the node's hash using the provided buffer to avoid allocations.

Copilot uses AI. Check for mistakes.
self.compute_hash_no_alloc(&mut vec![])
}

/// Computes the node's hash, using the provided buffer
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment is identical to the one for compute_hash above. Consider clarifying that this function uses a provided buffer to avoid allocations, e.g., "Computes the node's hash using the provided buffer to avoid allocations."

Suggested change
/// Computes the node's hash, using the provided buffer
/// Computes the node's hash using the provided buffer to avoid allocations.
/// This method reuses the given buffer to minimize heap allocations when encoding the node.

Copilot uses AI. Check for mistakes.
) -> Result<(), TrieGenerationError> {
debug!("{:x?}", center_side.path);
debug!("{:x?}", parent_element.path);
let mut nodehash_buffer = Vec::with_capacity(512);
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The initial capacity of 512 bytes may be insufficient for some branch nodes. A branch node can have 16 children (each encoded as up to 33 bytes) plus a value field, potentially exceeding 512 bytes with RLP overhead. While Vec will automatically grow, this may cause reallocations. Consider using a larger initial capacity (e.g., 600-700 bytes) or document this trade-off.

Suggested change
let mut nodehash_buffer = Vec::with_capacity(512);
// Increased initial capacity to 700 bytes to avoid reallocations for large branch nodes.
let mut nodehash_buffer = Vec::with_capacity(700);

Copilot uses AI. Check for mistakes.
let mut left_side = StackElement::default();
let mut center_side: CenterSide = CenterSide::from_value(initial_value);
let mut right_side_opt: Option<(H256, Vec<u8>)> = data_iter.next();
let mut nodehash_buffer = Vec::with_capacity(512);
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The initial capacity of 512 bytes may be insufficient for some branch nodes. A branch node can have 16 children (each encoded as up to 33 bytes) plus a value field, potentially exceeding 512 bytes with RLP overhead. While Vec will automatically grow, this may cause reallocations. Consider using a larger initial capacity (e.g., 600-700 bytes) or document this trade-off.

Suggested change
let mut nodehash_buffer = Vec::with_capacity(512);
// Increased initial capacity to 700 bytes to avoid reallocations for large branch nodes (16 children * 33 bytes + value + RLP overhead)
let mut nodehash_buffer = Vec::with_capacity(700);

Copilot uses AI. Check for mistakes.
pub fn hash_no_commit(&self) -> H256 {
if self.root.is_valid() {
self.root.compute_hash().finalize()
// 512 is the maximum size of an encoded node
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "512 is the maximum size of an encoded node", but this may not be accurate for all cases. A branch node with 16 children can exceed this size. While Vec will automatically grow, the comment should be updated to reflect that 512 is an estimated typical size rather than a strict maximum.

Suggested change
// 512 is the maximum size of an encoded node
// 512 is an estimated typical size for an encoded node; some nodes (e.g., branch nodes with many children) may exceed this size

Copilot uses AI. Check for mistakes.
@jrchatruc jrchatruc added this pull request to the merge queue Nov 17, 2025
Merged via the queue into main with commit 06dc722 Nov 17, 2025
56 checks passed
@jrchatruc jrchatruc deleted the trie-hash-avoid-intermediate-allocations branch November 17, 2025 15:43
@github-project-automation github-project-automation bot moved this from In Review to Done in ethrex_l1 Nov 17, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in ethrex_performance Nov 17, 2025
lakshya-sky pushed a commit to lakshya-sky/ethrex that referenced this pull request Nov 17, 2025
…nodes (lambdaclass#5353)

**Motivation**

Our current implementation uses intermediate allocations for hashing
trie nodes. This is inefficient, since the buffer could be allocated
once.

Also, we're currently allocating multiple buffers in
`Node::decode_unfinished`. This can be replaced by a simple stack
allocation.

**Description**

This PR adds a `compute_hash_no_alloc` function which receives a buffer
and avoids the allocation. It also replaces the temporary buffers used
in `Node::decode_unfinished` with a stack allocated array and references
to the original buffer.

Flamegraph before:

<img width="1512" height="763" alt="Screenshot 2025-11-14 at 20 48 14"
src="https://github.com/user-attachments/assets/4c31ba88-5eba-4ddf-9192-78fb07358265"
/>

Flamegraph after:

<img width="1512" height="735" alt="Screenshot 2025-11-14 at 20 48 51"
src="https://github.com/user-attachments/assets/ec6fbc93-d5dc-4560-831a-c4f9715dc78e"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: Done
Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants