-
Notifications
You must be signed in to change notification settings - Fork 131
perf(l1): avoid temporary allocations when decoding and hashing trie nodes #5353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lines of code reportTotal lines added: Detailed view |
Benchmark for fdcc01eClick to view benchmark
|
Benchmark for 632c51aClick to view benchmark
|
Benchmark for aa0588bClick to view benchmark
|
Benchmark Block Execution Results Comparison Against Main
|
Benchmark for 116c57fClick to view benchmark
|
Benchmark for 58823f7Click to view benchmark
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes trie node hashing and decoding by eliminating temporary allocations. The changes introduce buffer reuse patterns for hash computation and replace heap-allocated vectors with stack-allocated arrays in the RLP decoder.
- Added
compute_hash_no_allocfunctions that accept a reusable buffer parameter - Modified
memoize_hashesto accept and reuse a buffer throughout recursive traversals - Replaced dynamic vector allocation in RLP decoder with a stack-allocated array of references
- Added
get_encoded_item_refin RLP decoder to avoid unnecessary Vec allocations
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/common/trie/node.rs | Adds compute_hash_no_alloc method and modifies memoize_hashes to accept a buffer parameter for reuse |
| crates/common/trie/node/branch.rs | Implements compute_hash_no_alloc for BranchNode with buffer reuse pattern |
| crates/common/trie/node/extension.rs | Implements compute_hash_no_alloc for ExtensionNode with buffer reuse pattern |
| crates/common/trie/node/leaf.rs | Implements compute_hash_no_alloc for LeafNode with buffer reuse pattern |
| crates/common/trie/rlp.rs | Replaces heap-allocated Vec with stack-allocated array for RLP items, uses reference-based decoding |
| crates/common/rlp/structs.rs | Adds get_encoded_item_ref to return references instead of allocating new vectors |
| crates/common/trie/trie_sorted.rs | Updates hash computation calls to use new buffer-accepting methods with 512-byte capacity |
| crates/common/trie/trie.rs | Updates root hash computation to use buffer reuse pattern |
| CHANGELOG.md | Documents the performance optimization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| } | ||
|
|
||
| /// Computes the node's hash |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment is identical to the one for compute_hash above. Consider clarifying that this function uses a provided buffer to avoid allocations, e.g., "Computes the node's hash using the provided buffer to avoid allocations."
| /// Computes the node's hash | |
| /// Computes the node's hash using the provided buffer to avoid allocations |
| self.compute_hash_no_alloc(&mut vec![]) | ||
| } | ||
|
|
||
| /// Computes the node's hash, using the provided buffer |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment is identical to the one for compute_hash above. Consider clarifying that this function uses a provided buffer to avoid allocations, e.g., "Computes the node's hash using the provided buffer to avoid allocations."
| /// Computes the node's hash, using the provided buffer | |
| /// Computes the node's hash using the provided buffer to avoid allocations. |
| self.compute_hash_no_alloc(&mut vec![]) | ||
| } | ||
|
|
||
| /// Computes the node's hash, using the provided buffer |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment is identical to the one for compute_hash above. Consider clarifying that this function uses a provided buffer to avoid allocations, e.g., "Computes the node's hash using the provided buffer to avoid allocations."
| /// Computes the node's hash, using the provided buffer | |
| /// Computes the node's hash using the provided buffer to avoid allocations. |
| self.compute_hash_no_alloc(&mut vec![]) | ||
| } | ||
|
|
||
| /// Computes the node's hash, using the provided buffer |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment is identical to the one for compute_hash above. Consider clarifying that this function uses a provided buffer to avoid allocations, e.g., "Computes the node's hash using the provided buffer to avoid allocations."
| /// Computes the node's hash, using the provided buffer | |
| /// Computes the node's hash using the provided buffer to avoid allocations. | |
| /// This method reuses the given buffer to minimize heap allocations when encoding the node. |
| ) -> Result<(), TrieGenerationError> { | ||
| debug!("{:x?}", center_side.path); | ||
| debug!("{:x?}", parent_element.path); | ||
| let mut nodehash_buffer = Vec::with_capacity(512); |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The initial capacity of 512 bytes may be insufficient for some branch nodes. A branch node can have 16 children (each encoded as up to 33 bytes) plus a value field, potentially exceeding 512 bytes with RLP overhead. While Vec will automatically grow, this may cause reallocations. Consider using a larger initial capacity (e.g., 600-700 bytes) or document this trade-off.
| let mut nodehash_buffer = Vec::with_capacity(512); | |
| // Increased initial capacity to 700 bytes to avoid reallocations for large branch nodes. | |
| let mut nodehash_buffer = Vec::with_capacity(700); |
| let mut left_side = StackElement::default(); | ||
| let mut center_side: CenterSide = CenterSide::from_value(initial_value); | ||
| let mut right_side_opt: Option<(H256, Vec<u8>)> = data_iter.next(); | ||
| let mut nodehash_buffer = Vec::with_capacity(512); |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The initial capacity of 512 bytes may be insufficient for some branch nodes. A branch node can have 16 children (each encoded as up to 33 bytes) plus a value field, potentially exceeding 512 bytes with RLP overhead. While Vec will automatically grow, this may cause reallocations. Consider using a larger initial capacity (e.g., 600-700 bytes) or document this trade-off.
| let mut nodehash_buffer = Vec::with_capacity(512); | |
| // Increased initial capacity to 700 bytes to avoid reallocations for large branch nodes (16 children * 33 bytes + value + RLP overhead) | |
| let mut nodehash_buffer = Vec::with_capacity(700); |
| pub fn hash_no_commit(&self) -> H256 { | ||
| if self.root.is_valid() { | ||
| self.root.compute_hash().finalize() | ||
| // 512 is the maximum size of an encoded node |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "512 is the maximum size of an encoded node", but this may not be accurate for all cases. A branch node with 16 children can exceed this size. While Vec will automatically grow, the comment should be updated to reflect that 512 is an estimated typical size rather than a strict maximum.
| // 512 is the maximum size of an encoded node | |
| // 512 is an estimated typical size for an encoded node; some nodes (e.g., branch nodes with many children) may exceed this size |
…nodes (lambdaclass#5353) **Motivation** Our current implementation uses intermediate allocations for hashing trie nodes. This is inefficient, since the buffer could be allocated once. Also, we're currently allocating multiple buffers in `Node::decode_unfinished`. This can be replaced by a simple stack allocation. **Description** This PR adds a `compute_hash_no_alloc` function which receives a buffer and avoids the allocation. It also replaces the temporary buffers used in `Node::decode_unfinished` with a stack allocated array and references to the original buffer. Flamegraph before: <img width="1512" height="763" alt="Screenshot 2025-11-14 at 20 48 14" src="https://github.com/user-attachments/assets/4c31ba88-5eba-4ddf-9192-78fb07358265" /> Flamegraph after: <img width="1512" height="735" alt="Screenshot 2025-11-14 at 20 48 51" src="https://github.com/user-attachments/assets/ec6fbc93-d5dc-4560-831a-c4f9715dc78e" />
Motivation
Our current implementation uses intermediate allocations for hashing trie nodes. This is inefficient, since the buffer could be allocated once.
Also, we're currently allocating multiple buffers in
Node::decode_unfinished. This can be replaced by a simple stack allocation.Description
This PR adds a
compute_hash_no_allocfunction which receives a buffer and avoids the allocation. It also replaces the temporary buffers used inNode::decode_unfinishedwith a stack allocated array and references to the original buffer.Flamegraph before:
Flamegraph after: