feat(l1): move storage heal paths to their own table #2359

fmoletta · 2025-03-28T20:02:59Z

Motivation
During state sync, we store the accounts hashes of the storages we failed to fetch along with their root path in the store so the storage healer can then read them and heal them. For this we used the SnapState table, where the whole pending storage paths map was a value in that table. This used to work fine at a smaller scale, but when this map gets too big reading and writing from it becomes very expensive and can disrupt other processes.
This PR moves the pending storage paths to their own table and changes how we interact with them:

The storage healer no longer fetches the whole map, but instead reads a specific amount of storages from it when its queue is not filled.
The storage healer no longer uses a channel, it instead reads incoming requests directly from the store
Fetchers that need to communicate with the storage healer now do so via adding paths to the store

Description

Remove storage heal paths from snap state
Add new DB table for storage heal paths
Remove channel from storage healer and instead manage incoming and outgoing storage heal paths through the store (This also solves the issues of the rebuilder not being able to input storage heal requests and the storage healer being kept alive indefinitely upon forced shutdown)

Closes #issue_number

github-actions · 2025-03-28T20:03:59Z

Lines of code report

Total lines added: 88
Total lines removed: 30
Total lines changed: 118

Detailed view

+------------------------------------------------------+-------+------+
| File                                                 | Lines | Diff |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync.rs                 | 566   | -10  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/state_healing.rs   | 123   | +4   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/state_sync.rs      | 238   | -4   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/storage_fetcher.rs | 247   | +3   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/storage_healing.rs | 87    | -14  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/trie_rebuild.rs    | 242   | +3   |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/api.rs                         | 231   | +3   |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/rlp.rs                         | 102   | +2   |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs                       | 1219  | +3   |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/in_memory.rs          | 572   | +13  |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/libmdbx.rs            | 1279  | +30  |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store_db/redb.rs               | 1104  | +27  |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/utils.rs                       | 50    | -2   |
+------------------------------------------------------+-------+------+

…aths-to-own-table

…to move-storage-heal-paths-to-own-table

Oppen · 2025-04-04T14:36:49Z

crates/storage/store_db/redb.rs

+        // Delete read values
+        let txn = self.db.begin_write()?;
+        {
+            let mut table = txn.open_table(STORAGE_HEAL_PATHS_TABLE)?;
+            for (hash, _) in res.iter() {
+                table.remove(<H256 as Into<AccountHashRLP>>::into(*hash))?;
+            }
+        }
+        txn.commit()?;


Having the get_* method delete keys sounds confusing. Would it be too bad to split this?

Maybe a rename? (get_and_remove_ o retrieve_ or something that gives the idea of consumption of elements?)

I agree, I think take would be suitable here

I don't like the idea of splitting it too much as this has only one use case in which we want to delete as soon as we read

Updated ba937d5

Oppen · 2025-04-04T14:39:08Z

crates/storage/store_db/redb.rs

+    fn set_storage_heal_paths(&self, paths: Vec<(H256, Vec<Nibbles>)>) -> Result<(), StoreError> {
+        let key_values = paths
+            .into_iter()
+            .map(|(hash, paths)| {
+                (
+                    <H256 as Into<AccountHashRLP>>::into(hash),
+                    <Vec<Nibbles> as Into<TriePathsRLP>>::into(paths),
+                )
+            })
+            .collect();
+        self.write_batch(STORAGE_HEAL_PATHS_TABLE, key_values)


After merging #2336 this needs the following change:

Suggested change

fn set_storage_heal_paths(&self, paths: Vec<(H256, Vec<Nibbles>)>) -> Result<(), StoreError> {

let key_values = paths

.into_iter()

.map(|(hash, paths)| {

(

<H256 as Into<AccountHashRLP>>::into(hash),

<Vec<Nibbles> as Into<TriePathsRLP>>::into(paths),

)

})

.collect();

self.write_batch(STORAGE_HEAL_PATHS_TABLE, key_values)

async fn set_storage_heal_paths(&self, paths: Vec<(H256, Vec<Nibbles>)>) -> Result<(), StoreError> {

let key_values = paths

.into_iter()

.map(|(hash, paths)| {

(

<H256 as Into<AccountHashRLP>>::into(hash),

<Vec<Nibbles> as Into<TriePathsRLP>>::into(paths),

)

})

.collect();

self.write_batch(STORAGE_HEAL_PATHS_TABLE, key_values).await

Similar changes will be needed at the API level and for libmdbx.

Updated with merge!

Oppen

Left a few comments.

ElFantasma

It LGTM

…to move-storage-heal-paths-to-own-table

**Motivation** During snap sync, we download account ranges and then for each downloaded account we request its storage and bytecodes. For these requests we use fetcher processes that receive incoming messages from a channel (storage roots, bytecode hashes, etc), place them on a queue, and then group them in batches and spawn parallel processes to fetch them. All fetchers share a common behaviour of reading requests, batching, and fetching with differences concerning only the content of the queue. In many cases, we have had many bugs due to how these fetchers worked as we may update one of them and forget about the rest. This PR aims to reduce the sources of bugs and maintain a unified behaviour for fetchers by adding generic functions that represent the fetcher behaviour. In this PR we add the generic function `run_queue` that receives a generic queue (a Vec<T>), and an asyn function that operates over a batch in said queue.  **Description** * Add generic function `run_queue` to abstract queue logic from fetcher processes * Use `run_queue` in `bytecode_fetcher`, `large_storage_fecther`, and `storage_fetcher` *Considerations* * As this PR was done with #2359 this won't be applied to the `storage_healer` which will stop reading messages * While the batch size could be a const generic instead of a regular argument, doing so would force us to make the other generic arguments in `run_queue` explicit, which looks pretty bad   Closes #issue_number --------- Co-authored-by: Mario Rugiero <[email protected]>

**Motivation** During state sync, we store the accounts hashes of the storages we failed to fetch along with their root path in the store so the storage healer can then read them and heal them. For this we used the `SnapState` table, where the whole pending storage paths map was a value in that table. This used to work fine at a smaller scale, but when this map gets too big reading and writing from it becomes very expensive and can disrupt other processes. This PR moves the pending storage paths to their own table and changes how we interact with them: * The storage healer no longer fetches the whole map, but instead reads a specific amount of storages from it when its queue is not filled. * The storage healer no longer uses a channel, it instead reads incoming requests directly from the store * Fetchers that need to communicate with the storage healer now do so via adding paths to the store  **Description** * Remove storage heal paths from snap state * Add new DB table for storage heal paths * Remove channel from storage healer and instead manage incoming and outgoing storage heal paths through the store (This also solves the issues of the rebuilder not being able to input storage heal requests and the storage healer being kept alive indefinitely upon forced shutdown)   Closes #issue_number

…s#2408) **Motivation** During snap sync, we download account ranges and then for each downloaded account we request its storage and bytecodes. For these requests we use fetcher processes that receive incoming messages from a channel (storage roots, bytecode hashes, etc), place them on a queue, and then group them in batches and spawn parallel processes to fetch them. All fetchers share a common behaviour of reading requests, batching, and fetching with differences concerning only the content of the queue. In many cases, we have had many bugs due to how these fetchers worked as we may update one of them and forget about the rest. This PR aims to reduce the sources of bugs and maintain a unified behaviour for fetchers by adding generic functions that represent the fetcher behaviour. In this PR we add the generic function `run_queue` that receives a generic queue (a Vec<T>), and an asyn function that operates over a batch in said queue.  **Description** * Add generic function `run_queue` to abstract queue logic from fetcher processes * Use `run_queue` in `bytecode_fetcher`, `large_storage_fecther`, and `storage_fetcher` *Considerations* * As this PR was done with lambdaclass#2359 this won't be applied to the `storage_healer` which will stop reading messages * While the batch size could be a const generic instead of a regular argument, doing so would force us to make the other generic arguments in `run_queue` explicit, which looks pretty bad   Closes #issue_number --------- Co-authored-by: Mario Rugiero <[email protected]>

fmoletta added 6 commits March 28, 2025 16:52

feat: move storage heal paths to its own table

522e6a9

feat: clear storage heal paths upon snap state clear

a9a516a

Delete fetched entries + logging

78a392c

Remove quick fix

57bf596

feat: pull paths from the store batch by batch as we need them

4b630df

Update snap index

78912ed

fmoletta added 13 commits March 28, 2025 17:20

feat: remove storage healer channel

d7fa9fc

Add redb impl + fmt

6d485df

Merge branch 'remove-storage-healer-channel' into move-storage-heal-p…

5ee5fdb

…aths-to-own-table

Add cancellation token to storage healer

86f43f4

fmt

3223035

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

e390b7b

…to move-storage-heal-paths-to-own-table

Fix param name

b0047da

Update doc

34d82d6

Update doc

d147d0a

Simplify code + fmt

c1b803c

Remove debug tracing

ff7f680

remove outdated comment

6e48dbd

Fix in_memory impl

39eb4e2

fmoletta marked this pull request as ready for review March 31, 2025 22:22

fmoletta requested a review from a team as a code owner March 31, 2025 22:22

Oppen reviewed Apr 4, 2025

View reviewed changes

ElFantasma approved these changes Apr 4, 2025

View reviewed changes

fmoletta added 2 commits April 4, 2025 12:28

Rename get_storage_heal_paths -> take_storage_heal_paths

ba937d5

Merge branch 'main' of github.com:lambdaclass/lambda_ethereum_rust in…

50f5b33

…to move-storage-heal-paths-to-own-table

fmoletta requested a review from Oppen April 4, 2025 15:38

fmoletta mentioned this pull request Apr 4, 2025

feat(l1): abstract common fetcher logic used in snap sync #2408

Merged

Merge branch 'main' into move-storage-heal-paths-to-own-table

4f6fde0

mpaulucci approved these changes Apr 9, 2025

View reviewed changes

fmoletta added this pull request to the merge queue Apr 9, 2025

Merged via the queue into main with commit a3dc64e Apr 9, 2025
19 checks passed

fmoletta deleted the move-storage-heal-paths-to-own-table branch April 9, 2025 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(l1): move storage heal paths to their own table #2359

feat(l1): move storage heal paths to their own table #2359

Uh oh!

fmoletta commented Mar 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Mar 28, 2025 •

edited

Loading

Uh oh!

Oppen Apr 4, 2025

Uh oh!

ElFantasma Apr 4, 2025

Uh oh!

fmoletta Apr 4, 2025

Uh oh!

fmoletta Apr 4, 2025

Uh oh!

fmoletta Apr 4, 2025

Uh oh!

Oppen Apr 4, 2025

Uh oh!

fmoletta Apr 4, 2025

Uh oh!

fmoletta Apr 4, 2025

Uh oh!

Oppen left a comment

Uh oh!

ElFantasma left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat(l1): move storage heal paths to their own table #2359

feat(l1): move storage heal paths to their own table #2359

Uh oh!

Conversation

fmoletta commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lines of code report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Oppen left a comment

Choose a reason for hiding this comment

Uh oh!

ElFantasma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fmoletta commented Mar 28, 2025 •

edited

Loading

github-actions bot commented Mar 28, 2025 •

edited

Loading