Skip to content

Add trie cache warmup#7556

Merged
alexggh merged 81 commits intomasterfrom
AndreiEres/trie-warmup
May 28, 2025
Merged

Add trie cache warmup#7556
alexggh merged 81 commits intomasterfrom
AndreiEres/trie-warmup

Conversation

@AndreiEres
Copy link
Copy Markdown
Contributor

@AndreiEres AndreiEres commented Feb 12, 2025

Description

Fixes #7533.

Part of launching smart contracts on AssetHub. The current task involves warming up the state and performing sanity checks to ensure that memory usage remains within reasonable limits; otherwise, it should abort gracefully.

Here is a breakdown of the task by parts:

  • Configuration. We use a CLI flag to trigger the warm up. To increase the cache size, we use an existing flag.
  • Warm-up. We populate the trie cache with values from the database using a best hash as a storage root. For normal use we spawn a background task for it, for testing purposes we warm up in a blocking manner.
  • Sanity checks. If we find that the configured cache size is too large, we warn.

Variants of the added flag:

Flag Meaning
[no flag] no warm-up
warm-up-trie-cache non-blocking warmup
warm-up-trie-cache non-blocking non-blocking warmup (same as just the flag)
—warm-up-trie-cache blocking blocking warmup

Integration

Tested on a snapshot from polkadot-asset-hub: 5426064 keys, ~900MIB state. Populating Trie cache takes 1.5–2 minutes on Macbook Pro M2.

Aditional Notes

Initially, we thought about putting the entire state into memory using a CLI flag or a runtime parameter, but decided against it. That’s up to collators to decide on which setup run their machines, they can use the current cache size setting to increase the throughput.

Todos

Testing moved to #8644

@AndreiEres AndreiEres added the T0-node This PR/Issue is related to the topic “node”. label Feb 19, 2025
@alexggh alexggh self-requested a review February 28, 2025 09:32
@AndreiEres
Copy link
Copy Markdown
Contributor Author

/cmd prdoc --bump patch --audience node_dev

github-merge-queue bot pushed a commit that referenced this pull request Mar 3, 2025
# Description

Working with #7556 I
encountered an internal compiler error on polkadot-omni-node-lib: see
bellow or [in
CI](https://github.com/paritytech/polkadot-sdk/actions/runs/13521547633/job/37781894640).

```
error: internal compiler error: compiler/rustc_traits/src/codegen.rs:45:13: Encountered error `SignatureMismatch(SignatureMismatchData { found_trait_ref: <{closure@sp_state_machine::trie_backend_essence::TrieBackendEssence<sc_service::Arc<dyn sp_state_machine::trie_backend_essence::Storage<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>, <<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing, sp_trie::cache::LocalTrieCache<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>, sp_trie::recorder::Recorder<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>::storage::{closure#1}} as std::ops::FnOnce<(std::option::Option<&mut dyn trie_db::TrieRecorder<sp_core::H256>>, std::option::Option<&mut dyn trie_db::TrieCache<sp_trie::node_codec::NodeCodec<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>>)>>, expected_trait_ref: <{closure@sp_state_machine::trie_backend_essence::TrieBackendEssence<sc_service::Arc<dyn sp_state_machine::trie_backend_essence::Storage<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>, <<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing, sp_trie::cache::LocalTrieCache<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>, sp_trie::recorder::Recorder<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>::storage::{closure#1}} as std::ops::FnOnce<(std::option::Option<&mut dyn trie_db::TrieRecorder<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hash>>, std::option::Option<&mut dyn trie_db::TrieCache<sp_trie::node_codec::NodeCodec<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>>)>>, terr: Sorts(ExpectedFound { expected: Alias(Projection, AliasTy { args: [Alias(Projection, AliasTy { args: [Alias(Projection, AliasTy { args: [NodeSpec/#0], def_id: DefId(0:410 ~ polkadot_omni_node_lib[7cce]::common::spec::BaseNodeSpec::Block), .. })], def_id: DefId(0:507 ~ polkadot_omni_node_lib[7cce]::common::NodeBlock::BoundedHeader), .. })], def_id: DefId(229:1706 ~ sp_runtime[5da1]::traits::Header::Hash), .. }), found: sp_core::H256 }) })` selecting `<{closure@sp_state_machine::trie_backend_essence::TrieBackendEssence<sc_service::Arc<dyn sp_state_machine::trie_backend_essence::Storage<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>, <<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing, sp_trie::cache::LocalTrieCache<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>, sp_trie::recorder::Recorder<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>::storage::{closure#1}} as std::ops::FnOnce<(std::option::Option<&mut dyn trie_db::TrieRecorder<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hash>>, std::option::Option<&mut dyn trie_db::TrieCache<sp_trie::node_codec::NodeCodec<<<<NodeSpec as common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader as sp_runtime::traits::Header>::Hashing>>>)>>` during codegen
```

Trying to parse the error I found that TrieRecorder was not supposed to
work with H256:
- Expected: `&mut dyn TrieRecorder<<<<NodeSpec as
common::spec::BaseNodeSpec>::Block as common::NodeBlock>::BoundedHeader
as Header>::Hash>>`
- Found: `&mut dyn TrieRecorder<sp_core::H256>>`

The error happened because I added to
`new_full_parts_with_genesis_builder` interaction with Trie Cache which
eventually uses `TrieRecorder<H256>`. Here is the path:
- In polkadot-omni-node-lib trait BaseNodeSpec defined with Block as
`NodeBlock: BlockT<Hash = DbHash>`, where DbHash is H256.
- BaseNodeSpec calls [new_full_parts_record_import::<Self::Block, …
>](https://github.com/paritytech/polkadot-sdk/blob/75726c65d40d3631c441b946a26b2f0b30d40c26/cumulus/polkadot-omni-node/lib/src/common/spec.rs#L184-L189)
and eventually it goes to
[new_full_parts_with_genesis_builder](https://github.com/paritytech/polkadot-sdk/blob/08b302464a6ccaacf391516ff7b058fe07b5cca3/substrate/client/service/src/builder.rs#L195).
- In `new_full_parts_with_genesis_builder` we accessed storage,
initiating TrieRecorder with H256 what never happened before.

I believe the compiler found a mismatch checking types for TrieRecorder:
NodeBlock inherits from the trait `Block<Hash = DbHash>`, but it uses
BoundedHeader, which inherits from the trait Header with the default
Hash.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, mainly left nitpicks.

use sc_client_api::backend::Backend;
use sp_state_machine::Backend as StateBackend;

let state = || backend.state_at(storage_root, TrieCacheContext::Untrusted);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let state = || backend.state_at(storage_root, TrieCacheContext::Untrusted);
let state = || backend.state_at(storage_root, TrieCacheContext::Trusted);

Or why is this untrusted? Also why are we not keeping the state alive the entire time? So, directly write back to the global cache?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trusted is meant to be used when we are in a bounded context, otherwise it might balloon beyond the shared_trie_cache configured size.

So, using untrusted here and destroying it after every key access is the right call here, because that will end up populating the shared trie cache.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But untrusted means that we don't cache things like the :code (which is bigger than 2MiB in most of the cases already)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good point. Then I would go with UnTrusted for the iterator because it stays alive for the entire duration of the warmup and Trusted for each key access since we use a different instance for each iteration.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fine

@paritytech-workflow-stopper
Copy link
Copy Markdown

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/15250050789
Failed job name: build-test-collators

@paritytech-workflow-stopper
Copy link
Copy Markdown

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/15250050775
Failed job name: test-linux-stable

Copy link
Copy Markdown
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides this, it looks good to me.

@alexggh alexggh added this pull request to the merge queue May 28, 2025
Merged via the queue into master with commit 10cde1f May 28, 2025
182 checks passed
@alexggh alexggh deleted the AndreiEres/trie-warmup branch May 28, 2025 08:01
github-merge-queue bot pushed a commit that referenced this pull request May 29, 2025
This
#7556 (comment)
made me think that we can have some miss-uses of the trusted cache, so
it needs some reasonable bounds for the size.

Since there is no point in letting it grow past the shared cache limits,
because the items will be discarded anyways when we propagate them back
to the share cache so I decided to bound it to those limits.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
pgherveou pushed a commit that referenced this pull request Jun 11, 2025
# Description
Fixes #7533.

Part of launching smart contracts on AssetHub. The current task involves
warming up the state and performing sanity checks to ensure that memory
usage remains within reasonable limits; otherwise, it should abort
gracefully.

Here is a breakdown of the task by parts: 
- **Configuration.** We use a CLI flag to trigger the warm up. To
increase the cache size, we use an existing flag.
- **Warm-up.** We populate the trie cache with values from the database
using a best hash as a storage root. For normal use we spawn a
background task for it, for testing purposes we warm up in a blocking
manner.
- **Sanity checks.** If we find that the configured cache size is too
large, we warn.

Variants of the added flag:
| Flag | Meaning |

|---------------------------------|---------------------------------------------|
| [no flag] | no warm-up |
| warm-up-trie-cache | non-blocking warmup |
| warm-up-trie-cache non-blocking | non-blocking warmup (same as just
the flag) |
| —warm-up-trie-cache blocking | blocking warmup |

## Integration
Tested on a snapshot from polkadot-asset-hub: 5426064 keys, ~900MIB
state. Populating Trie cache takes 1.5–2 minutes on Macbook Pro M2.

## Aditional Notes
Initially, we thought about putting the entire state into memory using a
CLI flag or a runtime parameter, but decided against it. That’s up to
collators to decide on which setup run their machines, they can use the
current cache size setting to increase the throughput.

## Todos
Testing moved to #8644

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Alexandru Gheorghe <49718502+alexggh@users.noreply.github.com>
Co-authored-by: Alexander Theißen <alex.theissen@me.com>
pgherveou pushed a commit that referenced this pull request Jun 11, 2025
This
#7556 (comment)
made me think that we can have some miss-uses of the trusted cache, so
it needs some reasonable bounds for the size.

Since there is no point in letting it grow past the shared cache limits,
because the items will be discarded anyways when we propagate them back
to the share cache so I decided to bound it to those limits.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
alvicsam pushed a commit that referenced this pull request Oct 17, 2025
# Description
Fixes #7533.

Part of launching smart contracts on AssetHub. The current task involves
warming up the state and performing sanity checks to ensure that memory
usage remains within reasonable limits; otherwise, it should abort
gracefully.

Here is a breakdown of the task by parts: 
- **Configuration.** We use a CLI flag to trigger the warm up. To
increase the cache size, we use an existing flag.
- **Warm-up.** We populate the trie cache with values from the database
using a best hash as a storage root. For normal use we spawn a
background task for it, for testing purposes we warm up in a blocking
manner.
- **Sanity checks.** If we find that the configured cache size is too
large, we warn.

Variants of the added flag:
| Flag | Meaning |

|---------------------------------|---------------------------------------------|
| [no flag] | no warm-up |
| warm-up-trie-cache | non-blocking warmup |
| warm-up-trie-cache non-blocking | non-blocking warmup (same as just
the flag) |
| —warm-up-trie-cache blocking | blocking warmup |

## Integration
Tested on a snapshot from polkadot-asset-hub: 5426064 keys, ~900MIB
state. Populating Trie cache takes 1.5–2 minutes on Macbook Pro M2.

## Aditional Notes
Initially, we thought about putting the entire state into memory using a
CLI flag or a runtime parameter, but decided against it. That’s up to
collators to decide on which setup run their machines, they can use the
current cache size setting to increase the throughput.

## Todos
Testing moved to #8644

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Alexandru Gheorghe <49718502+alexggh@users.noreply.github.com>
Co-authored-by: Alexander Theißen <alex.theissen@me.com>
alvicsam pushed a commit that referenced this pull request Oct 17, 2025
This
#7556 (comment)
made me think that we can have some miss-uses of the trusted cache, so
it needs some reasonable bounds for the size.

Since there is no point in letting it grow past the shared cache limits,
because the items will be discarded anyways when we propagate them back
to the share cache so I decided to bound it to those limits.

---------

Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
snowmead added a commit to Moonsong-Labs/storage-hub that referenced this pull request Mar 17, 2026
- Replace `<T as pallet::Config>::RuntimeEvent` with
  `<T as frame_system::Config>::RuntimeEvent` in all benchmark files,
  since RuntimeEvent was removed from pallet Config traits
  (paritytech/polkadot-sdk#7229)
- Add missing 5th argument (SharedTrieCache) to storage benchmark
  `cmd.run()` calls in node/src/command.rs
  (paritytech/polkadot-sdk#7556)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T0-node This PR/Issue is related to the topic “node”.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Warmup SharedTrieCache at startup

6 participants