Skip to content

Conversation

@igor56D
Copy link

@igor56D igor56D commented Oct 28, 2025

No description provided.

@igor56D igor56D changed the title Replace fixed minimum balance per byte constant with dynamic minimum deposit SIMD-0389: Replace fixed minimum balance per byte constant with dynamic minimum deposit Oct 28, 2025
@brooksprumo brooksprumo self-requested a review October 28, 2025 16:30
@igor56D igor56D force-pushed the allocation-controller branch from 4ac4a08 to 27f4aaf Compare November 3, 2025 18:44
@igor56D igor56D changed the title SIMD-0389: Replace fixed minimum balance per byte constant with dynamic minimum deposit SIMD-0389: Reduce Account Creation Cost and Introduce Supervisory Controller Nov 3, 2025
@igor56D igor56D force-pushed the allocation-controller branch from 27f4aaf to 17ea6f4 Compare November 3, 2025 19:00
Comment on lines 33 to 34
than a single fixed constant, enabling better behavior targeting. With an
average on-chain state growth currently around ~200 MB per epoch, targeting 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an average on-chain state growth currently around ~200 MB per epoch,

Where is the 200 MB number from?

I plotted all my account state updates from Discord, and see higher numbers. Not a ton higher, at least recently. Around the beginning of the year, growth was much higher.

Here's the change in on-chain account data size each week, averaged per day. This is just the data field of each account summed up. Thus does not include any other fields e.g. owner, lamports, etc.
Image

Here's a more comprehensive look. It's the change of account storage file size each week, again averaged per day. This is the actual size Agave uses to store accounts. This has some storage overhead though. And it doesn't include any per account costs elsewhere in the system (e.g. in the accounts index).
Image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with the 200MB number by just eyeballing more recent metrics before you posted this data. I'll update it to be more accurate and specific about what the number is actually measuring.

Comment on lines 72 to 74
- The integral accumulator `I` MUST be tracked in fork-aware bank state.
- Each slot measures realized state growth `G_slot` (bytes of new account data
allocated minus freed/deallocated data).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "account data" here, specifically? Is it just the data field of Account? Or does it include any account metadata?

Note that this tracking must be deterministic cluster-wide, which doesn't exist today. We have approximations for on-chain account data size, and number of accounts, but work would need to be done here first.

Copy link
Author

@igor56D igor56D Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I haven't updated this since putting together the PoC implementation. AFAIK we do deterministically track new state growth in the cost tracker to enforce limits and in the bank's accounts_data_size_delta_on_chain field, but neither turned out to be sufficient for this use-case so I added a new bank field to track what the controller is actually interested in (net state growth).

I'll update the proposal to be more specific:

  1. the state contribution of each account is acc.data.size() + solana_rent::ACCOUNT_STORAGE_OVERHEAD, ie the same way we do it for min_balance checks.
  2. an account with post-exec balance of 0 and non-zero pre-exec balance is considered deleted, so we subtract its contribution from the net state growth.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, makes sense (w.r.t. data size + rent storage overhead).

I was the one that added the Bank::accounts_data_size fields. The hard part is on startup to get the deterministic number of accounts and accounts_data_size. The delta per bank during replay is (should be?) easier, and more or less done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this part is probably the main thing for me that gives me concern on shipping fast. Is a controller nice? Sure. Is it necessary? I don't know, I'm leaning more towards no than yes currently, given a 90% decrease. Maybe it becomes more useful with larger decreases.

I am confident that we could do new rent = old rent / 10 and have that ready in v4.0. The signals I'm getting say that shipping fast is a priority. How fast is fast enough? I don't know. Can we get an impl of this SIMD into v4.0? I'm not sure! Luckily you've already been working on it, so you may be in a better position to comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confident that we could do new rent = old rent / 10 and have that ready in v4.0

The rollout for that proposal will most likely be incremental, not a direct 10x reduction, and also include a waiting period between reduction stages to observe effects. From my PoV, most of the core part of the implementation for this SIMD is already done and I don't see why it can't be shipped quickly if everyone is aligned.

The hard part is on startup to get the deterministic number of accounts and accounts_data_size

I was assuming we can persist the fields we want by adding them to BankFieldsToSerialize or ExtraFieldsToSerialize. I'm not very familiar with the what happens during startup to initialize the current bank but it doesn't sound too difficult to add an extra field. Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rollout for that proposal will most likely be incremental, not a direct 10x reduction

Why?

most of the core part of the implementation for this SIMD is already done and I don't see why it can't be shipped quickly if everyone is aligned.

Working on getting the accounts data size to be deterministic needs to be done first. IMO it's an unknown, thus I don't feel confident this can skip quickly.

Luckily the accounts data size part can be worked on immediately/doesn't need to wait for SIMD approval.

I was assuming we can persist the fields we want by adding them to BankFieldsToSerialize or ExtraFieldsToSerialize. I'm not very familiar with the what happens during startup to initialize the current bank but it doesn't sound too difficult to add an extra field. Am I missing something?

The field is already in the snapshot, that's not the hard part. The hard part is keeping it deterministic across the cluster. The last time I checked (which was like 1-2 years ago), it still wasn't.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll look into this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed an update (e2ef30d). Determinism is achieved by backing up the necessary bank fields into dedicated sysvars, which are subject to the snapshot integrity check through the accounts lthash.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two deltas are tracked by the bank:

  1. accounts_data_size_delta: only tracks the change in the actual data portion of the overall account state. Calculated as the sum of the resize_delta (already tracked deterministically by the runtime) across all transactions in a slot. Zero balance accounts have their data size subtracted off separately post-execution because the resize_delta doesn't account for them.
  2. accounts_num_delta: tracks change in the number of accounts. Determinism is pretty easy to achieve here with pre and post exec balance checks.

@brooksprumo brooksprumo self-requested a review December 5, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants