-
Notifications
You must be signed in to change notification settings - Fork 240
SIMD-0389: Reduce Account Creation Cost and Introduce Supervisory Controller #389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4ac4a08 to
27f4aaf
Compare
27f4aaf to
17ea6f4
Compare
…than constantly engaged
17ea6f4 to
f98d494
Compare
| than a single fixed constant, enabling better behavior targeting. With an | ||
| average on-chain state growth currently around ~200 MB per epoch, targeting 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With an average on-chain state growth currently around ~200 MB per epoch,
Where is the 200 MB number from?
I plotted all my account state updates from Discord, and see higher numbers. Not a ton higher, at least recently. Around the beginning of the year, growth was much higher.
Here's the change in on-chain account data size each week, averaged per day. This is just the data field of each account summed up. Thus does not include any other fields e.g. owner, lamports, etc.

Here's a more comprehensive look. It's the change of account storage file size each week, again averaged per day. This is the actual size Agave uses to store accounts. This has some storage overhead though. And it doesn't include any per account costs elsewhere in the system (e.g. in the accounts index).

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with the 200MB number by just eyeballing more recent metrics before you posted this data. I'll update it to be more accurate and specific about what the number is actually measuring.
| - The integral accumulator `I` MUST be tracked in fork-aware bank state. | ||
| - Each slot measures realized state growth `G_slot` (bytes of new account data | ||
| allocated minus freed/deallocated data). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "account data" here, specifically? Is it just the data field of Account? Or does it include any account metadata?
Note that this tracking must be deterministic cluster-wide, which doesn't exist today. We have approximations for on-chain account data size, and number of accounts, but work would need to be done here first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I haven't updated this since putting together the PoC implementation. AFAIK we do deterministically track new state growth in the cost tracker to enforce limits and in the bank's accounts_data_size_delta_on_chain field, but neither turned out to be sufficient for this use-case so I added a new bank field to track what the controller is actually interested in (net state growth).
I'll update the proposal to be more specific:
- the state contribution of each account is
acc.data.size() + solana_rent::ACCOUNT_STORAGE_OVERHEAD, ie the same way we do it for min_balance checks. - an account with post-exec balance of 0 and non-zero pre-exec balance is considered deleted, so we subtract its contribution from the net state growth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, makes sense (w.r.t. data size + rent storage overhead).
I was the one that added the Bank::accounts_data_size fields. The hard part is on startup to get the deterministic number of accounts and accounts_data_size. The delta per bank during replay is (should be?) easier, and more or less done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this part is probably the main thing for me that gives me concern on shipping fast. Is a controller nice? Sure. Is it necessary? I don't know, I'm leaning more towards no than yes currently, given a 90% decrease. Maybe it becomes more useful with larger decreases.
I am confident that we could do new rent = old rent / 10 and have that ready in v4.0. The signals I'm getting say that shipping fast is a priority. How fast is fast enough? I don't know. Can we get an impl of this SIMD into v4.0? I'm not sure! Luckily you've already been working on it, so you may be in a better position to comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confident that we could do new rent = old rent / 10 and have that ready in v4.0
The rollout for that proposal will most likely be incremental, not a direct 10x reduction, and also include a waiting period between reduction stages to observe effects. From my PoV, most of the core part of the implementation for this SIMD is already done and I don't see why it can't be shipped quickly if everyone is aligned.
The hard part is on startup to get the deterministic number of accounts and accounts_data_size
I was assuming we can persist the fields we want by adding them to BankFieldsToSerialize or ExtraFieldsToSerialize. I'm not very familiar with the what happens during startup to initialize the current bank but it doesn't sound too difficult to add an extra field. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rollout for that proposal will most likely be incremental, not a direct 10x reduction
Why?
most of the core part of the implementation for this SIMD is already done and I don't see why it can't be shipped quickly if everyone is aligned.
Working on getting the accounts data size to be deterministic needs to be done first. IMO it's an unknown, thus I don't feel confident this can skip quickly.
Luckily the accounts data size part can be worked on immediately/doesn't need to wait for SIMD approval.
I was assuming we can persist the fields we want by adding them to
BankFieldsToSerializeorExtraFieldsToSerialize. I'm not very familiar with the what happens during startup to initialize the current bank but it doesn't sound too difficult to add an extra field. Am I missing something?
The field is already in the snapshot, that's not the hard part. The hard part is keeping it deterministic across the cluster. The last time I checked (which was like 1-2 years ago), it still wasn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll look into this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed an update (e2ef30d). Determinism is achieved by backing up the necessary bank fields into dedicated sysvars, which are subject to the snapshot integrity check through the accounts lthash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two deltas are tracked by the bank:
accounts_data_size_delta: only tracks the change in the actual data portion of the overall account state. Calculated as the sum of the resize_delta (already tracked deterministically by the runtime) across all transactions in a slot. Zero balance accounts have their data size subtracted off separately post-execution because the resize_delta doesn't account for them.accounts_num_delta: tracks change in the number of accounts. Determinism is pretty easy to achieve here with pre and post exec balance checks.
No description provided.