-
Notifications
You must be signed in to change notification settings - Fork 258
SIMD-0341: v0 Account Compression #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,389 @@ | ||
| --- | ||
| simd: '0341' | ||
| title: v0 Account Compression | ||
| authors: | ||
| - Igor Durovic (anza) | ||
| category: Standard | ||
| type: Core | ||
| status: Idea | ||
| created: 2025-08-21 | ||
| feature: (fill in with feature key and github tracking issues once accepted) | ||
| --- | ||
|
|
||
| ## Summary | ||
|
|
||
| Protocol-level account compression system including a new system | ||
| program to handle compression and decompression requests. The goal is to | ||
| significantly reduce the active account state and snapshot size by removing | ||
| qualifying accounts in such a way that they can be subsequently recovered. | ||
|
|
||
| ## Motivation | ||
|
|
||
| Solana's current account model requires all account data to be stored in | ||
| full on-chain, replicated on all validators indefinitely, leading to | ||
| significant storage costs and blockchain state bloat. Rent/storage cost is | ||
| already a significant complaint among app developers, and without reducing | ||
| the state size or growth, rent cannot be safely lowered and optimizations | ||
| like fully in-memory account state are infeasible in the medium to long | ||
| term. To solve this in an enduring way, we need: | ||
|
|
||
| 1. an economic mechanism to limit state growth | ||
| 2. a predicate to determine which accounts can be compressed | ||
| 3. a compression scheme that removes accounts from the global state while | ||
| allowing for safe and simple recovery. | ||
|
|
||
| This proposal focuses on (3), leaving (1) and (2) for other SIMDs. | ||
|
|
||
| ## New Terminology | ||
|
|
||
| - **Compression**: replacing arbitrary account data with a fixed size | ||
| commitment. Not to be confused with traditional compression; the data | ||
| cannot be directly recovered from compressed state. | ||
|
Comment on lines
+39
to
+41
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Compression" is overloaded, especially in this space (see: compressed nfts) and as mentioned here, account compression isn't analogous to traditional compression. Using a completely new term might be clearer, maybe "Compacting"? Just a suggestion, feel free to ignore.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd also love to not call it compression. I've been in favor of freeze/frozen/thaw/thawed personally.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also don't like the name but figured it had enough history that renaming would introduce more confusion. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some alternative naming suggestions: "stubbed" accounts, "digest" accounts, "proxy" accounts There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Anything but "compression"! Frozen/thawed sounds better. |
||
| - **Compression Condition**: a predicate determining whether or not an | ||
| existing account can be compressed. A specific compression condition isn't | ||
| provided in this SIMD -- it is assumed to always be false, meaning no | ||
| account is eligible for compression. | ||
| - `data_hash = lthash.out(Account)` where Account includes the pubkey, | ||
| lamports, data, owner, executable, and rent_epoch fields. See | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| [SIMD-0215](https://github.com/solana-foundation/solana-improvement-documents/blob/main/proposals/0215-accounts-lattice-hash.md) | ||
| for the definition of the lattice hash functions. | ||
| - **Decompression**: the process of restoring a compressed account to its | ||
| original state by providing the original account data and verifying it | ||
| matches the stored data_hash | ||
|
|
||
| ## Detailed Design | ||
|
|
||
| ### Syscalls for Compression Operations | ||
|
|
||
| The following new syscalls will be introduced to support account compression | ||
| operations. The compression system program will support two instructions that | ||
| just wrap these syscalls. For the time being, these syscalls can only be used | ||
| from the compression system program but that constraint may be relaxed in the | ||
| future. | ||
|
Comment on lines
+56
to
+62
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reason for doing compression/uncompression in syscalls, instead of adding two new system program instructions for doing this? Is there an advantage to these operations being syscalls rather than native program instructions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly, why a new program? Compression/un-compression feels like it fits within the system programs purview. |
||
|
|
||
| #### `sol_compress_account(account_pubkey: &[u8; 32])` | ||
|
|
||
| **Purpose**: Compresses an existing account by replacing its data with a | ||
| cryptographic commitment. | ||
|
|
||
| **Parameters**: | ||
|
|
||
| - `account_pubkey`: pubkey of the account to be compressed | ||
|
|
||
| **Behavior**: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What authorization is required to compress/decompress an account?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That depends on the "compression condition", which is intentionally set to |
||
|
|
||
| - MUST verify that the caller is the hardcoded compression system program | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you specify whether this check happens at program deploy-time or run-time? |
||
| - MUST verify the provided account satisfies the compression condition | ||
| - MUST mark the account as compressed if verification succeeds | ||
| - if the transaction containing the compression request succeeds, all | ||
| subsequent attempts to access the account MUST fail unless the account has | ||
| been decompressed. | ||
|
|
||
| While marking the account as compressed must be done synchronously, the | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what this paragraph is meant to address, or if I'm missing something. This is an implementation detail for the backing database that is irrelevant to this SIMD. A validator could never compress the account in the database (this may be useful for an RPC provider) as long as they calculate the lt_hash correctly.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, this is bordering on an implementation detail. I'll remove it to avoid confusion. |
||
| actual compression (ie the full replacement of the account with its | ||
| compressed form in the account database) can be done asynchronously for | ||
| performance: | ||
|
|
||
| - compute the 32-byte `data_hash` | ||
| - replace the account in account databse with a compressed account entry | ||
| - (optional) emit a compression event for off-chain data archival | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| #### `sol_decompress_account(..)` | ||
|
|
||
| **Purpose**: Recovers a compressed account by restoring its original data. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if this syscall interface should be changed to improve user UX.
I feel like we could combine some of these steps and skip the syscall copy
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Semantically, the compression program itself can delete the dummy account and transfer rent/balance/metadata to the new account. In that case, the copy vs move becomes an implementation detail. Is that sort of what you have in mind?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rather, we can make it so the decompression syscall does the delete. If the program does it then move vs copy isn't an implementation detail.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@igor56D I feel like it would be very tricky to implement an account move in a syscall handler. Since all the sorrounding code (transaction executor, VM) assumes that accounts are pinned in memory. EDIT: Would probably be easier if the move happens in the post-transaction cleanup after the program has finished executing.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good call, I'm being pretty hand-wavy because I'm not as familiar with these low-level details. Conceptually, it seems feasible to take I'll need to do some digging into the execution env to formulate something more concrete. |
||
|
|
||
| **Parameters**: | ||
|
|
||
| - `account_pubkey`: 32-byte public key of the account to decompress | ||
| - `lamports`: The lamport balance of the original account | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What will happen to the lamports when the account is compressed from a global tracking perspective? Maybe the get transferred to some kind of system account? |
||
| - `data`: Pointer to the original account data bytes | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if the original account data is larger than the max transaction size? There probably needs to be a mechanism for buffering the account data somewhere before uncompressing it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we should do something akin to program deployment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed with the buffering but thinking out loud, would it actually be beneficial to limit the size of compressed accounts to the max transaction size? If an account is bigger than the tx size limit, maybe it should live uncompressed onchain to prevent tpu/tx bloat?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this shouldn't materially impact tx bloat. If someone wants a large account with specific data on-chain, they need to upload the data in parts across multiple transactions. The same applies to a large account getting decompressed. When an account is decompressed, it persists on-chain like any other active account unless it gets compressed again so subsequent accesses don't need to re-upload. |
||
| - `data_len`: Length of the account data in bytes | ||
| - `owner`: 32-byte public key of the account owner | ||
| - `executable`: Whether the account is executable | ||
| - `rent_epoch`: The rent epoch of the original account | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Syscalls unfortunately can only take 5 arguments (iirc).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm pretty sure that It's fine to remove from this syscall
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, rent epoch was removed with the Accounts Lt Hash in SIMD-215. Let's remove it here as well. |
||
|
|
||
| **Behavior**: | ||
|
|
||
| - MUST verify the caller is the hardcoded compression system program | ||
| - MUST verify the account is currently in compressed state | ||
| - MUST compute `lthash.out(Account)` from the provided parameters and verify | ||
| it matches the stored compressed account's data_hash | ||
| - if verification succeeds, the original account MUST be restored to the active set | ||
| - this is treated exactly like a new account allocation so rent | ||
| requirements, load limits, etc all apply | ||
| - if verification succeeds, the compressed account entry must be replaced | ||
| with the full account data | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| ### Database and Snapshot Extensions | ||
|
|
||
| #### Compressed Accounts Storage | ||
|
|
||
| Compressed accounts are stored directly in the account database like regular | ||
| accounts, but with a special compressed account structure: | ||
|
Comment on lines
+121
to
+122
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like implementation details? We may or may not do exactly this. I agree we need the account's pubkey and |
||
|
|
||
| ```rust | ||
| pub struct CompressedAccount { | ||
| pub pubkey: Pubkey, | ||
| pub data_hash: [u8; 32], | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please note that the 'lattice hash' is 2048 bytes long, so the spec should mention how to go from 2048 bytes to 32 bytes hash size. Lattice hash is simply BLAKE3 with extended output (XOF). We can simply take the first 32 bytes here since BLAKE3 has the convenient property that, for a constant input, the low bytes of the hash output don't change as the XOF output length parameter is increased. EDIT:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (What we should NOT do is something like
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was using
^^ I'm confused by this but I assumed it was what you meant by 32-byte mode. I'll change the doc to explicitly use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Haha yeah we don't have definitions of these blake3 pseudocode functions
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Oops 🙃 I do But, that's only used for logging purposes and can be changed. We should do the right thing for compression stuff. |
||
| } | ||
| ``` | ||
|
|
||
| #### Bank Hash Integration | ||
|
|
||
| The bank hash calculation is updated to handle compressed accounts: | ||
|
|
||
| - **Existing behavior**: All accounts continue to contribute to the bank hash | ||
| via the accounts lattice hash | ||
| - **New behavior**: Compressed accounts contribute to the lattice hash using | ||
| their compressed representation instead of full account data | ||
| - **Hash calculation**: No changes to the overall bank hash structure, but | ||
| the lattice hash computation includes compressed accounts: | ||
|
|
||
| ``` | ||
| lthash(account: CompressedAccount) := | ||
| lthash.init() | ||
| lthash.append( account.pubkey ) | ||
| lthash.append( account.data_hash ) | ||
| return lthash.fini() | ||
| ``` | ||
|
|
||
| ### Account Creation Validation | ||
|
|
||
| When creating new accounts, the runtime MUST verify the target pubkey does not | ||
| already exist as a compressed account, just like with uncompressed accounts. | ||
|
|
||
| #### Execution Error | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if a user attempts to access a compressed account in the vm? Is this an execution error? I think it's simpler if it's required that all accounts that an instruction accesses (including through nested cpi calls) have already been decompressed before the vm invocation.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. During VM execution, attempts to access a compressed account are equivalent to attempts to access a non-existing account, and returns the same error |
||
|
|
||
| If an attempt is made to create an account at a pubkey that already exists as | ||
| a compressed account, the transaction MUST fail with the `AccountAlreadyInUse` | ||
| system error. | ||
|
|
||
| This maintains consistency with existing Solana behavior where any attempt to | ||
| create an account at an occupied address fails with the same error, regardless | ||
| of whether the existing account is active or compressed. | ||
|
|
||
| It may be worthwhile to introduce a new system error specific to collisions | ||
| on compressed public keys only if that's more clearly actionable for users | ||
| and developers. | ||
|
|
||
| ### Off-chain Data Storage | ||
|
|
||
| Since compressed accounts only store the data hash on-chain, the original | ||
| account data must be stored off-chain for recovery purposes. This system | ||
| provides multiple mechanisms for data availability: | ||
|
|
||
| #### RPC Provider Storage | ||
|
|
||
| RPC providers can maintain archives of compressed account data to support | ||
| client applications. When an account is compressed, the original data is made | ||
| available through RPC endpoints for future recovery operations. | ||
|
|
||
| #### Account Subscription for Compression Events | ||
|
|
||
| The existing `accountSubscribe` RPC endpoint will be extended to notify | ||
| subscribers when accounts are compressed. This provides real-time access to | ||
| compression events through the established subscription mechanism. | ||
|
|
||
| When an account is compressed, subscribers will receive: | ||
|
|
||
| ```typescript | ||
| interface AccountNotification { | ||
| // ... existing fields | ||
| result: { | ||
| context: { | ||
| slot: number; | ||
| }; | ||
| value: CompressedAccountInfo | ActiveAccountInfo; | ||
| originalAccount?: { // Only included during compression events | ||
| pubkey: string; | ||
| lamports: number; | ||
| data: Uint8Array; | ||
| owner: string; | ||
| executable: boolean; | ||
| rentEpoch: number; | ||
| }; | ||
| }; | ||
| } | ||
| ``` | ||
|
|
||
| **Critical Implementation Detail**: Validators MUST NOT delete the full account | ||
| data until all `accountSubscribe` subscribers have been notified of the | ||
| compression event. This ensures that off-chain services have the opportunity | ||
| to archive the original account data before it is permanently removed. | ||
|
|
||
| This approach enables: | ||
|
|
||
| - **Archive services**: Third-party services can maintain comprehensive | ||
| compressed data archives using existing subscription infrastructure | ||
| - **Application-specific storage**: DApps can store their own compressed | ||
| account data through established patterns | ||
| - **Redundancy**: Multiple parties can maintain copies for data availability | ||
|
Comment on lines
+209
to
+220
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we end up in a situations where: What would you expect/want to happen in these scenarios? Can redundancy be truly guaranteed, and can it be fairly distributed to all validators (and quickly enough for a slot, globally)? Is the expectation that a solana project would ensure that it does a good job at storing its data off-chain? Let's say there's some trading happening which requires the decompression of an account. If that project turned their archive service off, and validators can't access that data, could that risk assets becoming un-tradeable? This feels like we could risk things becoming centralised. Let me know if I've misunderstood something.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a theoretical possibility that compressed account data is completely lost, though it's extremely low. Even if every indexer, RPC provider, wallet provider, etc somehow loses the data (or refuses to release it), you can always fall back to replaying the ledger history from some snapshot before the account was compressed. The protocol provides the guarantee that you can always recover a compressed account if you can provide the original data. Users/apps must decide for themselves how strong the data availability guarantee must be for their use case, and plan accordingly. The best guarantee comes from running your own full node that's synced with the network. Others may prefer to subscribe to a stream of compression events from an RPC provider and store the compressed data themselves. Less sophisticated users are likely to rely on wallet backups or indexers to keep the data available. In any case, the complexity is moved out of the protocol and into off-chain infra.
In reality, most full node indexers will have this data stored, though this isn't backed by clearcut incentives for now. Worst case scenario is needing to replay the ledger. There's also the option of storing the data in some distributed storage protocol like filecoin or IPFS. The availability concern is purely theoretical until the compressed state size grows dramatically, in which case we'll need to consider a more complete data availability system. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Guarantees of future availability are essential. This doesn't go far enough.
This is not as easy as it sounds (according to people who have done it and told me it was tough).
This is a weak economic assumption. There's not much incentive to store terabytes of cold data in a separate data store for the outside chance that someone will pay $0.000025 to read it. Keep in mind the accounts are primarily cold because people aren't using/reading them. However, some of the discussion further above suggests a solution that comes closer to a guarantee of future availability with a data store that will always exist and does not require replay. The process for thawing a frozen account, which is similar to deploying a program using a buffer account, might also be used to freeze the account: write the account state to the ledger in one or more TXs before freezing it.
Footnotes:
A parting question: Since the validators are not connected to the ledger history, does that mean the ledger history is technically an "off-chain database"? (I'll see myself out) ;-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So first of all, a solana "user" is assumed to have a full node, or is paying someone else to run a full node directly or indirectly. There is no way for the protocol to make any guarantees about users data without this assumption. The best any protocol can do is that if the users one full node stays consistent then the user is not at risk of any loss. If a user has access to a full node they can always replicate just part of the compressed state that user cares about. We need tools to automate the compressed state recovery from the ledger. If you want weak assumption that there exists at least one full node in the network that will keep the user state, that user just needs to pay enough to keep it from being compressed. But this is also a weak economic incentive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The elegance of storing the frozen account state in the ledger is that the DB storage & retrieval tools already exist. getSignaturesForAddress() + getTransaction() or an Old Faithful gRPC stream will return the account state without a specialized SQL index or API. The new tooling will be related to the freeze/thaw processes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. None that can offer a guarantee of completeness or future availability. For example, streaming accounts through Geyser to a database won't work because there's no guarantee that messages will be received or that the database will be complete. Losing a single message could have catastrophic consequences for the account holder. I think all of your points about the tricky implementation are valid. Nonetheless, the guarantee is more important than the implementation challenge. Saving data in the ledger checks the boxes for storage & retrieval guarantees, universal availability, & economic alignment for the infrastructure providers.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't figure out a way to make this work well enough. Even if we introduce the complexity I mentioned above, compression requests will still be prohibitively expensive. It's very important that compression is cheap -- the network needs to be able to promptly and sufficiently respond to growing state/rent prices by evicting delinquent state, which may require a lot of compression requests. Requiring all the compressed state to flow through the blocks will be a challenge for throughput. You mentioned that replaying the ledger is difficult now. Would addressing that problem make more sense? Ledger history can tell you the slot the account was compressed at, so you'll know how far to replay. Not sure how much we can rely on historical snapshot retention, but reliable replay + snapshot fetching should make account reconstruction relatively doable.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if that doesn't seem sufficient, I'd prefer @aeyakovenko's previous idea of maintaining a separate compressed state snapshot that's replicated on all validators over making compression really difficult. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This also works, and was my original suggestion at some earlier meetups. Hot & Cold account sets with separate snapshots & hashes. NVMe storage is affordable, and drive sizes keep getting bigger. JWash mentioned a concern about validator start times when the validator needs to validate two account DBs. That could be a more solvable problem and provide the guarantees. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. P.S. Validator startup times will be slower, but all good validators run hot spares, so there's minimal downtime during failover. Slow startup times are manageable. |
||
|
|
||
| ### RPC Handling for Compressed Accounts | ||
|
|
||
| Existing RPC endpoints must be updated to handle compressed accounts properly. | ||
| When a client requests account information for a compressed account, the | ||
| response should clearly indicate the compression status and provide the | ||
| available data. | ||
|
|
||
| #### Updated `getAccountInfo` Response | ||
|
|
||
| The existing `getAccountInfo` endpoint will return modified responses for | ||
| compressed accounts: | ||
|
|
||
| ```typescript | ||
| interface CompressedAccountInfo { | ||
| compressed: true; | ||
| pubkey: string; | ||
| dataHash: string; // 32-byte hex-encoded lattice hash | ||
| } | ||
|
|
||
| interface ActiveAccountInfo { | ||
| compressed: false; | ||
| lamports: number; | ||
| data: [string, string]; // existing format [data, encoding] | ||
| owner: string; | ||
| executable: boolean; | ||
| rentEpoch: number; | ||
| } | ||
|
|
||
| type AccountInfo = CompressedAccountInfo | ActiveAccountInfo; | ||
| ``` | ||
|
|
||
| #### (optional) New `getCompressedAccountData` Endpoint | ||
|
|
||
| A new RPC endpoint specifically for retrieving compressed account data. | ||
| This is optional as it requires retaining data that isn't relevant to | ||
| core validator operations. | ||
|
|
||
| **Endpoint**: `getCompressedAccountData` | ||
| **Method**: POST | ||
| **Parameters**: | ||
|
|
||
| - `pubkey`: string - Account public key | ||
| - `commitment?`: Commitment level | ||
| - `dataHash?`: string - Optional data hash for verification | ||
|
|
||
| **Response**: | ||
|
|
||
| ```typescript | ||
| interface CompressedAccountDataResponse { | ||
| pubkey: string; | ||
| dataHash: string; | ||
| originalAccount: { | ||
| pubkey: string; | ||
| lamports: number; | ||
| data: Uint8Array; | ||
| owner: string; | ||
| executable: boolean; | ||
| rentEpoch: number; | ||
| } | null; // null if data not available | ||
| } | ||
| ``` | ||
|
|
||
| ### Performance Considerations | ||
|
|
||
| - **Recovery operations**: will require a disk-read so CU cost should be set | ||
| accordingly. | ||
| - **Off-chain storage**: Applications and RPC providers need sufficient | ||
| storage capacity for compressed account archives. | ||
| - **RPC performance**: Compressed account queries may require additional | ||
| archive lookups, potentially increasing response times. | ||
|
|
||
| ## Alternatives Considered | ||
|
|
||
| ### Fixed-size vector commitments | ||
|
|
||
| All compressed data is moved off-chain and replaced with a fixed size vector | ||
| commitment. Membership/Non-membership proofs are used for account creation, | ||
| compression, and decompression. | ||
|
|
||
| **Pros**: Minimal on-chain storage | ||
|
|
||
| **Cons**: Complex proof generation, proof availability concerns, complexity | ||
|
|
||
| The next iteration of account compression will likely look similar to this | ||
| but the complexity isn't currently necessary. | ||
|
|
||
| ### Reduce and fix hot-set size with a chili peppers-like approach | ||
|
|
||
| **Pros**: keeps all data on-chain, no need for users to manually recover old | ||
| accounts | ||
|
|
||
| **Cons**: doesn't reduce total state size so snapshots remain large and | ||
| rent remains high | ||
|
|
||
| Chili peppers has other applications but may not be necessary if account | ||
| compression can reduce the global state size sufficiently to store the | ||
| entire account state in memory. | ||
|
|
||
| ### Conclusion | ||
|
|
||
| The proposed hash-based approach provides the optimal balance of storage | ||
| savings, performance predictability, and implementation complexity. | ||
|
|
||
| ## Impact | ||
|
|
||
| ### DApp and Wallet Developers | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If an app or wallet developer wanted to view the data of an account, would this mean that they would have to simulate a transaction with a decompress call to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the account is compressed, the data isn't available on-chain. If that account needs to be used then it must be decompressed, which requires providing the data. The off-chain source of that data can vary, like I mentioned in a previous comment. |
||
|
|
||
| - include checks for account compression status in program interaction | ||
| workflow | ||
| - add instructions to transactions for account recovery when appropriate | ||
| - additional regular programs can be deployed to wrap CPI calls to the | ||
| compression system program to improve UX. For example, a decompression | ||
| request program can allow users to submit accounts they would like to be | ||
| decompressed along with a tip to incentivize others to fulfill the request. | ||
|
|
||
| ### Validators | ||
|
|
||
| - **Memory/Storage savings**: If enough accounts are compressed, a | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the expected storage savings given an account being compressed with X bytes?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Compressed accounts will take up 64 bytes of space (pubkey and data hash), so the space savings will be |
||
| significant reduction in disk/memory usage is expected | ||
| - **Network impact**: Reduced snapshot sizes improve sync times | ||
|
|
||
| ### Core Contributors | ||
|
|
||
| - **Implementation scope**: Changes required in runtime, banking, and | ||
| snapshot systems | ||
| - **Testing requirements**: Comprehensive testing of compression/recovery cycles | ||
| - **Monitoring needs**: New metrics for compression performance, compressed | ||
| state size, rate, etc | ||
|
|
||
| ## Security Considerations | ||
|
|
||
| ### Data Integrity | ||
|
|
||
| - **Hash verification**: All recovery operations verify data integrity via | ||
| lattice hash comparison | ||
| - **Atomic operations**: Compression/recovery operations are atomic to | ||
| ensure consistent state across the cluster | ||
|
|
||
| ### Attack Vectors | ||
|
|
||
| - **Hash collision attacks**: collisions on the `data_hash` would allow for | ||
| introducing arbitrary data into the account state. The lattice hash function | ||
| provides sufficient collision resistance. | ||
|
|
||
| ## Backwards Compatibility | ||
|
|
||
| This feature introduces breaking changes: | ||
|
|
||
| ### Bank Hash Changes | ||
|
|
||
| - **Impact**: Bank hash calculation includes compressed accounts, affecting | ||
| consensus | ||
| - **Mitigation**: Feature gate activation ensures all validators adopt | ||
| simultaneously | ||
|
|
||
| ### Snapshot Format | ||
|
|
||
| - **Impact**: New snapshot format including compressed accounts | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need (nor want) to change the snapshot format. We do need to modify the account storage file format though. |
||
| - **Mitigation**: Version-aware snapshot loading with backward compatibility | ||
| for old snapshots | ||
|
|
||
| ### Account Creation Behavior | ||
|
|
||
| - **Impact**: Account creation fails if pubkey already exists as a compressed | ||
| account | ||
| - **Mitigation**: if the account corresponding to the pubkey was previously | ||
| compressed it must be recovered rather than recreated. RPCs and good errors | ||
| can provide the relevant info. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worthwhile to include lowering rent cost in this list too? It's a main motivator in the above paragraph. (Will be a separate SIMD, of course.)