-
Notifications
You must be signed in to change notification settings - Fork 937
Rough prototype for architectural changes needed to introduce execution proofs #7755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
523cfca to
a283796
Compare
Resolve merge conflicts and update codebase with latest changes from unstable: - Maintain execution proof network imports in publish_blocks.rs - Update peer manager to use new unified pruning logic - Add ExecutionProof subnet handling in peer subnet info
a283796 to
4ba3974
Compare
This PR adds the scaffolding needed to have zk stateless clients. Most of these changes have been integrated here: sigp/lighthouse#7755 - The cryptography part has been stubbed out/implemented in a insecure way, it might be a non-trivial effort to have zkEVM proofs properly working even for tests because they generally need GPUs to create - This is the variant of executionProofs that does not require a fork upgrade
**Proposal: Add zkEVM working group** This proposal outlines the rationale of creating a **zkEVM working group** within the Protocol Guild. This working group will maintain a focus on foundational protocol-level work that benefits the broader ecosystem, rather than specific product or client development. This PR acknowledges that Ethereum's ["snarkification" roadmap](https://blog.ethereum.org/2025/07/31/lean-ethereum) is still in its early stages and will likely evolve over several years. However, the launch of a zkEVM Attester Client is anticipated in the short-to-medium term: * [Shipping an L1 zkEVM \protocolguild#1: Realtime Proving](https://blog.ethereum.org/2025/07/10/realtime-proving) * [Protocol Update 001 – Scale L1](https://blog.ethereum.org/2025/08/05/protocol-update-001) * [Making Sense of a ZK Staking Node](https://paragraph.com/@ethstaker/making-sense-of-a-zk-staking-node) Therefore, it is appropriate to include this initiative in Protocol Guild's "Wayfinding" category, which is defined as "the exploratory process to surface, describe and validate potential protocol changes". A similar longterm exploration would be post-quantum crypto, or the verkle/stateless efforts. This PR also adds Kev to this new category, who have already been engaging in this work. **Initial Eligible Scope** * Includes [prototyping work](sigp/lighthouse#7755) for a zkEVM attester client. The client will follow a [draft specification](http://github.com/ethereum/consensus-specs/pull/4591) that integrates both consensus and execution verification into a single binary. * May include guest program benchmarking, guest program security, zkEVM coordination, zkEVM proof verification, zkEVM specifications, and guest program compilation. **Exclusions** ["Lean Consensus" client teams](https://leanroadmap.org) are not covered by this proposal.
Wouldn't this imply the enshrining of the verifiers? |
Yes thats a good point -- so it can't be done with the initial (optional) execution proofs feature |
| // TODO: Add timing validation based on slot | ||
| // TODO: Add duplicate proof detection | ||
| if O::observe() { | ||
| // TODO: Add duplicate proof detection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to this, one assumption also being made is that the block will arrive before the proofs. This assumption is reasonably sound given the time it takes to create proofs, however I can see it being broken in the following cases:
- The slot time is decreased (implying the execution payload gas limit is lowered) which would mean that proofs are created faster
- A node is not well-connected and does not receive the block until after the proof due to network lag. This may happen more frequently for empty blocks.
One solution here would be to store proofs for blocks currentSlot + N in the future, but we would need to store the slot number also in the execution proof message because an unknown beacon root cannot tell you how far in the future it is
| optimistic_finalized_sync: true, | ||
| stateless_validation: false, | ||
| generate_execution_proofs: false, | ||
| max_execution_payload_proofs: 10_000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the expected time between receiving a block and then receiving the proof? We had a similar debate for the cache the holds blocks without blobs where initially we made it such that it could hold a very large numbers of blocks by storing them on disk. However for simplicity we later change it to hold only items in memory since you since blobs are expected to come relatively fast. I'm not sure of the value of holding a block whose data arrives 1 day later. Similar reasoning may apply to this new cache for proofs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current proof creation deadlines are around 8-10 seconds
|
|
||
| // Spawn the proof generation task in the background | ||
| // WARNING: No resource limits or task counting is performed here. | ||
| // TODO: Implement a task queue with concurrency limits and resource monitoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could send a new task to the beacon processor, which would limit resource consumption. Do you expect this process to have heavy parallelization? Reconstruction is breaking a little our current model as it consumes too many threads @jimmygchen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in practice, the beacon chain will always make an API call here -- the proof generation process will require GPUs so it may be highly unlikely that the validator is on the same setup with a bunch of GPUs
| // In a real implementation, this would be the time needed for zkVM local proof generation | ||
| // or communication with external proof generation services | ||
| use rand::{Rng, rng}; | ||
| let delay_ms = rng().random_range(1000..=3000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a flag to allow to test the system with slower proof generation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think so -- should note that we will only be testing (dummy proof generation here) because actual proof generation is too heavy (needs GPUs)
| // TODO: For now, the node will generate proofs for all available subnets. | ||
| // TODO: In the future, they should be able to configure this for proofs | ||
| // TODO: they can generate for. Mainly for altruistic nodes that want to | ||
| // TODO: seed the network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a CLI argument where you choose the set of subnets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the user know what the subnets represent? ie subnet 0 might be the reth-risc0 subnet, oe just let them specify subnet0 and the mapping is noted somewhere else
| FullPayloadRef::Fulu(payload) => ExecutionPayload::Fulu(payload.execution_payload.clone()), | ||
| FullPayloadRef::Gloas(payload) => { | ||
| ExecutionPayload::Gloas(payload.execution_payload.clone()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelsproul is there a macro over the fields we can use here?
| } | ||
| } | ||
|
|
||
| /// Validate an execution proof for gossip according to the rules defined in the consensus specs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we verify the ZK proof in gossip validation? How fast do you expect it to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the region of 100ms, but would need to benchmark this to give you concrete numbers for each zkEVM
| } | ||
|
|
||
| // Subscribe to all execution proof subnets when stateless validation is enabled | ||
| if opts.stateless_validation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all nodes subscribe to all execution proof subnets by default, why have separate subnets at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just for the initial prototype -- nodes can choose how many subnets they subscribe to based on their bandwidth and thoughts around zkVM/client diversity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this file to a hackmd and link the hackmd at the top of the PR description?
| /// | ||
| /// In reality, I do not think we will have 8, more closer to 3, though this is still being | ||
| /// explored. This number could be larger if we consider combining different zkVMs with different guests. | ||
| pub const MAX_EXECUTION_PROOF_SUBNETS: u64 = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If producers only subscribe and emit to a specific of subnets they support, and consumers only ready the set of incoming proofs they support, why do we need a max count of overall subnets?
|
Some notes from my latest review:
|
To rephrase this question; it is analogous to asking "execution proofs fungible" -- ie should nodes care if they receive a reth-sp1 proof vs a geth-risc0 proof. If they do then having separate subnets is useful and nodes can explicitly choose to receive one over the other, if they don't then having one subnet and receiving all of the different proofs on that subnet makes more sense |
|
Note from Teku; when Teku sends a proof before the block LH nodes will disconnect them since we currently assume that the block comes before the proof |
|
TODO: Decrease the number of subnets to 1 |
|
Add issue to allow optional EL |
This seems to be working as expected, ie proofs that arrive before the block are put into PendingComponents |
This PR is not meant to be merged. It's mainly to discover all of the architectural changes needed
Notes: https://hackmd.io/@kevaundray/BJeZCo5Tgx
Issue Addressed
The addition of zkEVMs to Ethereum L1 allows the CL to no longer need an EL to verify execution payloads, the execution payload content is moreover replaced with a cryptographic proof that does not scale linearly with the size of the execution payload proof. We will still need the execution payload header.
The current idea for rolling out the changes for zkEVMs is to iteratively add the changes needed such that we can safely test them and uncover edge cases, blockers, etc. safely here means that it should not affect the existing CL logic.
Proposed Changes
Proof generating nodes
Someone must create the execution proofs for the execution payloads. Long term, when proofs are mandatory, we believe this will be the builders, since they are incentivised to not have their blocks be re-orged for being invalid.
While we are in the interim stage where proofs are not mandatory, there is no incentive to create proofs. The idea here is that these proofs will be subsidised until then.
Proof generating nodes are the nodes whom will generate these proofs in the protocol.
new_payload
Nodes can opt to be proof generating nodes which means that whenever they receive a beacon block and need to verify the execution payload via engine_new_payload, they will also generate proofs and submit these proofs over the new proof subnets.
get_payload
These nodes can also opt to generate proofs for blocks they have proposed. I believe we could just have proof generation only in new_payload and omit it in get_payload -- the idea behind not doing it this way was that if you're the proposer, you can start generating proofs as soon as you receive the payload from the EL.
Stateless attestor
new_payload
Stateless nodes do not have an EL attached (in practice) and will wait for proofs to be received on whichever proof subnets they have subscribed to, in order to validate the execution payload.
The way it has been implemented in this PR is that if the node is a stateless validator, they will mark all payloads as optimistic, moreover, since fork choice is never modified, the node currently will always be in optimistic sync mode.
get_payload
Stateless nodes cannot locally build blocks because they have no EL state, they must use mev-boost. This PR does not add the mev-boost functionality, so stateless nodes cannot build blocks.
Proof generation
Currently proof generation has been stubbed out since it is not needed for running local testnets. The main feature we need here is determinstic output wrt the execution payload.
Proof chain
This code deliberately does not modify fork choice to incorporate proof verification. We instead have a structure that keeps track of what blocks have been proven alongside the beacon chain. This means that one could run a stateless validator alongside mainnet/testnet.
Additional Info
./scripts/local_testnet/start_local_testnet.sh