[PERSISTENCE] SavePoints and Rollbacks design document (Issue #493)#533
[PERSISTENCE] SavePoints and Rollbacks design document (Issue #493)#533
Conversation
…s-rollbacks-design Signed-off-by: Alessandro De Blasis <[email protected]>
Olshansk
left a comment
There was a problem hiding this comment.
I wanted to finish the first round of review with some thoughts/comments/ideas. You've identified a couple pieces that we can potentially start implementing right away, so let's continue the conversation.
Keep throwing questions & ideas so we can iterate on it.
Two things:
- I got a little tired toward the end so I apologize if the quality of the comments decreased.
- In a couple places, I didn't go back to edit my previous comments, so some things may feel redundant.
|
|
||
| ### Further improvements | ||
|
|
||
| - Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon. |
There was a problem hiding this comment.
I see, you really do look at savepoints as snapshots.
Do you see the solution for a node-specific savepoint (i.e. I updated one tree but not another and crashed) and a network wide (i.e. a data dir snapshot) as having the same solution?
There was a problem hiding this comment.
I think it can very much build on top of what I have in mind.
Similarly to what databases do, we'd need some data integrity check at startup that verifies that all the datastores are correct.
It can very much rely on the statehash I guess...
How?
KISS, back of the envelope implementation:
- retrieve the latest valid statehash (from file, from
SuperValidators, from P2P, TBD) - at startup check if it matches the current, computed state hash
- if not, rollback to latest savepoint and sync from there
There was a problem hiding this comment.
retrieve the latest valid statehash (from file, from SuperValidators, from P2P, TBD)
This is what "weak subjectivity" is. From Vitalik's 2014 article
Weak subjectivity is exactly the correct solution. It solves the long-range problems with proof of stake by relying on human-driven social information, but leaves to a consensus algorithm the role of increasing the speed of consensus from many weeks to twelve seconds and of allowing the use of highly complex rulesets and a large state. T
if not, rollback to latest savepoint and sync from there
at startup check if it matches the current, computed state hash
I think there are implementation nuances here, but I think you've got it.
I think it's worth adding (in the moonshot section) that we can potentially use Mina to verify the snapshot hash using only the latest state and the genesis.json file
There was a problem hiding this comment.
I think there are implementation nuances here, but I think you've got it.
Music to my ears.
I am going to read that article tonight. Thanks for sharing. 🙏
I think it's worth adding (in the moonshot section) that we can potentially use Mina to verify the snapshot hash using only the latest state and the genesis.json file
Added
|
|
||
| - Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon. | ||
|
|
||
| For example: a fresh node could be looking for the latest `Savepoint` signed by PNI available, download it from Tor, apply its state and resume the normal sync process from the other nodes from there. |
There was a problem hiding this comment.
Tor is great (and the right starting point) but long-term, try to also start thinking in terms of incentive: https://olshansky.medium.com/cryptocurrencies-its-all-about-incentive-77ac47a6adc4
- Is there another protocol that incentives sharing snapshots?
- Could/should we have another protocol actor for this?
- Is it part of Pocket or external?
For example, Gohkan and I were talking about this yesterday: https://olshansky.medium.com/cryptocurrencies-its-all-about-incentive-77ac47a6adc4
We got the idea of having SuperValidators (i.e. higher stake, higher penalty, limited in number) that could help with bootstrapping, and also make the formal 2/3 agreement more flexible. Could be a HotPOKT innovation ;)
There was a problem hiding this comment.
Yeah, makes sense. Perhaps disseminating snapshots via Tor or other medium is something that I'd start doing as PNI (for example), then I'd look into offloading the responsibility to the protocol itself by creating the necessary incentives but first of all I'd like to see it working like a charm.
Since snapshots/rollback could be seen also as a "disaster recovery" tool as well, I'd advice to use something that's decoupled from Pocket, or at least that can be interacted with in other ways in the case of complete network halt.
I don't know why but I am thinking about Solana now :)
There was a problem hiding this comment.
Maybe add a "FiremanBoostrapper" or "Node911" as a moonshot idea :)
There was a problem hiding this comment.
LOL, I'll leave that to you ;)
There was a problem hiding this comment.
Maybe even an IPFS hash that's stored directly in the block header
|
|
||
| ## Random thoughts | ||
|
|
||
| - I wonder if serialized, compressed and signed `Merkle Patricia Trie`s could be leveraged as a media for storing `Savepoint`s in a space-efficient and "blockchain-native" way 🤔. |
There was a problem hiding this comment.
Note that we're not using Etheruem's Merkle Patricia Trie, but a modified version of Libra's Jellyfish Merkle Tree implemented by Celestia.
The fact that we're using Trees, the root has a hash, and that hash is easily verifiable. If a node is synching from scratch, it's easy to verify the state transitions are correct. If a node is not synching from scratch, it needs to trust someone else. Mina protocol (which I don't really know) has a special property that if you have the genesis parameters (i.e. the json config) and ANY height, you can immediately (in O(1)) verify that it is correct. Might be worth looking into.
https://olshansky.substack.com/p/5p1r-ethereums-modified-merkle-patricia
https://olshansky.substack.com/p/5p1r-jellyfish-merkle-tree
There was a problem hiding this comment.
Yeah, I was reading about Ethereum and I had a Freudian slip with Patricia, luckily Celestia is not jealous.... bad joke 😝
Noted.
|
@Olshansk I know you don't have a working laptop with you and I am not expecting an answer till you are back from Denver. Just an update: I have addressed the comments and I am now going to rescope the related tickets to tackle the MVP scenario first. In the meantime I am working on #508 (keeping in mind the changes that will come with savepoints/rollbacks) Perhaps I could create also tickets for the more advanced implementation but I would wait for this to be merged first. |
|
|
||
| - [**State changes invalidation and rollback triggering**] We need some sort of shared and thread-safe reference that is available for us across the whole call-stack that we can use in the event of a failure to flag that we need to abort whatever we are doing and rollback. This could be achieved via the use of the [context](https://pkg.go.dev/context) package. | ||
|
|
||
| - [**Ensure atomicity across data stores**] We need to make sure that we are using transactions correctly and that we are not accidentally committing state ahead of time in any of the data-stores. |
There was a problem hiding this comment.
Indexed may be the wrong term here but also related IMO.
Fwiw, companies like https://www.covalenthq.com/ are basically (I'm WAYYYY oversimplifying) huge relational DBs.
Olshansk
left a comment
There was a problem hiding this comment.
@Olshansk I know you don't have a working laptop with you and I am not expecting an answer till you are back from Denver.
What a life, eh? 🤹
I have addressed the comments and I am now going to rescope the related tickets to tackle the MVP scenario first. In the meantime I am working on #508 (keeping in mind the changes that will come with savepoints/rollbacks)
Perhaps I could create also tickets for the more advanced implementation but I would wait for this to be merged first.
Discussed offline, but makes sense to me. Capturing some of the points
Between the
Checkout my responses to some of the comments. Hopefully both fun and educational.
Pretty sure you're deep in the implementation stage so I'm sure the ideas are flowing.
|
|
||
| ### Further improvements | ||
|
|
||
| - Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon. |
There was a problem hiding this comment.
retrieve the latest valid statehash (from file, from SuperValidators, from P2P, TBD)
This is what "weak subjectivity" is. From Vitalik's 2014 article
Weak subjectivity is exactly the correct solution. It solves the long-range problems with proof of stake by relying on human-driven social information, but leaves to a consensus algorithm the role of increasing the speed of consensus from many weeks to twelve seconds and of allowing the use of highly complex rulesets and a large state. T
if not, rollback to latest savepoint and sync from there
at startup check if it matches the current, computed state hash
I think there are implementation nuances here, but I think you've got it.
I think it's worth adding (in the moonshot section) that we can potentially use Mina to verify the snapshot hash using only the latest state and the genesis.json file
|
|
||
| - Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon. | ||
|
|
||
| For example: a fresh node could be looking for the latest `Savepoint` signed by PNI available, download it from Tor, apply its state and resume the normal sync process from the other nodes from there. |
There was a problem hiding this comment.
Maybe add a "FiremanBoostrapper" or "Node911" as a moonshot idea :)
Co-authored-by: Daniel Olshansky <[email protected]>
…kt-network/pocket into issue/493-savepoints-rollbacks-design
Co-authored-by: Daniel Olshansky <[email protected]>
Co-authored-by: Daniel Olshansky <[email protected]>
…kt-network/pocket into issue/493-savepoints-rollbacks-design Signed-off-by: Alessandro De Blasis <[email protected]>
…s-rollbacks-design Signed-off-by: Alessandro De Blasis <[email protected]>
|
| GitGuardian id | Secret | Commit | Filename | |
|---|---|---|---|---|
| 5841025 | Generic High Entropy Secret | 8030dca | build/config/genesis.json | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Our GitHub checks need improvements? Share your feedbacks!
|
@deblasis Going to hold off on reviewing this again until after our discussions next week. |
Olshansk
left a comment
There was a problem hiding this comment.
We're going to be figuring out implementation details along the way, but I think this is good to merge. Going to leave some of the comments unresolved because they're legendary :)
Just add a GITHUB_WIKI comment at the bottom before merging in, please!
…s-rollbacks-design Signed-off-by: Alessandro De Blasis <[email protected]>
|
Seems you are using me but didn't get OPENAI_API_KEY seted in Variables for this repo. you could follow readme for more information |
|
AI-Generated Summary: This pull request includes the addition of a design document called The MVP for implementing savepoints and rollbacks consists of the following tasks:
Long-term ideas for savepoints and rollbacks include snapshot hash verification using the |
* pokt/main: [Utility][RPC][CLI] Querying governance parameters (Issue #619) (#622) [Persistence][Utility] Separate all CreateAndApply functions into more functional components - Issue #508 (#652) [Persistence][Utility] Pools Address hack removal + state accessor fix for params and flags (#654) [PERSISTENCE] SavePoints and Rollbacks design document (Issue #493) (#533) Update reviewpad.yml Added ChatGPT-CodeReview workflow (#649) Update reviewpad.yml Added default reviewpad.yml file (#648) [DevNet] tweaks for remote environments (#601) [Documentation] Swap validator and non-validator triggers when finished synching (#646) [Consensus] Configuration entry point state sync (#528)
…p-modules * pokt/main: [Utility][RPC][CLI] Querying governance parameters (Issue pokt-network#619) (pokt-network#622) [Persistence][Utility] Separate all CreateAndApply functions into more functional components - Issue pokt-network#508 (pokt-network#652) [Persistence][Utility] Pools Address hack removal + state accessor fix for params and flags (pokt-network#654) [PERSISTENCE] SavePoints and Rollbacks design document (Issue pokt-network#493) (pokt-network#533) Update reviewpad.yml Added ChatGPT-CodeReview workflow (pokt-network#649) Update reviewpad.yml Added default reviewpad.yml file (pokt-network#648) [DevNet] tweaks for remote environments (pokt-network#601) [Documentation] Swap validator and non-validator triggers when finished synching (pokt-network#646) [Consensus] Configuration entry point state sync (pokt-network#528)
…p-modules * pokt/main: update pocket repo read.me (#667) Update reviewpad.yml [KEYBASE] Add improve comment on keybase config (#665) [E2E] Chore: Doc updates (#663) [E2E] Adds staking, unstaking, and sending tests (#653) [Utility][RPC][CLI] Querying governance parameters (Issue #619) (#622) [Persistence][Utility] Separate all CreateAndApply functions into more functional components - Issue #508 (#652) [Persistence][Utility] Pools Address hack removal + state accessor fix for params and flags (#654) [PERSISTENCE] SavePoints and Rollbacks design document (Issue #493) (#533) Update reviewpad.yml Added ChatGPT-CodeReview workflow (#649) Update reviewpad.yml Added default reviewpad.yml file (#648) [DevNet] tweaks for remote environments (#601) [Documentation] Swap validator and non-validator triggers when finished synching (#646) [Consensus] Configuration entry point state sync (#528)
…p-modules * pokt/main: update pocket repo read.me (#667) Update reviewpad.yml [KEYBASE] Add improve comment on keybase config (#665) [E2E] Chore: Doc updates (#663) [E2E] Adds staking, unstaking, and sending tests (#653) [Utility][RPC][CLI] Querying governance parameters (Issue #619) (#622) [Persistence][Utility] Separate all CreateAndApply functions into more functional components - Issue #508 (#652) [Persistence][Utility] Pools Address hack removal + state accessor fix for params and flags (#654) [PERSISTENCE] SavePoints and Rollbacks design document (Issue #493) (#533) Update reviewpad.yml Added ChatGPT-CodeReview workflow (#649) Update reviewpad.yml Added default reviewpad.yml file (#648) [DevNet] tweaks for remote environments (#601) [Documentation] Swap validator and non-validator triggers when finished synching (#646) [Consensus] Configuration entry point state sync (#528)


Description
This PR adds the
SAVEPOINTS_ROLLBACKS.mddocument that represents a design proposal and an implementation guideline.Issue
Fixes #493
Type of change
Please mark the relevant option(s):
List of changes
Testing
N/A
Required Checklist
If Applicable Checklist
shared/docs/*if I updatedshared/*README(s)