Skip to content

[PERSISTENCE] SavePoints and Rollbacks design document (Issue #493)#533

Merged
deblasis merged 18 commits intomainfrom
issue/493-savepoints-rollbacks-design
Apr 6, 2023
Merged

[PERSISTENCE] SavePoints and Rollbacks design document (Issue #493)#533
deblasis merged 18 commits intomainfrom
issue/493-savepoints-rollbacks-design

Conversation

@deblasis
Copy link
Contributor

Description

This PR adds the SAVEPOINTS_ROLLBACKS.md document that represents a design proposal and an implementation guideline.

Issue

Fixes #493

Type of change

Please mark the relevant option(s):

  • New feature, functionality or library
  • Bug fix
  • Code health or cleanup
  • Major breaking change
  • Documentation
  • Other

List of changes

  • Added design document

Testing

N/A

Required Checklist

  • I have performed a self-review of my own code
  • I have updated the corresponding CHANGELOG

If Applicable Checklist

  • I have updated the corresponding README(s); local and/or global
  • I have added tests that prove my fix is effective or that my feature works
  • I have added, or updated, mermaid.js diagrams in the corresponding README(s)
  • I have added, or updated, documentation and mermaid.js diagrams in shared/docs/* if I updated shared/*README(s)

@deblasis deblasis added documentation Improvements or additions to documentation core Core infrastructure - protocol related utility Utility specific changes persistence Persistence specific changes labels Feb 22, 2023
@deblasis deblasis self-assigned this Feb 22, 2023
@deblasis deblasis requested a review from Olshansk February 22, 2023 12:06
Copy link
Collaborator

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to finish the first round of review with some thoughts/comments/ideas. You've identified a couple pieces that we can potentially start implementing right away, so let's continue the conversation.

Keep throwing questions & ideas so we can iterate on it.

Two things:

  1. I got a little tired toward the end so I apologize if the quality of the comments decreased.
  2. In a couple places, I didn't go back to edit my previous comments, so some things may feel redundant.


### Further improvements

- Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, you really do look at savepoints as snapshots.

Do you see the solution for a node-specific savepoint (i.e. I updated one tree but not another and crashed) and a network wide (i.e. a data dir snapshot) as having the same solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can very much build on top of what I have in mind.

Similarly to what databases do, we'd need some data integrity check at startup that verifies that all the datastores are correct.

It can very much rely on the statehash I guess...

How?

KISS, back of the envelope implementation:

  • retrieve the latest valid statehash (from file, from SuperValidators, from P2P, TBD)
  • at startup check if it matches the current, computed state hash
  • if not, rollback to latest savepoint and sync from there

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retrieve the latest valid statehash (from file, from SuperValidators, from P2P, TBD)

This is what "weak subjectivity" is. From Vitalik's 2014 article

Weak subjectivity is exactly the correct solution. It solves the long-range problems with proof of stake by relying on human-driven social information, but leaves to a consensus algorithm the role of increasing the speed of consensus from many weeks to twelve seconds and of allowing the use of highly complex rulesets and a large state. T


if not, rollback to latest savepoint and sync from there
at startup check if it matches the current, computed state hash

I think there are implementation nuances here, but I think you've got it.


I think it's worth adding (in the moonshot section) that we can potentially use Mina to verify the snapshot hash using only the latest state and the genesis.json file

https://minaprotocol.com/lightweight-blockchain

Screenshot 2023-03-07 at 5 22 17 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are implementation nuances here, but I think you've got it.

Music to my ears.

I am going to read that article tonight. Thanks for sharing. 🙏

I think it's worth adding (in the moonshot section) that we can potentially use Mina to verify the snapshot hash using only the latest state and the genesis.json file

Added


- Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon.

For example: a fresh node could be looking for the latest `Savepoint` signed by PNI available, download it from Tor, apply its state and resume the normal sync process from the other nodes from there.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tor is great (and the right starting point) but long-term, try to also start thinking in terms of incentive: https://olshansky.medium.com/cryptocurrencies-its-all-about-incentive-77ac47a6adc4

  • Is there another protocol that incentives sharing snapshots?
  • Could/should we have another protocol actor for this?
  • Is it part of Pocket or external?

For example, Gohkan and I were talking about this yesterday: https://olshansky.medium.com/cryptocurrencies-its-all-about-incentive-77ac47a6adc4

We got the idea of having SuperValidators (i.e. higher stake, higher penalty, limited in number) that could help with bootstrapping, and also make the formal 2/3 agreement more flexible. Could be a HotPOKT innovation ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. Perhaps disseminating snapshots via Tor or other medium is something that I'd start doing as PNI (for example), then I'd look into offloading the responsibility to the protocol itself by creating the necessary incentives but first of all I'd like to see it working like a charm.

Since snapshots/rollback could be seen also as a "disaster recovery" tool as well, I'd advice to use something that's decoupled from Pocket, or at least that can be interacted with in other ways in the case of complete network halt.

I don't know why but I am thinking about Solana now :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a "FiremanBoostrapper" or "Node911" as a moonshot idea :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I'll leave that to you ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even an IPFS hash that's stored directly in the block header


## Random thoughts

- I wonder if serialized, compressed and signed `Merkle Patricia Trie`s could be leveraged as a media for storing `Savepoint`s in a space-efficient and "blockchain-native" way 🤔.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we're not using Etheruem's Merkle Patricia Trie, but a modified version of Libra's Jellyfish Merkle Tree implemented by Celestia.

The fact that we're using Trees, the root has a hash, and that hash is easily verifiable. If a node is synching from scratch, it's easy to verify the state transitions are correct. If a node is not synching from scratch, it needs to trust someone else. Mina protocol (which I don't really know) has a special property that if you have the genesis parameters (i.e. the json config) and ANY height, you can immediately (in O(1)) verify that it is correct. Might be worth looking into.

https://olshansky.substack.com/p/5p1r-ethereums-modified-merkle-patricia
https://olshansky.substack.com/p/5p1r-jellyfish-merkle-tree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was reading about Ethereum and I had a Freudian slip with Patricia, luckily Celestia is not jealous.... bad joke 😝

Noted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7dnl7m

@deblasis
Copy link
Contributor Author

deblasis commented Mar 1, 2023

@Olshansk I know you don't have a working laptop with you and I am not expecting an answer till you are back from Denver. Just an update:

I have addressed the comments and I am now going to rescope the related tickets to tackle the MVP scenario first. In the meantime I am working on #508 (keeping in mind the changes that will come with savepoints/rollbacks)

Perhaps I could create also tickets for the more advanced implementation but I would wait for this to be merged first.


- [**State changes invalidation and rollback triggering**] We need some sort of shared and thread-safe reference that is available for us across the whole call-stack that we can use in the event of a failure to flag that we need to abort whatever we are doing and rollback. This could be achieved via the use of the [context](https://pkg.go.dev/context) package.

- [**Ensure atomicity across data stores**] We need to make sure that we are using transactions correctly and that we are not accidentally committing state ahead of time in any of the data-stores.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexed may be the wrong term here but also related IMO.

Fwiw, companies like https://www.covalenthq.com/ are basically (I'm WAYYYY oversimplifying) huge relational DBs.

Copy link
Collaborator

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Olshansk I know you don't have a working laptop with you and I am not expecting an answer till you are back from Denver.

What a life, eh? 🤹


I have addressed the comments and I am now going to rescope the related tickets to tackle the MVP scenario first. In the meantime I am working on #508 (keeping in mind the changes that will come with savepoints/rollbacks)

Perhaps I could create also tickets for the more advanced implementation but I would wait for this to be merged first.

Discussed offline, but makes sense to me. Capturing some of the points

Between the

  • (this) #493
  • (this extra) Adding Context everywhere
  • (side quest) #508
  • (untriaged) #361
  • (legacy) #406

Checkout my responses to some of the comments. Hopefully both fun and educational.

Pretty sure you're deep in the implementation stage so I'm sure the ideas are flowing.


### Further improvements

- Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retrieve the latest valid statehash (from file, from SuperValidators, from P2P, TBD)

This is what "weak subjectivity" is. From Vitalik's 2014 article

Weak subjectivity is exactly the correct solution. It solves the long-range problems with proof of stake by relying on human-driven social information, but leaves to a consensus algorithm the role of increasing the speed of consensus from many weeks to twelve seconds and of allowing the use of highly complex rulesets and a large state. T


if not, rollback to latest savepoint and sync from there
at startup check if it matches the current, computed state hash

I think there are implementation nuances here, but I think you've got it.


I think it's worth adding (in the moonshot section) that we can potentially use Mina to verify the snapshot hash using only the latest state and the genesis.json file

https://minaprotocol.com/lightweight-blockchain

Screenshot 2023-03-07 at 5 22 17 PM


- Savepoints could be disseminated/retrieved using other networks like Tor/IPFS/etc., this would free-up bandwidth on the main network and could be used in conjunction with `FastSync` to speed up the process of bootstrapping a new node without overwhelming the `P2P` network with a lot of traffic that is not `Protocol`-related. This could be very important when the network reaches critical mass in terms of scale. Hopefully soon.

For example: a fresh node could be looking for the latest `Savepoint` signed by PNI available, download it from Tor, apply its state and resume the normal sync process from the other nodes from there.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a "FiremanBoostrapper" or "Node911" as a moonshot idea :)

@deblasis deblasis requested a review from Olshansk March 9, 2023 13:51
@gitguardian
Copy link

gitguardian bot commented Mar 9, 2023

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id Secret Commit Filename
5841025 Generic High Entropy Secret 8030dca build/config/genesis.json View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

@Olshansk
Copy link
Collaborator

@deblasis Going to hold off on reviewing this again until after our discussions next week.

Copy link
Collaborator

@Olshansk Olshansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to be figuring out implementation details along the way, but I think this is good to merge. Going to leave some of the comments unresolved because they're legendary :)

Just add a GITHUB_WIKI comment at the bottom before merging in, please!

@cr-gpt
Copy link

cr-gpt bot commented Apr 6, 2023

Seems you are using me but didn't get OPENAI_API_KEY seted in Variables for this repo. you could follow readme for more information

@reviewpad
Copy link

reviewpad bot commented Apr 6, 2023

AI-Generated Summary: This pull request includes the addition of a design document called SAVEPOINTS_ROLLBACKS.md that outlines the proposed implementation of savepoints and rollbacks in the Persistence module, while also considering the Utility module. The changes focus on improving data consistency, recovery from failures at any level, and ensuring atomicity across data stores.

The MVP for implementing savepoints and rollbacks consists of the following tasks:

  1. State changes invalidation and rollback triggering
  2. Ensuring atomicity across data stores
  3. Distributed commits across data stores
  4. Implementation of savepoints
  5. Implementation of rollbacks
  6. Extensive testing
  7. Tooling for creating savepoints and rollbacks via CLI

Long-term ideas for savepoints and rollbacks include snapshot hash verification using the genesis.json file and the Mina protocol, as well as improved savepoint artifacts and rollback mechanisms. Also, further improvements for data dissemination/retrieval and socially-coordinated rollbacks are outlined in the document.

@deblasis deblasis merged commit d680e41 into main Apr 6, 2023
bryanchriswhite added a commit that referenced this pull request Apr 10, 2023
* pokt/main:
  [Utility][RPC][CLI] Querying governance parameters (Issue #619) (#622)
  [Persistence][Utility] Separate all CreateAndApply functions into more functional components - Issue #508 (#652)
  [Persistence][Utility] Pools Address hack removal + state accessor fix for params and flags (#654)
  [PERSISTENCE] SavePoints and Rollbacks design document (Issue #493) (#533)
  Update reviewpad.yml
  Added ChatGPT-CodeReview workflow (#649)
  Update reviewpad.yml
  Added default reviewpad.yml file (#648)
  [DevNet] tweaks for remote environments (#601)
  [Documentation] Swap validator and non-validator triggers when finished synching (#646)
  [Consensus] Configuration entry point state sync (#528)
bryanchriswhite added a commit to bryanchriswhite/pocket that referenced this pull request Apr 10, 2023
…p-modules

* pokt/main:
  [Utility][RPC][CLI] Querying governance parameters (Issue pokt-network#619) (pokt-network#622)
  [Persistence][Utility] Separate all CreateAndApply functions into more functional components - Issue pokt-network#508 (pokt-network#652)
  [Persistence][Utility] Pools Address hack removal + state accessor fix for params and flags (pokt-network#654)
  [PERSISTENCE] SavePoints and Rollbacks design document (Issue pokt-network#493) (pokt-network#533)
  Update reviewpad.yml
  Added ChatGPT-CodeReview workflow (pokt-network#649)
  Update reviewpad.yml
  Added default reviewpad.yml file (pokt-network#648)
  [DevNet] tweaks for remote environments (pokt-network#601)
  [Documentation] Swap validator and non-validator triggers when finished synching (pokt-network#646)
  [Consensus] Configuration entry point state sync (pokt-network#528)
bryanchriswhite added a commit that referenced this pull request Apr 12, 2023
…p-modules

* pokt/main:
  update pocket repo read.me (#667)
  Update reviewpad.yml
  [KEYBASE] Add improve comment on keybase config (#665)
  [E2E] Chore: Doc updates (#663)
  [E2E] Adds staking, unstaking, and sending tests (#653)
  [Utility][RPC][CLI] Querying governance parameters (Issue #619) (#622)
  [Persistence][Utility] Separate all CreateAndApply functions into more functional components - Issue #508 (#652)
  [Persistence][Utility] Pools Address hack removal + state accessor fix for params and flags (#654)
  [PERSISTENCE] SavePoints and Rollbacks design document (Issue #493) (#533)
  Update reviewpad.yml
  Added ChatGPT-CodeReview workflow (#649)
  Update reviewpad.yml
  Added default reviewpad.yml file (#648)
  [DevNet] tweaks for remote environments (#601)
  [Documentation] Swap validator and non-validator triggers when finished synching (#646)
  [Consensus] Configuration entry point state sync (#528)
bryanchriswhite added a commit that referenced this pull request Apr 12, 2023
…p-modules

* pokt/main:
  update pocket repo read.me (#667)
  Update reviewpad.yml
  [KEYBASE] Add improve comment on keybase config (#665)
  [E2E] Chore: Doc updates (#663)
  [E2E] Adds staking, unstaking, and sending tests (#653)
  [Utility][RPC][CLI] Querying governance parameters (Issue #619) (#622)
  [Persistence][Utility] Separate all CreateAndApply functions into more functional components - Issue #508 (#652)
  [Persistence][Utility] Pools Address hack removal + state accessor fix for params and flags (#654)
  [PERSISTENCE] SavePoints and Rollbacks design document (Issue #493) (#533)
  Update reviewpad.yml
  Added ChatGPT-CodeReview workflow (#649)
  Update reviewpad.yml
  Added default reviewpad.yml file (#648)
  [DevNet] tweaks for remote environments (#601)
  [Documentation] Swap validator and non-validator triggers when finished synching (#646)
  [Consensus] Configuration entry point state sync (#528)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core infrastructure - protocol related documentation Improvements or additions to documentation persistence Persistence specific changes utility Utility specific changes waiting-for-review

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Core] Create design document for save points and rollbacks

2 participants