Skip to content

[Persistence][Core][Savepoints/Rollbacks] Implement KISS Rollbacks - Deserialize WorldState to Persistence #435

@jessicadaugherty

Description

@jessicadaugherty

Objective

Using the Save Points created from TODOs and placeholders throughout the persistence module codebase in #327, implement Rollbacks to the Save Points.

Revisit the TODOes below (might not exist anymore at this point, leaving as is only for reference).

Particularly, deserializing the WorldState to Persistence means that the version of the "World State" tracked by UtilityUnitOfWork can be applied to Persistence

Note: This ticket is dependent on #564 and previous deliverables

Origin Document

#562 , also, borrowing from the previous scoping, which should still apply:

There are various situations that require an ephemeral state to be rolled back:

  • Failure of block application
  • Pacemaker skips to the next step
  • Node crashed mid-round
  • Etc...

For this reason, there needs to be a smooth & clean mechanism to create save points and have rollbacks spanning across all the modules (consensus, utility, persistence, etc...).

The existing context management and empty interfaces are a starting point but must be implemented.

For example, in persistence/context.go, the following functions exist:

func (p PostgresContext) NewSavePoint(bytes []byte) error {
	log.Println("TODO: NewSavePoint not implemented")
	return nil
}

func (p PostgresContext) RollbackToSavePoint(bytes []byte) error {
	log.Println("TODO: RollbackToSavePoint not fully implemented")
	return p.GetTx().Rollback(context.TODO())
}

In consensus/hotstuff_replica.go, the following needs to be verified:

	txResults, err := m.applyBlock(block)
	if err != nil {
		m.nodeLogError(typesCons.ErrApplyBlock.Error(), err)
		m.paceMaker.InterruptRound()
		return
	}

Goals

  • Identify all points in the persistence module where save points have been implemented
  • Implement the missing logic to rollback to a safe and recoverable state

Deliverable

  • A PR with implementation tending to the tests above
  • Deterministic unit tests to verify the logic above
  • Fuzzed tests with some chaos to stress test this business logic

Non-goals / Non-deliverables

  • An end-to-end testing/automation framework

General issue deliverables

  • Update the appropriate CHANGELOG
  • Update any relevant READMEs (local and/or global)
  • Update any relevant global documentation & references

Testing Methodology

  • All tests: make test_all
  • LocalNet: verify a LocalNet is still functioning correctly by following the instructions at docs/development/README.md

Creator: @jessicadaugherty - rescope: @deblasis
Co-creator: @Olshansk

Metadata

Metadata

Assignees

Labels

code healthNice to have code improvementcoreCore infrastructure - protocol relatedpersistencePersistence specific changes

Type

No type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions