go/oasis-node/cmd/storage: Add create and import checkpoint cmd by martintomazic · Pull Request #6454 · oasisprotocol/oasis-core

martintomazic · 2026-02-07T12:30:22Z

Write our own version of BootstrapState
- The node creating checkpoint can dump untrusted metadata, that will be used for initializing cometbft stores after importing the checkpoint.
- We could also make it trustless, if we want to get rid of the snapshots entirely and only store checkpoints + storage diffs and blocks. :)
Fix TODOs depending on the review feedback.

Optional:
~~Add another command that uses the logic from those two commands, e.g. storage reset --height to enable snapshot creation with exact start height:~~

Find corresponding rounds for all configured runtimes.
Create consensus checkpoint and runtime checkpoints for all rounds previously obtained.
Clear consensus and runtime data.
Import checkpoints + bootstrap cometbft stores from the imported consensus checkpoint.
I would prefer to do this manually/using bash command for now.

netlify · 2026-02-11T09:00:30Z

✅ Deploy Preview for oasisprotocol-oasis-core canceled.

Name	Link
🔨 Latest commit	`06b5cc4`
🔍 Latest deploy log	https://app.netlify.com/projects/oasisprotocol-oasis-core/deploys/699dc474bc897e00087ecdcb

go/oasis-node/cmd/storage/checkpoint.go

martintomazic · 2026-02-11T09:10:39Z

go/oasis-node/cmd/storage/checkpoint.go

+// bootstrapTrustedState synchronizes the cometbft databases after the state sync
+// has been performed offline.
+//
+// It is expected that the block store and state store are empty at the time the
+// function is called.
+//
+// Adapted from https://github.com/oasisprotocol/cometbft/blob/08e22df73d354512fc27bd0c5731b3dcf1f8fef7/node/node.go#L198.
+func bootstrapTrustedState(ctx context.Context, dataDir string, meta bootstrapMeta) error {


Might be reasonable to write this function next to BootstrapState in the oasis-cometbft...

Pros: Logically fits there + less things to import.
Cons: You can change things here, if we move it there you need to bump cometbft which is not that practical.

martintomazic · 2026-02-11T09:13:11Z

go/oasis-node/cmd/storage/checkpoint.go

+	currentMeta := blockStore.LoadBlockMeta(h + 1)
+	if currentMeta == nil {
+		return cmtState.State{}, fmt.Errorf("block meta not found at height %d", h+1)
+	}


This follows upstream where current is h+1. In practice (confirmed) if you do state sync, either managed by cometbft or using the new import command, the blockstore will start from h+1 then, meaning you shadowed start height.

~~I am inclined to deviate here and use h-1, h and h+1 to work around this.~~

martintomazic · 2026-02-11T09:17:51Z

Works! :)

The only thing that is impractical is finding corresponding runtime rounds to given consensus height and the fact that bootstrap "eats" one height as described.

Finally, one should be very careful with creation/import height/rounds so that you have all relevant light history for the runtime checkpoints you are importing.

martintomazic · 2026-02-11T14:18:01Z

go/oasis-node/cmd/storage/checkpoint.go

+			if height != 0 { // TODO handle zero value vs not set correctly.
+				if err := createConsensusCp(); err != nil {
+					return fmt.Errorf("failed to create consensus checkpoint (height: %d): %w", height, err)


Maybe use default undefined round (aka max uint64), alternative is cmd.Flags().Changed("height").

Update: Alternative is explicit --consensus flag or possible consensus/runtime sub-commands. No height/round could also mean latest height. -all flag with --height would be also interesting if it would find corresponding runtime rounds for the given height.

Currently things are not fine, as you can do consensus and runtime checkpoints at the same time. Furthermore, this might be confusing for users, e.g.., do they need to set height, round, both?

Yes I left it intentional for now. I can easily allow only one at the time. The question is would using sub-commands make things even clearer? Also any preference for what should omitting height/round do?

Observe also that if you want to be able to create checkpoints for multiple heights/versions (same NodeDB) the import command also grows in complexity.

Finally, one should be very careful for which height to specific runtime round combinations you are creating as you can easily get left missing missing runtime light blocks, and therefore stuck runtime state restore.

We could also make import command use consensus/runtime/height/round and import the actual directories created one by one to make it symmetric to create.

Any pref?

martintomazic · 2026-02-11T14:29:20Z

Creating checkpoints from the penultimate snapshot, is dominated by the Sapphire checkpoint creation.

With 6 chunker threads current projection is 5-7 hours (will update). Import is a matter of minutes.

martintomazic · 2026-02-22T23:04:42Z

Added unit and e2e tests, fixed empty state corner case and improved code quality.

Two minor things left to discuss:

Exact API of this command / default values actions: see go/oasis-node/cmd/storage: Add create and import checkpoint cmd #6454 (comment)
Consider refactoring checkpoint abstraction as proposed: Improve storage.mkvs.checkpoint abstractions #6467 (follow-up).

codecov · 2026-02-22T23:14:32Z

Codecov Report

❌ Patch coverage is 59.29204% with 138 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.56%. Comparing base (c9a4b8e) to head (06b5cc4).

Files with missing lines	Patch %	Lines
go/oasis-node/cmd/storage/checkpoint.go	58.43%	73 Missing and 65 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6454      +/-   ##
==========================================
- Coverage   64.73%   64.56%   -0.18%     
==========================================
  Files         699      700       +1     
  Lines       68246    68581     +335     
==========================================
+ Hits        44179    44279     +100     
- Misses      19060    19183     +123     
- Partials     5007     5119     +112

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

peternose

When I import a consensus checkpoint, I get few lines of the following error. Afterwards, blocks execute normally.

{"caller":"grpc.go:194","err":"failed to get consensus status: failed to fetch current block: cometbft: block query failed: height 28800866 must be less than or equal to the current blockchain height 0","level":"error","method":"/oasis-core.NodeController/GetStatus","module":"grpc/internal","msg":"request failed","req_seq":15,"ts":"2026-02-24T13:00:28.934344662Z"}

go/oasis-node/cmd/storage/checkpoint.go

peternose · 2026-02-24T11:00:18Z

go/oasis-node/cmd/storage/checkpoint.go

+	}
+	earliest := ndb.GetEarliestVersion()
+	if version < earliest || version > latest {
+		return fmt.Errorf("version not finalized (finalized range: %d-%d)", earliest, latest)


Suggested change

return fmt.Errorf("version not finalized (finalized range: %d-%d)", earliest, latest)

return fmt.Errorf("version not found")

Go style discourages opaque errors.

I believe good error trace gives you context to debug the issues just looking at the error trace. E.g. adding a range here, makes the root command return the exact issue to the caller, saving him debug time and calling other commands such as inspect to find a valid range.

go/oasis-node/cmd/storage/checkpoint.go

peternose · 2026-02-24T13:21:40Z

go/oasis-node/cmd/storage/checkpoint_test.go

+		})
+		require.NoError(t, ndb.Finalize([]node.Root{stateRoot}))
+
+		outDir := filepath.Join(t.TempDir(), "checkpoints")


You should use a constant for directory name.

In the target test, with a short scope, where in-lining reads nicer?

peternose · 2026-02-24T13:26:07Z

go/oasis-node/cmd/storage/checkpoint_test.go

+		require.NoError(t, os.WriteFile(filepath.Join(outDir, "existing"), []byte{}, 0o600))
+
+		err = createCheckpoints(ctx, ndb, ns, testVersion, outDir)
+		require.Error(t, err, "createCheckpoints should fail for non-empty output directory")


Suggested change

require.Error(t, err, "createCheckpoints should fail for non-empty output directory")

require.ErrorContains(t, err, "output directory is not empty")

You should check if tested function fails for the right reason.

Probably you mean this for other tests as well? In general I wanted to avoid it as errors are not part of the public API, but then yes the test can fail for a wrong reason. Will add ErrorContains as you suggest.

peternose · 2026-02-24T13:36:16Z

go/oasis-node/cmd/storage/checkpoint.go

+	return cmd
+}
+
+func createCheckpoints(ctx context.Context, ndb api.NodeDB, ns common.Namespace, version uint64, outputDir string) error {


Maybe creating a struct checkpointer would be better, as you could create multiple checkpoints with the same struct, e.g.

cp.Create(ctx, 1, "/checkpoints/1") cp.Create(ctx, 2, "/checkpoints/2") ...

or even without outputDir if accepted in the constructor. The new struct might also be easier to test and could be decoupled from the commands.

update:

The new struct might also be easier to test and could be decoupled from the commands.

newCheckpointer + cp.create (proposed) = createCheckpoints (current) so this is only a matter of style.

As usual I prefer an explicit function over abstractions until multiple methods share same parameter set.

The question is do we want to allow multiple checkpoint heights/rounds for the same NodeDB.

If you want to make this generic helper I think this would fit inside checkpoint package.
See ( #6467):

// Consider using functional optional arguments to shorten args list. CreateCheckpoint(ctx context.Context, ndb db.NodeDB, store Store, root node.Root, chunkSize uint64, chunkerThreads uint16) (*Metadata, error) // Maybe add CreateAllCheckpoints as well and avoid passing root there.

peternose · 2026-02-24T13:40:02Z

go/oasis-test-runner/scenario/e2e/runtime/checkpoint_create_import.go

+)
+
+// CheckpointCreateImport is the checkpoint create/import e2e scenario.
+var CheckpointCreateImport scenario.Scenario = newCheckpointCreateImportImpl()


One could argue that this functionality should not be tested with E2E tests as a bug in the functions doesn't affect nodes running the buggy version of the oasis-node.

I see your point but also observe this is still part of the oasis-node binary.

As pointed in the commit message introducing this I would prefer to:

Furthermore, given that e2e tests are expensive and meant to
test complex scenarios, my suggestions would be to also run
prune, compact, and inspect command on the source node prior
to creating a checkpoint. This way we would "smoke test"
remaining storage commands, and the scenario could be called
storaged_cmds instead

So e2e test for complex scenarios + smoke testing command wiring. Everything else should be tested via unit+integration.

With the new new*Cmd pattern you can also easily test cmd argument validation/combinations. The reason why we need e2e here is 1. smoke testing wiring of the existing storage commands + boostrap cometbft DBs would be very complex to mock and test otherwise.

Update: We could also make e2e tests like this only run prior to release.

peternose · 2026-02-24T13:44:02Z

go/oasis-node/cmd/storage/checkpoint.go

+			if height != 0 { // TODO handle zero value vs not set correctly.
+				if err := createConsensusCp(); err != nil {
+					return fmt.Errorf("failed to create consensus checkpoint (height: %d): %w", height, err)


Currently things are not fine, as you can do consensus and runtime checkpoints at the same time. Furthermore, this might be confusing for users, e.g.., do they need to set height, round, both?

The test should be ideally hardened by also making sure the target node also syncs up to the tip of the runtime chain and not just consensus. Furthermore, given that e2e tests are expensive and meant to test complex scenarios, my suggestions would be to also run prune, compact, and inspect command on the source node prior to creating a checkpoint. This way we would "smoke test" remaining storage commands, and the scenario could be called storaged_cmds instead.

martintomazic force-pushed the martin/feature/storage-inspect-cmd branch from ea89ecc to c5e2f2a Compare February 9, 2026 10:33

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 9761a3c to 0bba8dd Compare February 9, 2026 23:08

martintomazic force-pushed the martin/feature/storage-inspect-cmd branch 2 times, most recently from fe09fe6 to f833d73 Compare February 10, 2026 00:00

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 0bba8dd to 41b49b4 Compare February 10, 2026 00:05

martintomazic force-pushed the martin/feature/storage-inspect-cmd branch from f833d73 to b47eb6c Compare February 10, 2026 14:35

Base automatically changed from martin/feature/storage-inspect-cmd to master February 10, 2026 21:53

martintomazic linked an issue Feb 10, 2026 that may be closed by this pull request

go/oasis-node: Enable snapshot creation with exact start version #6423

Open

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 41b49b4 to b31dfff Compare February 11, 2026 09:00

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from b31dfff to 744884b Compare February 11, 2026 09:04

martintomazic commented Feb 11, 2026

View reviewed changes

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch 3 times, most recently from ef92148 to a41d394 Compare February 11, 2026 14:12

martintomazic commented Feb 11, 2026

View reviewed changes

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from a41d394 to 206c70e Compare February 11, 2026 14:25

martintomazic marked this pull request as ready for review February 11, 2026 14:33

martintomazic requested review from kostko, peternose and ptrus as code owners February 11, 2026 14:33

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch 2 times, most recently from 7f2aac9 to 817bc76 Compare February 22, 2026 22:31

go/oasis-node/cmd/storage: Add create and import checkpoint cmd

c7fd7bd

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 817bc76 to 2be35e9 Compare February 22, 2026 22:38

peternose reviewed Feb 24, 2026

View reviewed changes

martintomazic added 2 commits February 24, 2026 16:31

Fixup: Fix minor stuff

ec94b4e

martintomazic force-pushed the martin/feature/create-checkpoint-cmd branch from 2be35e9 to 06b5cc4 Compare February 24, 2026 15:32

	return fmt.Errorf("version not finalized (finalized range: %d-%d)", earliest, latest)
	return fmt.Errorf("version not found")

	require.Error(t, err, "createCheckpoints should fail for non-empty output directory")
	require.ErrorContains(t, err, "output directory is not empty")

Conversation

martintomazic commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for oasisprotocol-oasis-core canceled.

Uh oh!

Uh oh!

martintomazic Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic commented Feb 11, 2026

Uh oh!

martintomazic Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martintomazic commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

peternose left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martintomazic Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

martintomazic commented Feb 7, 2026 •

edited

Loading

netlify bot commented Feb 11, 2026 •

edited

Loading

martintomazic Feb 11, 2026 •

edited

Loading

martintomazic Feb 11, 2026 •

edited

Loading

martintomazic Feb 11, 2026 •

edited

Loading

martintomazic commented Feb 11, 2026 •

edited

Loading

martintomazic commented Feb 22, 2026 •

edited

Loading

codecov bot commented Feb 22, 2026 •

edited

Loading

martintomazic Feb 24, 2026 •

edited

Loading

martintomazic Feb 24, 2026 •

edited

Loading