Skip to content

UnsafeRange panicking during shutdown #17223

@ahrtr

Description

@ahrtr

Bug report criteria

What happened?

Test case TestMaintenanceSnapshotCancel failed and panicking.

Refer to https://github.com/etcd-io/etcd/actions/runs/7463174417/job/20307221683?pr=17220

Based on the log, the reason should be that the backend has already been closed (the member is being stopped) before the snapshot operation,

b.batchTx.Commit()

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xa6c27c]

goroutine 19029 [running]:
go.etcd.io/bbolt.(*Cursor).seek(0xc00044a578, {0x1aa05d4, 0x4, 0x4})
	/home/runner/go/pkg/mod/go.etcd.io/[email protected]/cursor.go:159 +0x7c
go.etcd.io/bbolt.(*Bucket).Bucket(0xc0009da638, {0x1aa05d4, 0x4, 0x4})
	/home/runner/go/pkg/mod/go.etcd.io/[email protected]/bucket.go:105 +0x10c
go.etcd.io/bbolt.(*Tx).Bucket(...)
	/home/runner/go/pkg/mod/go.etcd.io/[email protected]/tx.go:102
go.etcd.io/etcd/server/v3/storage/backend.(*batchTx).UnsafeRange(0xc0017a7da0, {0x12e17b0, 0x1b0dc40}, {0x1aa1670, 0x10, 0x10}, {0x0, 0x0, 0x0}, 0xc4798?)
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/backend/batch_tx.go:174 +0xa0
go.etcd.io/etcd/server/v3/storage/schema.UnsafeReadConsistentIndex({0xffff39290758, 0xc0017a7da0})
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/schema/cindex.go:41 +0xa0
go.etcd.io/etcd/server/v3/storage/schema.unsafeUpdateConsistentIndex.func1()
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/schema/cindex.go:80 +0x54
go.etcd.io/etcd/client/pkg/v3/verify.Verify(0xc00044aa28)
	/home/runner/actions-runner/_work/etcd/etcd/client/pkg/verify/verify.go:71 +0x44
go.etcd.io/etcd/server/v3/storage/schema.unsafeUpdateConsistentIndex({0x12e5920, 0xc0017a7da0}, 0x7, 0x2, 0x0)
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/schema/cindex.go:79 +0x1d4
go.etcd.io/etcd/server/v3/storage/schema.UnsafeUpdateConsistentIndex(...)
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/schema/cindex.go:67
go.etcd.io/etcd/server/v3/etcdserver/cindex.(*consistentIndex).UnsafeSave(0xc001727800, {0x12e5920, 0xc0017a7da0})
	/home/runner/actions-runner/_work/etcd/etcd/server/etcdserver/cindex/cindex.go:121 +0x68
go.etcd.io/etcd/server/v3/storage.(*BackendHooks).OnPreCommitUnsafe(0xc001a0d9e0, {0x12e5920?, 0xc0017a7da0})
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/hooks.go:45 +0x64
go.etcd.io/etcd/server/v3/storage/backend.(*batchTxBuffered).unsafeCommit(0xc0017a7da0, 0x0)
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/backend/batch_tx.go:342 +0xfc
go.etcd.io/etcd/server/v3/storage/backend.(*batchTxBuffered).commit(0xc0017a7da0, 0xdc?)
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/backend/batch_tx.go:335 +0x70
go.etcd.io/etcd/server/v3/storage/backend.(*batchTxBuffered).Commit(0xc0017a7da0)
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/backend/batch_tx.go:322 +0x3c
go.etcd.io/etcd/server/v3/storage/backend.(*backend).Snapshot(0xc000329140)
	/home/runner/actions-runner/_work/etcd/etcd/server/storage/backend/backend.go:331 +0x54
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*maintenanceServer).Snapshot(0xc0029ace40, 0x12e1120?, {0x12e6550, 0xc0006302b0})
	/home/runner/actions-runner/_work/etcd/etcd/server/etcdserver/api/v3rpc/maintenance.go:111 +0x110
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*authMaintenanceServer).Snapshot(0xc0019d7b60, 0x103b4c0?, {0x12e6550, 0xc0006302b0})
	/home/runner/actions-runner/_work/etcd/etcd/server/etcdserver/api/v3rpc/maintenance.go:296 +0xb8
go.etcd.io/etcd/api/v3/etcdserverpb._Maintenance_Snapshot_Handler({0x1004a40?, 0xc0019d7b60}, {0x12e3fc8, 0xc0028e4c00})
	/home/runner/actions-runner/_work/etcd/etcd/api/etcdserverpb/rpc.pb.go:7620 +0xe8
github.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).StreamServerInterceptor.func4({0x1004a40, 0xc0019d7b60}, {0x12e3d80?, 0xc0005681e0}, 0xc0028e4bd0, 0x111e8a8)
	/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:121 +0x128
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.Server.ChainStreamServer.func7.1.1({0x1004a40, 0xc0019d7b60}, {0x12e3d80, 0xc0005681e0})
	/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:49 +0x70
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.newStreamInterceptor.func1({0x1004a40, 0xc0019d7b60}, {0x12e3d80, 0xc0005681e0}, 0xc0028e4bd0, 0xc001992240)
	/home/runner/actions-runner/_work/etcd/etcd/server/etcdserver/api/v3rpc/interceptor.go:258 +0x560
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.Server.ChainStreamServer.func7.1.1({0x1004a40, 0xc0019d7b60}, {0x12e3d80, 0xc0005681e0})
	/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:49 +0x70
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.Server.ChainStreamServer.func7({0x1004a40, 0xc0019d7b60}, {0x12e3d80, 0xc0005681e0}, 0xc0028e4bd0, 0x111e8a8)
	/home/runner/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:58 +0x100
google.golang.org/grpc.(*Server).processStreamingRPC(0xc00162c400, {0x12e1120, 0xc00185e870}, {0x12e8340, 0xc000753380}, 0xc00[284](https://github.com/etcd-io/etcd/actions/runs/7463174417/job/20307221683?pr=17220#step:5:285)6b40, 0xc0028ac2d0, 0x1b0c680, 0x0)
	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1673 +0x16b8
google.golang.org/grpc.(*Server).handleStream(0xc00162c400, {0x12e8340, 0xc000753380}, 0xc002846b40)
	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1787 +0x12f0
google.golang.org/grpc.(*Server).serveStreams.func2.1()
	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1016 +0xa0
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 19027
	/home/runner/go/pkg/mod/google.golang.org/[email protected]/server.go:1027 +0x1d8

What did you expect to happen?

No panicking on processing any client requests

How can we reproduce it (as minimally and precisely as possible)?

Write an integration test to stop a member before call the snapshot api.

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
# paste output here

$ etcdctl version
# paste output here

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions