Skip to content

Panic: tocommit(...) out of range [lastIndex(0)] on fresh Learner join after member remove/add #21178

@amolmishra23

Description

@amolmishra23

Bug report criteria

What happened?

We are observing a consistent panic when a node that previously served as a Voting Member is re-added to the cluster as a Learner Member after a failure and a data directory reset.

The Workflow:

  1. Initial State: Healthy cluster with 3 Voting Members (A, B, C) and 1 Learner Member (D).
  2. Failure: Member A is forced offline (binary renamed to etcd.bak).
  3. Recovery Phase 1: Our operator detects A is unhealthy and executes etcdctl member remove <ID_of_A>.
  4. Promotion: Per etcd behavior, the Learner (D) is automatically promoted to a Voting Member. The cluster now consists of B, C, and D as voters.
  5. Recovery Phase 2 (The Backfill): We restore the binary on the node where A resided. Our workflow attempts to add this node back as a Learner to maintain the desired cluster shape.
  6. The Crash: We wipe the data directory of the old member A, perform etcdctl member add --learner with a new name/ID, and start the process.

Result: The new process immediately panics upon receiving its first heartbeat from the leader:
panic: tocommit(5099750) is out of range [lastIndex(0)].

What did you expect to happen?

The node that previously held the identity of Member A should be able to join as a brand-new Learner with a clean slate (index 0) and synchronize from the current leader without a range panic.

How can we reproduce it (as minimally and precisely as possible)?

  1. Start a cluster with 3 Voters and 1 Learner.
  2. Kill one Voter.
  3. Remove the dead Voter from the member list.
  4. Verify the Learner has been promoted to a Voter.
  5. On the node of the original dead Voter:
  • Wipe the data directory.
  • Add it back to the cluster using etcdctl member add --learner.
  • Start the etcd process with --initial-cluster-state existing.
  1. Observe the panic on the new Learner.

Anything else we need to know?

  • Strictly Learner Issue: If we try to add this node back as a Voter instead of a Learner, the issue does not occur.
  • ID & Name: The re-added member has a fresh Member ID and a new unique name, but it reuses the Peer URL of the original Voter A.
  • mTLS: The cluster uses mTLS for all peer and client communications.
  • Member List Confirmation: etcdctl member list shows the cluster is healthy with 3 voters before we attempt to add the learner back.

Etcd version (please run commands below)

etcd Version: 3.5.25
Git SHA: e2eff77
Go Version: go1.24.10
Go OS/Arch: linux/amd64


etcdctl version: 3.5.25
API version: 3.5

Etcd configuration (command line flags or environment variables)

[user@node-name ~]$ cat /path/to/etcd/etcd.conf
advertise-client-urls: https://NODE_ID_1:2379
auto-compaction-mode: periodic
auto-compaction-retention: "1"
cipher-suites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
client-transport-security:
  auto-tls: false
  cert-file: /path/to/spire/etcd/svid.pem
  client-cert-auth: true
  client-cert-file: /path/to/spire/etcd/svid.pem
  client-key-file: /path/to/spire/etcd/svid_key.pem
  key-file: /path/to/spire/etcd/svid_key.pem
  trusted-ca-file: /path/to/spire/etcd/ca.crt
data-dir: /path/to/etcd/NODE_ID_1-INDEX
initial-advertise-peer-urls: https://NODE_ID_1:2380
initial-cluster: NODE_ID_2=https://NODE_ID_2:2380,NODE_ID_3=https://NODE_ID_3:2380,NODE_ID_4=https://NODE_ID_4:2380,NODE_ID_1=https://NODE_ID_1:2380
initial-cluster-state: existing
initial-cluster-token: "TOKEN_HASH"
listen-client-urls: https://PRIVATE_IP:2379
listen-peer-urls: https://PRIVATE_IP:2380
name: NODE_ID_1-INDEX
peer-transport-security:
  auto-tls: false
  cert-file: /path/to/spire/etcd/svid.pem
  client-cert-auth: true
  client-cert-file: /path/to/spire/etcd/svid.pem
  client-key-file: /path/to/spire/etcd/svid_key.pem
  key-file: /path/to/spire/etcd/svid_key.pem
  trusted-ca-file: /path/to/spire/etcd/ca.crt
strict-reconfig-check: true
tls-max-version: TLS1.3
tls-min-version: TLS1.2

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Details
[user@node-name ~]$  etcdctl member list
5996a09fdc632a12, started, 181140266914826-159, https://181140266914826:2380, https://181140266914826:2379, false
5cda505f2ab9e012, started, 181140269087284-164, https://181140269087284:2380, https://181140269087284:2379, true
d095159cb78e6c13, started, 181140264581536-3, https://181140264581536:2380, https://181140264581536:2379, false
d62fc83d0070fce5, started, 181140266590820-4, https://181140266590820:2380, https://181140266590820:2379, false

(Note: 5cda505f2ab9e012 is the new Learner that is panicking).

[user@node-name ~]$ etcdctl endpoint status
{"level":"warn","ts":"2026-01-21T00:36:01.419001-0800","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00063c3c0/181140264581536:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing: dial tcp 10.14.19.214:2379: connect: connection refused\""}
Failed to get the status of endpoint https://181140269087284:2379 (context deadline exceeded)
https://181140264581536:2379, d095159cb78e6c13, 3.5.25, 44 MB, false, false, 9, 5165011, 5165011,
https://181140266590820:2379, d62fc83d0070fce5, 3.5.25, 44 MB, false, false, 9, 5165011, 5165011,
https://181140266914826:2379, 5996a09fdc632a12, 3.5.25, 44 MB, true, false, 9, 5165011, 5165011,
Error: exit status 1
[cohesity@sac01-haswell17-bqkp91500051-node-3 ~]$

Relevant log output

TLDR

Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.062757-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft 5cda505f2ab9e012 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.090441-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"5cda505f2ab9e012 [term: 0] received a MsgHeartbeat message with higher term from 5996a09fdc632a12 [term: 9]"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"panic","ts":"2026-01-20T22:49:36.090505-0800","logger":"raft","caller":"etcdserver/zap_raft.go:101","msg":"tocommit(5099750) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?"}


Complete etcd logs: 

Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.062749-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"5cda505f2ab9e012 became follower at term 0"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.062757-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"newRaft 5cda505f2ab9e012 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"warn","ts":"2026-01-20T22:49:36.062990-0800","caller":"auth/store.go:1241","msg":"simple token is not cryptographically signed"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.063968-0800","caller":"mvcc/kvstore.go:425","msg":"kvstore restored","current-rev":1}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.064040-0800","caller":"etcdserver/server.go:628","msg":"restore consistentIndex","index":0}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.064287-0800","caller":"etcdserver/quota.go:94","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.064966-0800","caller":"etcdserver/server.go:875","msg":"starting etcd server","local-member-id":"5cda505f2ab9e012","local-server-version":"3.5.25","cluster-version":"to_be_decided"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.065021-0800","caller":"etcdserver/server.go:775","msg":"starting initial election tick advance","election-ticks":10}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.065091-0800","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/data/etcd/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.065161-0800","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/data/etcd/member/snap","suffix":"snap","max":5,"interval":"30s"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.065174-0800","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/data/etcd/member/wal","suffix":"wal","max":5,"interval":"30s"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.065649-0800","caller":"v3rpc/health.go:61","msg":"grpc service status changed","service":"","status":"SERVING"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.067347-0800","caller":"embed/etcd.go:762","msg":"starting with client TLS","tls-info":"cert = /etc/spire/etcd/svid.pem, key = /etc/spire/etcd/svid_key.pem, client-cert=/etc/spire/etcd/svid.pem, client-key=/etc/spire/etcd/svid_key.pem, trusted-ca = /etc/spire/etcd/ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":["TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256","TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256","TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384","TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256"]}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.067464-0800","caller":"embed/etcd.go:633","msg":"serving peer traffic","address":"10.0.0.1:25688"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.067495-0800","caller":"embed/etcd.go:603","msg":"cmux::serve","address":"10.0.0.1:25688"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.067580-0800","caller":"embed/etcd.go:292","msg":"now serving peer/client/metrics","local-member-id":"5cda505f2ab9e012","initial-advertise-peer-urls":["https://etcd-node-1:25688"],"listen-peer-urls":["https://10.0.0.1:25688"],"advertise-client-urls":["https://etcd-node-1:25687"],"listen-client-urls":["https://10.0.0.1:25687"],"listen-metrics-urls":[]}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.090257-0800","caller":"rafthttp/pipeline.go:72","msg":"started HTTP pipelining with remote peer","local-member-id":"5cda505f2ab9e012","remote-peer-id":"5996a09fdc632a12"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.090294-0800","caller":"rafthttp/transport.go:286","msg":"added new remote peer","local-member-id":"5cda505f2ab9e012","remote-peer-id":"5996a09fdc632a12","remote-peer-urls":["https://etcd-node-2:25688"]}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.090441-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"5cda505f2ab9e012 [term: 0] received a MsgHeartbeat message with higher term from 5996a09fdc632a12 [term: 9]"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"info","ts":"2026-01-20T22:49:36.090474-0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"5cda505f2ab9e012 became follower at term 9"}
Jan 20 22:49:36 node-3 etcd-member[859922]: {"level":"panic","ts":"2026-01-20T22:49:36.090505-0800","logger":"raft","caller":"etcdserver/zap_raft.go:101","msg":"tocommit(5099750) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*zapRaftLogger).Panicf\n\tgo.etcd.io/etcd/server/v3/etcdserver/zap_raft.go:101\ngo.etcd.io/etcd/raft/v3.(*raftLog).commitTo\n\tgo.etcd.io/etcd/raft/[email protected]/log.go:237\ngo.etcd.io/etcd/raft/v3.(*raft).handleHeartbeat\n\tgo.etcd.io/etcd/raft/[email protected]/raft.go:1508\ngo.etcd.io/etcd/raft/v3.stepFollower\n\tgo.etcd.io/etcd/raft/[email protected]/raft.go:1434\ngo.etcd.io/etcd/raft/v3.(*raft).Step\n\tgo.etcd.io/etcd/raft/[email protected]/raft.go:975\ngo.etcd.io/etcd/raft/v3.(*node).run\n\tgo.etcd.io/etcd/raft/[email protected]/node.go:356"}
Jan 20 22:49:36 node-3 etcd-member[859922]: panic: tocommit(5099750) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?
Jan 20 22:49:36 node-3 etcd-member[859922]: goroutine 168 [running]:
Jan 20 22:49:36 node-3 etcd-member[859922]: go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00054a000, {0x0, 0x0, 0x0})
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.uber.org/[email protected]/zapcore/entry.go:234 +0x2dc
Jan 20 22:49:36 node-3 etcd-member[859922]: go.uber.org/zap.(*SugaredLogger).log(0xc0001a02a8, 0x4, {0x1263b0f?, 0x47627b?}, {0xc00053c0c0?, 0x1005680?, 0xc000139000?}, {0x0, 0x0, 0x0})
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.uber.org/[email protected]/sugar.go:227 +0xec
Jan 20 22:49:36 node-3 etcd-member[859922]: go.uber.org/zap.(*SugaredLogger).Panicf(...)
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.uber.org/[email protected]/sugar.go:159
Jan 20 22:49:36 node-3 etcd-member[859922]: go.etcd.io/etcd/server/v3/etcdserver.(*zapRaftLogger).Panicf(0x4dd0e6?, {0x1263b0f?, 0xc000139160?}, {0xc00053c0c0?, 0xc00000ef60?, 0x11af300?})
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.etcd.io/etcd/server/v3/etcdserver/zap_raft.go:101 +0x45
Jan 20 22:49:36 node-3 etcd-member[859922]: go.etcd.io/etcd/raft/v3.(*raftLog).commitTo(0xc00037e8c0, 0x4dd0e6)
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.etcd.io/etcd/raft/[email protected]/log.go:237 +0xf2
Jan 20 22:49:36 node-3 etcd-member[859922]: go.etcd.io/etcd/raft/v3.(*raft).handleHeartbeat(_, {0x8, 0x5cda505f2ab9e012, 0x5996a09fdc632a12, 0x9, 0x0, 0x0, {0x0, 0x0, 0x0}, ...})
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.etcd.io/etcd/raft/[email protected]/raft.go:1508 +0x36
Jan 20 22:49:36 node-3 etcd-member[859922]: go.etcd.io/etcd/raft/v3.stepFollower(_, {0x8, 0x5cda505f2ab9e012, 0x5996a09fdc632a12, 0x9, 0x0, 0x0, {0x0, 0x0, 0x0}, ...})
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.etcd.io/etcd/raft/[email protected]/raft.go:1434 +0x3b8
Jan 20 22:49:36 node-3 etcd-member[859922]: go.etcd.io/etcd/raft/v3.(*raft).Step(_, {0x8, 0x5cda505f2ab9e012, 0x5996a09fdc632a12, 0x9, 0x0, 0x0, {0x0, 0x0, 0x0}, ...})
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.etcd.io/etcd/raft/[email protected]/raft.go:975 +0x1295
Jan 20 22:49:36 node-3 etcd-member[859922]: go.etcd.io/etcd/raft/v3.(*node).run(0xc000183920)
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.etcd.io/etcd/raft/[email protected]/node.go:356 +0x925
Jan 20 22:49:36 node-3 etcd-member[859922]: created by go.etcd.io/etcd/raft/v3.RestartNode in goroutine 1
Jan 20 22:49:36 node-3 etcd-member[859922]:         go.etcd.io/etcd/raft/[email protected]/node.go:244 +0x239
Jan 20 22:49:36 node-3 systemd[1]: etcd-member.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions