Bor sync wedges on empty block after producer restart — insertSideChain same-stateRoot false-positive
Summary
After restarting a Bor validator whose local head is an empty (no-transaction) block produced just before the restart, Bor can refuse to sync past that block, dropping every peer with:
WARN Sidechain ghost-state attack detected number=N sideroot=X canonroot=X
WARN Synchronisation failed, dropping peer err="retrieved hash chain is invalid: sidechain ghost-state attack" mode=full
sideroot and canonroot are the same value — the check at core/blockchain.go:3562 is firing on identical state roots. The upstream-inherited logic assumes same-root + different-hash implies a shadow-state attack, which is only true when the block carried a state transition. For empty blocks (gasUsed=0, transactions=[]) the state root is just the parent's, so two distinct empty headers at the same height legitimately share a state root while having different hashes/seals.
A node in this state cannot recover by reconnecting, restarting Bor, or by debug_setHead to a block immediately behind the offending height — the side-block record persists in the database and re-triggers the same check on the next sync attempt.
Environment
- Bor
v2.8.0-beta
- Devnet spawned via
kurtosis-pos, 9 L2 nodes:
- v1–v5: bor + heimdall-v2 validators
- v6–v8: bor RPC nodes
- v9: erigon RPC node
- Heimdall span at the time of the incident (span id 11, blocks 1408–1535) had a single entry in
selected_producers (val_id 3). Small validator set, weighted span selection — one validator can end up sole producer for a span.
Reproduction
- Spawn a devnet large enough that some spans have a single
selected_producers entry (the kurtosis-pos default tends to produce this with small validator counts).
- Wait for an active span where one validator (call it
vP) is the sole producer.
- Stop
vP's bor EL and a second validator's bor EL (vS) within a second of each other, with vS's last canonical head being an empty block recently produced by vP.
- After a few seconds, start both back up.
Expected: vS resyncs to chain head.
Observed: vS stays stuck at its last pre-stop block (block number N). docker logs for vS's bor container streams the warn-drop pair above on every peer it talks to, plus continuous "Whitelisting milestone deferred err=chain out of sync" from the heimdall ws subscription.
Concrete trace from our run
Captured artifacts: commands.out, chaos.out.txt, reports/test-20260513-131241-test-1778692361/. Key timeline:
| Time (EDT) |
Event |
| 13:19:04 |
Stop l2-el-2-bor-heimdall-v2-validator (v2). v2's local head: block 1441, produced by v3, stateRoot 0x8f58ec…2dcc11, transactions: [], gasUsed: 0. |
| 13:19:11 |
Stop l2-el-3-bor-heimdall-v2-validator (v3). v3 was the sole producer for span 11. |
| 13:22:31 |
Start v2. |
| 13:22:37 |
Start v3. v3 resumes producing 1442, 1443, … |
| 13:34:41 onward |
v2 logs Sidechain ghost-state attack detected number=1441 sideroot=8f58ec..2dcc11 canonroot=8f58ec..2dcc11 against every peer; drops them; stays stuck at 1441 while v1/v3/v4/v5 advance past 2100. |
Block 1441 on v2 (its local canonical):
number 1441
hash 0xa8397a7be178763f882deb7f893b28c326cc4ddf038296a073cc5e6ea597826e
parentHash 0xb1ba2c9854ea1d575173714ebcb730e0e4c6f385702b6cd81a00250d38a69f69
stateRoot 0x8f58ec83616416cdae4e71306e5ef9f7482facf9328957b8c28d6468332dcc11
transactions []
gasUsed 0
extraData 0x626f722d3300… ("bor-3" vanity, then seal)
The 1441 the rest of the cluster has at the same height shares parentHash, stateRoot, transactionsRoot, receiptsRoot — the state didn't change — but has a different timestamp and extraData seal (likely produced by a succession-1+ backup after v3 stopped), so its block hash differs.
Root cause
core/blockchain.go, insertSideChain (current develop):
// blockchain.go:3534
func (bc *BlockChain) insertSideChain(block *types.Block, it *insertIterator, makeWitness bool) (*stateless.Witness, int, error) {
...
for ; block != nil && errors.Is(err, consensus.ErrPrunedAncestor); block, err = it.next() {
headers = append(headers, block.Header())
if number := block.NumberU64(); current.Number.Uint64() >= number {
canonical := bc.GetBlockByNumber(number)
if canonical != nil && canonical.Hash() == block.Hash() {
// re-import of a canon block, fine
continue
}
if canonical != nil && canonical.Root() == block.Root() { // <-- false positive
log.Warn("Sidechain ghost-state attack detected", "number", block.NumberU64(),
"sideroot", block.Root(), "canonroot", canonical.Root())
return nil, it.index, errors.New("sidechain ghost-state attack")
}
}
...
}
This is upstream go-ethereum's pre-merge defense against attackers side-mining to a height where state was pruned and substituting their block to bypass full state verification. The premise is that a benign sidechain block at height N would produce a different state root from the canonical chain (because the txs at heights <= N differ across the two chains).
That premise doesn't hold for Bor:
- Bor produces empty blocks (no txs) during quiet periods. For an empty block, the post-state equals the pre-state —
block.Root() == parent.Root().
- Bor's sprint/succession model lets a backup producer sign a block at the same height as the primary if the primary is missing. Two empty headers sharing a parent and produced by different signers — or even the same signer with a different timestamp — share a state root while having different hashes/seals.
So same-root + different-hash is a normal, expected condition in Bor, not an attack signal. The check turns it into a hard sync failure that also blacklists the offering peer for that sync attempt; with every honest peer in the network offering the canonical chain that disagrees with the node's local stale head, the node has no way out.
Why debug_setHead to height N-2 did not recover the node
We tried debug_setHead 0x59f (=1439) on v2. Logs confirm the rewind:
WARN Rewinding blockchain to block target=1439
INFO Truncating from head type=state ohead=1442 tail=427 nhead=1441
INFO Rewound to block with state number=1440 hash=b1ba2c..a69f69
INFO Truncating from head type=state ohead=1441 tail=427 nhead=1440
INFO Rewound to block with state number=1439 hash=280067..874280
INFO Loaded most recent local block number=1439 hash=280067..874280 td=2316 age=47m16s
But the warn-drop loop continued. Best guess: the side-block record for the old 1441 hash remains in the block database after SetHead (only the canonical pointer was rewound and state was truncated), and the downloader-side handling re-imports it as a side chain on the next sync attempt before reconciling with peers, so insertSideChain keeps hitting the same condition. We had to rewind well past the divergence to escape — a much larger setHead eventually let the node resync cleanly.
Suggested directions
Cheapest fix: scope the check to non-trivial blocks. If the candidate block carries no state delta vs its parent, this isn't a shadow-state attack pattern.
if canonical != nil && canonical.Root() == block.Root() {
// Genuine shadow-state attacks substitute a block whose state diverges
// from the canonical state at the same height. Two empty blocks with
// the same parent legitimately share a state root — that isn't an
// attack, just a different seal/timestamp.
if len(block.Transactions()) == 0 && block.GasUsed() == 0 &&
block.ParentHash() == canonical.ParentHash() &&
block.TxHash() == canonical.TxHash() &&
block.ReceiptHash() == canonical.ReceiptHash() {
// fall through to normal side-chain insertion
} else {
log.Warn("Sidechain ghost-state attack detected", ...)
return nil, it.index, errors.New("sidechain ghost-state attack")
}
}
Stronger fix: the upstream check is a pre-merge defense. Bor's consensus model (validator-signed headers, Heimdall-anchored milestones, checkpoint-based finality on L1) doesn't rely on this in-line root match to defend against state-pruning shadow attacks. It may be reasonable to drop the check entirely in insertSideChain for Bor and lean on existing Bor-specific verification (sealer authorization, sprint rules, milestone whitelist) instead. Worth a security review before doing this — flagging as a path the reviewer should weigh in on, not a recommendation.
Either way, the recovery path needs work too: even after debug_setHead puts the head behind the offending block, the side-chain record can re-trigger the same condition on the next sync. Investigating whether SetHead should also evict diverged side-blocks at heights > target would help operators recover without having to rewind a "large number" of blocks past the divergence.
Workaround for operators
debug_setHead to a height well before the divergence (not just one or two blocks behind). RPC timeout on the call is normal for large rewinds; the operation continues server-side. After the rewind, expect a full resync from that height.
Artifacts
commands.out — full operator session capture (script(1) format, ANSI included)
chaos.out.txt — chaos-runner stdout/stderr from the partition scenario
reports/test-20260513-131241-test-1778692361/ — scenario.yaml, report.json, container logs, prom snapshots
pos--683f2274028c459982665e777b2bcdc9/ — full Kurtosis enclave dump
Bor sync wedges on empty block after producer restart —
insertSideChainsame-stateRoot false-positiveSummary
After restarting a Bor validator whose local head is an empty (no-transaction) block produced just before the restart, Bor can refuse to sync past that block, dropping every peer with:
siderootandcanonrootare the same value — the check atcore/blockchain.go:3562is firing on identical state roots. The upstream-inherited logic assumes same-root + different-hash implies a shadow-state attack, which is only true when the block carried a state transition. For empty blocks (gasUsed=0, transactions=[]) the state root is just the parent's, so two distinct empty headers at the same height legitimately share a state root while having different hashes/seals.A node in this state cannot recover by reconnecting, restarting Bor, or by
debug_setHeadto a block immediately behind the offending height — the side-block record persists in the database and re-triggers the same check on the next sync attempt.Environment
v2.8.0-betakurtosis-pos, 9 L2 nodes:selected_producers(val_id 3). Small validator set, weighted span selection — one validator can end up sole producer for a span.Reproduction
selected_producersentry (the kurtosis-pos default tends to produce this with small validator counts).vP) is the sole producer.vP's bor EL and a second validator's bor EL (vS) within a second of each other, withvS's last canonical head being an empty block recently produced byvP.Expected:
vSresyncs to chain head.Observed:
vSstays stuck at its last pre-stop block (block numberN).docker logsforvS's bor container streams the warn-drop pair above on every peer it talks to, plus continuous"Whitelisting milestone deferred err=chain out of sync"from the heimdall ws subscription.Concrete trace from our run
Captured artifacts:
commands.out,chaos.out.txt,reports/test-20260513-131241-test-1778692361/. Key timeline:l2-el-2-bor-heimdall-v2-validator(v2). v2's local head: block 1441, produced by v3, stateRoot0x8f58ec…2dcc11,transactions: [],gasUsed: 0.l2-el-3-bor-heimdall-v2-validator(v3). v3 was the sole producer for span 11.Sidechain ghost-state attack detected number=1441 sideroot=8f58ec..2dcc11 canonroot=8f58ec..2dcc11against every peer; drops them; stays stuck at 1441 while v1/v3/v4/v5 advance past 2100.Block 1441 on v2 (its local canonical):
The 1441 the rest of the cluster has at the same height shares
parentHash,stateRoot,transactionsRoot,receiptsRoot— the state didn't change — but has a differenttimestampandextraDataseal (likely produced by a succession-1+ backup after v3 stopped), so its block hash differs.Root cause
core/blockchain.go,insertSideChain(currentdevelop):This is upstream go-ethereum's pre-merge defense against attackers side-mining to a height where state was pruned and substituting their block to bypass full state verification. The premise is that a benign sidechain block at height
Nwould produce a different state root from the canonical chain (because the txs at heights<= Ndiffer across the two chains).That premise doesn't hold for Bor:
block.Root() == parent.Root().So
same-root + different-hashis a normal, expected condition in Bor, not an attack signal. The check turns it into a hard sync failure that also blacklists the offering peer for that sync attempt; with every honest peer in the network offering the canonical chain that disagrees with the node's local stale head, the node has no way out.Why
debug_setHeadto heightN-2did not recover the nodeWe tried
debug_setHead 0x59f(=1439) on v2. Logs confirm the rewind:But the warn-drop loop continued. Best guess: the side-block record for the old 1441 hash remains in the block database after
SetHead(only the canonical pointer was rewound and state was truncated), and the downloader-side handling re-imports it as a side chain on the next sync attempt before reconciling with peers, soinsertSideChainkeeps hitting the same condition. We had to rewind well past the divergence to escape — a much larger setHead eventually let the node resync cleanly.Suggested directions
Cheapest fix: scope the check to non-trivial blocks. If the candidate block carries no state delta vs its parent, this isn't a shadow-state attack pattern.
Stronger fix: the upstream check is a pre-merge defense. Bor's consensus model (validator-signed headers, Heimdall-anchored milestones, checkpoint-based finality on L1) doesn't rely on this in-line root match to defend against state-pruning shadow attacks. It may be reasonable to drop the check entirely in
insertSideChainfor Bor and lean on existing Bor-specific verification (sealer authorization, sprint rules, milestone whitelist) instead. Worth a security review before doing this — flagging as a path the reviewer should weigh in on, not a recommendation.Either way, the recovery path needs work too: even after
debug_setHeadputs the head behind the offending block, the side-chain record can re-trigger the same condition on the next sync. Investigating whetherSetHeadshould also evict diverged side-blocks at heights> targetwould help operators recover without having to rewind a "large number" of blocks past the divergence.Workaround for operators
debug_setHeadto a height well before the divergence (not just one or two blocks behind). RPC timeout on the call is normal for large rewinds; the operation continues server-side. After the rewind, expect a full resync from that height.Artifacts
commands.out— full operator session capture (script(1) format, ANSI included)chaos.out.txt— chaos-runner stdout/stderr from the partition scenarioreports/test-20260513-131241-test-1778692361/— scenario.yaml, report.json, container logs, prom snapshotspos--683f2274028c459982665e777b2bcdc9/— full Kurtosis enclave dump