A possible fix for 2026-04-29/30 BAD BLOCK incidents

Diagnosis: go-pulse path-mode shares a layer tree between canonical and side-chain newPayload

Orphan diff layers pollute the per-account lookup index that canonical reads walk.

Timeline of the morning incident (UTC)

Time Event
07:19:45 Canonical block 26,414,494 (hash 0xfc68e06b…e38447, root 0x622977…6d8716) imported via engine_newPayloadV3. Goes through BlockChain.InsertBlockWithoutSetHeadinsertChain(setHead=false)processBlockwriteBlockWithState. statedb.Commit calls triedb.Update(622977…, parent=3d1816…, …) which adds it as a new diff layer in pathdb's shared layerTree. Subsequent engine_forkchoiceUpdatedV3 advances head to it.
07:19:55 Canonical block 26,414,495 (hash 0xef9a13…e625ca, root 0x565391…2a6224) imported the same way. Diff layer added on top of canon494. FCU advances head. lookup.accounts[0x0CFd…hash] now contains […, canon494_root, canon495_root] (block 26,414,495 included a tx from this address with nonce 0x4419d0).
07:19:56 Late-arriving orphan payload for height 26,414,494 (hash 0xcf16e4…b86508, parent 0x3d1816…e1a7e0 — same parent as canon494). Same code path: processBlock(setHead=false)writeBlockWithStatestatedb.Committriedb.Update(R_orphan, parent=3d1816…). tree.add links the orphan as a sibling of canon494 under the same parent layer. lookup.addLayer(orphan) appends R_orphan to lookup.accounts[hash] for every account the orphan touched. Geth correctly does NOT fire Chain head was updated — head stays at canon495.
07:19:56 Layer tree state: disk → … → canon493 → {canon494 → canon495, R_orphan}. tree.descendants[R_orphan] = ∅. The descendant-isolation gate in lookup.accountTip is supposed to keep canon495 reads from ever resolving to R_orphan. Per the incident, this gate is brittle in practice (matches the empirical conclusion in the pulse-reth diagnosis: shared mutable state across canonical/side-chain execution).
07:20:15 Block 26,414,496 (hash 0x30a860bc…121aec, parent canon495) starts processing. state.New(canon495_root) opens a multiStateReader. Prefetcher kicks off; for some account (e.g. 0x6c958a91…), OpenStorageTrie(canon495_root, owner, expectedStorageRoot=0x0f1d11…) walks pathdb to read node (owner, []). The walk returns a node with hash 0xb99efe…3842502b… instead of 0x0f1d11…. Geth fires ERROR Unexpected trie node location=diff and WARN Trie prefetcher failed opening storage trie … missing trie node. Cascade across ~274 affected storage tries. The "got" hash is the orphan-branch's storage root for that account, leaked through the polluted lookup index.
07:20:35 EVM applies tx 0 (0x116dbb4d…) of block 26,414,496: from=0x0CFd4b2BC70dd20E9e040E67Fc26C9cc4309192A nonce=4,463,057. state_transition.preCheck calls statedb.GetNonce(from); the read resolves through the same poisoned reader → returns 4,463,056 (canon494's view, missing canon495's tx with nonce 0x4419d0 that should have bumped the account to 4,463,057). EVM rejects: nonce too high: address 0x0CFd…192A, tx: 4463057 state: 4463056. processor.Process returns the error → bc.reportBlock fires ########## BAD BLOCK ######### for 26,414,496. NewPayload returns INVALID to prysm.
07:20:35 → ~08:50 Every retry of 26,414,496 hits the same pre-checked invalid mark (Skip duplicated bad block); every newer canonical block on top arrives with parent=26,414,496 and is rejected with unknown ancestor / bad ancestor. Validator misses ~1h15m of attestations. Manual systemctl restart geth-pulse-validator.service clears in-memory pathdb layer tree + invalid-blocks cache; re-sync from disk recovers, head catches up. Same failure mode recurs at the next equivocation event.

Why erigon on the same host stayed at tip

Stage-sync holds candidate state in memory until forkchoice commits it. The orphan's engine_newPayload was validated, found non-canonical (no FCU), and dropped — never written to the staged state. No shared layer to corrupt.

The bugfix:

core: skip statedb commit for side-chain newPayload to prevent pathdb state leak

Equivocating beacon proposers cause two newPayload calls at the same height; previously the orphan's diff layer was committed into the shared pathdb and corrupted canonical reads on the next block (BAD BLOCK / nonce too high). Validate-only path now writes header/body/receipts via writeBlockWithReceipts; FCU promotion re-executes via existing recoverAncestors.

Merge request reports

Loading