Describe the bug
During cluster setup, the shard id gets established through cluster message extension data. For backwards compatibility reasons, this is delayed until it is established that the node can properly receive these extensions, leading to a propagation delay for the shard ID. When an engine crashes or restarts before the shard ID has stabilized, the config file can become corrupted, leading to failure to restart the engine.
To reproduce
Set up a cluster and then immediately restart a node. It will (flakily) fail to restart due to a corrupted nodes.conf file - either because the replicas do not agree on the shard ID, or there is a shard ID mismatch.
Expected behavior
Engine restarts successfully.
Additional information
Related to this PR - #573