-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
Describe the bug
I set up a 3 nodes Vault cluster (only servers) with the integrated raft storage backend. With retry_join stanza in configuration.
Since version 1.11.1, the cluster doesn't initialize properly. Only the first node is initialized, the other nodes hangs because they havent find an available leader early, and they do not retry infinitely (Only 2 times), waiting for the first node to be initialized.
To Reproduce
Steps to reproduce the behavior:
- Run a vault agents on 3 VMs
- Run
vault initon the first node - Run
vault unsealon the first node - Wait for the other nodes to be initialized (it fails here, only the first node is initialized)
Expected behavior
At the step 3, all three nodes of the cluster should be initialized, and nodes 2 and 3 could be unsealed next.
In orther words, nodes 2 and 3 should always retry to join the cluster for the initialization process. Not only 2 times.
Possible workaround
After the first node is initilized, restarting vault agents on nodes 2 and 3 will succesfully terminate the cluster initialization. This is because we are forcing the retry join with a vault agent restart. And at this time, the first node is initialized and unsealed, so it's a leader available to init other nodes.
Environment:
- Vault Server Version (retrieve with
vault status): 1.11.1 - Vault CLI Version (retrieve with
vault version): 1.11.1 - Server Operating System/Architecture: Ubuntu 22.04
Additional context
I've tested with vault v1.11.0 and I do not experiment this issue. It's seems that the last security patch has introduced this issue.