Commit 525551a
authored
When the primary changes the config epoch and then down immediately,
the replica may not update the config epoch in time. Although we will
broadcast the change in cluster (see #1813), there may be a race in
the network or in the code. In this case, the replica will never finish
the failover since other primaries will refuse to vote because the
replica's slot config epoch is old.
We need a way to allow the replica can finish the failover in this case.
When the primary refuses to vote because the replica's config epoch is
less than the dead primary's config epoch, it can send an UPDATE packet
to the replica to inform the replica about the dead primary. The UPDATE
message contains information about the dead primary's config epoch and
owned slots. The failover will time out, but later the replica can try
again with the updated config epoch and it can succeed.
Fixes #2169.
---------
Signed-off-by: Ran Shidlansik <[email protected]>
1 parent 5dc6632 commit 525551a
1 file changed
+15
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3145 | 3145 | | |
3146 | 3146 | | |
3147 | 3147 | | |
3148 | | - | |
3149 | | - | |
3150 | | - | |
| 3148 | + | |
| 3149 | + | |
| 3150 | + | |
| 3151 | + | |
| 3152 | + | |
3151 | 3153 | | |
3152 | 3154 | | |
3153 | 3155 | | |
| |||
4080 | 4082 | | |
4081 | 4083 | | |
4082 | 4084 | | |
| 4085 | + | |
| 4086 | + | |
| 4087 | + | |
| 4088 | + | |
| 4089 | + | |
| 4090 | + | |
| 4091 | + | |
| 4092 | + | |
| 4093 | + | |
| 4094 | + | |
4083 | 4095 | | |
4084 | 4096 | | |
4085 | 4097 | | |
| |||
0 commit comments