[NEW] Faster cluster failover

In very fast networks, we don't need the hard-coded 500ms delay. Can we change these hard-coded numbers to be relative the configured node timeout?

This is to have less downtime during an automatic failover.

```C
        server.cluster->failover_auth_time = now +
                                             500 +           /* Fixed delay of 500 milliseconds, let FAIL msg propagate. */
                                             random() % 500; /* Random delay between 0 and 500 milliseconds. */
```

```C
        /* We add another delay that is proportional to the replica rank.
         * Specifically 1 second * rank. This way replicas that have a probably
         * less updated replication offset, are penalized. */
        server.cluster->failover_auth_time += server.cluster->failover_auth_rank * 1000;
```

@madolson @enjoy-binbin @hpatro Am I missing anything?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NEW] Faster cluster failover #2023

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[NEW] Faster cluster failover #2023

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions