[NEW] Full Sync from Replica

**The problem/use-case that the feature addresses**

Full synchronization causes a large burst in resource consumption on the primary node. Oftentimes operators have a preference to consume resources on replicas instead of primaries to prevent impact to traffic served by the primary.

**Description of the feature**

Introduce a configuration `CONFIG SET repl-prefer-sync-from-replica yes`

When the configuration is enabled, `CLUSTER REPLICATE` will first identify an eligible replica of the specified node (if one is available).

The new replica would simultaneously connect to the old primary and the existing replica. First, the new replica would send `PSYNC` to the existing replica and use a command like `REPLCONF SNAPSHOT-ONLY` during the replication handshake. If the replica does not support the configuration, we would fallback to sending `PSYNC` to the primary. But if it does support it, the existing replica would send the RDB snapshot to the new replica. Simultaneously, the new replica can do `PSYNC` from the offset of the snapshot to the primary (a la dual channel replication).

If we want, you could support a non-dual-channel flavor as well, but we would need to inform the primary to hold a pointer to the offset in the repl-backlog to prevent failed PSYNC after the snapshot is loaded.

**Alternatives you've considered**

You could make `CLUSTER REPLICATE` specify it rather than a config, either `CLUSTER REPLICATE <primary> SYNCFROM <replica>`, or `CLUSTER REPLICATE <primary> SYNCFROMREPLICA`, or even `CLUSTER REPLICATE <replica>` (although the last one would be breaking I think). Or maybe we change it to a shard based `CLUSTER JOINSHARD <shardid> USING <replica>`

We could also support chaining replication. But I feel like to do chaining replication correctly, it wouldn't be a static "Node A is configured to replicate to B is configured to replicate to C" but rather a dynamic "Node A is primary of shard 1, B is a replica of shard 1, C is a replica of shard 1, and customer supplied a maximum chaining depth of 2 so we can do A->B->C, but if B dies, we can reconfigure as A->C still". Or maybe instead of chaining depth, it could be a per-node replica max (1 is A->B, B->C, C->D, 2 is A->B, A->C, B->D, etc.)

**Additional information**

Atomic slot migration would also benefit from similar functionality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NEW] Full Sync from Replica #2767

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[NEW] Full Sync from Replica #2767

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions