Dual-channel-replication announces itself at replica-announce-ip if configured #2846

jdheyburn · 2025-11-14T18:08:38Z

When dual-channel-replication is enabled, and replica-announce-ip is set, the RDB/AOF channel does not announce itself at this endpoint. This defaults to the IP address behind the NAT, or the Kubernetes Pod IP in our case.

This means that if Sentinel is polling the primary for connected replicas, it will first see the ephemeral pod IP, then revert to the announce-ip - leaving behind the pod IP as a down replica.

This PR configures the RDB/AOF channel to also announce itself at the announce-ip to prevent the stale replica.

Testing

I evaluated writing unit tests for this, but I am not sure of a way we can test an IP address different to localhost (127.0.0.1) that would fail without the fix. I did test on Kubernetes against 9.0 tag and verified the fix there too.

Status quo

On 9.0 image tag:

$ kubectl get pods -n valkey-baseline -o custom-columns=NAME:.metadata.name,POD-IP:.status.podIP
NAME                              POD-IP
valkey-primary-5bd78c8566-llb6k   10.244.0.25
valkey-replica-0                  10.244.0.17
valkey-replica-1                  10.244.0.13

$ kubectl get services -n valkey-baseline -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.spec.clusterIP
NAME               CLUSTER-IP
valkey-primary     10.96.147.28
valkey-replica-0   10.96.66.233
valkey-replica-1   10.96.57.230

Logs below show that pod IP for valkey-primary-5bd78c8566-llb6k 10.244.0.25:6379 is being used for dual-channel replication. This should be its cluster IP 10.96.147.28 as this is what is set in replica-announce-ip.

1:M 14 Nov 2025 17:57:51.750 * Replica 10.96.147.28:6379 asks for synchronization
1:M 14 Nov 2025 17:57:51.751 * Replica 10.244.0.25:6379 asks for synchronization
1:M 14 Nov 2025 17:57:56.135 * Dual channel replication: Sending to replica 10.244.0.25:6379 RDB end offset 1763269 and client-id 35
1:M 14 Nov 2025 17:57:56.140 * Replica 10.96.147.28:6379 asks for synchronization

This fix

$ kubectl get pods -n valkey-test -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.status.podIP  
NAME                              POD-IP
valkey-primary-594c9597b5-qqvdk   10.244.0.26
valkey-replica-0                  10.244.0.10
valkey-replica-1                  10.244.0.18

$ kubectl get services -n valkey-test -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.spec.clusterIP
NAME               CLUSTER-IP
valkey-primary     10.96.125.142
valkey-replica     None
valkey-replica-0   10.96.155.74
valkey-replica-1   10.96.64.111
valkey-sentinel    None

Logs show that the Cluster IP is now being used for dual-channel replication.

1:M 14 Nov 2025 17:57:49.923 * Replica 10.96.125.142:6379 asks for synchronization
1:M 14 Nov 2025 17:57:49.924 * Replica 10.96.125.142:6379 asks for synchronization
1:M 14 Nov 2025 17:57:54.913 * Dual channel replication: Sending to replica 10.96.125.142:6379 RDB end offset 1771247 and client-id 36
1:M 14 Nov 2025 17:57:54.916 * Replica 10.96.125.142:6379 asks for synchronization

Fixes #2338

Signed-off-by: Joseph Heyburn <[email protected]>

ranshid

Overall the fix LGTM

Can we please add a tcl test for it?

ranshid · 2025-11-19T08:49:26Z

src/replication.c

        return C_ERR;
    }

+    if (server.replica_announce_ip) {


Maybe we can ALWAYS include the ip-address in the first replconf and thus reduce the need to explicitly handle the second replconf error handing?

I wanted to be consistent with the current pattern for how replication handles replica_announce_ip. I am unsure how the REPLCONF command would handle a null ip-address.

valkey/src/replication.c

Lines 3725 to 3731 in e19ceb7

/* Set the replica ip, so that primary's INFO command can list the

* replica IP address port correctly in case of port forwarding or NAT.

* Skip REPLCONF ip-address if there is no replica-announce-ip option set. */

if (server.replica_announce_ip) {

err = sendCommand(conn, "REPLCONF", "ip-address", server.replica_announce_ip, NULL);

if (err) goto err;

}

Yes. I meant that we could pass the replica IP address even when the parameter is not configured. but it is not that critical TBH

jdheyburn · 2025-11-19T17:06:23Z

Can we please add a tcl test for it?

@ranshid I am not sure of a way to accurately test this via a tcl. The replica-announce-ip that would need to be set during the test would have to be a local IP address such as 127.0.0.1 which would be the IP address used anyway. I had a tcl test case before, but removing the code I added caused the test to pass anyway.

Is there another means of testing? This is why I put the emphasis on the test I added in the description.

ranshid · 2025-11-20T12:58:04Z

Can we please add a tcl test for it?

@ranshid I am not sure of a way to accurately test this via a tcl. The replica-announce-ip that would need to be set during the test would have to be a local IP address such as 127.0.0.1 which would be the IP address used anyway. I had a tcl test case before, but removing the code I added caused the test to pass anyway.

Is there another means of testing? This is why I put the emphasis on the test I added in the description.

I was thinking to set the config to something like
replica-announce-ip 5.5.5.5

and then delay the full sync (IIRC you can use the config set repl-diskless-sync-delay)

during that time the primary info shuold indicate the 'rdb-channel' has ip address 5.5.5.5

Fix stale sentinel replicas when dual-channel-replication is enabled

310c32f

Signed-off-by: Joseph Heyburn <[email protected]>

github-actions bot assigned jdheyburn Nov 14, 2025

jdheyburn changed the title ~~Fix stale sentinel replicas when dual-channel-replication is enabled~~ Dual-channel-replication announces itself at replica-announce-ip if configured Nov 14, 2025

This was referenced Nov 14, 2025

Have dual-channel-repl use replica-announce-ip #2733

Closed

Enable dual-channel replication by default #2083

Open

ranshid self-requested a review November 18, 2025 16:17

ranshid reviewed Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dual-channel-replication announces itself at replica-announce-ip if configured #2846

Dual-channel-replication announces itself at replica-announce-ip if configured #2846

jdheyburn commented Nov 14, 2025

Uh oh!

ranshid left a comment

Uh oh!

ranshid Nov 19, 2025

Uh oh!

jdheyburn Nov 19, 2025

Uh oh!

ranshid Nov 20, 2025

Uh oh!

jdheyburn commented Nov 19, 2025

Uh oh!

ranshid commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	/* Set the replica ip, so that primary's INFO command can list the
	* replica IP address port correctly in case of port forwarding or NAT.
	* Skip REPLCONF ip-address if there is no replica-announce-ip option set. */
	if (server.replica_announce_ip) {
	err = sendCommand(conn, "REPLCONF", "ip-address", server.replica_announce_ip, NULL);
	if (err) goto err;
	}

Dual-channel-replication announces itself at replica-announce-ip if configured #2846

Are you sure you want to change the base?

Dual-channel-replication announces itself at replica-announce-ip if configured #2846

Conversation

jdheyburn commented Nov 14, 2025

Testing

Status quo

This fix

Uh oh!

ranshid left a comment

Choose a reason for hiding this comment

Uh oh!

ranshid Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

jdheyburn Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

ranshid Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

jdheyburn commented Nov 19, 2025

Uh oh!

ranshid commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants