-
Notifications
You must be signed in to change notification settings - Fork 947
Dual-channel-replication announces itself at replica-announce-ip if configured #2846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Dual-channel-replication announces itself at replica-announce-ip if configured #2846
Conversation
Signed-off-by: Joseph Heyburn <[email protected]>
ranshid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the fix LGTM
Can we please add a tcl test for it?
| return C_ERR; | ||
| } | ||
|
|
||
| if (server.replica_announce_ip) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can ALWAYS include the ip-address in the first replconf and thus reduce the need to explicitly handle the second replconf error handing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to be consistent with the current pattern for how replication handles replica_announce_ip. I am unsure how the REPLCONF command would handle a null ip-address.
Lines 3725 to 3731 in e19ceb7
/* Set the replica ip, so that primary's INFO command can list the * replica IP address port correctly in case of port forwarding or NAT. * Skip REPLCONF ip-address if there is no replica-announce-ip option set. */ if (server.replica_announce_ip) { err = sendCommand(conn, "REPLCONF", "ip-address", server.replica_announce_ip, NULL); if (err) goto err; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I meant that we could pass the replica IP address even when the parameter is not configured. but it is not that critical TBH
@ranshid I am not sure of a way to accurately test this via a tcl. The replica-announce-ip that would need to be set during the test would have to be a local IP address such as Is there another means of testing? This is why I put the emphasis on the test I added in the description. |
I was thinking to set the config to something like and then delay the full sync (IIRC you can use the during that time the primary info shuold indicate the 'rdb-channel' has ip address 5.5.5.5 |
When dual-channel-replication is enabled, and replica-announce-ip is set, the RDB/AOF channel does not announce itself at this endpoint. This defaults to the IP address behind the NAT, or the Kubernetes Pod IP in our case.
This means that if Sentinel is polling the primary for connected replicas, it will first see the ephemeral pod IP, then revert to the announce-ip - leaving behind the pod IP as a down replica.
This PR configures the RDB/AOF channel to also announce itself at the announce-ip to prevent the stale replica.
Testing
I evaluated writing unit tests for this, but I am not sure of a way we can test an IP address different to localhost (127.0.0.1) that would fail without the fix. I did test on Kubernetes against 9.0 tag and verified the fix there too.
Status quo
On 9.0 image tag:
Logs below show that pod IP for valkey-primary-5bd78c8566-llb6k
10.244.0.25:6379is being used for dual-channel replication. This should be its cluster IP10.96.147.28as this is what is set in replica-announce-ip.This fix
Logs show that the Cluster IP is now being used for dual-channel replication.
Fixes #2338