-
Notifications
You must be signed in to change notification settings - Fork 957
Description
There is a fundamental difference w.r.t. Pub/Sub between standalone and cluster
mode:
-
In cluster mode, pub/sub operations are role-agnostic: Messages propagate seamlessly across primaries/replicas.
-
In standalone mode this is not the case; publishes are replicated, though.
Furthermote, PUBLISH is not a writing command. This means that one can publish
on a read-only replica** and the message will only be seen on that replica.(This is why sentinel issues a "CLIENT KILL TYPE PUBSUB" in addition to a
"CLIENT KILL TYPE NORMAL" in order to force a reconnect when changing the role
of a node.)**: With one exception:
EVAL_ROdoes not allow publishing (Interestingly in this context PUBLISH is treated as a writing command)
Currently, REDIRECT & the FAILOVER command neither impact the publishing commands nor do they impact connections in subcribed
mode. The following scenarios are possible:
-
A client that is connected to the primary and e.g. only issuing PUBLISH
commands will be on a replica after a FAILOVER. Now, PUBLISH will only publish
locally to this replica, i.e. subscribers connecting to the new primary won't
receive these messages anymore. -
A client that is connected to the primary and is in subscription mode won't
notice a role switch either. However, since published messages are
replicated, it will receive messages that were published on the primary (and
also those published on the replica).Still, there is a user visible change: A client on another node will not be reported as
a client the message was sent to (in the reply of a PUBLISH command).
This means that in contrast to a cluster failover, there is a chance that a
standalone failover creates two disjoint pub/sub domains. And, currently,
a client in REDIRECT mode will not be notified about role changes
if the connection is used for pub/sub only. This is in contrast to the "smooth switchover"
idea of REDIRECT, IMHO.
Solution options/proposals: (for simplicity these proposals don't make a distinction
between "regular", "pattern", and "sharded" variants. The proposals apply to all variants.)
-
Make pub/sub fully agnostic to role (like in cluster mode).
While ideal, replicating cluster-mode Pub/Sub in standalone is impractical as it would require significant
architectural changes. -
Don't change the current behavior and document this limitation of REDIRECT mode
This leaves users vulnerable to subtle message-loss edge cases.
-
Issue REDIRECTs and kill client connections if necessary:
-
Modify REDIRECT logic to treat PUBLISH as a "write-like" command, redirecting
it to the primary regardless of the client’s mode (even in READONLY).
This avoids inconsistencies where publishing on a replica could isolate messages.Rationale for redirecting always instead of READWRITE only: READONLY means that the client
is willing to read replicated data. It does not mean that the client is willing to accept domain splits.Note that there is a case in which we won't be able to redirect: Publishing in a script without keys (e.g.
eval 'return redis.call("publish", "foo", "bar")' 0). This is because we can't issue a MOVED/REDIRECT from within a partly executed script. We need to return a different error instead.
(However, this looks like a uncommon use case for a script) -
SUBSCRIBE in REDIRECT mode is redirected to the primary in READWRITE mode. (rationale: READONLY means that the client is willing to read replicated data. We can assume that replicated publishes are fine as well.)
-
A connection in subscribe mode is killed (or unsubscribed?) on a role change if the client
is in REDIRECT READWRITE mode.Note: This is not the first connection type that is killed on a role change. Connections waiting on a blocked command are already killed today.
-
I prefer option 3 (with complementary documentation). Although the change is substantial, it is the simplest change that preserves consistency I can think of.