Skip to content

Standalone REDIRECT, Pub/Sub and failover #1780

@gmbnomis

Description

@gmbnomis

There is a fundamental difference w.r.t. Pub/Sub between standalone and cluster
mode:

  • In cluster mode, pub/sub operations are role-agnostic: Messages propagate seamlessly across primaries/replicas.

  • In standalone mode this is not the case; publishes are replicated, though.
    Furthermote, PUBLISH is not a writing command. This means that one can publish
    on a read-only replica** and the message will only be seen on that replica.

    (This is why sentinel issues a "CLIENT KILL TYPE PUBSUB" in addition to a
    "CLIENT KILL TYPE NORMAL" in order to force a reconnect when changing the role
    of a node.)

    **: With one exception: EVAL_RO does not allow publishing (Interestingly in this context PUBLISH is treated as a writing command)

Currently, REDIRECT & the FAILOVER command neither impact the publishing commands nor do they impact connections in subcribed
mode. The following scenarios are possible:

  1. A client that is connected to the primary and e.g. only issuing PUBLISH
    commands will be on a replica after a FAILOVER. Now, PUBLISH will only publish
    locally to this replica, i.e. subscribers connecting to the new primary won't
    receive these messages anymore.

  2. A client that is connected to the primary and is in subscription mode won't
    notice a role switch either. However, since published messages are
    replicated, it will receive messages that were published on the primary (and
    also those published on the replica).

    Still, there is a user visible change: A client on another node will not be reported as
    a client the message was sent to (in the reply of a PUBLISH command).

This means that in contrast to a cluster failover, there is a chance that a
standalone failover creates two disjoint pub/sub domains. And, currently,
a client in REDIRECT mode will not be notified about role changes
if the connection is used for pub/sub only. This is in contrast to the "smooth switchover"
idea of REDIRECT, IMHO.

Solution options/proposals: (for simplicity these proposals don't make a distinction
between "regular", "pattern", and "sharded" variants. The proposals apply to all variants.)

  1. Make pub/sub fully agnostic to role (like in cluster mode).

    While ideal, replicating cluster-mode Pub/Sub in standalone is impractical as it would require significant
    architectural changes.

  2. Don't change the current behavior and document this limitation of REDIRECT mode

    This leaves users vulnerable to subtle message-loss edge cases.

  3. Issue REDIRECTs and kill client connections if necessary:

    • Modify REDIRECT logic to treat PUBLISH as a "write-like" command, redirecting
      it to the primary regardless of the client’s mode (even in READONLY).
      This avoids inconsistencies where publishing on a replica could isolate messages.

      Rationale for redirecting always instead of READWRITE only: READONLY means that the client
      is willing to read replicated data. It does not mean that the client is willing to accept domain splits.

      Note that there is a case in which we won't be able to redirect: Publishing in a script without keys (e.g. eval 'return redis.call("publish", "foo", "bar")' 0). This is because we can't issue a MOVED/REDIRECT from within a partly executed script. We need to return a different error instead.
      (However, this looks like a uncommon use case for a script)

    • SUBSCRIBE in REDIRECT mode is redirected to the primary in READWRITE mode. (rationale: READONLY means that the client is willing to read replicated data. We can assume that replicated publishes are fine as well.)

    • A connection in subscribe mode is killed (or unsubscribed?) on a role change if the client
      is in REDIRECT READWRITE mode.

      Note: This is not the first connection type that is killed on a role change. Connections waiting on a blocked command are already killed today.

I prefer option 3 (with complementary documentation). Although the change is substantial, it is the simplest change that preserves consistency I can think of.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions