Skip to content

PUB crash when SUB exceeded SNDHWM #2942

@sublee

Description

@sublee

Please use this template for reporting suspected bugs or requests for help.

Issue description

When all of these conditions are satisfied, the assertion failure from mtrie.cpp occurs:

  • A connection between a PUB socket and many SUB sockets.
  • A SUB socket subscribe/unsubscribe many prefixes.
  • Call zmq_getsockopt() with ZMQ_EVENTS for SUB sockets.
Assertion failed: erased == 1 (src/mtrie.cpp:297)
[1]    30266 abort (core dumped)  ./a.out

Environment

  • libzmq version (commit hash if unreleased): 4.2.0 and 4.2.3
  • OS: Ubuntu 16.04 LTS

Minimal test code / Steps to reproduce the issue

To reproduce this crash, we should prepare a PUB socket and many SUB sockets.

We will call this sequence (pseudo-code): pub.connect(sub) or sub.connect(pub); pub.getsockopt(ZMQ_EVENTS); sub.subscribe(prefix); sub.getsockopt(ZMQ_EVENTS); sub.unsubscribe(prefix); sub.getsockopt(ZMQ_EVENTS). There will be many prefixes to subscribe/unsubscribe.

Calling getsockopt(ZMQ_EVENTS) after SUB's SUBSCRIBE/UNSUBSCRIBE, or PUB's zmq_connect() will produce a crash due to the assertion failure in mtrie_t::rm_helper.

You can switch PUB<->SUB connection topology by the pub_to_sub variable.

#include "zmq.h"
#include <stdio.h>

// Set 1 or 0 to switch the PUB<->SUB connection topology.
static int pub_to_sub = 1;

void gen_topic(int n, char* topic)
{
    // Simple hash function to generate a subscription prefix from a number.
    n = (n * 2654435761);
    sprintf(topic, "%08x", n);
}

void getsockopt_events_within_many_subscriptions(void* sub)
{
    char topic[8];
    char opt[256];
    size_t opt_len = 256;

    for (int j = 0; j < 10000; ++j)
    {
        gen_topic(j, topic);
        zmq_setsockopt(sub, ZMQ_SUBSCRIBE, &topic, 8);
        // CRASH: Get ZMQ_EVENTS from a SUB socket.
        zmq_getsockopt(sub, ZMQ_EVENTS, opt, &opt_len);
    }
    for (int j = 0; j < 10000; ++j)
    {
        gen_topic(j, topic);
        zmq_setsockopt(sub, ZMQ_UNSUBSCRIBE, &topic, 8);
        // CRASH: Get ZMQ_EVENTS from a SUB socket.
        zmq_getsockopt(sub, ZMQ_EVENTS, opt, &opt_len);
    }
}

int main()
{
    printf("%d.%d.%d\n", ZMQ_VERSION_MAJOR, ZMQ_VERSION_MINOR, ZMQ_VERSION_PATCH);

    void *context = zmq_ctx_new();
    void *pub = zmq_socket(context, ZMQ_PUB);
    void *sub;

    char addr[256]; size_t addr_len = 256;
    char opt[256];  size_t opt_len  = 256;

    if (pub_to_sub)
    {
        // PUB->SUB
        for (int i = 0; i < 100; ++i)
        {
            sub = zmq_socket(context, ZMQ_SUB);

            zmq_bind(sub, "tcp://127.0.0.1:*");
            zmq_getsockopt(sub, ZMQ_LAST_ENDPOINT, addr, &addr_len);
            zmq_connect(pub, addr);

            getsockopt_events_within_many_subscriptions(sub);
        }
    }
    else
    {
        // SUB->PUB
        zmq_bind(pub, "tcp://127.0.0.1:*");
        zmq_getsockopt(pub, ZMQ_LAST_ENDPOINT, addr, &addr_len);
        for (int i = 0; i < 100; ++i)
        {
            sub = zmq_socket(context, ZMQ_SUB);

            zmq_connect(sub, addr);

            getsockopt_events_within_many_subscriptions(sub);

            // CRASH: Get ZMQ_EVENTS from the PUB socket.
            zmq_getsockopt(pub, ZMQ_EVENTS, opt, &opt_len);
        }
    }
}

What's the actual result? (include assertion message & call stack if applicable)

$ gcc zmq_events_crash.c -L ~/usr/local/lib -lzmq && ./a.out
4.2.3
Assertion failed: erased == 1 (src/mtrie.cpp:297)
[1]    30266 abort (core dumped)  ./a.out

What's the expected result?

$ gcc zmq_events_crash.c -L ~/usr/local/lib -lzmq && ./a.out
4.2.3
$ echo $?
0

When SUB sockets connect to the PUB socket, this crash doesn't happen.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions