Skip to content

Problem: ZMQ_DISH over UDP triggers errno_assert() after hitting watermark #2911

@eponsko

Description

@eponsko

Issue description

Running ZMQ_DISH / ZMQ_RADIO pair over UDP causes errno_assert() in udp_engine.cpp:381 .

Environment

  • libzmq version: latest from master (a45e4bb)
  • OS: Ubuntu 17.10

Minimal test code / Steps to reproduce the issue

#define ZMQ_BUILD_DRAFT_API 1
#include <stdio.h>
#include <zmq.h>
#include <string.h>

int publisher_main (void)
{
    void *context = zmq_ctx_new ();
    void *publisher = zmq_socket (context, ZMQ_RADIO);
    zmq_connect (publisher, "udp://127.0.0.1:5556");
    printf("Broadcasting on radio channel 'foo'!\n");
    long long int sent = 0;
    while (1) {
       zmq_msg_t msg;
        zmq_msg_init_size(&msg, strlen("hello"));
        memcpy(zmq_msg_data(&msg), "hello", strlen("hello"));
        zmq_msg_set_group(&msg, "foo");
        zmq_msg_send(&msg, publisher, 0);
    }
}
int subscriber_main (void)
{
    void *context = zmq_ctx_new ();
    void *subscriber = zmq_socket (context, ZMQ_DISH);
    int total = 0, tsize = 0;
    zmq_bind(subscriber, "udp://*:5556");
    zmq_join(subscriber, "foo");

    zmq_msg_t msg;
    zmq_msg_init(&msg);
    usleep(1000);
    fprintf(stderr,"Pointing the dish to channel 'foo'\n");
    while(1) {
    total = 0;
    while (total < 1000000) {
        zmq_msg_recv(&msg, subscriber, 0);
        total++;
        tsize += zmq_msg_size(&msg);
     }
    }
    zmq_close (subscriber);
    zmq_ctx_destroy (context);
    return 0;
}

What's the actual result? (include assertion message & call stack if applicable)

#0  0x00007ffff7239428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff723b02a in __GI_abort () at abort.c:89
#2  0x00007ffff7b2e9e5 in zmq::zmq_abort (errmsg_=0x7ffff73949a8 "Resource temporarily unavailable") at /home/ponsko/libzmq/src/err.cpp:87
#3  0x00007ffff7b9240d in zmq::udp_engine_t::in_event (this=0x7ffff00008c0) at /home/ponsko/libzmq/src/udp_engine.cpp:381
#4  0x00007ffff7b2dbb0 in zmq::epoll_t::loop (this=0x618010) at /home/ponsko/libzmq/src/epoll.cpp:188
#5  0x00007ffff7b2dcce in zmq::epoll_t::worker_routine (arg_=0x618010) at /home/ponsko/libzmq/src/epoll.cpp:203
#6  0x00007ffff7b785f4 in thread_routine (arg_=0x618090) at /home/ponsko/libzmq/src/thread.cpp:109
#7  0x00007ffff6fee6ba in start_thread (arg=0x7ffff5c82700) at pthread_create.c:333
#8  0x00007ffff730b41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The root cause of the problem seems to be in zmq::dish_session_t:push_msg() , at line 303 it calls zmq::session_base_t::push_msg() and returns it's return value. If push_msg() fails, e.g. if the if the watermark has been hit, errno is set to EAGAIN and the failure propagates up to zmq::udp_engine_t:in_event() and triggers errno_assert().

Something like this:

#4 zmq::udp_engine_t::in_event (this=0x7ffff00008c0) at /home/ponsko/libzmq/src/udp_engine.cpp:380
#3  zmq::dish_session_t::push_msg (this=0x61a330, msg_=0x7ffff5c81150) at /home/ponsko/libzmq/src/dish.cpp:308
#2   zmq::session_base_t::push_msg (this=0x61a330, msg_=0x7ffff5c81150) at /home/ponsko/libzmq/src/session_base.cpp:164
#1    zmq::pipe_t::write (this=0x619100, msg_=0x7ffff5c81150) at /home/ponsko/libzmq/src/pipe.cpp:224
#0     zmq::pipe_t::check_write
            bool full == true
           return false
       -> write() 
          return false
     -> session_base_t::push_msg() 
        errno = EAGAIN, 
        return -1
    -> dish_session_t::push_msg 
       return -1
#4 zmq::udp_engine_t::in_event (this=0x7ffff00008c0) at /home/ponsko/libzmq/src/udp_engine.cpp:380  called push_msg()
    value rc is now -1
    @udp_engine.cpp:381 
    -> errno_assert (rc == 0); 
       -> Resource temporarily unavailable (/home/ponsko/libzmq/src/udp_engine.cpp:381)
       -> Aborted (core dumped)

Expected result

If the pipe fails to write the message, dish_session_t expects to try again with the same message. Returning -1 kills the udp_engine, returning 0 I suppose will drop the message. Not sure how this should be solved.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions