API_ZmqFailed exception thrown in ZmqEventSupplier::push_heartbeat_event leads to program termination

We got a core file generated on one of our Tango device servers (cppTango 9.3.3 running on Debian Stretch):

Here is the backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fb8eb92942a in __GI_abort () at abort.c:89
#2  0x00007fb8ec2400ad in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fb8ec23e066 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fb8ec23e0b1 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fb8ec23e2c9 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x000055ef49d87f19 in Tango::Except::throw_exception (reason=0x7fb8ee3a13d4 "API_ZmqFailed", 
    desc="Can't push ZMQ heartbeat event for event tango://mytangohost:10000/dserver/hostinfo/my_instance.heartbeat\nZmq error: Interrupted system call", origin=0x7fb8ee3a1a20 "ZmqEventSupplier::push_heartbeat_event", sever=Tango::ERR)
    at /segfs/tango/release/debian9/include/tango/except.h:595
#7  0x00007fb8ee3137cd in Tango::ZmqEventSupplier::push_heartbeat_event() () from /opt/os/lib/libtango.so.9
#8  0x00007fb8ee20dde8 in Tango::PollThread::eve_heartbeat() () from /opt/os/lib/libtango.so.9
#9  0x00007fb8ee208cd7 in Tango::PollThread::one_more_poll() () from /opt/os/lib/libtango.so.9
#10 0x00007fb8ee206acc in Tango::PollThread::run_undetached(void*) () from /opt/os/lib/libtango.so.9
#11 0x00007fb8ecba2731 in omni_thread_wrapper () from /opt/os/lib/libomnithread.so.4
#12 0x00007fb8ec538494 in start_thread (arg=0x7fb8e2ffd700) at pthread_create.c:333
#13 0x00007fb8eb9ddacf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

It looks like an interrupted system call raised a ZMQ error and an exception got thrown in ZmqEventSupplier::push_heartbeat_event method. This exception is not caught at the upper levels (Polling Thread) and seems to lead to a crash of the device server.

terminate called after throwing an instance of 'Tango::DevFailed'

I have no idea why we got this system interrupted call in this specific case.

Maybe we could make the polling thread more robust for this kind of issues?

Edited by Thomas Braun