Refactor signal handling logic in non-Windows platforms

For Breaking Changes section of release notes:

  • The DServerSignal class has been completely refactored to handle signals delivered to threads started before Tango. Clients of the class calling the register_*_handler etc. functions should be unaffected, however, the internals have been completely rewritten meaning subclasses of DServerSiganl may have to be rewritten.

This MR refactors the signal handling logic in non-Windows platforms as suggested in #1327 (closed). Even though the issue description and summary refer to the "self-pipe trick", I posited in a comment that it should be safe to perform an in-memory operation that is safe against concurrent signal delivery, involving a deque, a mutex and a condition variable. That is the strategy implemented here. If this proposed solution is not deemed safe, it should be fairly painless to switch to the self-pipe mechanism while maintaining the rest of the changes.

This MR contains 14 commits, each self-contained. As usual, I recommend to go through each one individually, and read the commit messages for a full, clear picture of what the changes are doing. The first 9 commits are fairly non-controversial, and mostly small simplifications and quality-of-life improvements for some of the code around the signal handling logic. The final commits implement the main changes:

  • Start the signal thread after the DServerSignal object is fully created. This wasn't the case, and the signal thread could potentially access a not-fully-constructed DServerSignal. Hasn't been a problem, but we want this stronger guarantee for our signal handlers.
  • Implement and test a SynchronisedQueue class in isolation. The class allows one thread to put values in a deque, while a second thread pops them out.
  • Refactor how signals are handled by Tango, switching from the current strategy (block signals in all Tango threads, then sigwait() in a loop in the signal thread) to one based on signal handlers that are safe to execute on any thread. The new signal handler puts the signal number on a SynchronisedQueue, which the signal handler thread continuously pops values from. With this strategy, further simplifications can be done on how arbitrary signals requested by users can be dynamically added/removed from the signal management logic, including the case of direct execution in the signal handler context.
  • Now that we use signal handlers throughout, we can actually unify the Windows and non-Windows logic (the Windows code branch implemented its own version of communicating values from the signal handler to the signal thread, only it was simpler and might have been subjected to data races).
  • Minor convenience for removing semantic overload on SIGINT previously used to stop the signal thread.

As expected, after these changes a SIGINT'd/SIGTERM'd tango device server always exits with a 0 status code, even on the presence of extra threads. Thus the test that reproduced this situation has been updated to assert on this.

There are also two points I want to note:

  • The ABI/API compatibility job fails, which isn't a surprise given the changes I introduced in the code. However, my understanding is that these classes are all internal to Tango, so users shouldn't be affected by this, correct?
  • Users can request their callbacks to be invoked directly in the context of the signal handler instead of in the signal thread for a particular signal. When this is requested, suddenly all user-provided callbacks for that signal are invoked in the context of the signal handler, and not only the one that requested it. I left this behaviour unchanged, but this is probably something that would be nice to implement correctly in the future (i.e., obey the "bool handle" parameter on a per-callback basis rather that a per-signo basis).
Edited by Thomas Ives

Merge request reports

Loading