Skip to content

Join threads when we shutdown

Thomas Ives requested to merge fix-thread-join into release/10.0.0

This MR explicitly joins the ThSig thread and the KillThread when we shutdown. This makes sure that the main thread will not exit while these threads are running, avoiding potential segfaults if these threads access static objects after they have been destroyed by the main thread.

There are many other threads, but I think it is less likely that they are doing things after we have shutdown. I have explicitly looked into the PollThread and this is already being join()'d correctly.

Working towards #1278 (closed), but we might want to explicitly check the other threads before we call that issue done.

I have gone through all the threads and although there are a couple of threads where I think it is probably possible for them to cause a crash during shutdown, I think it is (i) hard to work out how to fix them and (ii) sufficiently rare occurrences that I don't think it is worth the effort to work out how to fix it. Here are my justification for each omni_thread object (other than KillThread and ThSig which we fix in this MR).

  • server PollThread: Runs undetached and is joined during shutdown.
  • client LockThread: Runs detached and does not access static objects.
  • client CallbackThread: Runs undetached and does not access static objects. I don't believe it is ever joined, this might be a memory leak but will not cause a crash.
  • server ServRestartThread: Runs detached and accesses the loggers. This could cause a crash if we receive Kill command/SIGTERM signal while this is running. I am not sure how to fix it and I am not sure it is worth the effort.
  • server DevIntrThread: Runs detached and does not access static objects.
  • client NotifdEventConsumer: Runs undetached. Calls _orb->run() so presumably is stopped when we shutdown the ORB. When _orb->run() returns we do not access any static objects. Is not joined, but I don't think this matters much as there is only ever one.
  • client ZmqEventConsumer: Runs undetached. It is sent a ZMQ_END command during shutdown. The main thread waits for an ACK before continuing. After the ZmqEventConsumer thread sends the ACK it no longer access static objects.
  • client EventConsumerKeepAliveThread: Runs undetached. It is joined during shutdown.
  • client DelayedEventUnsubThread/DelayedEventSubThread: Both run detatched. They are short lived, but access loggers. Could potential cause a crash if we receive a Kill command/SIGTERM signal while they are active. Not clear to me how to fix this and again I am a not sure it is worth the effort.

TODO:

  • Add a catch2 test for the kill command

Closes #1278 (closed).

Edited by Thomas Ives

Merge request reports