Skip to content

Fix recent CI failures for tests with synchronisation issues

Thomas Ives requested to merge bugfix/1095-event-pipe-test-cb-thread-safety into main

There are a few tests which have been failing occasionally due to synchronisation issues after the move to our own Gitlab runners. These tests are using a Tango::CallBack to verify that events are working correctly, but are using std::this_thread::sleep_for to wait for the event to come in.

It looks like our own CI runners can occasionally run very slowly which means that this sleep is not sufficient. My solution is to move to proper synchronisation between threads so that we can wait until the event actually occurs (which we should do anyway, see #1095).

In this MR I have just focused on the tests with this issue that I see failing most frequently. with the aim to get these fixes merged sooner so that other MR's don't have to deal with this any more. After fixing a few of these I have decided to add a common base class CountingCallBack for all the callback classes as each test is doing similar things and handling the mutex properly to avoid deadlocks can be tricky. I have had a little skim through the other tests and I am reasonably confident that this base class can be used for all of them and some can just use the base class directly.

Initially, I had a default timeout of 60s when waiting for an event to come in, but I found that each of the tests still occasionally failed. After some digging it looks to me like we occasionally get a test that runs very slowly (rather than anything actually getting stuck) so I have decided that increasing the timeout to 120s is probably the best fix. Since then I haven't seen any failures, but it is possible that this still is not enough. As these changes are things we want to do anyway, I think we should just merge and visit again if we still have some issues.

Working towards #1095.

Edited by Thomas Ives

Merge request reports