Some tests are very flaky. We sometimes have to run pipelines multiple times to get them to pass.

Summary

Some tests seem to fail randomly in the pipeline.

Here are some different error messages I had when running the Test Package Ubuntu-20.04 task on develop:

12/78 Test #12: check_test_full ............................***Exception: SegFault  4.14 sec
(check_test_full:13202): GLib-GObject-WARNING **: 14:23:18.592: value "((GstOpusEncAudioType) 0)" of type 'GstOpusEncAudioType' is invalid or out of range for property 'audio-type' of type 'GstOpusEncAudioType'
(check_test_full:13202): GLib-GObject-WARNING **: 14:23:18.592: value "((GstOpusEncBandwidth) 4)" of type 'GstOpusEncBandwidth' is invalid or out of range for property 'bandwidth' of type 'GstOpusEncBandwidth'
(check_test_full:13202): GLib-GObject-WARNING **: 14:23:18.592: value "((GstOpusEncFrameSize) 3)" of type 'GstOpusEncFrameSize' is invalid or out of range for property 'frame-size' of type 'GstOpusEncFrameSize'

Start 54: pyquid_webrtc_multireceiv
ERROR: Job failed: execution took longer than 1h0m0s seconds

12/78 Test #12: check_test_full ............................Child aborted***Exception:   4.30 sec
(check_test_full:13203): GLib-GObject-WARNING **: 14:35:11.612: value "((GstOpusEncAudioType) 0)" of type 'GstOpusEncAudioType' is invalid or out of range for property 'audio-type' of type 'GstOpusEncAudioType'
(check_test_full:13203): GLib-GObject-WARNING **: 14:35:11.613: value "((GstOpusEncBandwidth) 4)" of type 'GstOpusEncBandwidth' is invalid or out of range for property 'bandwidth' of type 'GstOpusEncBandwidth'
(check_test_full:13203): GLib-GObject-WARNING **: 14:35:11.613: value "((GstOpusEncFrameSize) 3)" of type 'GstOpusEncFrameSize' is invalid or out of range for property 'frame-size' of type 'GstOpusEncFrameSize'
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
no message buffer overruns
JACK server starting in non-realtime mode
self-connect-mode is "Don't restrict self connect requests"
creating alsa driver ... hw:0|hw:0|1024|2|48000|0|0|nomon|swmeter|-|32bit
control open "hw:0" (No such file or directory)
ALSA lib pcm_hw.c:1829:(_snd_pcm_hw_open) Invalid value for card
ALSA lib pcm_hw.c:1829:(_snd_pcm_hw_open) Invalid value for card
ALSA: Cannot open PCM device alsa_pcm for playback. Falling back to capture-only mode
Cannot initialize driver
JackServer::Open failed with -1
JACK server starting in non-realtime mode
self-connect-mode is "Don't restrict self connect requests"
creating alsa driver ... hw:0|hw:0|1024|2|48000|0|0|nomon|swmeter|-|32bit
control open "hw:0" (No such file or directory)
ALSA lib pcm_hw.c:1829:(_snd_pcm_hw_open) Invalid value for card
ALSA lib pcm_hw.c:1829:(_snd_pcm_hw_open) Invalid value for card
ALSA: Cannot open PCM device alsa_pcm for playback. Falling back to capture-only mode
Cannot initialize driver
JackServer::Open failed with -1
no message buffer overruns
no message buffer overruns
no message buffer overruns
check_test_full: tpp.c:82: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)' failed.

I have also seen another test fail in the Test -- Ubuntu 20.04 while working on a feature branch that didn't involve the failing test in question. Restarting the test fixed the pipeline without any code change :

11/73 Test #11: check_threaded_wrapper .....................***Exception: SegFault  0.33 sec
: last do_nothing invocation done
coucou0 
: last do_nothing invocation done
coucou0 
: last do_nothing invocation done
coucou0

How to reproduce ?

Start a test pipeline and sometimes it fails

Expected behavior

It should work all the time (or at least restart known flaky tests a couple of times before giving up)

What is the frequency of occurrence of this behavior ?

Maybe 10-20% of the time ? Hard to say but pretty often.

Other comment

As a band aid measure, we could add some logic to rerun known flaky tests in the pipelines. At least it would be less annoying.