Skip to content

Fix flaky test `TestDockerCommandRunAttempts`

What does this MR do?

Fix flaky test TestDockerCommandRunAttempts

Why was this MR needed?

Background

When removeContaienr is called the exit code is 137. Which is sometimes picked up by the Runner and that exit code is returned as explained in detail in #25385 (comment 324486793) this causes the test to fail and lead into flaky tests.

Fix

Add testAttempts, which will run the tests x amount of times, increasing the odds of us getting the expected value. If the test attempts are higher there is a high chance that the failure is legit.

The Runner can just retry the job section if the exit code is 137, but it's useful to show the exit code 137 to the user because this is mostly the oom killer killing the container as explained in https://success.docker.com/article/what-causes-a-container-to-exit-with-code-137 and it's useful to show to the user that their job is getting killed by the oom killer.

Reproduce

It is quite hard to reproduce since it's random and no real way to provide the exit code properly. The only way that you can is calling ContainerStop with the following patch: https://gitlab.com/snippets/1967002

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Added tests for this feature/bug
  • In case of conflicts with master - branch was rebased

What are the relevant issue numbers?

Closes #25385

Edited by 🤖 GitLab Bot 🤖

Merge request reports