Remove randomness of TestProcessRunner_BuildLimit failures (!5588) · Merge requests · GitLab.org / gitlab-runner

What does this MR do?

Remove randomness of TestProcessRunner_BuildLimit failures

For some reason, the TestProcessRunner_BuildLimit test (under commands/multi_test.go) that exists since several years, started recently failing randomly. And it feels to be a lot recently.

Because the test has concurrency built within it, it is vulnerable for such random failures. In general the test works like follow:

We prepare the RunCommand configuration supporting two concurrent jobs in a single configured [[runners]] worker.
We feed the workers with job configurations - in a loop, each started in a separate goroutine. This is the built in concurrency.
Jobs are "executed" through mocks.
If a job that is fed to the runner exceeds the limit, a warning in logs is emited.
When all jobs are reported as finished, we scan the logs and count the number of these warnings.

In the test, the limit was set to 2 and three jobs were send in a loop. Because of the goroutine concurrency, jobs are fed totally asynchronous to how they are handled. It may happen that two jobs will be "executed" while the third will be fed. It may happen that each of them will be "finished" before the other will be fed.

And that's what started happening recently in our CI pipelines. We don't see the log warnings at all, which violates the test as we've wanted to confirm that it happened and limit fulfilled its purpose.

This commit brings two improvements:

We feed more jobs, to increase the chance that at least two will be in execution when a third will be fed, which will emit the log warning we want.
The check is changed from equal(1, calculated) to true(calculated > 0) which doesn't require an ideal scenario to happen. Which is risky in a very concurrent and asynchronus setup with an added margin.

Remove randomness of TestProcessRunner_BuildLimit failures

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Remove randomness of TestProcessRunner_BuildLimit failures

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Merge request reports