Skip to content

ci: Include test files filtering in Knapsack allocation

What does this MR do and why?

Reopening of !110359 (merged) after it was reverted.

The goal is to optimize the distribution of the rspec * predictive jobs.

Let's imagine we have this Knapsack report of testfile -> duration:

Test file Duration
testfile1 10
testfile2 20
testfile3 30
testfile4 40
testfile5 50

And based on the changes, detect-tests decides that we only need to run testfile1, testfile3, testfile5.

Let's say we parallelize the rspec * predictive jobs on two nodes.

Before

Before this change, the following would happen:

  • First, Knapsack would generate a list of tests to be run on node1 and node2 without taking in account the list of test files that will actually be run. Based on the test files duration, the distribution would be:
Node Test files Total expected duration
node1 testfile5, testfile3 50 + 30 = 80
node2 testfile4, testfile2, testfile1 40 + 20 + 10 = 70

Then, our ParallelRSpecRunner wrapper would filter the list of tests further, so that the end result would be as follows:

Node Test files Total expected duration
node1 testfile5, testfile3 50 + 30 = 80
node2 testfile1 10

We can see that the distribution isn't balanced well.

After

With this change, the following would happen:

  • First, Knapsack would generate a list of tests to be run on node1 and node2 with taking in account the list of test files that will actually be run. Based on the test files duration, the distribution would be:
Node Test files Total expected duration
node1 testfile5 50
node2 testfile3 testfile1 30 + 10 = 40

Then, our ParallelRSpecRunner wrapper doesn't have to filter the list of tests further since Knapsack already did the filtering.

We can see that the distribution is balanced way better now.

Actual example

Test Before After
rspec system predictive 1/3 12m52s (+1m10s from average) 14m9s (+1m39s from average)
rspec system predictive 2/3 7m15s (-4m27s from average) 12m0s (+30s from average)
rspec system predictive 3/3 15m3s (+3m21s from average) 11m28s (-1m2s from average)
Average 11m42s 12m30s

Next steps

As a further optimization, we should parallelize dynamically based on the number of test files to run, similarly to how we do with the rspec foss-impact child pipeline.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rémy Coutable

Merge request reports