Implement a prediction limit for Crystalball

Context

When a single file is changed, Crystalball will sometimes predict we should run more than 10'000 spec files for it (see more data on this). This will make predictive pipelines slower, and will not necessarily add more quality to the resulting test run.

Goal

Implement a mechanism to limit the number of tests Crystalball would predict for a single test file, so that predictive pipelines remain fast with a certain level of confidence on their predictions.

Technical hints

We should be able to limit the number of specs in two ways:

Take the first X specs (this is the strategy Crystalball uses in their predictors logic)
Take X specs at random
- This option is probably the most appealing to me, but it makes test suites more random. We already add some randomness in RSpec by default, so that might not be a big problem.

This feature would make sense in https://gitlab.com/gitlab-org/ruby/gems/test_file_finder, as we could even add it to other predictions tools than Crystalball (e.g. our tests.yml file). It should be configurable, and disabled by default.

Edited Mar 14, 2024 by David Dieulivol