[E2E] Build capability to dynamically scale amount of parallel test jobs

Currently all e2e job parallelisation is statically configured. This makes it impossible to keep total runtime at a certain value automatically. Additionally, selective test execution requires ability to set amount of jobs dynamically due to large variation of how many specs would be executed based on changes in merge request.

Build capability to scale up and down amount of jobs that should be configured for each test suite job based on runtime data available in knapsack report.

Similar mechanism already exists for lower level specs: https://gitlab.com/gitlab-org/gitlab/-/blob/3167827c565cf8e36f1424e09c7dd3f4bd162ff9/scripts/generate_rspec_pipeline.rb. Ability to reuse some of the logic should be evaluated.

E2E pipeline are already generated by generate-e2e-pipeline shell script. For more flexibility, it should be ported to a ruby script or a rake task.

Potential design outline

Use Scenario::Test classes to add additional pipeline related metadata to every specific scenario type (like run_type in which pipeline type this scenario runs and job name to indicate which job runs this scenario)
Scenario::Test are already used to automatically detect if it has any runnable tests based on selected tests. This could be enhanced to return actual spec files rather than just a number
Based on returned spec files and using runtime data stored in knapsack/master_report.json file, calculate total runtime for specific scenario type and determine how many parallel jobs are needed to achieve a specific runtime threshold.
replace shell script scripts/generate-e2e-pipeline with a proper rake task that based on metadata stored in scenario classes and determined parallel job count, injects the parallel: keyword in job definitions through first loading the main pipeline yml file, parsing it as a hash, injecting needed data and then generating a final pipeline yml file from it.

Such design would achieve several things:

Consistent runtime regardless if full suite or only a subset of tests are running
Consistent job names, it would remove the need to have 3 pairs of job definitions like cng-instance, cng-instance-selective-parallel, cng-instance-selective. Because amount of parallel jobs is injected dynamically, we always use the same job which definition and simply changing amount of parallel jobs and specific spec set to run
Main pipeline definition is still mostly a static yml file which significantly simplifies pipeline development and debugging. We had solutions that generate all of the pipeline definitions via code and it is very hard to debug and maintain due to lack of actual pipeline yml definition in final form

Edited Oct 29, 2024 by andrey