[E2E] Build capability to dynamically scale amount of parallel test jobs

Problem Statement

Currently all e2e job parallelisation is statically configured. This makes it impossible to keep total runtime at a certain value automatically. Additionally, selective test execution requires ability to set amount of jobs dynamically due to large variation of how many specs would be executed based on changes in merge request.

Build capability to scale up and down amount of jobs that should be configured for each test suite job based on runtime data available in knapsack report.

Similar mechanism already exists for lower level specs: https://gitlab.com/gitlab-org/gitlab/-/blob/3167827c565cf8e36f1424e09c7dd3f4bd162ff9/scripts/generate_rspec_pipeline.rb. Ability to reuse some of the logic should be evaluated.

E2E pipeline are already generated by generate-e2e-pipeline shell script. For more flexibility, it should be ported to a ruby script or a rake task.

Potential design outline

Use Scenario::Test classes to add additional pipeline related metadata to every specific scenario type (like run_type in which pipeline type this scenario runs and job name to indicate which job runs this scenario)
Scenario::Test are already used to automatically detect if it has any runnable tests based on selected tests. This could be enhanced to return actual spec files rather than just a number
Based on returned spec files and using runtime data stored in knapsack/master_report.json file, calculate total runtime for specific scenario type and determine how many parallel jobs are needed to achieve a specific runtime threshold.
replace shell script scripts/generate-e2e-pipeline with a proper rake task that based on metadata stored in scenario classes and determined parallel job count, injects the parallel: keyword in job definitions through first loading the main pipeline yml file, parsing it as a hash, injecting needed data and then generating a final pipeline yml file from it.

Such design would achieve several things:

Consistent runtime regardless if full suite or only a subset of tests are running
Consistent job names, it would remove the need to have 3 pairs of job definitions like cng-instance, cng-instance-selective-parallel, cng-instance-selective. Because amount of parallel jobs is injected dynamically, we always use the same job which definition and simply changing amount of parallel jobs and specific spec set to run
Main pipeline definition is still mostly a static yml file which significantly simplifies pipeline development and debugging. We had solutions that generate all of the pipeline definitions via code and it is very hard to debug and maintain due to lack of actual pipeline yml definition in final form

Results

Architectural Benefits

Simplified setup for `knapsack` parallelisation

Removes separate jobs that download knapsack report before tests get executed (report is now stored locally in git repository and is always available and correctly corresponds to the state of the code)
Removes specific selective report creation as it's not needed, main report has all the runtime data necessary for knapsack (knapsack report doesn't have to contain exactly the same tests that would be executed, it just needs to contain the necessary specs but it can contain all the extra specs as well)
Removes the need for Google Cloud Storage credentials when running tests because knapsack report is committed in to repository
Tests running in patch releases use runtime data that corresponds to state of the code which improves test distribution

Dynamic CI job scaling

Greatly simplifies pipeline yml setup. Only 1 single job definition is present in CI yml file instead of having to maintain 3 copies of same job (job definition for default full suite run, job definition for selective test execution with test count small enough to not require test parallelisation, job definition for selective test execution with test count requiring multiple parallel jobs)
Greatly simplifies rules setup in CI yml definition files. Because only 1 test job is required, pipeline yml files do not require set of custom rules to correctly select one of 3 job definition from previous implementation
Greatly simplifies adding new test jobs in CI. Only basic 1 job definition has to be added without the need to manually define amount of parallel jobs.
Removes the need to manually track job runtimes and adjust parallel job count, job count is always adjusted automatically based on test runtime
Future support for Enable selective E2E tests on GitLab merge requ... (gitlab-org/quality/quality-engineering&47 - closed). Amount of jobs will be adjusted automatically regardless of what percentage of test suite selective test execution would use. Previous implementation would either not set enough parallel jobs which leads to increased runtime or set too big of a parallel job count leading to increased load on runner infrastructure.
Dynamic pipeline generation is now implemented in ruby which allows to add unit testing to implementation compared to previous implementation via bash script which was not testable

Data on consistent runtime

Internal chart

Pipeline examples

Cases	pipeline Ref	Runtime
Case 1: pipeline executing 2 changed specs	https://gitlab.com/gitlab-org/gitlab/-/pipelines/1654381115	`cng-instance` x 1, job runtime `under 14` minutes
Case 2: pipeline with a lot of selected specs	https://gitlab.com/gitlab-org/gitlab/-/pipelines/1654348358	`cng-instance` x 3, average job runtime - `~11` minutes
Case 3: pipeline executing full test suite	https://gitlab.com/gitlab-org/gitlab/-/pipelines/1654307366	`cng-instance` x 9, average job runtime - `~14` minutes

Average runtime difference between running full suite and subset of jobs that requires 3 jobs is due to rather big discrepancy in E2E test runtimes. So the less tests is being executed, the more chance for selected tests to have more similar runtime and less test flakiness resulting in slightly better average runtimes and better test distribution.

Also, dynamic scaling has a job count rounding logic that prioritises speed. So if calculation would return 2.5 jobs, it would be rounded to 3 parallel jobs which also creates a slightly lower average runtime.

Edited Feb 04, 2025 by andrey