Skip to content

Redistribute pipelines RSpec jobs parallelization

From draft to ready

  • Remove test commit: !133976 (5b7e0036)
  • Ensure that we also changed the comments on the artifacts collector

Context

Closes #422702 (closed).

What does this MR do?

Redistribute the RSpec parallel jobs to aim for a maximum of 40min on average, while spending as little money for it as possible.

Are we spending a lot more CI/CD money for this?

TL;DR: Only a little bit.

In most cases in this MR, we are redistributing jobs from a given "RSpec job class" (e.g. rspec unit pg14 is a RSpec job class with 28 parallel jobs) to another. With this approach, we don't spend extra money (see the section below if you're interested to know why).

The RSpec jobs classes we're redistributing are:

  • 4 jobs from rspec unit pg14 to rspec-ee unit pg14
  • 2 jobs from rspec system pg14 to rspec-ee system pg14
  • 2 jobs from rspec-ee integration pg14 to rspec integration pg14

Additionally, the migration jobs were taking longer than most other job classes, so I added some extra parallel jobs for the following classes that I could not take from other jobs:

  • 4 jobs to rspec migration pg14
  • 2 jobs to rspec migration pg14-as-if-foss

Those we have to pay out of pocket, BUT they are not executed as often as the ones above, so the cost is relatively lower than if we added say 2 parallel jobs to rspec unit pg14.

To see this in the last three months, we can see the number of jobs ran for each RSpec job classes:

  • rspec-ee unit pg14: 387'548 jobs
  • rspec integration pg14: 247'049 jobs
  • rspec-ee system pg14: 225'280 jobs
  • rspec migration pg14: 73'619 jobs
  • rspec migration pg14-as-if-foss: 21'593 jobs

As a general comment about CI costs: we have some high-leverage issues such as #412717 (closed) where we could drastically reduce our CI costs (mainly because we would run less often the FOSS tests than we currently do in gitlab-org/gitlab MR pipelines). Tweaking the file patterns for which jobs should be triggered in pipelines could give us big savings as well.

Why those RSpec jobs classes specifically?

A few factors:

  1. The average duration for those RSpec jobs classes
  2. The number of times they were run over the last three months
  3. The pairs of RSpec duration classes (fast ones giving away jobs to slow ones) should be executed in the same pipelines most of the time (otherwise, we could make a certain job class slower, and it could become the critical path of those pipelines)

All the data from above is shown in the data section below.

Why are we not spending more money when redistributing a job from one "RSpec job class" to another?

Disclaimer: What's below is my current understanding of our CI costs, which might be wrong 😆. Please let me know if you think that's the case!

Let's take rspec unit pg14 tests as an example. The number of tests we have to run will be the same, whether we run them in one job or 30 jobs. What's making the cost go higher when adding more jobs is the setup/teardown around the RSpec run. When adding a parallel job, we're therefore paying for the extra setup/teardown for that new job.

If we are removing a parallel job, the exact opposite is true: we'll save the setup/teardown money, but the tests that this job executed will still have to be run on other jobs, making them longer, or in other words, more expensive.

If we are redistributing jobs, it cancels out: the setup/teardown for the RSpec job we remove will be used for the new job.

The data

Based on a sample pipeline and more global job stats.

Show me more data!

rspec unit pg14 and rspec-ee unit pg14

rspec unit pg14 jobs (28 jobs) are way faster (8min faster) than rspec-ee unit pg14 jobs (18 jobs). They could be rebalanced.

Rules

Both those jobs are run in the same pipeline most of the time for gitlab-org/gitlab pipelines:

# rspec unit
.rails:rules:ee-and-foss-unit:
  rules:
    - <<: *if-fork-merge-request
      when: never
    - !reference [".rails:rules:ee-and-foss-default-rules", rules]
    - <<: *if-default-refs
      changes: *backend-patterns
    - <<: *if-default-refs         # This is different
      changes: *backstage-patterns # This is different

# rspec-ee unit
.rails:rules:ee-only-unit:
  rules:
    - <<: *if-not-ee               # This is different
      when: never                  # This is different
    - <<: *if-fork-merge-request
      when: never
    - !reference [".rails:rules:ee-and-foss-default-rules", rules]
    - <<: *if-default-refs
      changes: *backend-patterns

rspec system pg14 and rspec-ee system pg14

rspec system pg14 jobs (28 jobs) are faster (3.80min faster) than rspec-ee system pg14 jobs (10 jobs). They could be rebalanced.

Rules

Both those jobs are run in the same pipeline most of the time for gitlab-org/gitlab pipelines:

# rspec system
.rails:rules:ee-and-foss-system:
  rules:
    - <<: *if-fork-merge-request
      when: never
    - !reference [".rails:rules:system-default-rules", rules]
    - <<: *if-default-refs
      changes: *code-backstage-patterns

# rspec-ee system
.rails:rules:ee-only-system:
  rules:
    - <<: *if-not-ee # This is different
      when: never    # This is different
    - <<: *if-fork-merge-request
      when: never
    - !reference [".rails:rules:system-default-rules", rules]
    - <<: *if-default-refs
      changes: *code-backstage-patterns

rspec-ee integration pg14 and rspec integration pg14

rspec-ee integration pg14 jobs (6 jobs) are faster (3.38min faster) than rspec integration pg14 jobs (12 jobs). They could be rebalanced.

Rules

Both those jobs are run in the same pipeline most of the time for gitlab-org/gitlab pipelines:

# rspec-ee integration pg14
.rails:rules:ee-only-integration:
  rules:
    - <<: *if-not-ee              # This is different
      when: never                 # This is different
    - <<: *if-fork-merge-request
      when: never
    - !reference [".rails:rules:ee-and-foss-default-rules", rules]
    - <<: *if-default-refs
      changes: *backend-patterns

# rspec integration pg14
.rails:rules:ee-and-foss-integration:
  rules:
    - <<: *if-fork-merge-request
      when: never
    - !reference [".rails:rules:ee-and-foss-default-rules", rules]
    - <<: *if-default-refs
      changes: *backend-patterns

Screenshots or screen recordings

I made another MR to run pipelines without the changes:

migration pg14-as-if-foss

Screenshot_2023-10-13_at_16.19.02

migration pg14

Screenshot_2023-10-13_at_16.19.47

rspec-ee

Screenshot_2023-10-13_at_16.21.58

migration pg14-as-if-foss

Screenshot_2023-10-13_at_16.22.52

migration pg14

Screenshot_2023-10-13_at_16.23.41

rspec-ee

Screenshot_2023-10-13_at_16.25.03

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Dieulivol

Merge request reports