Skip to content

Draft: Enforce CI minutes quota for running jobs

Fabio Pitino requested to merge fp-enforce-minutes-quota-for-running-jobs into master

What does this MR do?

Related to https://gitlab.com/gitlab-org/gitlab/-/issues/20856

This MR introduces a monitoring and enforcement of CI minutes usage for running builds.

Since we accumulate CI minutes consumption into namespace_statistics.shared_runners_seconds when builds complete, we can have pipelines with very long running builds that can cause the CI minutes consumption to exceed by far the limit set on the root namespace.

To limit this we need to monitor the CI minutes consumption on running builds and enforce the limit by dropping builds when the limit is exceeded by a 20.minutes grace period.

Because this operation is expensive (running at root namespace level) we have a few layers of checks to ensure we don't do any overprocessing:

  1. allow the check to be scheduled every 5 minutes per project
  2. allow the check to run exclusively every 5 minutes at namespace level (out of the multiple schedules from 1. above, only 1 actually runs)
  3. skip processing if not on Gitlab.com
  4. skip processing if project does not have shared runners enabled or is public
  5. skip processing if project is on any paid plans (no trial)
  6. only consider cancelable builds in recent cancelable pipelines

TODO:

  • add remaining specs
  • add feature flag
  • do E2E manual QA

Query plans

Builds in namespace being run by shared runners

Executes a query per batch of builds.

::Ci::Build
  .running
  .from_shared_runners
  .for_project(root_namespace.all_projects)
  .updated_after(RUNNING_BUILDS_SINCE_TIME.ago)
  .each_batch { ... }

https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3192/commands/10427

Time: 24.021 ms
  - planning: 23.447 ms
  - execution: 0.574 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 4 (~32.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

All projects in namespace

root_namespace.all_projects.find_each { ... }

https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3246/commands/10618

Time: 20.115 ms
  - planning: 3.249 ms
  - execution: 16.866 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 4097 (~32.00 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Online specific runners that can be used by a project

executed for each project in the namespace having builds to drop.

Ci::Runner.specific_for_project(project).with_tags.online.to_a
Time: 34.966 ms
  - planning: 5.353 ms
  - execution: 29.613 ms
    - I/O read: 28.266 ms
    - I/O write: N/A

Shared buffers:
  - hits: 61 (~488.00 KiB) from the buffer pool
  - reads: 9 (~72.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 1 (~8.00 KiB)
  - writes: 0

https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/3246/commands/10619

Recent cancelable pipelines for project

::Ci::Pipeline
  .for_project(project)
  .cancelable
  .updated_after(ALIVE_BUILDS_SINCE_TIME.ago)
  .each_batch(of: 100) { ... }

https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/3246/commands/10621

Time: 0.961 ms
  - planning: 0.601 ms
  - execution: 0.360 ms
    - I/O read: N/A
    - I/O write: N/A

Shared buffers:
  - hits: 62 (~496.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Recent alive builds in pipelines

::Ci::Build.in_pipelines(pipelines)
  .running_or_pending_or_created
  .updated_after(ALIVE_BUILDS_SINCE_TIME.ago)
  .in_batches(of: 150) { ... }

For the plan I used in_batches(of: 3) to test the first batch.

https://postgres.ai/console/gitlab/gitlab-production-tunnel/sessions/3246/commands/10622

Time: 85.662 ms
  - planning: 19.410 ms
  - execution: 66.252 ms
    - I/O read: 65.670 ms
    - I/O write: N/A

Shared buffers:
  - hits: 22 (~176.00 KiB) from the buffer pool
  - reads: 10 (~80.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 1 (~8.00 KiB)
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Fabio Pitino

Merge request reports