Skip to content

WIP: Parallelize deletion of expired job artifacts

What does this MR do?

Related to: https://gitlab.com/gitlab-org/gitlab/-/issues/223034#note_366458035, #220422 (closed)

Parallelize job artifacts removal by moving the removal process outside of the cron worker.

Alternative solution based on Redis: !37328 (closed)

Stats

  • Number of job artifacts waiting to be removed at 2020-07-15: 28_453_672
  • Number of artifacts that will expire after 2020-07-15: 12_392_776, counted on 2020-07-15

Database migration output

VERSION=20200629091718 bin/rake db:migrate:redo
== 20200629091718 AddPendingDeleteToCiJobArtifacts: reverting =================
-- remove_column(:ci_job_artifacts, :pending_delete)
   -> 0.0020s
== 20200629091718 AddPendingDeleteToCiJobArtifacts: reverted (0.0050s) ========

== 20200629091718 AddPendingDeleteToCiJobArtifacts: migrating =================
-- add_column(:ci_job_artifacts, :pending_delete, :boolean, {:default=>false, :null=>false})
   -> 0.0011s
== 20200629091718 AddPendingDeleteToCiJobArtifacts: migrated (0.0023s) ========

Database queries

With destroy_only_unlocked_expired_artifacts_enabled set to false

Updating the first 250_000 records returned by query to have "pending_delete" = TRUE: https://paste.depesz.com/s/6jS

Query: https://paste.depesz.com/s/dz2

Execution plan: https://explain.depesz.com/s/URiG

Summary:

Time: 2.954 s
  - planning: 0.279 ms
  - execution: 2.954 s
    - I/O read: 40.419 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 476304 (~3.60 GiB) from the buffer pool
  - reads: 31 (~248.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 5 (~40.00 KiB)
  - writes: 0

After adding the index:

Execution plan: https://explain.depesz.com/s/M5gZ

Summary:
Time: 136.721 ms
  - planning: 0.260 ms
  - execution: 136.461 ms
    - I/O read: 134.845 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 22 (~176.00 KiB) from the buffer pool
  - reads: 63 (~504.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

With destroy_only_unlocked_expired_artifacts_enabled set to true

Updating the first 250_000 records returned by query to have "pending_delete" = TRUE: https://paste.depesz.com/s/AEo

Query: https://paste.depesz.com/s/2W2

Execution plan: https://explain.depesz.com/s/2ewk

Summary:
Time: 2.971 s
  - planning: 1.921 ms
  - execution: 2.969 s
    - I/O read: 379.592 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 435692 (~3.30 GiB) from the buffer pool
  - reads: 3545 (~27.70 MiB) from the OS file cache, including disk I/O
  - dirtied: 27146 (~212.10 MiB)
  - writes: 0

With the new index:

Execution plan: https://explain.depesz.com/s/MQGh

Summary:
Time: 496.052 ms
  - planning: 1.513 ms
  - execution: 494.539 ms
    - I/O read: 485.978 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 837 (~6.50 MiB) from the buffer pool
  - reads: 250 (~2.00 MiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Second execution:

Summary:
Time: 3.744 ms
  - planning: 1.878 ms
  - execution: 1.866 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1087 (~8.50 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

The new index:

CREATE INDEX CONCURRENTLY i2 ON public.ci_job_artifacts USING btree (expire_at, job_id) WHERE (pending_delete = false);

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Marius Bobin

Merge request reports