Skip to content

Use new locked column on ci_job_artifacts in DestroyAllExpiredService

What does this MR do and why?

Adds a feature flag (:ci_destroy_unlocked_job_artifacts) which when enabled, changes DestroyAllExpiredService to using the new locked column on ci_job_artifacts to mark unlocked and expired artifacts for deletion.

This is intended to resolve #327281 (closed) which describes how the performance of using the locked column on ci_pipelines necessitating a join ci_pipelines -> ci_builds -> ci_job_artifacts is causing us to fall significantly behind on marking artifacts for deletion, which results in us storing an estimated >800TB of unnecessary objects.

This MR will only cover marking job artifacts for deletion if they are unlocked, expired, and were created after !70235 (merged) was merged. This is expected, because the intent is to stop the problem from growing worse, and the follow up issue to clean up old job artifacts is intended to be addressed in #322817 (closed).

Feature Flag

ci_destroy_unlocked_job_artifacts: #338165 (closed)

Database

Query (https://console.postgres.ai/shared/b0ef3b07-89b2-42b5-bcc9-27739a3ad4e6):

SELECT
    "ci_job_artifacts"."project_id",
    "ci_job_artifacts"."file_type",
    "ci_job_artifacts"."size",
    "ci_job_artifacts"."created_at",
    "ci_job_artifacts"."updated_at",
    "ci_job_artifacts"."expire_at",
    "ci_job_artifacts"."file",
    "ci_job_artifacts"."file_store",
    "ci_job_artifacts"."file_sha256",
    "ci_job_artifacts"."file_format",
    "ci_job_artifacts"."file_location",
    "ci_job_artifacts"."id",
    "ci_job_artifacts"."job_id",
    "ci_job_artifacts"."locked"
FROM
    "ci_job_artifacts"
WHERE
    "ci_job_artifacts"."expire_at" < '2021-10-15 00:13:30.139913'
    AND "ci_job_artifacts"."locked" = 0
LIMIT 100;

Statistics:

Time: 3.924 ms
  - planning: 3.886 ms
  - execution: 0.038 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 1 (~8.00 KiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Explain:

 Limit  (cost=0.12..3.06 rows=1 width=140) (actual time=0.004..0.005 rows=0 loops=1)
   Buffers: shared hit=1
   I/O Timings: read=0.000 write=0.000
   ->  Index Scan using ci_job_artifacts_expire_at_unlocked_idx on public.ci_job_artifacts  (cost=0.12..3.06 rows=1 width=140) (actual time=0.002..0.003 rows=0 loops=1)
         Index Cond: (ci_job_artifacts.expire_at < '2021-10-15 00:13:30.139913+00'::timestamp with time zone)
         Buffers: shared hit=1
         I/O Timings: read=0.000 write=0.000

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #327281 (closed)

Edited by Matt Kasa

Merge request reports