Skip to content

Stop the increase of unlocked expired artifacts waiting for removal

This is step 1 of a 2-step solution; see #322817 (closed) for step 2.

Problem to solve

We have around 800TB(80 millions records) of removable artifacts that can't be removed because of inefficient database queries.

Ci::DestroyExpiredJobArtifactsService is triggered every 7 minutes to remove expired and unlocked artifacts. It looks at 100000 artifacts that have the expire_at value less than the current timestamp and it selects the ones that are unlocked. This is not efficient because it can't reach artifacts with an older expire_at value. It's not possible to add another index for this query because we need to join three large tables to gather the data.

Proposal

We need to stop the increasing number of expired artifacts waiting for removal as a first iteration before we clean up old expired artifacts (#322817 (closed)) in a background migration.

Ci::UnlockArtifactsService returns the pipeline ids that it unlocks. We should use this service and the pipeline ids to update the expire_at timestamp for the artifacts to a more recent value. This will push the removable artifact at the top of the queue making the removal process more efficient.

diff --git a/app/services/ci/unlock_artifacts_service.rb b/app/services/ci/unlock_artifacts_service.rb
index 07faf90dd6d..9f8d9416c7d 100644
--- a/app/services/ci/unlock_artifacts_service.rb
+++ b/app/services/ci/unlock_artifacts_service.rb
@@ -17,7 +17,9 @@ def execute(ci_ref, before_pipeline = nil)
       SQL
 
       loop do
-        break if ActiveRecord::Base.connection.exec_query(query).empty?
+        result = ActiveRecord::Base.connection.exec_query(query)
+        break if result.empty?
+        refresh_expire_at_timestamp(result)
       end
     end
 
@@ -29,5 +31,12 @@ def collect_pipelines(ci_ref, before_pipeline)
 
       pipeline_scope.artifacts_locked
     end
+
+    # This will be more complex in the actual implementation because a batch of 100 pipelines for the GitLab project has 32808 artifacts, so it needs some form of batching.
+    # https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2706/commands/8325 
+    def refresh_expire_at_timestamp(result)
+      Ci::JobArtifact
+        .expired_before(1.hour.ago)
+        .where(job_id: Ci::Build.in_pipelines(result.rows))
+        .update_all(expire_at: 5.seconds.ago) # keep them expired.
+    end
   end
 end
Edited by Marius Bobin