Skip to content
GitLab
Next
    • GitLab: the DevOps platform
    • Explore GitLab
    • Install GitLab
    • How GitLab compares
    • Get started
    • GitLab docs
    • GitLab Learn
  • Pricing
  • Talk to an expert
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    Projects Groups Snippets
  • Sign up now
  • Login
  • Sign in / Register
  • GitLab GitLab
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 46,530
    • Issues 46,530
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1,511
    • Merge requests 1,511
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • GitLabGitLab
  • Issues
  • #327281
Closed
Open
Issue created Apr 08, 2021 by Thao Yeager@thaoyeager💬Contributor

Stop the increase of unlocked expired artifacts waiting for removal

This is step 1 of a 2-step solution; see #322817 (closed) for step 2.

Problem to solve

We have around 800TB(80 millions records) of removable artifacts that can't be removed because of inefficient database queries.

Ci::DestroyExpiredJobArtifactsService is triggered every 7 minutes to remove expired and unlocked artifacts. It looks at 100000 artifacts that have the expire_at value less than the current timestamp and it selects the ones that are unlocked. This is not efficient because it can't reach artifacts with an older expire_at value. It's not possible to add another index for this query because we need to join three large tables to gather the data.

Proposal

We need to stop the increasing number of expired artifacts waiting for removal as a first iteration before we clean up old expired artifacts (#322817 (closed)) in a background migration.

Ci::UnlockArtifactsService returns the pipeline ids that it unlocks. We should use this service and the pipeline ids to update the expire_at timestamp for the artifacts to a more recent value. This will push the removable artifact at the top of the queue making the removal process more efficient.

diff --git a/app/services/ci/unlock_artifacts_service.rb b/app/services/ci/unlock_artifacts_service.rb
index 07faf90dd6d..9f8d9416c7d 100644
--- a/app/services/ci/unlock_artifacts_service.rb
+++ b/app/services/ci/unlock_artifacts_service.rb
@@ -17,7 +17,9 @@ def execute(ci_ref, before_pipeline = nil)
       SQL
 
       loop do
-        break if ActiveRecord::Base.connection.exec_query(query).empty?
+        result = ActiveRecord::Base.connection.exec_query(query)
+        break if result.empty?
+        refresh_expire_at_timestamp(result)
       end
     end
 
@@ -29,5 +31,12 @@ def collect_pipelines(ci_ref, before_pipeline)
 
       pipeline_scope.artifacts_locked
     end
+
+    # This will be more complex in the actual implementation because a batch of 100 pipelines for the GitLab project has 32808 artifacts, so it needs some form of batching.
+    # https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/2706/commands/8325 
+    def refresh_expire_at_timestamp(result)
+      Ci::JobArtifact
+        .expired_before(1.hour.ago)
+        .where(job_id: Ci::Build.in_pipelines(result.rows))
+        .update_all(expire_at: 5.seconds.ago) # keep them expired.
+    end
   end
 end
Edited Apr 09, 2021 by Marius Bobin
Assignee
Assign to
Time tracking