Skip to content

Fix timeout errors when loading expired job artifacts

What does this MR do?

Related to #281688 (closed)

Ci::DestroyExpiredJobArtifactsService#destroy_job_artifacts_batch tries to load batches of expired artifacts but it fails with statement timeout: https://log.gprd.gitlab.net/goto/a8d3af0ba5d30a6ac56bdc0593a02565. As we can see from the json.duration_s column, this timeout doesn't happen right away, but after a good few executions, which means that it successfully removes the recently expired and unlocked artifacts, but it has to jump over a lot of locked ones to find more.

Query and execution plans before the change: https://explain.depesz.com/s/GOpw

New idea from EachBatch docs:

Select batches of expired artifacts and apply pipeline locked filtering to each batch. This is a simplified version:

Ci::JobArtifact.where('expired < ?', Time.current).each_batch(column: :expire_at) do |relation|
  destroy relation.unlocked.to_a
end

EachBatch queries:

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Marius Bobin

Merge request reports