Move pipeline artifacts removal into its own worker/service
Ci::PipelineArtifacts
removal is dependent on Ci::JobArtifacts
removal:
- It uses the same cron worker:
ExpireBuildArtifactsWorker
and it's calledBuildArtifacts
, not pipeline artifacts. - It used the same service:
DestroyExpiredJobArtifactsService
which should remove only job artifacts. - Its implementation on top of the job artifacts has a few drawbacks:
- if we can't keep up with deleting job artifacts, the pipeline artifacts will never be removed. See the discussion linked below.
- it executes a select on
ci_job_artifacts
before removing anything from the pipeline artifacts table - It is possible that in the time that pipeline artifacts are removed, at least one job artifact will be expired, leading us to an almost wasted loop cycle that removes an under BATCH size number of
ci_job_artifacts
instead of removing pipeline artifacts.
The following discussion from !42242 (merged) should be addressed:
-
@reprazent started a discussion: (+1 comment) Is this something we'll need to parallelize as well at some point?
There's currently 90k of those:
gitlabhq_production=> SELECT COUNT(*) FROM ci_pipeline_artifacts WHERE expire_at < NOW(); count ------- 90482 (1 row)
Though I was a bit surprised to find this in the same service as for
JobArtifact
records. Would it be a good idea to decouple this?
Proposal
- Create a new service to remove pipeline artifacts using
Ci::DeletedObject
- Create a new cron worker to execute this service