Unlock CI pipeline artifacts when pipeline is unlocked
What does this MR do and why?
Currently the SQL query for finding expired CI pipeline artifacts (not to be confused with CI job artifacts) is too expensive and is timing out. In !97194 (merged),
we added a locked
column to help. When a pipeline is unlocked, it will also unlock the associated pipeline artifacts. With this denormalization, querying for expired and unlocked artifacts is fast.
Relates to #372500 (closed)
Query plan
At the moment, most pipelines appear to have at most 2 pipeline artifacts, far fewer than job artifacts, which makes sense because there are currently only two types. Given a batch size of 1000 pipelines, right now this will at update 2000 rows:
gitlabhq_production=# SELECT pipeline_id, COUNT(*) FROM ci_pipeline_artifacts GROUP BY pipeline_id ORDER BY count DESC;
pipeline_id | count
-------------+-------
508169682 | 2
509296954 | 2
457495070 | 2
509296973 | 2
458782901 | 2
503883139 | 2
470024162 | 2
506467362 | 2
392742168 | 2
487135429 | 2
498259249 | 2
492014928 | 2
511706639 | 2
457256151 | 2
469186289 | 2
467344390 | 2
471841881 | 2
498259468 | 2
460416336 | 2
458540251 | 2
<snip>
The query itself is pretty straightforward. For example:
UPDATE "ci_pipeline_artifacts" SET "locked" = 0 WHERE "ci_pipeline_artifacts"."pipeline_id" = 471464763
With #database-lab, we can see it's indexed properly:
Update on ci_pipeline_artifacts (cost=0.43..3.45 rows=1 width=177)
-> Index Scan using index_ci_pipeline_artifacts_on_pipeline_id_and_file_type on ci_pipeline_artifacts (cost=0.43..3.45 rows=1 width=177)
Index Cond: (pipeline_id = 471464763)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.