Refactor BuildArtifactsSizeRefresh ordering
Ref: #362640 (closed)
What does this MR do?
This MR refactors how we fetch the next batch of Ci::JobArtifact
that needs to be fetched to recalculate the artifact size for affected broken projects.
It orders the data by (created_at, id) and break down our query with 3 new scopes.
Why are we doing this?
It prevents a typebug from happening pointed out by @fabiopitino
when an object gets saved and created_at
is filled in before the id
.
It leverages tie-breaker column to reuse our our existing index as pointed out by Adam
.
The
created_at
is a non-unique column, there can be several rows with the same value. In this case, we usually add one extraORDER BY
column to break the ties. For example:ORDER BY created_at, id
SQL before
SELECT ci_job_artifacts.id, ci_job_artifacts.size
FROM ci_job_artifacts
WHERE ci_job_artifacts.project_id = 278964
AND (created_at <= '2022-05-07 08:40:59.712340' AND id > 29)
ORDER BY ci_job_artifacts.created_at ASC
LIMIT 1000;
SQL after
SELECT ci_job_artifacts.id, ci_job_artifacts.size
FROM ci_job_artifacts
WHERE ci_job_artifacts.project_id = 278964
AND ci_job_artifacts.created_at <= '2022-05-08 13:09:28.796704'
AND ci_job_artifacts.id > 29
ORDER BY ci_job_artifacts.created_at ASC, ci_job_artifacts.id ASC
LIMIT 1000;
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.