Skip to content

Refactor BuildArtifactsSizeRefresh ordering

Max Orefice requested to merge morefice/order-artifact-size-refresh-by-id into master

Ref: #362640 (closed)

What does this MR do?

This MR refactors how we fetch the next batch of Ci::JobArtifact that needs to be fetched to recalculate the artifact size for affected broken projects.

It orders the data by (created_at, id) and break down our query with 3 new scopes.

Why are we doing this?

It prevents a typebug from happening pointed out by @fabiopitino when an object gets saved and created_at is filled in before the id.

It leverages tie-breaker column to reuse our our existing index as pointed out by Adam.

The created_at is a non-unique column, there can be several rows with the same value. In this case, we usually add one extra ORDER BY column to break the ties. For example: ORDER BY created_at, id

SQL before

SELECT ci_job_artifacts.id, ci_job_artifacts.size
FROM ci_job_artifacts
WHERE ci_job_artifacts.project_id = 278964
AND (created_at <= '2022-05-07 08:40:59.712340' AND id > 29)
ORDER BY ci_job_artifacts.created_at ASC
LIMIT 1000;

SQL after

SELECT ci_job_artifacts.id, ci_job_artifacts.size
FROM ci_job_artifacts
WHERE ci_job_artifacts.project_id = 278964
AND ci_job_artifacts.created_at <= '2022-05-08 13:09:28.796704'
AND ci_job_artifacts.id > 29
ORDER BY ci_job_artifacts.created_at ASC, ci_job_artifacts.id ASC
LIMIT 1000;

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Max Orefice

Merge request reports