Skip to content

Bad SQL in the migration of job artifact registries to a separate table

Noted by @_stark (Thanks!)

Source code         |                                                                                                                                                                +
                    | BEGIN                                                                                                                                                          +
                    |     IF (TG_OP = 'UPDATE') THEN                                                                                                                                 +
                    |         UPDATE job_artifact_registry SET (retry_at, bytes, retry_count, success, sha256) = (NEW.retry_at, NEW.bytes, NEW.retry_count, NEW.success, NEW.sha256);+
                    |     ELSEIF (TG_OP = 'INSERT') THEN                                                                                                                             +
                    |         INSERT INTO job_artifact_registry (created_at, retry_at, artifact_id, bytes, retry_count, success, sha256)                                             +
                    |         VALUES (NEW.created_at, NEW.retry_at, NEW.file_id, NEW.bytes, NEW.retry_count, NEW.success, NEW.sha256);                                               +
                    | END IF;                                                                                                                                                        +
                    | RETURN NEW;                                                                                                                                                    +
                    | END;                                                                                                                                                           +

Is there a missing WHERE clause on the UPDATE here?

I think this is updating every record in job_artifact_registry every time any artifact is updated in the file_registry. Which is timing out but not before it takes a long time blocking on locks and doing lots of I/O...

This SQL was introduced in a %10.7 MR: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/4719 , insufficiently reviewed by me.

We need to fix this before v10.7.0 because a managing the update will be hell.

I propose we:

  • Update the migration code to be correct inline (i.e., don't try to write a new migration that corrects the previous one)
  • Redo the migration from scratch on gprd, wiping the existing job_artifact_registry table since it's bound to be incorrect.
Edited by Nick Thomas