Bad SQL in the migration of job artifact registries to a separate table
Noted by @_stark (Thanks!)
Source code | +
| BEGIN +
| IF (TG_OP = 'UPDATE') THEN +
| UPDATE job_artifact_registry SET (retry_at, bytes, retry_count, success, sha256) = (NEW.retry_at, NEW.bytes, NEW.retry_count, NEW.success, NEW.sha256);+
| ELSEIF (TG_OP = 'INSERT') THEN +
| INSERT INTO job_artifact_registry (created_at, retry_at, artifact_id, bytes, retry_count, success, sha256) +
| VALUES (NEW.created_at, NEW.retry_at, NEW.file_id, NEW.bytes, NEW.retry_count, NEW.success, NEW.sha256); +
| END IF; +
| RETURN NEW; +
| END; +
Is there a missing WHERE clause on the UPDATE here?
I think this is updating every record in job_artifact_registry every time any artifact is updated in the file_registry. Which is timing out but not before it takes a long time blocking on locks and doing lots of I/O...
This SQL was introduced in a %10.7 MR: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/4719 , insufficiently reviewed by me.
We need to fix this before v10.7.0 because a managing the update will be hell.
I propose we:
- Update the migration code to be correct inline (i.e., don't try to write a new migration that corrects the previous one)
- Redo the migration from scratch on gprd, wiping the existing
job_artifact_registrytable since it's bound to be incorrect.
Edited by Nick Thomas