Check for existing LfsObjectProject before copying

What does this MR do and why?

Background:

  • We're trying to add a unique index to lfs_objects_projects over (lfs_object_id, project_id, repository_type).
  • We added an AR callback to use advisory locks to prevent creating duplicates when saving through a model.
  • A migration was already run to remove duplicates, but we found that we're still introducing duplicates.

The other places where we do inserts/updates of multiple rows check for existing rows before writing. This change is meant to mitigate the number of duplicates we introduce by doing a similar check before copying.

Query plans

The new query to check for existing LfsObjectsProject looks like:

SELECT "lfs_objects".*
FROM "lfs_objects"
WHERE "lfs_objects"."oid" IN ('c134cdde152985ebdd8515baadd7506fc5c18c0da2ce5b7160b85035c47aa9f0',
                              '7ec9d54f6f7cf5779f91701fbeb2aa732bf44f43646694434fdf417cb0ff052b',
                              'eb93c80f217271677ccb8a1960a4ad3ba7718ee99b4ed3f2d1292020fa0be6c0')
  AND (NOT EXISTS
         (SELECT 1
          FROM "lfs_objects_projects"
          WHERE "lfs_objects_projects"."project_id" = 278964
            AND (lfs_objects_projects.lfs_object_id = lfs_objects.id)
            AND "lfs_objects_projects"."repository_type" = 2))

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31052/commands/96389

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #480094 (closed)

Edited by Jerry Seto

Merge request reports

Loading