Detect the source of new duplicates LFS objects projects
Context
Now that the Background migration to deduplicate LFS object ... (!154323 - merged) is finished in production and Add advisory lock to ensure uniqueness of LFS o... (!159264 - merged) is in place, there should not be new lfs_objects_projects duplicate records saved in the database.
However, there are a few new duplicates detected, potentially due to a bulk insert or raw insert in our codebase.
Some places to investigate:
-
Projects::MoveLfsObjectsProjectsService#move_lfs_objects_projects. It uses
update_allso it bypasses the model validations. -
Projects::LfsPointers::LfsLinkService#link_existing_lfs_objects. It uses
ApplicationRecord.legacy_bulk_insertto link LFS objects projects to a fork -> This is probably the problem since the records found do not havecreated_atorupdated_attimestamps.
Problem to solve
- We should find a way to mitigate these duplicates.
- We should check all the possible sources where a bulk insert or raw insert is defined for
lfs_objects_projectsand ensure these are covered. - We should follow https://docs.gitlab.com/ee/development/database/batched_background_migrations.html#re-queue-batched-background-migrations to re-queue the batched background migration.
Edited by Javiera Tapia