Duplicate rows with the same tag name in Release
Problem
Due to the poor-validation mechanizm on Release model, we actually allow users to insert duplicate rows with the same tag name.
Since our Release API assumes the tag to be unique per project, this could cause unexpected behaviors e.g. Trying update an attribute of a release, but it doesn't work because project.releases.find_by_tag(tag_name)
could return random entry from the duplicates.
Here is some stats of the number of duplicates per project (Copied from https://gitlab.com/gitlab-org/gitlab-ee/issues/27856#note_215726022)
FYI,
releases
table is also suffered by duplicatetag
as well. For instance,gitlabhq_production=> SELECT project_id, COUNT(*) as duplicate FROM releases GROUP BY project_id, tag HAVING COUNT(*) > 1 ORDER BY duplicate DESC LIMIT 10; project_id | duplicate ------------+----------- 11711027 | 21 13122813 | 17 7092544 | 4 6827936 | 4 10611918 | 4 5547923 | 3 5041949 | 3 6593557 | 3 6595562 | 3 5547923 | 3 (10 rows)
Releases API would not work on these tags well as it assumes that Release exists for distinct tag.
Proposal
- Clean up duplicates by database migration
- Set DB/AR validation on
tag
in project-scope.
Edited by Shinya Maeda