create PyPI package files uniqueness constraint in the database
🔥 Problem
As discovered in #357373 (closed), we have a Packages::PackageFile model validation of PyPI packages that ensure file_name uniqueness within the scope of a package.
The problem is that this constraint is not implemented in the database side. As documented, a model validation only will not guarantee that we don't have duplicates.
This can produce typebug because those package files record can't be updated anymore = the model validation will fail all the time.
This issue is delicate as we have many duplicates on gitlab.com.
Note that this issue is not having any impact on UX because the Package Registry will always serve the most recent package file.
🚒 Solution
If we have this model constraint, then we should have the same constraint in the database.
So:
- (optional) Consider using
#upsertto avoid creating duplicates. - Remove older duplicates for PyPI package files of the same package.
- Caution, we can't simply remove them in the database. They are linked to a file on object storage.
- Add a
UNIQUEindex using thefile_nameand thepackage_idcolumn.
Edited by David Fernandez