Data integrity problem: namespaces with duplicate owner / owner_id rows
Per https://gitlab.com/gitlab-org/gitlab-ce/issues/42936 and https://gitlab.com/gitlab-org/gitlab-ce/issues/31967
Namespaces have a 1-1 relationship with users. All users must have_one
personal namespace. All namespaces must have a nil owner
or, for personal namespaces, belong_to :owner
Our database schema is such that we allow multiple personal namespaces with the same owner_id
, which leads the application to choosing one or the other at random when, e.g., creating a project in the user's personal namespace.
In https://gitlab.com/gitlab-org/gitlab-ce/issues/31967, May 2017, we noted 188 entries violate this rule
. In https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/17034#note_59177229, Feb 2018, we noted 16,365 duplicates
, suggesting that the problem is growing.
We need to decide on a data migration to resolve the problem and enforce a unique index on owner_id to prevent new entries invalid entries from being added.
This will break the application in the cases where it is currently creating duplicates, so we'll need to track down and fix that as well.
I don't have a good idea of what negative consequences this bug has at the moment. At the least, it means we can't reliably tell which projects are in the same namespace. Renaming a user with these duplicated namespaces may also result in oddness, with only one of the namespaces acquiring the new username, and some projects being left in the old namespace. I don't know that we've seen any of this in production yet.