Direct Transfer - Don't create placeholder users for deleted users
During the investigation of #506170 (closed), it was found that Direct Transfer creates placeholders and import source users for users who have been deleted from the source instance.
This occurs because some database tables lack foreign key constraints for columns referencing users. As a result:
- When a user is deleted, these columns retain the now-invalid user ID.
- When records containing these invalid IDs are exported, the IDs are included in the export.
- During the import process, Direct Transfer interprets these IDs as valid and creates placeholder and import source users corresponding to the deleted users.
For example:
Consider the Note#updated_by_id
field, which does not have a foreign key. If a user with ID 99
updates a note and is later deleted, the updated_by_id
column still retains the value 99
. During export, this ID is included in the exported data. When Direct Transfer imports this data, an import source user with the identifier 99
is created, and a placeholder user is also created.
Because the user no longer exists in the source instance, Direct Transfer fails to populate the source user's name and username, leaving these fields empty. This lack of information prevents group owners from reassigning ownership effectively, as they cannot identify the original source user.
Proposed Solutions
1. Modify the Export Process
Update the export logic to exclude user IDs from records when the corresponding user no longer exists.
- Limitations: This change would only be applicable to instances running the updated version of GitLab.
2. Enhance Direct Transfer Logic (selected)
Adjust Direct Transfer to avoid creating import source users and placeholder users for deleted users. However, Direct Transfer currently lacks direct knowledge of whether a user exists in the source instance, So It would have to infer that.
To infer the user's existence, Direct Transfer could only create import source users if the user ID is referenced by a column with a foreign key constraint.
For example, if a note is being imported and the updated_by_id
field, which doesn't have a foreign key constraint, references a user ID for which an import source user has not been created yet, Direct Transfer could ignore the updated_by_id
field. This is because it likely corresponds to a user that was deleted from the source instance.
If the user referenced by the updated_by_id
is a member, the import source user will be created at the beginning of the migration process. In this case, Direct Transfer will not ignore the updated_by_id
information since the import source user will already exist. Therefore, using such logic, it is mostly likely that Direct Transfer will not infer the user's existence incorrectly.
Field Analysis
The following fields currently lack foreign key constraints and are impacted:
Table | Field | Observation |
---|---|---|
Note |
updated_by_id , resolved_by_id
|
These fields do not hold critical information. |
Issue |
last_edited_by_id |
These fields do not hold critical information. |
Event |
author_id |
Events are deleted after 3 years; it should be fine to ignored them |
Approval |
user_id |
If the user exists, they are likely created during the import of prior contributions. |
Ci::Bridge |
user_id |
All CI tables lack foreign keys because they reference users in a separate database. However, by the time pipelines are imported, users likely already exist. |
Ci::Build |
user_id |
Same as above. |
Ci::Pipeline |
user_id |
Same as above. |
GenericCommitStatus |
user_id |
Same as above. |