Gitlab Migration fails systematically with some data not being migrated.
As reported in https://gitlab.com/gitlab-org/manage/import/support/-/issues/14#note_1020687989 a customer reported that during the GitLab migration, the import process crashes almost systematically after a while with half of the missing data. They also mentioned that in this case they don't know what has been migrated or not.
Below are some details from the customer:
- general group info: 2 epics, 22 issues
- their group structure. Group_structure.txt
- bulk_import_failures content :
gitlabhq_production=> SELECT * from bulk_import_failures;
id | bulk_import_entity_id | created_at | pipeline_class | exception_class | exception_message
| correlation_id_value | pipeline_step
----+-----------------------+-------------------------------+---------------------+---------------------------+-------------------------------------------------------------------------------------
--------------------------+----------------------------+---------------
90 | 133 | 2022-07-07 14:35:44.841398+00 | ExportRequestWorker | BulkImports::NetworkError | Unsuccessful response 404 from /api/v4/groups/5000/export_relations. Body: {"message
"=>"404 Group Not Found"} | 01G7CHD4PFB254S64WER9056E6 |
(1 row)
Problem
The problem occurs when the group path looks like a number, e.g. 5000. In this case, GitLab tries to find the group by ID (See the code) and not by the path.
Proposal
Update BulkImports::ExportRequestWorker to request export from source using id instead of full_path. For this we need to do the followin:
- Add
source_idcolumn tobulk_import_entitiestable - Update group & project pipelines that fetch initial entity information to store
source_idin the newly added column (see #367915 (comment 1085195344) for details) - Update the export request worker to perform export network request with the stored
source_id