Gitlab Migration fails systematically with some data not being migrated.

As reported in https://gitlab.com/gitlab-org/manage/import/support/-/issues/14#note_1020687989 a customer reported that during the GitLab migration, the import process crashes almost systematically after a while with half of the missing data. They also mentioned that in this case they don't know what has been migrated or not.

Below are some details from the customer:

  • general group info: 2 epics, 22 issues
  • their group structure. Group_structure.txt
  • bulk_import_failures content :

gitlabhq_production=> SELECT * from bulk_import_failures;

id | bulk_import_entity_id | created_at | pipeline_class | exception_class | exception_message

                      |    correlation_id_value    | pipeline_step

----+-----------------------+-------------------------------+---------------------+---------------------------+-------------------------------------------------------------------------------------

--------------------------+----------------------------+---------------

90 | 133 | 2022-07-07 14:35:44.841398+00 | ExportRequestWorker | BulkImports::NetworkError | Unsuccessful response 404 from /api/v4/groups/5000/export_relations. Body: {"message

"=>"404 Group Not Found"} | 01G7CHD4PFB254S64WER9056E6 |

(1 row)

Problem

The problem occurs when the group path looks like a number, e.g. 5000. In this case, GitLab tries to find the group by ID (See the code) and not by the path.

Proposal

Update BulkImports::ExportRequestWorker to request export from source using id instead of full_path. For this we need to do the followin:

  1. Add source_id column to bulk_import_entities table
  2. Update group & project pipelines that fetch initial entity information to store source_id in the newly added column (see #367915 (comment 1085195344) for details)
  3. Update the export request worker to perform export network request with the stored source_id

/cc @georgekoltsov @mrouggani

Edited by George Koltsov