Gitlab Migration gets stuck in the importing state when migrating from CE to EE
Summary
Gitlab Migration gets stuck in the importing state for a long time when migrating from Community Edition (CE) to Enterprise Edition (EE).
Steps to reproduce
The easiest way is test is to migrate a group from https://dev.gitlab.org/ (CE) to https://gitlab.com/ (EE)
What is the current bug behavior?
Gitlab Migration takes too long to migrate the group/project and sometimes doesn't migrate everything. And in the history page, it's possible to see these failures
[
{
"exception_message": "Pipeline timeout",
"exception_class": "BulkImports::Pipeline::ExpiredError",
"pipeline_class": "BulkImports::Groups::Pipelines::EpicsPipeline",
"pipeline_step": "pipeline_worker_run",
"correlation_id_value": "1bfcf441e20f2c19e86ef86442b16274",
"created_at": "2022-09-27T23:17:34.726Z"
},
{
"exception_message": "Pipeline timeout",
"exception_class": "BulkImports::Pipeline::ExpiredError",
"pipeline_class": "BulkImports::Common::Pipelines::BoardsPipeline",
"pipeline_step": "pipeline_worker_run",
"correlation_id_value": "1bfcf441e20f2c19e86ef86442b16274",
"created_at": "2022-09-27T23:17:34.669Z"
},
{
"exception_message": "Unsuccessful response 404 from [FILTERED] Bod...",
"exception_class": "BulkImports::NetworkError",
"pipeline_class": "BulkImports::Common::Pipelines::WikiPipeline",
"pipeline_step": "extractor",
"correlation_id_value": "1bfcf441e20f2c19e86ef86442b16274",
"created_at": "2022-09-27T23:17:34.658Z"
},
{
"exception_message": "Pipeline timeout",
"exception_class": "BulkImports::Pipeline::ExpiredError",
"pipeline_class": "BulkImports::Groups::Pipelines::IterationsCadencesPipeline",
"pipeline_step": "pipeline_worker_run",
"correlation_id_value": "1bfcf441e20f2c19e86ef86442b16274",
"created_at": "2022-09-27T23:17:34.410Z"
}
]
What is the expected correct behavior?
Migration to complete in a timely manner, and no errors occur
Cause of the issue
The CE instance doesn't generate some export relation files because they are only available in the EE edition. For example, iterations_cadences
and epics
are EE-only relations. So when the pipelines to migrate Iterations Cadences and Epics are executed by the EE instance, the pipelines keep waiting for the generating of the files, which never happens. So the migration gets stuck in this step for 90 minutes until the pipeline timeout.
Possible fixes
1. Does not execute EE pipelines when migrating from a CE instance.
For this solution, GitLab Migration needs to know from which Gitlab edition the migration is happening. Currently, GitLab Migration only knows the version. So we need to amend the /api/v4/version
to include the edition. Then save the edition when creating the migration and amend the pipelines to know to execute depending on the edition.
2. Change the logic to determine if an export relation file will be generated or not.
Currently, when Gitlab Migration requests the source instance for the status of the exported files, it uses the endpoint /api/v4/groups/:id/export_relations/status
, which returns an array with the status of the relation export files.
Example of a response of the endpoint /api/v4/groups/:id/export_relations/status
[
{
"relation": "badges",
"status": 1,
"error": null,
"updated_at": "2022-09-27T23:03:12.121Z"
},
{
"relation": "boards",
"status": 1,
"error": null,
"updated_at": "2022-09-27T23:03:12.170Z"
},
{
"relation": "epics",
"status": 1,
"error": null,
"updated_at": "2022-09-27T23:03:12.421Z"
}
]
The response is empty when accessing the endpoint for a group/project that was never migrated. But after requesting the files to be generated, the relations and their status are gradually added to the list.
But because the target instance can't know if a file will be added to the list, the pipeline keeps waiting for the file to be added until the pipeline timeout.
There are a few things that we can do to improve this logic:
- Create a dedicated endpoint to get the relation status. In case the source instance doesn't recognize the relation, a 422 error could be returned, and the pipeline would know that the source instance doesn't support the relation, and the pipeline could skip the migration.
The problem with this solution is that the new endpoint would only be available in new Gitlab versions, and for old versions, we would have to fallback to the old endpoint.
- When using the old endpoint, we could have a lower timeout period for the relation status, for example, 5 minutes instead of 90 minutes. This way, the migration wouldn't get stuck for a long period.
Chosen solution
We decided to implement solution 1.
Tasks breakdown:
- Reduce the timeout to wait for the file relation status
- Add Gitlab instance edition to the API
- Create database migration to add
source_edition
to thebulk_imports
table - Capture and store Gitlab source edition using the updated API
- Change BulkImport to skip EE pipelines if the source instance is CE