Race condition when migrating group and projects
Summary
When migrating a group or project, there is a chance that the pipelines workers try to access the exported NDJSON files before the worker that generates the files has started the export process which can cause the migration to not import some files.
Technical details
When the bulk import starts, two workers are enqueued:
-
BulkImports::ExportRequestWorker. This worker trigger a request to the external source to start generating the NDJSON files of the record that that is being migrated. When receiving the request, the source enqueues other works that eventually creates a BulkImport::Export record for each file that needs to be exported. When source complete the generation of each file, the BulkImport::Export record state is updated to
finished
and the file is associated with the record. -
BulkImports::EntityWorker. This worker triggers the pipelines that will extract the data from the source instance. Each pipeline consumes the data from a different resource that can be the GraphQL API, Rest API, or NDJSON files. In the case of NDJSON files, before starting the pipeline, the worker makes a API call to the source to check the status of the exported file. Basically, the worker keep tracking the status of the previous worker
1. BulkImports::ExportRequestWorker
.
Problem / Race condition
If the 2. BulkImports::EntityWorker
is executed before the BulkImport::Export
records being created by the worker 1. BulkImports::ExportRequestWorker
, when the 2. BulkImports::EntityWorker
makes the call to the endpoint GET /api/v4/groups/#{GROUP_ID}/export_relations/status
to check the status of the exported files, the endpoint returns an empty response because BulkImport::Export
wasn't created yet and in this situation the 2. BulkImports::EntityWorker
marks the pipeline as failed and the file isn't imported.
Possible fixes
In order to fix this issue, empty responses shouldn't set the pipeline as failed
. Instead the 2. BulkImports::EntityWorker
worker should be re-enqueued until the endpoint returns a response that isn't empty.
Steps to reproduce
In order to easily reproduce the problem, we can force the export to take longer to start by adding a sleep
before the service that starts the process
Make sure to export a group or project that doesn't have BulkImport::Export records. Use the endpoints below to check if the records don't exist
-
GET /api/v4/groups/#{GROUP_ID}/export_relations/status
-
GET /api/v4/projects/#{PROJECT_ID}/export_relations/status
Then carry on with the migration of the group or project.
In the end, it's expected that the final status of the import to be success
, but some resources won't have been migrated.