POC: GitLab Migration - export and import relations in batches
This is the POC for Export/import in batches during GitLab Migration (&9036 - closed) epic.
Tasks
-
Come up with possible implementation paths.
-
Check feasibility of implementation plan written by @georgekoltsov here:
- Create a new model / db table
BulkImports::ExportBatch
to be associated withBulkImports::Export
(one to many) - Each
ExportBatch
to have it's own export status & row range / offset / index to indicate which rows are contained in one batch - Add optional flag to
export_relations
API to export in batches/api/v4/projects/123/export_relations?batch_export=true
- Whenever flag is provided, export in batches, otherwise fallback to the previous non-batched approach
- As far as batch export goes, relation export worker would have to, for each batch of rows (e.g. 1000 rows per), enqueue new
RelationBatchExportWorker
to perform the same things currentRelationExportWorker
does, but on a new set of records. - The enqueuing of batch workers can cause race conditions in updating overall export status, so need to think how to make it reliable/not get stuck/ not updating status prematurely
-
List unknowns - things that require any other further checks.
-
Note where we need to remember about backwards compatibility (mark it in the implementation paths)
-
Propose how the GET /groups/:id/export_relations/status and GET /projects/:id/export_relations/status endpoints will look like (question form here).
Edited by Magdalena Frankiewicz