POC: GitLab Migration - export and import relations in batches
This is the POC for Export/import in batches during GitLab Migration (&9036 - closed) epic.
Tasks
-
Come up with possible implementation paths.
-
Check feasibility of implementation plan written by @georgekoltsov here:
- Create a new model / db table
BulkImports::ExportBatchto be associated withBulkImports::Export(one to many) - Each
ExportBatchto have it's own export status & row range / offset / index to indicate which rows are contained in one batch - Add optional flag to
export_relationsAPI to export in batches/api/v4/projects/123/export_relations?batch_export=true - Whenever flag is provided, export in batches, otherwise fallback to the previous non-batched approach
- As far as batch export goes, relation export worker would have to, for each batch of rows (e.g. 1000 rows per), enqueue new
RelationBatchExportWorkerto perform the same things currentRelationExportWorkerdoes, but on a new set of records. - The enqueuing of batch workers can cause race conditions in updating overall export status, so need to think how to make it reliable/not get stuck/ not updating status prematurely
-
List unknowns - things that require any other further checks.
-
Note where we need to remember about backwards compatibility (mark it in the implementation paths)
-
Propose how the GET /groups/:id/export_relations/status and GET /projects/:id/export_relations/status endpoints will look like (question form here).
Edited by Magdalena Frankiewicz