Add BulkImports export services to export in batches
What does this MR do and why?
This MR:
- Is part of series of MRs in scope of &9036 (closed) in order to enable GitLab Direct Transfer support Export & Import of relations in batches, to reduce the amount of time a single sidekiq job takes (to reduce likelihood to job interruptions).
- Adds Export services and supporting Sidekiq workers to export batchable relations (relations that are collections, e.g. labels, mrs, etc). These services are currently not used anywhere and the plan is to update export API & Import side of things in the future MRs.
- Updates a number of collection models to
include EachBatch
to utilize a suggested when it comes to fetching records in batches - Any non-batchable relations are skipped and are going through a regular export process (e.g. repository export or a
has_one
association which is not a collection)
Key components:
- Updated
BulkImports::ExportService
to pass inbatched
flag, to indicate if an export should be done in batches or not - Updated
BulkImports::RelationExportWorker
to execute a newBatchedRelationExportService
service if export is batched - New
BulkImports::BatchedRelationExportService
- entry point service for batched exports. It:- Creates
BulkImports::ExportBatch
records - Caches ids of records to export in Redis
- Enqueues batch export jobs to actually perform export of a batch
- Enqueues 'export finisher' worker which keeps track and updates Export status when all batches are finished/failed
- Creates
- New
BulkImports::RelationBatchExportWorker
- worker which executes Batch Export Service - New
BulkImports::RelationBatchExportService
- service which does batch export. It:- Fetches cached ids from redis
- Exports 1000 (or less) records on disk to an ndjson file/binary files
- Archives & compresses exported batch
- Uploads compressed file to object storage
Mentions #391222 (closed)
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
- Create a project or a group
- Seed it with data, e.g. add 5k labels via a rails console
- Open rails console
- Run
portable = Group.find(...)
user = User.first
BulkImports::ExportService.new(portable: portable, user: user, batched: 'true').execute
- Observe
bulk_import_exports
table, for each collection relation it should have rows with attributesbatched: true
,batches_count
1 or higher,total_objects_count
- a number of rows exported - Observe
bulk_import_export_batches
. Each export row should have the corresponding amount of batches (1 batch per 1000 rows). - Run
portable = Group.find(...)
user = User.first
BulkImports::ExportService.new(portable: portable, user: user, batched: 'false').execute
- Observe
bulk_import_exports
table. All rows should bebatched: false
and should have no batches associated (although as it currently stands I think they might still be present, and we need to clean them up. A todo for me).
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Edited by George Koltsov