Skip to content

Add BulkImports export services to export in batches

What does this MR do and why?

this change is not used anywhere yet (apart from existing code refactorings that shouldn't affect existing behaviour). In the next MR I plan to add a feature flag to control export in batches (as well as modify export API to initiate batched export). No Import code is utilising export in batches yet, either. Based on work done in POC !109491 (closed)

This MR:

  • Is part of series of MRs in scope of &9036 (closed) in order to enable GitLab Direct Transfer support Export & Import of relations in batches, to reduce the amount of time a single sidekiq job takes (to reduce likelihood to job interruptions).
  • Adds Export services and supporting Sidekiq workers to export batchable relations (relations that are collections, e.g. labels, mrs, etc). These services are currently not used anywhere and the plan is to update export API & Import side of things in the future MRs.
  • Updates a number of collection models to include EachBatch to utilize a suggested when it comes to fetching records in batches
  • Any non-batchable relations are skipped and are going through a regular export process (e.g. repository export or a has_one association which is not a collection)

Key components:

  • Updated BulkImports::ExportService to pass in batched flag, to indicate if an export should be done in batches or not
  • Updated BulkImports::RelationExportWorker to execute a new BatchedRelationExportService service if export is batched
  • New BulkImports::BatchedRelationExportService - entry point service for batched exports. It:
    • Creates BulkImports::ExportBatch records
    • Caches ids of records to export in Redis
    • Enqueues batch export jobs to actually perform export of a batch
    • Enqueues 'export finisher' worker which keeps track and updates Export status when all batches are finished/failed
  • New BulkImports::RelationBatchExportWorker - worker which executes Batch Export Service
  • New BulkImports::RelationBatchExportService - service which does batch export. It:
    • Fetches cached ids from redis
    • Exports 1000 (or less) records on disk to an ndjson file/binary files
    • Archives & compresses exported batch
    • Uploads compressed file to object storage

Mentions #391222 (closed)

Screenshots or screen recordings

batchedexport.mov

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

  1. Create a project or a group
  2. Seed it with data, e.g. add 5k labels via a rails console
  3. Open rails console
  4. Run
portable = Group.find(...)
user = User.first
BulkImports::ExportService.new(portable: portable, user: user, batched: 'true').execute
  1. Observe bulk_import_exports table, for each collection relation it should have rows with attributes batched: true, batches_count 1 or higher, total_objects_count - a number of rows exported
  2. Observe bulk_import_export_batches. Each export row should have the corresponding amount of batches (1 batch per 1000 rows).
  3. Run
portable = Group.find(...)
user = User.first
BulkImports::ExportService.new(portable: portable, user: user, batched: 'false').execute
  1. Observe bulk_import_exports table. All rows should be batched: false and should have no batches associated (although as it currently stands I think they might still be present, and we need to clean them up. A todo for me).

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by George Koltsov

Merge request reports

Loading