Skip to content

BulkImports: Introduce the concurrency by running each pipeline on its own job

Context

As part of the road to a better concurrent approach (&5544) for the Gitlab Group Migration, we need a way to keep track of the status of each pipeline.

With the new required fields to keep track of a Pipeline job, introduced in #323382 (closed), now it'll be possible to run each pipeline on it's own job.

Proposed changes

  1. Introduce the concurrency by running each pipeline on its own job
  • Replace BulkImports::Groups::Importer by BulkImports::Groups::Stages to keep the Stages definition
  • Create the current Entity's trackers for each pipeline, something like
  #  app/workers/bulk_import_worker.rb
  def perform(bulk_import_id)
    # ...

    created_entities.first(next_batch_size).each do |entity|
      BulkImports::Groups::Stages.create_trackers_for(entity)
      BulkImports::EntityWorker.perform_async(entity.id)
      entity.start!
    end
  end
  • Remove the transition created: :finished from app/models/bulk_imports/tracker.rb - This transition should not be required anymore now that every tracker needs to be started to be finished.
  • Create the BulkImports::PipelineWorker
  • Change the BulkImports::EntityWorker to call BulkImports::PipelineWorker instead of the BulkImports::Groups::Importer

These changes were also done in the spike: #270098 (closed)

Edited by Kassio Borges