BulkImports not retrying batches if all failed

Summary

When using bulk_import's batching feature, if all batches fail roughly at the same time (for example connection issue with source). The import will not retry the batches.

!132702 (comment 1590516605)

Steps to reproduce

To mimic all the batches failing. We can use pipeline schedules

  1. Create 7 pipelines schedules (arbitrary number)
  2. Change BATCH_SIZE = 3
  3. Make runner fail with retry-able error (this simulates connection loss with source)
diff --git a/app/services/bulk_imports/batched_relation_export_service.rb b/app/services/bulk_imports/batched_relation_export_service.rb
index 778510f2e358..a8761df0071a 100644
--- a/app/services/bulk_imports/batched_relation_export_service.rb
+++ b/app/services/bulk_imports/batched_relation_export_service.rb
@@ -4,7 +4,7 @@ module BulkImports
   class BatchedRelationExportService
     include Gitlab::Utils::StrongMemoize
 
-    BATCH_SIZE = 1000
+    BATCH_SIZE = 3
     BATCH_CACHE_KEY = 'bulk_imports/batched_relation_export/%{export_id}/%{batch_id}'
     CACHE_DURATION = 4.hours
 
diff --git a/lib/bulk_imports/pipeline/extracted_data.rb b/lib/bulk_imports/pipeline/extracted_data.rb
index 0b36c0682981..4e0ea3fd3b40 100644
--- a/lib/bulk_imports/pipeline/extracted_data.rb
+++ b/lib/bulk_imports/pipeline/extracted_data.rb
@@ -24,6 +24,10 @@ def next_page
       def each(&block)
         data.each(&block)
       end
+
+      def each_with_index(&block)
+        data.each_with_index(&block)
+      end
     end
   end
 end
diff --git a/lib/bulk_imports/pipeline/runner.rb b/lib/bulk_imports/pipeline/runner.rb
index 1e2d91520473..d108d9350e25 100644
--- a/lib/bulk_imports/pipeline/runner.rb
+++ b/lib/bulk_imports/pipeline/runner.rb
@@ -16,6 +16,10 @@ def run
 
         if extracted_data
           extracted_data.each_with_index do |entry, index|
+            if index == 1 && context.tracker.relation == "BulkImports::Projects::Pipelines::PipelineSchedulesPipeline"
+              raise BulkImports::RetryPipelineError.new("oh no", 1.second)
+            end
+
             transformers.each do |transformer|
               entry = run_pipeline_step(:transformer, transformer.class.name) do
                 transformer.transform(context, entry)
  1. Trigger bulk_import direct transfer

Example Project

What is the current bug behavior?

The import goes to the next stage, whatever pipeline the batches belong to is marked as "success". No duplicated pipelineSchedules created.

What is the expected correct behavior?

Even if all the batches fail, the import should retry the batches instead of progressing to the next stage. Duplicate pipelineSchedules created

Note: This behaviour will be fixed when FF bulk_import_idempotent_workers is enabled to remove duplicated.

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of: \\\\\\\\\\\\\\\`sudo gitlab-rake gitlab:env:info\\\\\\\\\\\\\\\`) (For installations from source run and paste the output of: \\\\\\\\\\\\\\\`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production\\\\\\\\\\\\\\\`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: \\\\\\\`sudo gitlab-rake gitlab:check SANITIZE=true\\\\\\\`) (For installations from source run and paste the output of: \\\\\\\`sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true\\\\\\\`) (we will only investigate if the tests are passing)

Possible fixes

Edited by Max Fan