Make BulkImports::PipelineWorker and BulkImports::PipelineBatchWorker idempotent

This issue considers that Fix error "Cannot transition status via :start ... (#424970 - closed) is done

Summary

The BulkImports::PipelineBatchWorker and The BulkImports::PipelineWorker aren't idempotent, and because of that, if the workers are retried due to a Sidekiq restart, duplicated records can be imported.

Context

The BulkImports::PipelineWorker and BulkImports::PipelineBatchWorker are responsible for initiating the pipeline that migrates a particular project relation. If the worker is interrupted during the migration process and then retried, the pipeline will restart from the beginning, resulting in the duplication of some data. This occurs because the pipeline does not resume from where it left off previously.

So, to make the workers idempotent, the pipelines must know how to resume the migration from where it stopped.

Proposed solution

Direct Transfer has distinct types of pipelines, for each one, we need to implement a different strategy to resume the migration.

NdjsonPipeline

For all the pipelines that include the BulkImports::NdjsonPipeline module, we can save on Redis the last NDJSON line that is saved in the database and in case the pipeline reruns, we resume from that line.

UploadsPipeline

Similarly, we can save on Redis the list of uploads that were already saved. Then, in case the pipeline reruns, we skip the uploads that are in the list.

LfsObjectsPipeline

Similarly, we can save on Redis the list of LfsObjects that were already saved. Then, in case the pipeline reruns, we skip the uploads that are in the list.

Remaining pipelines

Apply a similar strategy

Edited by Rodrigo Tomonari