Make BulkImports::PipelineWorker and BulkImports::PipelineBatchWorker idempotent
This issue considers that Fix error "Cannot transition status via :start ... (#424970 - closed) is done
Summary
The BulkImports::PipelineBatchWorker and The BulkImports::PipelineWorker aren't idempotent, and because of that, if the workers are retried due to a Sidekiq restart, duplicated records can be imported.
Context
The BulkImports::PipelineWorker and BulkImports::PipelineBatchWorker are responsible for initiating the pipeline that migrates a particular project relation. If the worker is interrupted during the migration process and then retried, the pipeline will restart from the beginning, resulting in the duplication of some data. This occurs because the pipeline does not resume from where it left off previously.
So, to make the workers idempotent, the pipelines must know how to resume the migration from where it stopped.
Proposed solution
Direct Transfer has distinct types of pipelines, for each one, we need to implement a different strategy to resume the migration.
NdjsonPipeline
For all the pipelines that include the BulkImports::NdjsonPipeline module, we can save on Redis the last NDJSON line that is saved in the database and in case the pipeline reruns, we resume from that line.
UploadsPipeline
Similarly, we can save on Redis the list of uploads that were already saved. Then, in case the pipeline reruns, we skip the uploads that are in the list.
LfsObjectsPipeline
Similarly, we can save on Redis the list of LfsObjects that were already saved. Then, in case the pipeline reruns, we skip the uploads that are in the list.
Remaining pipelines
Apply a similar strategy