Direct Transfer - Stages executed out of order

GitLab Direct Transfer is beginning pipeline stages before completing the previous one, leading to unexpected errors. For example, for project migrations, stage 5 is being executed along with stage 4, therefore MergeRequestsPipeline are being executed along with CiPipelinesPipeline and leading to errors like PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_merge_requests_on_target_project_id_and_iid" DETAIL: Key (target_project_id, iid)=(XXXX, XX) already exists..

As highlighted by @georgekoltsov in #419166 (comment 1556582446), CiPipelinesPipeline creates merge requests if they don't exist, so if a CI Pipeline is imported before the merge requests, by the time the MergeRequestsPipeline imports the merge request, a duplicated error raises.

After analyzing the logs, I concluded that the problem lies in the BulkImports::EntityWorker. This worker uses a deduplicate :until_executing strategy, which makes it subject to concurrent execution.

Log screenshot and analyze

The log below is from a migration that raised the error PG::UniqueViolation: ERROR

Kibana - Internal only

Screenshot_2023-09-19_at_18.51.43

Proposed solution

The proposed solution is not to enqueue BulkImport::EntityWorker when a pipeline finishes and instead make BulkImport::EntityWorker to re-enqueue itself.

Previous proposed solution

We used to use deduplicate :until_executed strategy; however, we changed to until_executing to fix Update BulkImports::EntityWorker deduplication ... (!84204 - merged)

I believe using deduplicate :until_executed, if_deduplicated: :reschedule_once addresses the stages execution order and the problem fixed by !84204 (merged)

Edited by Rodrigo Tomonari