Split BulkImports::Projects::Pipelines::ReferencesPipeline into multiple workers

During the test migration of Rails from Staging-Ref to Production, which migrated ~30K pull requests, the BulkImports::Projects::Pipelines::ReferencesPipeline took ~3.5 hours to complete.

We shouldn't have workers that take long to execute, as they are more likely to be interrupted in the middle.

The worker takes that long to execute because it processes references from Merge Requests, Merge Request Notes, Issues, and Issues Notes

Proposed solutions

Option 1

Split ReferencesPipeline into 4 workers, each processing references from Merge Requests, Merge Request Notes, Issues, and Issues Notes.

Cons: Depending on the project's size, the new worker would still take a long time to finish.

Option 2

Update ReferencesPipeline to spawn other workers to process the references. In summary, ReferencesPipeline would no longer be responsible for analyzing if a record has references and updating the references. The pipeline would spawn another worker to process the references for each merge request, note, or issue.

Cons: We would spawn thousands of workers

Option 3

Similar to option 2, but instead of spawning one worker for each record, we would spawn one worker for a batch of records.

Option 4 (chosen option)

Similar to option 3, but the new workers are normal ApplicationWorkers that don't block the migration but can be finished async.

Edited Nov 02, 2023 by Madelein van Niekerk