Split BulkImports::Projects::Pipelines::ReferencesPipeline into multiple workers
During the test migration of Rails from Staging-Ref to Production, which migrated ~30K pull requests, the BulkImports::Projects::Pipelines::ReferencesPipeline
took ~3.5 hours to complete.
We shouldn't have workers that take long to execute, as they are more likely to be interrupted in the middle.
The worker takes that long to execute because it processes references from Merge Requests, Merge Request Notes, Issues, and Issues Notes
Proposed solutions
Option 1
Split ReferencesPipeline
into 4 workers, each processing references from Merge Requests, Merge Request Notes, Issues, and Issues Notes.
Cons: Depending on the project's size, the new worker would still take a long time to finish.
Option 2
Update ReferencesPipeline
to spawn other workers to process the references. In summary, ReferencesPipeline
would no longer be responsible for analyzing if a record has references and updating the references. The pipeline would spawn another worker to process the references for each merge request, note, or issue.
Cons: We would spawn thousands of workers
Option 3
Similar to option 2, but instead of spawning one worker for each record, we would spawn one worker for a batch of records.
Option 4 (chosen option)
Similar to option 3, but the new workers are normal ApplicationWorker
s that don't block the migration but can be finished async.