GitLab Migration - import 1 object per job instead of 1 job per entire relation

Problem

We need to improve performance as well as memory consumption of BulkImports Import & Export workers.

Read epic for detailed information.

Proposed solution

As a first step, GitLab Migration should perform import of 1 object per job, instead of 1 job per the entire relation. This reduces the overall job duration significantly, but increases the amount of jobs required. This is acceptable, because the overall aim for background processing is to have smaller more frequent jobs.

Technical details

Instead of PipelineWorker processing the whole relation, it can now only download/decompress exported relation from source (e.g. labels) and for each individual object from an ndjson file - enqueue a new worker to process it
New worker to do the same procedure as PipelineWorker did previously - transform/sanitize the object, convert it to an ActiveRecord model and save
After enqueuing a lot of jobs to process individual objects we still need to somehow keep track of when all of the objects are processed. Similar to Github Importer, we could use Gitlab::JobWaiter functionality, without having to keep track of individual object's import state in the database.
We're still likely to process 'binary file' relations the same way as before, at least until we implement an ability to read individual files from zip directly from object storage

Edited Nov 03, 2022 by George Koltsov