[Discussion] Support data_consistency per database in Sidekiq workers

Referencing #3771 (comment 2074733142)

Motivation

From #3771 (comment 2074687023)

I was curious about this too. I noticed that for PipelineProcessWorker and Ci::InitialPipelineProcessWorker calls Ci::PipelineProcessing::AtomicProcessingService which loads project and user data from the main db as part of the Ci::Pipeline model. Ci::BuildFinishedWorker also loads project and user through Ci::Build model.

Having the ability to define data_consistency per database would enable devs to fine tune resource-hungry Sidekiq jobs to optimise their resource consumption by using the replica for specific databases.

Considerations

We would need to work out how delays and retries behave when different data consistencies are defined. e.g. ci: :always, main: :delayed could mean that the worker gets retried if the main db replica has not caught up even if ci is fine. The new behaviours depending on permutations would need to be defined and made clear.

Self-managed users using single-db set up should not be affected.

Key areas of code affected

Gitlab::Database::LoadBalancing::Session now needs to store a hash of db_config_name => use_primary? rather than a single boolean.
data_consistency in WorkerAttributes needs to be updated to handle a hash 🤔

data_consistency :sticky

to

data_consistency { ci: always, main: sticky }

Seeing how this affect sessions, we ought to roll it out in stages behind a feature flag.

Edited Sep 16, 2024 by Sylvester Chin