[Discussion] Support data_consistency per database in Sidekiq workers
Referencing #3771 (comment 2074733142)
Motivation
From #3771 (comment 2074687023)
I was curious about this too. I noticed that for
PipelineProcessWorkerandCi::InitialPipelineProcessWorkercallsCi::PipelineProcessing::AtomicProcessingServicewhich loadsprojectanduserdata from the main db as part of theCi::Pipelinemodel.Ci::BuildFinishedWorkeralso loadsprojectanduserthroughCi::Buildmodel.
Having the ability to define data_consistency per database would enable devs to fine tune resource-hungry Sidekiq jobs to optimise their resource consumption by using the replica for specific databases.
Considerations
We would need to work out how delays and retries behave when different data consistencies are defined. e.g. ci: :always, main: :delayed could mean that the worker gets retried if the main db replica has not caught up even if ci is fine. The new behaviours depending on permutations would need to be defined and made clear.
Self-managed users using single-db set up should not be affected.
Key areas of code affected
-
Gitlab::Database::LoadBalancing::Sessionnow needs to store a hash ofdb_config_name => use_primary?rather than a single boolean. -
data_consistencyinWorkerAttributesneeds to be updated to handle a hash🤔
data_consistency :sticky
to
data_consistency { ci: always, main: sticky }
Seeing how this affect sessions, we ought to roll it out in stages behind a feature flag.