Skip to content

Make incremental rollout possible for `UserRefreshOverUserRangeWorker` so as to read data from replica [RUN ALL RSPEC] [RUN AS-IF-FOSS]

Manoj M J requested to merge mj-replica-rollout-new-plan into master

What does this MR do?

This MR makes changes as described in #327092 (comment 574534409).

Details

This change has been made because the UserRefreshOverUserRangeWorker is a worker that is invoked via cronjobs only on the 1st and 15th of the month.

To make a Sidekiq job read data from the replica, it is necessary that a location is written to the job details during the time when a job is enqueued. Source.

If this data isn't written when the job is enqueued, even if the feature flag that controls data consistency is ON when the job is dequeued for execution, it does not read from replica, but continues to read from primary.

For jobs that are enqueued via a cron-job, it now becomes difficult to incrementally rollout the change to read from replica because these jobs are enqueued in specific times - like just on the 1st and 15th of every month.

So the normal rollout process does not work here. Hence we are making this change where:

  • There will be 2 different feature flags.
  • One of them controls the data-consistency, which is delayed_consistency_for_user_refresh_over_range_worker FF in UserRefreshOverUserRangeWorker
  • Another flag controls where the data should be read from, which is periodic_project_authorization_update_via_replica in UserRefreshOverUserRangeWorker.

To facilitate an incremental rollout for this particular worker, we will be

  • Setting delayed_consistency_for_user_refresh_over_range_worker to always ON, ie, 100% rollout from the beginning.
  • This will write location to the job details whenever these jobs are enqueued, 100% of the time (So we solve the problem I have mentioned above)
  • When the job is picked up for execution, we perform incremental rollout using the periodic_project_authorization_update_via_replica flag.
  • When this flag is ON, it will read the data from replica (which is the default as data_consistency :delayed, feature_flag: :delayed_consistency_for_user_refresh_over_range_worker would return true anyway as we have set it to 100%)
  • When this flag is OFF, we explicitly switch to reading from primary database using ::Gitlab::Database::LoadBalancing::Session.current.use_primary!.

Thus, we get to incrementally rollout reading from the replica, by controlling the rollout of the periodic_project_authorization_update_via_replica flag from 10% to 100% in stages, while keeping delayed_consistency_for_user_refresh_over_range_worker flag fixed at 100% rollout.

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Manoj M J

Merge request reports