Skip to content

Reverse the feature flag convention for deferring Sidekiq jobs

Context

During an incident, we enabled the feature flag to defer jobs (Slack). We then decided to progressively rollout running the jobs normally:

/chatops run feature set defer_sidekiq_jobs_Ci::CancelRedundantPipelinesWorker --random 90 --ignore-feature-flag-consistency-check
/chatops run feature set defer_sidekiq_jobs_Ci::CancelRedundantPipelinesWorker --random 50 --ignore-feature-flag-consistency-check

However, we couldn't change it to percentage of time value from a fully-enabled flag - error log:

/app/vendor/bundle/ruby/2.6.0/gems/gitlab-4.19.0/lib/gitlab/request.rb:71:in `validate': Server responded with code 400, message: 400 Bad request - Cannot enable percentage of time for a fully-enabled flag. Request URI: https://gitlab.com/api/v4/features/defer_sidekiq_jobs_Ci::CancelRedundantPipelinesWorker (Gitlab::Error::BadRequest)

What @reprazent did was to fully disable, then instantly set 90% (which means deferring 90% of the jobs and only running 10%) for the flag.

Proposal

Reverse the convention that when the flag is enabled, we are running the job and vice versa. This also helps intuitively with setting 10% to mean that we are actually running 10% of the jobs. We should change the feature flag name along the lines of run_sidekiq_jobs_SomeWorker instead.

https://gitlab.com/gitlab-org/gitlab/-/blob/4f603aa4014e05ff3127d07ccb58b295cab7bf37/lib/gitlab/sidekiq_middleware/defer_jobs.rb#L47-52

      def defer_job_by_ff?(worker_class)
        Feature.disabled?(
          :"run_sidekiq_jobs_#{worker_class.name}",
          type: :worker,
          default_enabled_if_undefined: true
        )
      end