Evaluate timed incremental rollout on hosting servers

We're going to ship the delayed job feature https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21767 for AutoDevOps timed incremental rollout https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/22023.

We're going to evaluate it works on our hosting servers with AutoDevOps timed incremental rollout mode.

Evaluation plan

dev.gitlab.org

  • Enabled term: 8th, Oct. ~
  • Evaluation date: 8th, Oct.
  • Wait for dev.gitlab.org daily sync
  • Create a sample project with new AutoDevOps deployment strategy to make sure it's fully functional.
  • Check health (metrics, logs and crash reports)

staging.gitlab.com

  • Enabled term: 10th?, Oct. ~ (It depends on RM's plan)
  • Evaluation date: 10th, Oct.
  • RC with the new code has been deployed
  • Create a sample project with new AutoDevOps deployment strategy to make sure it's fully functional.
  • Check health (metrics, logs and crash reports)

gitlab.com

  • Enabled term: 10th?, Oct. ~ (It depends on RM's plan)
  • Evaluation date: 10th, Oct. ~ 10th, Nov.
  • RC with the new code has been deployed
  • Create a sample project with new AutoDevOps deployment strategy to make sure it's fully functional.
  • Check health (metrics, logs and crash reports)

NOTE:

  • This feature is behind the feature flag ci_enable_scheduled_build, but it's enabled by default (See https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21767#note_106332776)
  • This feature doesn't run unless users manually changed their deployment strategy in AutoDevOps. Presumably, after we published a release post of 11.4 at 22nd, Oct., users would try to use it, and the usage of the new sidekiq-workers will gradually increase so that the time we should keep eyes on server's health.

Check health (metrics, logs and crash reports)

  • Does the new worker Ci::BuildScheduleWorker run properly? This uses pipeline_processing namespace (priority: 5 (highest)).
  • Does not the new worker Ci::BuildScheduleWorker pressurize other workers in the same namespace?
  • Are there any stale delayed jobs? Ci::Build.stale_schedule.count should be zero.
  • Are there any crash reports related to this feature on Sentry?
  • StuckCIJobWorker should use Index Scan properly? (Ref: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21767#note_106811708) (cc @abrandl)

Feature flag

The feature flag's name is ci_enable_scheduled_build. The new AutoDevOps deployment strategy - Timed incremental rollout is based on the delayed job feature. By disabling ci_enable_scheduled_build, we can effectively revert the timed incremental rollout to manual incremental rollout (Also, stops new creation for delayed jobs)

Feature.enabled?('ci_enable_scheduled_build') # Check if it's enabled
Feature.enable('ci_enable_scheduled_build') # Enable the feature
Feature.disable('ci_enable_scheduled_build') # Disable the feature

/cc @nolith @winh @jlenny @erushton

Edited by Shinya Maeda