[Feature flag] Rollout of `ci_runner_limits`

Feature

This feature uses the :ci_runner_limits feature flag!

Owners

Stakeholders

The Rollout Plan

  • Rollout on GitLab.com for a certain period (1 week)
  • Rollout Feature for everyone as soon as it's ready

Beta Groups/Projects:

  • gitlab-org/gitlab-com groups

Expectations

What are we expecting to happen?

If a runner requests to register against a group or project and the plan limit has been hit, the runner will receive an HTTP 400 error.

What might happen if this goes wrong?

We should turn off the FF altogether.

What can we monitor to detect problems with this?

Dashboard: A slight increase in error rates here.

Kibana: gitlab-org logs:

Description Link
400's https://log.gprd.gitlab.net/goto/9d53360c397173238bfdd51436c828ce
400's (Runners specific) https://log.gprd.gitlab.net/goto/1b0f6222c65d62a0a2992703762cb1ad
Non 400's https://log.gprd.gitlab.net/goto/6ae7bf53879cd5344003a1eb3fb970f2
Non 400's (Runners specific) https://log.gprd.gitlab.net/goto/49ee19bd7cccc736be68a758131fae46

Rollout Timeline

Initial Rollout

Preparation Phase

  • Enable on staging (/chatops run feature set ci_runner_limits true --staging)

  • Test on staging

  • Ensure that documentation has been updated (More info)

  • Announce on the issue an estimated time this will be enabled on GitLab.com

  • Check if the feature flag change needs to be accompagnied with a change management issue. Cross link the issue here if it does.

Partial Rollout Phase

  • Enable on GitLab.com for individual groups/projects listed above and verify behaviour (/chatops run feature set --project=gitlab-org/gitlab ci_runner_limits true)

  • Verify behaviour (See Beta Groups) and add details with screenshots as a comment on this issue

  • If it is possible to perform an incremental rollout, this should be preferred. Proposed increments are: 10%, 50%, 100%. Proposed minimum time between increments is 15 minutes.

    • When setting percentages, make sure that the feature works correctly between feature checks. See #327117 (closed) for more information
    • For actor-based rollout: /chatops run feature set ci_runner_limits 10 --actors
    • For time-based rollout: /chatops run feature set ci_runner_limits 10
  • Make the feature flag enabled by default i.e. Change default_enabled to true

  • Cross post chatops slack command to #support_gitlab-com (more guidance when this is necessary in the dev docs) and in your team channel

Cleanup

This is an important phase, that should be either done in the next Milestone or as soon as possible. For the cleanup phase, please follow our documentation on how to clean up the feature flag.

  • Announce on the issue that the flag has been enabled

  • Remove :ci_runner_limits feature flag

    • Remove all references to the feature flag from the codebase
    • Remove the YAML definitions for the feature from the repository
    • Create a Changelog Entry
  • Clean up the feature flag from all environments by running this chatops command in #production channel /chatops run feature delete ci_runner_limits.

Final Step

  • Close this rollout issue for the feature flag after the feature flag is removed from the codebase.

Rollback Steps

  • This feature can be disabled by running the following Chatops command:
/chatops run feature set ci_runner_limits false
Edited by Pedro Pombeiro