Rollout [certificate_based_clusters] Disable certificate-based Clusters FF
Summary
This feature flag (FF) rollout is the reverse of what we usually do. Instead of turning the FF on, we will roll it out to off. That's because we're using this FF to remove a feature, not to introduce one.
The removal of a feature behind a FF allows for easier rollback if necessar. More so, it relives a bit of the pressure of removing the code before the major milestone, as introducing a FF is simpler than cleaning up code.
This issue is to rollout the removal hiding of all the features that are dependent on certificate-based cluster integration on production. This should be behind the certificate_based_clusters
feature flag.
The Epic which tracks this work: gitlab-org/configure&8
Blog that introduced the deprecation intention: https://about.gitlab.com/blog/2021/11/15/deprecating-the-cert-based-kubernetes-integration/
MR that introduces this global FF: !81054 (merged)
Owners
- Team: ~"group::configure"
- Most appropriate slack channel to reach out to:
#s_configure
- Best individual to reach out to: @Alexand
- PM: @nagyv-gitlab
Stakeholders
- The Configure Team, for the core of the integration which connects the clusters and other features related to it (@nmezzopera)
- The ~"group::container security" for the associated features (@sam.white, @thiagocsf)
- The Quality Team, for the related QA specs (@svistas)
Expectations
What are we expecting to happen?
All the feature related to it will stop working. The feature should stop being available to users both in UX and API.
- Auto DevOps deploys won't create rollout jobs like
production
,staging
,review apps
, etc, for this kind of cert-based clusters. - Serverless won't be available in the menu anymore.
- Environments won't show the deploy boards with pods and canary ingress status anymore.
- Service Accs won't be automatically created for GitLab managed clusters during deployments, as the concept of a gitlab managed clusters only existed for cert-based clusters.
- The REST APIs for project/group/instance cert-based clusters won't be available anymore.
- Pod logs won't be available anymore for these clusters through the GitLab UI.
- Pod web terminals, the ones used to run commands on your cluster, won't be available anymore through the GitLab UI.
- The UI for Canary ingress won't be available anymore.
Existing users
SaaS users
For anyone with a pre-existing Kubernetes cluster, we've created a special table to track those users, to postpone the removal of the certificate based cluster feature until 15.6: !87149 (merged). The feature will the be re-enabled for all the root level groups and users, which current have at least one cluster associated to it or to any of their children groups.
GitLab Self-Managed users
The FF was disabled by default already and the users that want to re-enable it, can still do with until 15.6. Just execute this command on your rails console to re-enable it globally:
Feature.enable(:certificate_based_clusters)
When is the feature viable?
Hiding all the cert-based clusters will be done by executing:
/chatops run feature set certificate_based_clusters false
This will only happen after we validate that !87149 (merged) was deployed, and that the temporary table with the collection of users which will have the feature extended was already loaded.
What might happen if this goes wrong?
Any of the features not being properly hidden could result in:
- features that should be removed still functioning.
- maybe features could be partially functioning.
What can we monitor to detect problems with this?
What can we check for monitoring production after rollouts?
The endpoints from above section
Rollout Steps
Rollout on non-production environments
- Ensure that the feature MRs have been deployed to non-production environments.
-
/chatops run feature get certificate_based_clusters
-
-
Enable the feature globally on non-production environments. -
/chatops run feature set certificate_based_clusters false --dev
-
/chatops run feature set certificate_based_clusters false --staging
-
-
Verify that the feature works as expected. Posting the QA result in this issue is preferable.
Specific rollout on production
- Ensure that the feature MRs have been deployed to both production and canary.
-
/chatops run auto_deploy status <merge-commit-of-your-feature>
-
**DISCLAIMER: we can not disable selectively. This is a global turn on/off. See discussion here: !81054 (comment 848569947) **
-
Verify that the feature works on the specific entries. Posting the QA result in this issue is preferable.
Preparation before global rollout
-
Check if the feature flag change needs to be accompanied with a change management issue. Cross link the issue here if it does. -
Ensure that you or a representative in development can be available for at least 2 hours after feature flag updates in production. If a different developer will be covering, or an exception is needed, please inform the oncall SRE by using the @sre-oncall
Slack alias. -
Ensure that documentation has been updated (More info). -
Announce on the feature issue an estimated time this will be enabled on GitLab.com. -
Notify #support_gitlab-com
and your team channel (more guidance when this is necessary in the dev docs).
Global rollout on production and Rollback
-
Disable the FF it globally. Run on #production: -
/chatops run feature set certificate_based_clusters false
-
-
Announce on the feature issue that the feature has been globally disabled. -
Wait for at least one day for the verification term.
Release the feature
After the feature has been deemed stable, the clean up should be done as soon as possible to permanently enable the feature and reduce complexity in the codebase.
You can either create a follow-up issue for Feature Flag Cleanup or use the checklist below in this same issue.
-
Create a merge request to remove <feature-flag-name>
feature flag. Ask for review and merge it.-
Remove all references to the feature flag from the codebase. -
Remove the YAML definitions for the feature from the repository. -
Create a changelog entry.
-
-
Ensure that the cleanup MR has been deployed to both production and canary. If the merge request was deployed before the code cutoff, the feature can be officially announced in a release blog post. -
/chatops run auto_deploy status <merge-commit-of-cleanup-mr>
-
-
Close the feature issue to indicate the feature will be released in the current milestone. -
Clean up the feature flag from all environments by running these chatops command in #production
channel:-
/chatops run feature delete <feature-flag-name> --dev
-
/chatops run feature delete <feature-flag-name> --staging
-
/chatops run feature delete <feature-flag-name>
-
-
Close this rollout issue.
Rollback Steps
-
Re-enable the FF globally on #production: -
/chatops run feature set certificate_based_clusters true
-