Manual Rollback jobs for configuration changes in Kubernetes
Problem Statement
Currently the only method to perform a rollback during configuration changes, is to perform a revert and wait.
Proposed Fix
Consider creating a set of manual jobs on each cluster deployment that will perform a rollback in helm. Doing so will speed up time to remediation by asking helm to rollback the configuration change.
Risk Consideration
Our repository which holds these configurations will be out of date with what we run on the cluster. This would mean that we then begin to block auto-deployments until a revert is merged. We should also consider that a revert will trigger CI which takes a resource lock for a short period of time.
It is unknown what will happen when an auto-deploy begins if a revert of the offending MR has not yet occurred.
Milestones
-
Create auto-rollback jobs targeting the correct style of pipeline -
Update documentation/runbooks -
Open an issue with considerations to start the conversation on automating this process
Edited by John Skarbek