Automatic rollback in case of failure
Release notes
This feature is GitLab Ultimate tier
Problem to solve
As a developer, I want to make sure no faulty deployment is active on my production environment at all times, so I can make sure I am not exposing my users unnecessarily to a subpar experience due to external dependencies.
Intended users
User experience goal
Proposal
If there is a problem in the pipeline that deploys, it would be nice if the pipeline would perform an automatic rollback.
For this iteration (and this specific issue)
- Any critical alert on the environment will initiate a rollback.
- The user must opt-in to this feature (setting defined below in acceptance criteria)
- There will only be one rollback attempt on an alert (to avoid an endless loop of rollbaks)
Acceptance criteria
- Add settings section for the user to configure Auto-rollback on/off to CI/CD project settings above
Deploy freezes
- Features a single checkbox with label and help text and a confirmation button.
- Rollback is to the last successful deployment (This will re-run the pipeline of the last successful deployment)
- Auto rollback must be logged in the audit log as an action done in the pipeline
- For the first iteration - any critical alert will trigger a rollback
- Confirmation is a primary success button
- Copy of settings UI:
Automatic deployment rollbacks
Automatically roll back to the last successful deployment when a critical problem is detected.
- [ ] Enable automatic rollbacks
Automatic rollbacks start when a critical alert is triggered. If the last successful deployment fails to roll
back automatically, it can still be done manually. More information
Mockup (browser made) |
---|
![]() |
Note: Copy might differ from the mockup, see acceptance criteria above |
Out of scope for this issue:
- Metrics will be defined on a a dedicated yml file.
- Only metrics defined in this YML file will initiate the auto-rollback similar to the common_metrics.yml. ./gitlab/dashbords.yml
Engineering scope
Weight estimate: 3
-
backend: Create a worker to trigger an automatic rollback, should a deployment fail. -
frontend: Create a switch in the project settings. Users should be able to choose whether rollbacks should be automatic or manual (as highlighted in link posted above) -
Documentation guidelines
Technical proposal
How GitLab Re-deploy feature behaves today
- GitLab already knows the list of successful deployments on an environment.
- GitLab can deduce the latest successful deployment from the deployment list. (Let's say Deployment-A)
- GitLab can deduce the previous successful deployment from the deployment list and Deployment-A. (Let's say Deployment-B)
- When Deployment-B is re-deployed, GitLab creates Deployment-C. Deployment-B and C have the same metadata.
- We can check if a deployment was re-deployed:
project.deployments.where(ref: deployment.ref, sha: deployment.sha).exist?
How we will extend
- Auto Rollback happens only once when a new critical alert is raised.
- We cannot simply re-deploy Deployment-B because Deployment-A will be the previous successful deployment which is the next rollback target. This feature keeps deploying Deployment-A and Deployment-B alternatively. Probably that's not what we want. To illustrate:
- Deployment-C (latest, same content with Deployment-B)
- Deployment-A (previous successful)
- Deployment-B (previous previous successful)
- We need to persist the rollback history as the following.
-
deployments.auto_redeployed_by_id
(FK) ... The ID of the deployment initiated the auto re-deploy.
-
- Let's say there are two deployments Deployment-A and Deployment-B and a critical alert is raised on the environment.
- What is the latest deployment? => Deployment-A
- What is the previous successful deployment? => Deployment-B
- Should GitLab re-deploy Deployment-B? => Yes
- Let's say a new critical alert is raised on the environment again.
- What is the latest deployment? => Deployment-C
- What is the previous successful deployment? => Deployment-A
- Should GitLab re-deploy Deployment-A? => No, because Deployment-C was triggered by A. Next.
- Should GitLab re-deploy Deployment-B? => No, because Deployment-C is identical with Deployment-B. Next.
anti-race condition
- If there is a running deployment on the environment when a critical alert is raised, this feature won't do anything. (Please see the "Constant Rollback" below)
Which alert is considered as critical?
-
environment.alert_management_alerts
hasseverity
column that takescritical: 0, high: 1, medium: 2, low: 3, info: 4, unknown: 5
.critical
is the status of an alert that triggers an rollback.
Rollback range (Next iteration)
- Rollback is a useful operation to revert a problematic code, but it also has a risk to remove a valid code/feature that disturbs end-user's request.
- Operators should be able to set a rollback range to group backward compatible deployments. Auto Rollback should happen in this current range and shouldn't go across.
Constant Rollback (Next iteration)
- If a critical alert still exists after the X minutes from the Auto Rollback point. The next Auto Rollback will be triggered.
Future iterations
- Last version to rollback to (including gitlab-ci.yml file)
Further details
Spinnikar supports a similar feature that specifically designed for Rollback on Kubernetes's Rolling Update:
Permissions and Security
Documentation
Availability & Testing
What does success look like, and how can we measure that?
What is the type of buyer?
Is this a cross-stage feature?
Links / references
https://www.spinnaker.io/guides/user/kubernetes-v2/automated-rollbacks/