Skip to content

Automatic rollback in case of failure

Release notes

This feature is GitLab Ultimate tier

Problem to solve

As a developer, I want to make sure no faulty deployment is active on my production environment at all times, so I can make sure I am not exposing my users unnecessarily to a subpar experience due to external dependencies.

Intended users

User experience goal

Proposal

If there is a problem in the pipeline that deploys, it would be nice if the pipeline would perform an automatic rollback.

For this iteration (and this specific issue)

  1. Any critical alert on the environment will initiate a rollback.
  2. The user must opt-in to this feature (setting defined below in acceptance criteria)
  3. There will only be one rollback attempt on an alert (to avoid an endless loop of rollbaks)

Acceptance criteria

  • Add settings section for the user to configure Auto-rollback on/off to CI/CD project settings above Deploy freezes
  • Features a single checkbox with label and help text and a confirmation button.
  • Rollback is to the last successful deployment (This will re-run the pipeline of the last successful deployment)
  • Auto rollback must be logged in the audit log as an action done in the pipeline
  • For the first iteration - any critical alert will trigger a rollback
  • Confirmation is a primary success button
  • Copy of settings UI:
Automatic deployment rollbacks

Automatically roll back to the last successful deployment when a critical problem is detected.

- [ ] Enable automatic rollbacks
      Automatic rollbacks start when a critical alert is triggered. If the last successful deployment fails to roll 
      back automatically, it can still be done manually. More information
Mockup (browser made)
image
Note: Copy might differ from the mockup, see acceptance criteria above

Out of scope for this issue:

  • Metrics will be defined on a a dedicated yml file.
    • Only metrics defined in this YML file will initiate the auto-rollback similar to the common_metrics.yml. ./gitlab/dashbords.yml

Engineering scope

Weight estimate: 3

  • backend: Create a worker to trigger an automatic rollback, should a deployment fail.
  • frontend: Create a switch in the project settings. Users should be able to choose whether rollbacks should be automatic or manual (as highlighted in link posted above)
  • Documentation guidelines

Technical proposal

How GitLab Re-deploy feature behaves today

  • GitLab already knows the list of successful deployments on an environment.
  • GitLab can deduce the latest successful deployment from the deployment list. (Let's say Deployment-A)
  • GitLab can deduce the previous successful deployment from the deployment list and Deployment-A. (Let's say Deployment-B)
  • When Deployment-B is re-deployed, GitLab creates Deployment-C. Deployment-B and C have the same metadata.
  • We can check if a deployment was re-deployed: project.deployments.where(ref: deployment.ref, sha: deployment.sha).exist?

How we will extend

  • Auto Rollback happens only once when a new critical alert is raised.
  • We cannot simply re-deploy Deployment-B because Deployment-A will be the previous successful deployment which is the next rollback target. This feature keeps deploying Deployment-A and Deployment-B alternatively. Probably that's not what we want. To illustrate:
    • Deployment-C (latest, same content with Deployment-B)
    • Deployment-A (previous successful)
    • Deployment-B (previous previous successful)
  • We need to persist the rollback history as the following.
    • deployments.auto_redeployed_by_id (FK) ... The ID of the deployment initiated the auto re-deploy.
  • Let's say there are two deployments Deployment-A and Deployment-B and a critical alert is raised on the environment.
    • What is the latest deployment? => Deployment-A
    • What is the previous successful deployment? => Deployment-B
    • Should GitLab re-deploy Deployment-B? => Yes
  • Let's say a new critical alert is raised on the environment again.
    • What is the latest deployment? => Deployment-C
    • What is the previous successful deployment? => Deployment-A
    • Should GitLab re-deploy Deployment-A? => No, because Deployment-C was triggered by A. Next.
    • Should GitLab re-deploy Deployment-B? => No, because Deployment-C is identical with Deployment-B. Next.

anti-race condition

  • If there is a running deployment on the environment when a critical alert is raised, this feature won't do anything. (Please see the "Constant Rollback" below)

Which alert is considered as critical?

  • environment.alert_management_alerts has severity column that takes critical: 0, high: 1, medium: 2, low: 3, info: 4, unknown: 5. critical is the status of an alert that triggers an rollback.

Rollback range (Next iteration)

  • Rollback is a useful operation to revert a problematic code, but it also has a risk to remove a valid code/feature that disturbs end-user's request.
  • Operators should be able to set a rollback range to group backward compatible deployments. Auto Rollback should happen in this current range and shouldn't go across.

Constant Rollback (Next iteration)

  • If a critical alert still exists after the X minutes from the Auto Rollback point. The next Auto Rollback will be triggered.

Future iterations

  • Last version to rollback to (including gitlab-ci.yml file)

Further details

Spinnikar supports a similar feature that specifically designed for Rollback on Kubernetes's Rolling Update:

Description Screenshot
A user Can configure which version to rollback to (for example last version deployed to production) image
We should note that for proper rollback, we need to rollback
  • Kubernetes Config
  • Docker images
  • gitlab-ci.yml
View of the Kuberenetes Clusters image

Permissions and Security

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Is this a cross-stage feature?

Links / references

https://www.spinnaker.io/guides/user/kubernetes-v2/automated-rollbacks/

Edited by Shinya Maeda