Skip to content
GitLab
Next
    • Why GitLab
    • Pricing
    • Contact Sales
    • Explore
  • Why GitLab
  • Pricing
  • Contact Sales
  • Explore
  • Sign in
  • Get free trial
  • GitLab.orgGitLab.org
  • GitLabGitLab
  • Issues
  • #35404

Automatic rollback in case of failure

Release notes

This feature is GitLab Ultimate tier

Problem to solve

As a developer, I want to make sure no faulty deployment is active on my production environment at all times, so I can make sure I am not exposing my users unnecessarily to a subpar experience due to external dependencies.

Intended users

  • Sasha (Software Developer)
  • Devon (DevOps Engineer)

User experience goal

Proposal

If there is a problem in the pipeline that deploys, it would be nice if the pipeline would perform an automatic rollback.

For this iteration (and this specific issue)↕

  1. Any critical alert on the environment will initiate a rollback.
  2. The user must opt-in to this feature (setting defined below in acceptance criteria)
  3. There will only be one rollback attempt on an alert (to avoid an endless loop of rollbaks)

Acceptance criteria

  • Add settings section for the user to configure Auto-rollback on/off to CI/CD project settings above Deploy freezes
  • Features a single checkbox with label and help text and a confirmation button.
  • Rollback is to the last successful deployment (This will re-run the pipeline of the last successful deployment)
  • Auto rollback must be logged in the audit log as an action done in the pipeline
  • For the first iteration - any critical alert will trigger a rollback
  • Confirmation is a primary success button
  • Copy of settings UI:
Automatic deployment rollbacks

Automatically roll back to the last successful deployment when a critical problem is detected.

- [ ] Enable automatic rollbacks
      Automatic rollbacks start when a critical alert is triggered. If the last successful deployment fails to roll 
      back automatically, it can still be done manually. More information
Mockup (browser made)
image
Note: Copy might differ from the mockup, see acceptance criteria above

Out of scope for this issue:

  • Metrics will be defined on a a dedicated yml file.
    • Only metrics defined in this YML file will initiate the auto-rollback similar to the common_metrics.yml. ./gitlab/dashbords.yml

Engineering scope

Weight estimate: 3

  • backend: Create a worker to trigger an automatic rollback, should a deployment fail.
  • frontend: Create a switch in the project settings. Users should be able to choose whether rollbacks should be automatic or manual (as highlighted in link posted above)
  • Documentation guidelines

Technical proposal

How GitLab Re-deploy feature behaves today

  • GitLab already knows the list of successful deployments on an environment.
  • GitLab can deduce the latest successful deployment from the deployment list. (Let's say Deployment-A)
  • GitLab can deduce the previous successful deployment from the deployment list and Deployment-A. (Let's say Deployment-B)
  • When Deployment-B is re-deployed, GitLab creates Deployment-C. Deployment-B and C have the same metadata.
  • We can check if a deployment was re-deployed: project.deployments.where(ref: deployment.ref, sha: deployment.sha).exist?

How we will extend

  • Auto Rollback happens only once when a new critical alert is raised.
  • We cannot simply re-deploy Deployment-B because Deployment-A will be the previous successful deployment which is the next rollback target. This feature keeps deploying Deployment-A and Deployment-B alternatively. Probably that's not what we want. To illustrate:
    • Deployment-C (latest, same content with Deployment-B)
    • Deployment-A (previous successful)
    • Deployment-B (previous previous successful)
  • We need to persist the rollback history as the following.
    • deployments.auto_redeployed_by_id (FK) ... The ID of the deployment initiated the auto re-deploy.
  • Let's say there are two deployments Deployment-A and Deployment-B and a critical alert is raised on the environment.
    • What is the latest deployment? => Deployment-A
    • What is the previous successful deployment? => Deployment-B
    • Should GitLab re-deploy Deployment-B? => Yes
  • Let's say a new critical alert is raised on the environment again.
    • What is the latest deployment? => Deployment-C
    • What is the previous successful deployment? => Deployment-A
    • Should GitLab re-deploy Deployment-A? => No, because Deployment-C was triggered by A. Next.
    • Should GitLab re-deploy Deployment-B? => No, because Deployment-C is identical with Deployment-B. Next.

anti-race condition

  • If there is a running deployment on the environment when a critical alert is raised, this feature won't do anything. (Please see the "Constant Rollback" below)

Which alert is considered as critical?

  • environment.alert_management_alerts has severity column that takes critical: 0, high: 1, medium: 2, low: 3, info: 4, unknown: 5. critical is the status of an alert that triggers an rollback.

Rollback range (Next iteration)

  • Rollback is a useful operation to revert a problematic code, but it also has a risk to remove a valid code/feature that disturbs end-user's request.
  • Operators should be able to set a rollback range to group backward compatible deployments. Auto Rollback should happen in this current range and shouldn't go across.

Constant Rollback (Next iteration)

  • If a critical alert still exists after the X minutes from the Auto Rollback point. The next Auto Rollback will be triggered.

Future iterations

  • Last version to rollback to (including gitlab-ci.yml file)

Further details

Spinnikar supports a similar feature that specifically designed for Rollback on Kubernetes's Rolling Update:

Description Screenshot
A user Can configure which version to rollback to (for example last version deployed to production) image
We should note that for proper rollback, we need to rollback
  • Kubernetes Config
  • Docker images
  • gitlab-ci.yml
View of the Kuberenetes Clusters image

Permissions and Security

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Is this a cross-stage feature?

Links / references

https://www.spinnaker.io/guides/user/kubernetes-v2/automated-rollbacks/

Edited Dec 03, 2020 by Shinya Maeda
Assignee
Assign to
Time tracking