What we can learn from Spinnaker

Background

Spinnaker is an open source, multi-cloud continuous delivery platform for releasing software changes with high velocity and confidence. Created at Netflix, it has been battle-tested in production by hundreds of teams over millions of deployments. It combines a powerful and flexible pipeline management system with integrations to the major cloud providers.

Research

The goal of this issue is to conduct research and get a better understanding in possible integration and/or missing feature set

We should also confirm if Spinnaker has more default integrations and is easier to customize deeply (like adding custom links in the UI).

We should also contact a couple users who are using Spinnaker with GitLab instead of our CD solution. I think @bjung can get us in touch.

Outputs

The output of this research should update the vision page for CD as well as https://about.gitlab.com/devops-tools/spinnaker-vs-gitlab.html (https://gitlab.com/gitlab-com/marketing/product-marketing/issues/1085 is the associated marketing issue).

Screenshots

Homepage Pipelines Deploy to staging - colors each stage in green once complete. A manual judgement determines whether to continue with the deployment to production. Automated rollback when the validation tests fail and hot standby period after the successful validation in case something unexpected happens Kubernetes resources Spinnaker is managing can be viewed - and there is a one-clock to view the deployed app Pipeline - you can configure the window where and when you can run your stages This can be skipped and enforce deployment to production Automatic rollback - an error was discovered Triggers Rollback pipeline

Canary Analysis

Baseline is saved - version, number of pods etc. then both copy of production and canary are rolled out at the same time with the same scale to verify the metrics on equal terms Adding a metric (integration with data dog) You can configure real time statistics (common use case) or retrospective to analyze over time. Some canaries will run for days (long run) to check that there aren't any memory leaks. You can set a wait time before metrics will be collected to allow the setup to complete You can change the baseline version it is comparing to.

If a threshold test fails - the canary will immediately stop If it is in between - it will continue but not pass to production If it is green it will pass to production

In the example it was grey - 66% - needed manual review

Failed example

Links

Edited Feb 06, 2020 by Orit Golowinski