2022 in numbers: GitLab.com deployment and Delivery team overview
2022 has been a year of growth for the Delivery group. We welcomed four new team members and re-organized into two independent teams with a shared Delivery mission.
As well as forming teams and figuring out how best to support the Delivery domain, we've continued to iterate on our deployment and release processes and successfully lowered Mean Time To Production at the same time as the number of changes from across GitLab grows. This issue, similar to the 2021 review takes a quick look back at our year.
GitLab.com deployments in 2022
A combination of faster deployments, fewer deployment blockers, and improved deployment scheduling allowed us to deploy over 3000 times during 2022! August was our busiest month with 113 deployments, closely followed by November with 109. Both months massively exceeded our busiest 2021 month - August with just 80 deployments.
The increased number of deployments again translated into MTTP improvements and we managed to achieve our lowest-ever MTTP twice during 2022 - first in February with 15.96hrs, then improved again during November with 14.7hrs.
Going deeper into MTTP
MTTP is affected by three factors and improvements across all three have allowed us to remain close to our 12hr target:
-
Deployment pipeline duration - the time it takes to deploy a change affects how many deployments can take place each day. In 2022 we reduced pipeline duration by re-ordering the pipeline and by separating post-deploy migrations for better control. We're still working on metrics to track this accurately but the existing Deployment SLO dashboard shows the drop from ~6hrs per deployment down to ~5hrs or less and considerably fewer spikes from deployments being delayed by some kind of failure or incident. As an SLO we're seeing over 95% of pipelines complete in under 8hrs.
Source dashboard
-
Deployment availability - excellent GitLab.com availability and changes to incident severity use have increased the amount of time available for deployments. In Delivery, we've been tracking the duration and root cause of deployment blockers on this epic. Unsurprisingly, months with fewer blockers have higher numbers of deployments and also show lower MTTP.
-
Deployment frequency - improved deployment scheduling combined with shorter deployments and more deployment time has increased the number of deployments. As a proxy measure of how well our deployment scheduling is working we track how many tagged packages actually get deployed to Production. Ideally, we would be deploying 100% of tagged packages meaning there would be no wasted time or resources creating packages that we don't use. The graph below shows that in recent weeks most packages have been deployed.
Source dashboard
All of these measures tell us when we're being delayed but not always why. To bring these separate measures together and help us decide our next steps for MTTP we're working to improve deployment pipeline observability.
Releases for self-managed users
During 2022 we published over 72 releases (December numbers to be added). This is 35% less than in 2021 and indicates that we're releasing higher-quality monthly releases with less need for patching.
Numbers of releases, compared to 2021:
Month | 2021 | 2022 |
---|---|---|
January | 9 | 5 |
February | 9 | 10 |
March | 11 | 10 |
April | 7 | 2 |
May | 3 | 6 |
June | 8 | 9 |
July | 11 | 9 |
August | 11 | 5 |
September | 9 | 5 |
October | 6 | 4 |
November | 7 | 10 |
December | 6 | 4 |
Project highlights
2022 has also been an interesting year for Delivery projects. We've worked on our platform, tools, and processes to improve speed, reliability, safety, and usability. Rather than attempt to detail everything we've worked on this year this section highlights a few of our bigger projects:
GitLab.com Kubernetes migration
The Kubernetes migration has had another constructive year despite not being quite as visible as in previous years. With the stateless service migration completed at the end of 2021 we started this year collaborating with Scalability to test the feasibility of migrating Redis to Kubernetes. The results were positive and Scalability has been working on the migration. Make sure to read the Scalability Year in Review issue for all the details on the Redis migrations.
Camoproxy was the only service we ended up migrating this year. It was a small one and turned out to be an excellent training ground for several newer Delivery team members.
Alongside the hands-on work we've been evaluating other stateful services to decide whether they should be migrated or not. Gitaly and Praefect were the main focus areas as we hoped to reduce operating effort, remove some deployment complexity and possibly reduce deployment times. After a thorough evaluation with the Gitaly team, Quality, and Distribution, we decided not to include them in this migration because of other planned Gitaly architecture changes. With this decision made we're finally able to conclude the Kubernetes Migration Working Group and consider the GitLab.com migration to Kubernetes complete. Well done everyone, this has been a huge collaborative effort and a great outcome.
For a detailed list of the work and the services migrated, please look at the Gitlab.com on Kubernetes epic..
Deployment pipeline re-order to support mixed-version testing
Improving mixed-version testing to reduce the chance of GitLab.com users experiencing problems caused by canary deployments was a huge cross-team effort. Quality provided guidance and new tests, Reliability created a new Staging-ref environment, and in Delivery, we created a new Staging Canary environment and re-ordered the deployment pipeline to allow a more realistic rollout to be tested on staging.
Deployment pipeline re-ordering is delicate work and we had relatively few rollback options as we were still running the normal deployment schedule alongside the re-order work. With some careful testing and a cautious rollout we successfully moved feature flags and other manual testing from Staging to Staging Canary, and added tools and processes to keep Staging and Production versions in sync for testing. One happy side-effect of the new deployment setup is a shorter MTTP; the new order, with some parallel stages, removed 1 hour of deployment time.
Separating post-deploy migrations away from deployments
Continuing with the theme of reducing MTTP, we decided to separate the post-deploy migrations from the main GitLab.com deployment pipeline. Post-deploy migrations are the point of no return for a deployment, once executed on Production we cannot rollback if we have problems. By separating them off the main deployment pipeline we gave ourselves more control over when we run the migrations and created a better chance of being able to rollback in a time of need. The new pipelines also reduced the impact of long-running post-deploy migrations on deployment frequency and reduced deployment duration by ~1hr.
Supporting other teams
Throughout 2022 Delivery has also been supporting other teams with their projects. Our collaboration with grouppackage registry continued the work from 2021 to test and migrate the Container Registry to use a new database. The project had many phases and both teams worked together brilliantly to successfully complete the migration.
In a different style of project, but with no less collaboration we worked with groupsource code to rollout the ssh-d service. The rollout turned out to be much harder than expected, and with limited Staging testing we were challenged to safely roll the service out to Production. In the end, we did succeed and the lessons learned are helping to shape an improved process for service onboarding.
What's coming up in 2023?
- Release Managers have worked incredibly hard to guarantee releases and to improve MTTP but it is coming with a cost. In 2023 we'll be reviewing our tools and processes to simplify things to reduce workload and stress. The first step towards this, figuring out a metric to measure and track our effort is already in progress.
- Maintenance Policy extension and self-serve deployment and release options - extending support for bug fixes to cover the previous three release versions to match with security fixes. The solution to enable this is our first step towards supporting self-serve release processes. We'll be looking for more of these throughout 2023 and building on our Independent Deployment blueprint to allow Stage groups to have more control over their development processes.
- Improved deployment strategies - since completing the migration to Kubernetes we've been working on better cluster management to enable more flexible deployment strategies. In 2023 we'll be aiming to use this to provide a way for experimental features to be safely tested with real traffic.
- Metrics - supporting all of this work, and hopefully set things up for next year's Year in Review we'll be extending and improving our Delivery group metrics. We're already working on adding better deployment pipeline observability and discussing ways to extend our team Performance Indicators (PIs) beyond MTTP to cover all of Delivery group's responsibilities.
Thank you!
Thanks to everyone who has contributed to this work! I think the length of this issue and the difficulty in keeping it only to this length is a huge testament to what we've achieved this year. Here's to an even bigger 2023