2023 Delivery Group Impact Overview
Background
2023 has been a year that brought some company and team changes. We also welcomed a new team member and reorganized ourselves in a different setup more focused on Deployments (GitLab.com/Dedicated/Cells) and Releases for our internal and external (self-managed) customers.
Team changes
In November 2023, Delivery Group re-organized into "Deployment" and "Releases" - official naming still under discussion, as we all know that Naming Things is Hard
We're still working on what the vision of these two teams will be, but let's look at what the teams accomplished.
GitLab.com 2023 Deployments
A combination of faster deployments and improved deployment scheduling allowed us to deploy over 3090 times during 2023! August was our busiest month with 109 deployments, closely followed very closely by October with 108 and November with 106.
Regarding our MTTP, March and October were the lowest ones with 17hrs and 16hrs respectively.
Deployment Blockers
Deployment availability - In 2023, we had a ~50% increase in the hours that gprd
deployments were blocked compared to 2022. This led to higher MTTP overall. We improved other components of MTTP that minimized the effect of this increase. In Delivery, we've been tracking the duration and root cause of deployment blockers for 2023 on this epic. Unsurprisingly, months with fewer blockers have higher numbers of deployments and also show lower MTTP.
Deployment SLO
We greatly improved our Deployment SLO, with a current 95%+ of deployments within our SLO of 8 hours.
Releases for self-managed
During 2023, we published over 52 releases, including monthly, patch and security releases. This is 32% less than last year, with a clear downtrend in the last 2023 months thanks to the Delivery efforts to establish a single planned release for bug and security fixes.
Month | 2022 | 2023 |
---|---|---|
January | 5 | 6 |
February | 10 | 2 |
March | 10 | 4 |
April | 2 | 6 |
May | 6 | 5 |
June | 9 | 8 |
July | 9 | 4 |
August | 5 | 5 |
September | 5 | 6 |
October | 4 | 2 |
November | 10 | 3 |
December | 4 | 2 |
76 | 53 |
Project Highlights
Release Environments
A new tool to create entire environments for testing new releases in the supported maintenance policy. This work is nearly complete. The solution will allow us to quickly and easily spin up small GitLab environments running any built/packaged version of GitLab, in order to test deployment of code pre and post-release. The goal of this is to allow us to have full confidence that the stable branches are "green" and fully tested, both from a deployment and application QA perspective.
Auto-Deploy KAS
Using existing tooling and patterns, we've updated KAS to be able to be deployed without the need of manual intervention from anyone in teamDelivery.
Build Pipeline observability foundations to extract data from our deployment pipelines
We augmented the deployment pipelines to collect metrics and traces. We built various metrics for target points of a complete auto-deploy pipeline. We now have the ability to report on and derive future decision-making based on the results of adding pipeline observability into how our deployment mechanism works. This work is also allowing us to identify bottlenecks, inefficiencies, negative trend and act promptly before they have a big impact on our deployments.
Release Date Switchover
For the first time in 10+ years, GitLab.com stopped to be released on the 22nd of the month and moved to a different cadence, the 3rd Thursday of the month. A change like this needed a lot of changes and coordination with different stakeholders and well structured communication. Delivery tools and processes were adjusted to support the dynamic release date. The switchover was smooth and painless with no issues.
Improve Security release automation to reduce release manager workload
In order to keep Release Management a good experience and to maintain the well-being of Release Managers, we managed to automate and reduce the workload needed and reduce toil.
Adapted the Release Process to address customer needs
During the FY24Q1 Engineering offsite "releases" discussion(internal only), on demand security release was highlighted as a desirable outcome to support the FedRamp SLAs.
This work resulted in a series of efforts needed to achieve this overall goal.
Reduce Security release preparation to less than 24hrs
To allow for multiple Security Release a month we had to build efficiency in our process and tooling. We managed to decrease active time spend on working on the early merge phase of 90%, and a reduction or 54% on the wall time (start to finish). A 40% decrease on the Release Manager work
Automate combining bug fixes and security fixes into patch releases
As part of implementing the new release process, we worked on automation and a combination of patch releases and security releases.
Pilot two scheduled security releases per month in preparation for planned releases
During the last quarter of 2023, we are working on piloting the designed and implemented solution. In Adapt release process to address customer curre... (&1017 - closed), Reduce Security release preparation to less tha... (&1061 - closed), and Automate combining bug fixes and security fixes... (&1073 - closed), release processes and toolings were updated to work towards a more frequent release schedule to better meet bug and security SLAs and get fixes out to customers sooner.
Reduce downtime of Dedicated deployments
Cells (aka Tenant Scale) are the new direction GitLab is taking in the longer horizon. This translates to a different GitLab Infrastructure organization, different scale, and also different capabilities needed to guarantee reliable deployments and rollbacks.
Experimentation to reduce downtime of Dedicated deployments
Delivery switched its focus to Dedicated, rapidly onboarded, identified the possible causes of downtime during Tenant upgrades and laid a plan to tackle them in the last quarter of the year.
Reduce Customer impact due to GitLab Dedicated Tenants upgrades
With the plan and Blueprint in place, the Delivery group started to implement the new deployment solutions into the Dedicated stack.
Develop ability to dynamically route traffic within different cluster deployments
In order to support faster rollbacks and traffic routing methodologies to unleash new deployment capabilities, we've instigated time and effort to test the use of a Service Mesh for our Kubernetes Infrastructure. While we did not make further progress to leverage this work, we did jump start the installation of a Service Mesh onto our infrastructure.
Ruby and Rails Upgrades
During 2023, we managed to safely upgrade GitLab.com to:
We learned from the Ruby 3.0 rollout: a rollout that needed a Hard PCL and a team of 15+ engineers available on a 24 hours window. We progressed to the Rails 7 and Ruby 3.1 rollout: no PCL needed, rolled out via a normal CR with a 3 engineers team simply monitoring dashboards