Add GitLab.com strategy
Compare changes
@@ -33,13 +33,32 @@ GitLab.com is our multi-tenant SaaS platform, where we are able to offer the mos
GitLab currently has an availability target (SLO) of two 9s (99.80%). However, based on 3 years of historical data from August 2021 to August 2024, we achieve an average of three 9s (99.94%) in our availability SLI. Three 9s is the high end of availability targets set for complex SaaS applications across the industry including applications like GitHub. In order to win against GitHub in the long term, we need to increase our availability to be best in class across the industry.
GitLab currently has an availability target (SLO) of two 9s (99.80%). However, based on [3 years of historical data](https://handbook.gitlab.com/handbook/engineering/monitoring/#historical-service-level-availability) from August 2021 to August 2024, we achieve an average of three 9s (99.94%) in our availability SLI. Three 9s is the high end of availability targets set for complex SaaS applications across the industry including applications like GitHub. In order to win against GitHub in the long term, we need to increase our availability to be best in class across the industry.
As stewards of GitLab.com, Delivery, Scalability & Production Engineering have a responsibility to ensure that GitLab.com remains operational and reliable every minute of every day. Shifting more things left in the SDLC is commonly accepted practice in software development as the most efficient way to deliver higher quality production systems.
Globally improving our product and software development practice will result in a higher quality product, with fewer errors and incidents. That will enable teams to be more efficient with their time and bias work toward preventative measures rather than reactive measures post incident. In order to do this, we essentially have 2 levers:
In FY 25, much of our activity as stewards requires ICs to engage with teams and onboard onto a problem in order to drive towards an optimal outcome. This has two major drawbacks, the first of which is that this process is slow and places a high burden on all team members involved to build context and understanding. The second is that, since this process is not scalable, many items that should have had review from our stewards slip through and eventually impact customers and in some cases impact customers the same way as previous incidents ([INC-18003](https://gitlab.com/gitlab-com/Product/-/issues/13406#note_1936034718), [INC-18548](https://gitlab.com/gitlab-com/gl-infra/production/-/issues/18548)).
In order to reduce the number and impact and increase the quality of our product offerings at the same time, we have to increase the level of investment in shared tools that make it easy for teams at GitLab to build quality in from the first iteration. Since taking on full responsibility for the operations of GitLab's SaaS Platforms, SaaS Platforms has gained full accountability for much of GitLab's infrastructure including provisioning, deploying, operating, logging, metrics, observability, maintenance and disaster recovery. Many of theses domains, such as logging and rate limiting, require tight integration and collaboration with teams developing GitLab features. For example rate limits are easy to implement as a feature is introduced, but become exponentially harder to introduce as a feature gains adoption.
Improvements in this area will likely be driven by having [availability metrics better reflect the user experience](https://about.gitlab.com/direction/saas-platforms/scalability/#availability-metrics-better-reflect-the-user-experience), [enabling experimental deployments](https://about.gitlab.com/direction/saas-platforms/delivery/#enable-experimental-deployments) and [release channels](https://about.gitlab.com/direction/saas-platforms/delivery/#release-channels-on-com) as well as [increasing the number of paved roads](https://about.gitlab.com/direction/saas-platforms/scalability/#paved-roads-are-the-default-for-all-team-members) for team members to traverse.
Runway introduced a new paradigm for delivering software to customers. In FY 25 Runway supported [multi region deployments](https://docs.runway.gitlab.com/guides/multi-region/), in many locations across the world for customers that wanted to use our hosted AI services that power Duo. In FY 26, Runway will GA a new runtime, aligned with our long term vision, that will allow easy delivery of services built on Runway to Self Managed customers and expand to support more workloads across GitLab.
We expect this new paradigm to unlock new possibilites for stage teams and accelerate our time to value for new and innovative products like [GitLab Secrets Manager](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/secret_manager/). This will also drive opportunities to "Land" in categories that previously we looked to "Expand".
Along with this, GitLab.com teams will be responsible for operating and orchestrating [Cells](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/cells/infrastructure/) which presents an opportunity to further innovate on our overall multi tenant product offering. Growth in this area will likely take the form of product offerings like per cell/customer Geo, exclusive customer cells, private connections and/or private runners. Cells will be the foundation on which these new product offerings are integrated into gitlab.com and SaaS Platforms teams must design future solutions with this in mind.
@@ -52,5 +71,10 @@ Example of what we are not doing: