Commit b9d31096 authored by Alex Ives's avatar Alex Ives
Browse files

Updates to database excellence stage

parent 1bc5590f
Loading
Loading
Loading
Loading
+84 −8
Original line number Diff line number Diff line
@@ -5,20 +5,96 @@ description: "The Database Excellence section ensures GitLab's databases run rel

## Mission

Keep GitLab's databases running reliably through proactive health management, operational excellence, and strategic enablement. We maintain operational runway by identifying and mitigating saturation points, operate infrastructure with automated and scalable processes, and provide tools and frameworks that help teams build features sustainably.
Keep GitLab's databases running reliably through proactive health management, operational excellence, and strategic enablement. We maintain operational runway by identifying and mitigating saturation points, operate infrastructure with automated and scalable processes, and provide tools and frameworks that help teams build features sustainably. While our primary focus is GitLab.com, we are expanding our scope to provide database health frameworks and tooling that benefit self-managed customers as well.

## Groups

This stage consists of the following group:
This stage consists of the following groups:

### Database Frameworks
### Database Architecture

The [Database Frameworks](/handbook/engineering/data-engineering/database-excellence/database-frameworks/) group manages the rails application code that interfaces and communicates with our database systems.
The [Database Architecture](/handbook/engineering/data-engineering/database-excellence/database-architecture/) group enables teams to build sustainably with data by providing decision frameworks for data placement, data growth controls, and coordinating the database review process across all datastores.

{{< group-by-slugs praba.m7n irina.bronipolsky jon_jenkins krasio l.rosa mattkasa maximeorefice stomlinson allisonbrowne  >}}
Priorities:

### Database Operations
* Enabling teams to make sustainable data architecture decisions
* Preventing database performance issues before they reach production
* Establishing and maintaining data lifecycle best practices

The [Database Operations](/handbook/engineering/data-engineering/database-excellence/database-operations) group manages the infrastructure and automation that power gitlab.com's postgres databases.
{{< group-by-slugs alexander-sosna praba.m7n l.rosa maximeorefice vporalla >}}

{{< group-by-slugs alexives alexander-sosna bshah11 rhenchen.gitlab vporalla saadullah707  >}}
### Database Health

The [Database Health](/handbook/engineering/data-engineering/database-excellence/database-health/) group provides the monitoring, observability, and health frameworks that keep databases healthy across both GitLab.com and self-managed deployments, including shift-left identification of saturation points.

Priorities:

* Maintaining operational runway by proactively managing database saturation points
* Providing visibility into database health across all deployment types
* Optimizing database resource utilization and cost efficiency

{{< group-by-slugs alexives allisonbrowne irina.bronipolsky krasio meiyang rhenchen.gitlab stomlinson >}}

### Database Automation

The [Database Automation](/handbook/engineering/data-engineering/database-excellence/database-automation) group owns the automation frameworks, tools, and templates that make GitLab's Postgres databases easier to operate at scale — replacing manual, bespoke processes with standardized, repeatable automation. All three teams contribute automations, but Database Automation owns the frameworks and manages the planning load for infrastructure changes.

Priorities:

* Replacing manual database operations with standardized, automated processes
* Building reusable tooling for database provisioning, configuration, and upgrades
* Enabling reliable, repeatable database operations across deployment types

{{< group-by-slugs bshah11 saadullah707 mattkasa jon_jenkins >}}

### Previous Teams

Previously, this stage consisted of 2 teams: Database Frameworks and Database Operations. These teams had a very large and overlapping scope covering our production database systems, but had different tools at their disposal. This resulted in difficulty for teams in two respects: the teams would pursue different projects with the same goals and different tools, and the teams each had more scope than they could reasonably plan for or accomplish.

In Q1 of FY27, we reorganized the teams into their current structure in order to accomplish a few things:

* Narrow team's scope to prevent fatigue from jumping between projects and areas
* Provide more management support allowing the teams to grow beyond their current size limitations
* Expand the department's overall scope to include topics that impact self-managed customers

#### Database Frameworks

The [Database Frameworks](/handbook/engineering/data-engineering/database-excellence/database-frameworks/) group managed the Rails application code that interfaces and communicates with our database systems.

#### Database Operations

The [Database Operations](/handbook/engineering/data-engineering/database-excellence/database-operations) group managed the infrastructure and automation that power GitLab.com's PostgreSQL databases.

## How We Work

Each team within Database Excellence is composed of a mix of backend engineers and reliability engineers (SRE/DBRE). The balance varies by team — Database Architecture and Database Health are primarily backend engineers, while Database Automation is primarily reliability engineers — but every team has both disciplines represented.

While each team has a distinct focus area, several responsibilities are shared across the entire stage. Database reviews are coordinated by Database Architecture but staffed by members of all three teams. Oncall rotations draw from reliability engineers across the stage. Operational needs such as saturation mitigation and incident response are distributed across all teams rather than owned by any single group. Infrastructure management and database upgrades are also shared across teams, as the regional distribution of the three groups — spanning AMER, EMEA, and APAC — enables the potential for follow-the-sun coverage. This shared model ensures that operational knowledge stays broad and no single team becomes a bottleneck.

## Requesting Help

### Support Escalations

TBD

### Reliability Requests

TBD

### Tier-2 On-Call

[Database Tier-2](/handbook/engineering/infrastructure-platforms/incident-management/on-call/tier-2/#database-operations-dbo) is staffed as a 24/5 response with team members responding on a "Best Effort" basis. This means it's possible that pages to this rotation may occasionally go unacknowledged. The limited availability of database operators has made it difficult to commit beyond that.

We may readdress this rotation in FY27-Q2 in response to the recent reorganization.

### Long Term Stable Counterpart or Reviewer requests

Longer term requests, such as stable counterpart or reviewers, are handled at the stage level. These requests should be submitted as a [counterpart request](https://gitlab.com/gitlab-org/database-team/team-tasks/-/work_items/new?description_template=counterpart_request)

### Triage Rotations

TBA

## Planning Process

TBA
+24 −0
Original line number Diff line number Diff line
---
title: "Database Architecture Team"
description: "The Database Architecture team enables GitLab engineering teams to build sustainably with data by providing decision frameworks for data placement, data growth controls, and coordinating the database review process across all datastores."
---

The Database Architecture team is the result of a split of the [Database Frameworks Team](/handbook/engineering/data-engineering/database-excellence/database-frameworks/).

## Mission

Enable teams to build sustainably with data by providing clear guidance, best practices, and frameworks for data placement, retention, and lifecycle management. We ensure that architectural decisions prevent future technical debt and support GitLab's long-term scalability across all datastores.

## Scope

The Database Architecture team is responsible for:

* **Data placement frameworks** — Providing decision frameworks and guidance that help teams choose the right datastore for their needs across PostgreSQL, Redis, OpenSearch, ClickHouse, Object Storage, and other application datastores.
* **Data growth controls** — Building frameworks and tooling that help teams manage data growth proactively, including retention policies, lifecycle management, and strategies for keeping database size sustainable over time.
* **Database review coordination** — Coordinating the database review process and supporting the database maintainer community. All three Database Excellence teams participate in reviews, but Database Architecture owns the process, tooling, and standards.

## Team

The team is composed primarily of backend engineers, with reliability engineers to help achieve its infrastructure and operational goals. Regardless of role, all team members share stage-level responsibilities including database reviews, oncall rotations, and operational needs alongside the other Database Excellence teams.

{{< group-by-slugs alexander-sosna praba.m7n l.rosa maximeorefice vporalla >}}
+27 −0
Original line number Diff line number Diff line
---
title: "Database Automation Team"
description: "The Database Automation team owns the automation frameworks, tools, and templates that make GitLab's Postgres databases easier to operate at scale, including configuration management, upgrade automation, and infrastructure provisioning."
---

The Database Automation team is the result of a reorganization of the [Database Operations Team](/handbook/engineering/data-engineering/database-excellence/database-operations/) and [Database Frameworks Team](/handbook/engineering/data-engineering/database-excellence/database-frameworks/).

## Mission

Replace manual, bespoke database operations with standardized, repeatable automation — transitioning GitLab's database infrastructure from individually managed systems to scalable, automated processes. The Database Automation team owns the automation frameworks, tools, and templates that make GitLab's PostgreSQL databases easier to operate at scale. While all three Database Excellence teams contribute automations and operational changes, Database Automation owns the underlying frameworks and manages the planning load for infrastructure changes.

Today, the team's primary focus is GitLab.com, with a longer-term goal of extending these capabilities to support Dedicated and self-managed deployments as well.

## Scope

The Database Automation team is responsible for:

* **Automation frameworks** — Owning the frameworks, tools, and templates that all three Database Excellence teams use to automate database operations, including managing the planning and prioritization of infrastructure changes.
* **Configuration management** — Standardizing and automating PostgreSQL configuration across clusters, replacing ad-hoc tuning with repeatable, version-controlled processes.
* **Upgrade automation** — Owning the tooling and frameworks that make PostgreSQL version upgrades safe, predictable, and increasingly automated. All three teams contribute to upgrade work using these frameworks.
* **Infrastructure provisioning** — Owning the patterns and tooling for creating and managing database clusters, replicas, and related infrastructure. All three teams contribute provisioning changes through standardized processes.

## Team

The team is composed primarily of reliability engineers, with backend engineers to help achieve its tooling and framework development goals. Regardless of team, all team members share stage-level role responsibilities including database reviews, oncall rotations, and operational needs alongside the other Database Excellence teams.

{{< group-by-slugs bshah11 saadullah707 mattkasa jon_jenkins >}}
+24 −0
Original line number Diff line number Diff line
---
title: "Database Health Team"
description: "The Database Health team maintains operational runway for GitLab's databases through health monitoring, observability, shift-left saturation identification, and health frameworks for both GitLab.com and self-managed deployments."
---

The Database Health team is the result of a split of the [Database Frameworks Team](/handbook/engineering/data-engineering/database-excellence/database-frameworks/).

## Mission

Maintain operational runway for GitLab's databases by proactively identifying and mitigating saturation points before they impact customers. We provide the visibility, tooling, and frameworks that keep databases healthy across both GitLab.com and self-managed deployments.

## Scope

The Database Health team is responsible for:

* **Database health monitoring & observability** — Building and maintaining the dashboards, metrics, and monitoring systems that provide visibility into database health across GitLab.com and self-managed instances.
* **Shift-left saturation identification** — Developing tooling and processes that detect potential saturation points earlier in the development cycle, before they reach production. Identifying and mitigating active saturation points is a shared responsibility across all three Database Excellence teams.
* **Self-managed health frameworks** — Building frameworks that give self-managed customers insight into the health and operability of their GitLab database, bringing the same visibility available on GitLab.com to customer-managed environments.

## Team

The team is composed primarily of backend engineers, with reliability engineers to help achieve its infrastructure and operational goals. Regardless of role, all team members share stage-level responsibilities including database reviews, oncall rotations, and operational needs alongside the other Database Excellence teams.

{{< group-by-slugs alexives allisonbrowne irina.bronipolsky krasio meiyang rhenchen.gitlab stomlinson >}}