Commit 8cbb9280 authored by Rachel Nienaber's avatar Rachel Nienaber
Browse files

Create Production Engineering Page and move Observability team in

parent 252fe7a5
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -27,8 +27,8 @@ Long-term, we think of including Tamland in self-managed installations and think

### Background: Capacity planning for GitLab.com

[Tamland](https://gitlab.com/gitlab-com/gl-infra/tamland) is an infrastructure resource forecasting project owned by the [Observability team](/handbook/engineering/infrastructure/team/observability/).
It implements [capacity planning](../../../infrastructure/capacity-planning/) for GitLab.com, which is a [controlled activity covered by SOC 2](https://gitlab.com/gitlab-com/gl-security/security-assurance/security-compliance-commercial-and-dedicated/observation-management/-/issues/604).
[Tamland](https://gitlab.com/gitlab-com/gl-infra/tamland) is an infrastructure resource forecasting project owned by the [Observability team](../../../infrastructure-platforms/production-engineering/observability/).
It implements capacity planning for GitLab.com, which is a [controlled activity covered by SOC 2](https://gitlab.com/gitlab-com/gl-security/security-assurance/security-compliance-commercial-and-dedicated/observation-management/-/issues/604).
As of today, it is used exclusively for GitLab.com to predict upcoming SLO violations across hundreds of monitored infrastructure components.

Tamland produces a [report](https://gitlab-com.gitlab.io/gl-infra/tamland/intro.html) (internal link, hosted on GitLab Pages) containing forecast plots, information around predicted violations and other information around the components monitored.
+1 −1
Original line number Diff line number Diff line
@@ -333,7 +333,7 @@ See [this issue](https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/team/-/

#### Observability

In general, the lifecycle of observability components for cells will be owned by the [Observability team](/handbook/engineering/infrastructure/team/observability/).
In general, the lifecycle of observability components for cells will be owned by the [Observability team](/handbook/engineering/infrastructure-platforms/production-engineering/observability/).

By default, each Dedicated tenant is provisioned with a fully functional Prometheus/Grafana stack. Cells will reuse this stack, with the intention of aggregating metrics so that queries can be run over multiple cells. More information can be found [here](https://gitlab-com.gitlab.io/gl-infra/gitlab-dedicated/team/engineering/observability/metrics.html).

+1 −1
Original line number Diff line number Diff line
@@ -466,7 +466,7 @@ An engineer might be assigned as a DRI to look into this.

The DRI is neither expected to determine a root cause nor propose a solution on their own.

The DRI should instead reach out to the [Observability team](/handbook/engineering/infrastructure/team/observability/) for support.
The DRI should instead reach out to the [Observability team](/handbook/engineering/infrastructure-platforms/production-engineering/observability/) for support.

## Async Issue Updates

+1 −1
Original line number Diff line number Diff line
@@ -201,7 +201,7 @@ Error budget events are attributed to stage groups via feature categorization. T

Updates to feature categories only change how future events are mapped to stage groups. Previously reported events will not be retroactively updated.

The [Observability team](/handbook/engineering/infrastructure/team/observability/) owns keeping the mappings up to date when feature categories are changed in the website repository. When the categories are changed in `stages.yml`, a scheduled pipeline creates an issue ([example issue](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/2084)) on the [build board](https://gitlab.com/gitlab-com/gl-infra/scalability/-/boards/1697160). The issue contains the pipeline link and instructions to follow in the description. The categories need to be synced to two places:
The [Observability team](/handbook/engineering/infrastructure-platforms/production-engineering/observability/) owns keeping the mappings up to date when feature categories are changed in the website repository. When the categories are changed in `stages.yml`, a scheduled pipeline creates an issue ([example issue](https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/2084)) on the [build board](https://gitlab.com/gitlab-com/gl-infra/scalability/-/boards/1697160). The issue contains the pipeline link and instructions to follow in the description. The categories need to be synced to two places:

1. The [Rails application](https://docs.gitlab.com/ee/development/feature_categorization/#updating-configfeature_categoriesyml).
1. The [Runbooks repository](https://gitlab.com/gitlab-com/runbooks/-/blob/master/services/stage-group-mapping.jsonnet).
+2 −2
Original line number Diff line number Diff line
@@ -310,7 +310,7 @@ All team members are encouraged to schedule time for personal development. The f
|-------|-------|
| SaaS Platforms | [Product direction](https://about.gitlab.com/direction/saas-platforms/) |
| Delivery Group | [Delivery Group](/handbook/engineering/infrastructure-platforms/gitlab-delivery/delivery/) |
| Production Engineering Group| [Production Engineering](/handbook/engineering/infrastructure/team/production-engineering/) |
| Production Engineering Group| [Production Engineering](/handbook/engineering/infrastructure-platforms/production-engineering/) |
| Dedicated Group | [Dedicated Group](/handbook/engineering/infrastructure/team/gitlab-dedicated/) |
| Tenant Scale | [Group Page](/handbook/engineering/infrastructure-platforms/tenant-scale/) |

Loading