Commit ff51d234 authored by Steve Abrams's avatar Steve Abrams
Browse files

Replace SaaS Health Review with Operational Excellence meeting

parent 62edb066
Loading
Loading
Loading
Loading
+2 −2
Original line number Diff line number Diff line
@@ -296,7 +296,7 @@ The [team](/handbook/company/structure/#organizational-structure) involved is th
- Form the group of engineers working under the FCL. By default, it will be the owning team, but it could be a reduced group if there is not enough work for everyone.
- Plan and execute the FCL.
- Inform their manager (e.g. Senior Manager / Director) and Product counterpart that the team will focus efforts towards an FCL which may impact capacity planning.
- Provides updates at the [SaaS Health Review](/handbook/engineering/infrastructure-platforms/saas-health-review).
- Provides updates at the [Operational Excellence meeting](/handbook/engineering/infrastructure-platforms/operational-excellence).

Direct reports involved in an active [borrow](/handbook/product/product-processes/pm-procedures/#borrow) should be included if they were involved in the authorship or review of the change.

@@ -312,7 +312,7 @@ The following bulleted list provides a suggested timeline starting from incident
- Business day 1: open the FCL issue and begin planning. Request approval from VP of engineering if an FCL is not believed to be necessary.
- Business day 2-3: planning time
- Business days 2-9:  complete planned work
- Business days 10-11:  closing ceremony, retrospective and report back to the SaaS Health Review
- Business days 10-11:  closing ceremony, retrospective and report back to the Operational Excellence meeting

#### Activities

+2 −2
Original line number Diff line number Diff line
@@ -303,9 +303,9 @@ The call is recorded to the [Infrastructure Platforms Leads Demo Unfiltered Play
While the intention is for the call to be made public on GitLab Unfiltered, the default is for it to be published as private.
At the end of the call, a quick vote is held between the attendees and if all agree that the content is #SAFE, it can be published as public.

##### SaaS Health Review
##### Operational Excellence

Engineering Managers from Infrastructure Platforms lead regular calls regarding SaaS Health. For more information, view the [SaaS Health Review page](./saas-health-review.md).
Engineering leadership and Engineering Managers meet weekly to review the health and reliability of our systems. For more information, view the [Operational Excellence page](./operational-excellence.md).

### Requests for Help

+19 −0
Original line number Diff line number Diff line
---
title: "Operational Excellence"
description: "The Operational Excellence meeting is a weekly series for reviewing the health of our platforms"
---

## Operational Excellence

The Operational Excellence meeting is a weekly series where Engineering leadership and all Engineering Managers review the health and reliability of our systems, hold teams accountable for customer impact, and drive continuous improvement across the organization.

- **Cadence**: Weekly
- **Audience**: Engineering leadership and all Engineering Managers
- **Agenda**: Most recent agenda and dashboards can be found in the [Operational Excellence agenda document](https://docs.google.com/document/d/1WpiY-07KVXZx0wmgK1oz9i79-umODSJu64M2TOuKNwI/edit?usp=sharing).

### Meeting Objectives

- **Visibility**: Assess the overall health and reliability of our systems. Until we have a single dashboard that provides a unified reliability view of our system, we will review the Customer Metrics Dashboard, Pipeline Dashboard, SaaS Dashboard, and the infradev Tableau.
- **Accountability**: Review anomalous metrics, understand the customer impact, and discuss what is being done to mitigate that impact.
- **Learning**: Review two S1/S2 incident learnings from the previous weeks. Incidents will be picked by the SRE team.
- **Continuous Improvement**: Share operational wins and enforce preventive actions across teams.
+0 −33
Original line number Diff line number Diff line
---
title: "Infrastructure Platforms SaaS Health Review"
---

## SaaS Health Review

(Updated format from the SaaS Availability Weekly Call)

The SaaS Health Review is a set of regular meetings where the teams responsible for running GitLab.com and GitLab Dedicated share health information with the rest of the Engineering Department.

### Weekly Review

- **Meeting Owner**: Engineering Manager in Production Engineering or Dedicated
- **Audience**: Development Teams
- **Dashboard**: [SaaS Health Dashboard](https://saas-health-83948d.gitlab.io/)

The purpose of the meeting is to review incidents from that week and discuss critical time-bound problems.

The call will highlight the volume of preventative actions that are yet to be completed, but there is no discussion on specific items from the backlog.

### Monthly Review

- **Meeting Owner**: Senior EM or Director from Production Engineering or Dedicated
- **Audience**: Engineering Leadership Directs

The purpose of the meeting is to show incident trends for each group and to show trends in addressing the preventative actions.

### Quarterly Review

- **Meeting Owner**: Senior EM or Director from Production Engineering or Dedicated
- **Audience**: Engineering Leadership Team (ELT)

The purpose of the meeting is to show quarterly trends in incidents and quarterly trends in how preventative actions are addressed by department.