Skip to content

Implement the concept of Standing Squads to the Reliability Team

Anthony Fappiano requested to merge afappiano-standing-squads into master

Why is this change being made?

A big challenge for SREs old and new is trying to understand all of the different areas that the team is responsible for. To address this, the Reliability Team has been exploring options for deploying SREs in a more specialized way. In reviewing our backlog it is clear that we have plenty of great ideas for improving our services, but we're rarely able to prioritize these issues in a way that winds up being helpful to the organization.

Current Configuration

Screen_Shot_2022-09-06_at_11.35.04_AM

Our current configuration, while allowing for flexibility, does have some downsides:

  • No built-in method for improving. Our only way to improve as it stand is to spin up a new project squad. We tend to do this only as a reaction to when things get really bad.
  • Small improvements that could lead to toil reduction are never worked on.
  • There is nothing within the current structure that narrows the scope of responsibility for the Reliability Team in any way. All SREs on the Reliability are expected to have at operational knowledge of every service and this scope continues to grow.

Proposed Configuration

new_squads

With this configuration we introduce the idea of an Observability Squad and a Reliability Foundations Squad. The goal is to realize the following benefits:

  • Standing resources dedicated to service improvement for Observability and Reliability Platform related work
  • Begins to narrow the scope of responsibility for Reliability SREs. As we continue to grow the team we should think about how we can recalibrate how alerts are handled across the team, this is the first step towards that.

Author Checklist

  • Provided a concise title for this Merge Request (MR)
  • Added a description to this MR explaining the reasons for the proposed change, per say why, not just what
    • Copy/paste the Slack conversation to document it for later, or upload screenshots. Verify that no confidential data is added.
  • Assign reviewers for this MR to the correct Directly Responsible Individual/s (DRI)
    • If the DRI for the page/s being updated isn’t immediately clear, then assign it to one of the people listed in the Maintained by section on the page being edited
    • If your manager does not have merge rights, please ask someone to merge it AFTER it has been approved by your manager in #mr-buddies
  • If the changes affect team members, or warrant an announcement in another way, please consider posting an update in #whats-happening-at-gitlab linking to this MR
    • If this is a change that directly impacts the majority of global team members, it should be a candidate for #company-fyi. Please work with internal communications and check the handbook for examples.

Edited by Anthony Fappiano

Merge request reports