SRE Infrastructure (Introduction) - Harish R

module-name: "SRE Introduction"
area: "Product Knowledge"
gitlab-group: "Enablement:Infrastructure"
maintainers:
  - rhassanein

Title: SRE Infrastrcuture (Introduction) - your-name"

Preferably, follow the order of the stages, but it's not mandatory.

Stage 1: Commit to learning about Site Reliability Engineering (5 minutes)

  • Done with Stage 1
  1. Ping your manager on the issue to notify them you have started.
  2. Notify the team via one of the Support Channels.

Stage 2: Introduction to the Infrastrcuture Team (30 minutes)

  • Done with Stage 2

This stage is an introduction to the Infrastructure team structure at GitLab.

While SREs from different divisions will call themselves just that (Site Reliability Engineer), this can be confusing as they actually perform different roles based on which team they belong to.

Understanding what each team does prior to picking a learning path is vital for choosing a sepceialization/deep-dive that you really like.

Stage 3: GitLab.com Incidents (30 minutes)

  • Done with Stage 3

  • Watch this Video for an overview about Incidents at GitLab. The following topics are addressed in the Video:

    • Definition of an Incident
    • Who attends the incidents?
    • What causes an incident?
    • Who creates incidents?
    • The Incident Room.
    • The Incident Lifecycle.
  • Review the Incident Management documentation page for more details about each of the points above.

Stage 4: Issue tracking (~)

  • Done with Stage 4

Spend as much time as you're interested reviewing the work of each of the sub teams. Pick one or two issues for each team and read it in detail.

Stage 5: Slack Channels (20 minutes)

  • Done with Stage 5

There's a number of Slack channels that you can use to communicate with / follow the work of SREs.

Review and, if interested, join the following channels:

  • #infrastructure-lounge: all infra-related casual questions, also where SREs perform an async standup.
  • Incidents channels:
    • #incident-management: a must-join!
    • #production
  • Specialized channels:
    • Reliability team: #sre_coreinfra #sre_observability & #sre_datastores
    • Delivery team: #g_delivery
    • Scalability team: #g_scalability

Final Stage

  • Your manager needs to check this box to acknowledge that you have finished.
Edited by Harish Ramachandran