SRE Infrastructure (Introduction) - Harish R
module-name: "SRE Introduction"
area: "Product Knowledge"
gitlab-group: "Enablement:Infrastructure"
maintainers:
- rhassanein
Title: SRE Infrastrcuture (Introduction) - your-name"
Preferably, follow the order of the stages, but it's not mandatory.
Stage 1: Commit to learning about Site Reliability Engineering (5 minutes)
-
Done with Stage 1
-
Ping your manager on the issue to notify them you have started. -
Notify the team via one of the Support Channels.
Stage 2: Introduction to the Infrastrcuture Team (30 minutes)
-
Done with Stage 2
This stage is an introduction to the Infrastructure team structure at GitLab.
While SREs from different divisions will call themselves just that (Site Reliability Engineer), this can be confusing as they actually perform different roles based on which team they belong to.
Understanding what each team does prior to picking a learning path is vital for choosing a sepceialization/deep-dive that you really like.
-
Watch this 10 minutes Video for a breif introduction to each of the sub-teams and their work. -
Review each team's page, KPIs and team members: -
The Reliability Engineering team page. -
The Delivery team page. -
The Scalability team page.
-
Stage 3: GitLab.com Incidents (30 minutes)
-
Done with Stage 3 -
Watch this Video for an overview about Incidents at GitLab. The following topics are addressed in the Video: - Definition of an Incident
- Who attends the incidents?
- What causes an incident?
- Who creates incidents?
- The Incident Room.
- The Incident Lifecycle.
-
Review the Incident Management documentation page for more details about each of the points above.
Stage 4: Issue tracking (~)
-
Done with Stage 4
Spend as much time as you're interested reviewing the work of each of the sub teams. Pick one or two issues for each team and read it in detail.
-
Review the Incidents Board -
Review the Project work board of the Core Reliability team. -
Review the Project work board of the Observability team. -
Review the Project work board of the Datastores team. -
Review the Project work board of the Delivery team. -
Review the Project work board of the Scalability team. -
Leave a comment below about an issue (or more!) that you found interesting, describe why you think it's interesting.
Stage 5: Slack Channels (20 minutes)
-
Done with Stage 5
There's a number of Slack channels that you can use to communicate with / follow the work of SREs.
Review and, if interested, join the following channels:
-
#infrastructure-lounge: all infra-related casual questions, also where SREs perform an async standup. -
Incidents channels: - #incident-management: a must-join!
- #production
-
Specialized channels: -
Reliability team: #sre_coreinfra #sre_observability & #sre_datastores -
Delivery team: #g_delivery -
Scalability team: #g_scalability
-
Final Stage
-
Your manager needs to check this box to acknowledge that you have finished.