Documentation and process improvements
While working on https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/19146, it was noted that the original issue was much larger and had many additional related follow-up items than were warranted by a single issue. Creating this epic to consolidate, organize, and coordinate that effort. We have multiple areas of documentation that we'd like to improve, and other resources that need to be organized. The following are intended to kickstart that process; we will create related child epics and issues to further refine this list into smaller bits that we can schedule and complete in reasonable timeframes. ## Onboarding We need a getting started page for new team members joining us that provides guidance and additional context to existing onboarding issue templates. 1. [ ] Review existing onboarding/offboarding issue templates to avoid duplication of topics or redundant task items 1. [ ] Consider notes from https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/19146#note_1350359925 for general structure/approach 2. [ ] Identify links to additional documentation, resources, etc. for services and technology managed by reliability 3. [ ] Identify links to commonly used dashboards, applications, slack channels, etc. 4. [ ] Provide guidance and resources for how/where to go for additional help or to learn more about our platform ## Accounts and access While we have much of the requirements documented in access request templates, we should consider a "meta-reference" that provides an overview of which accounts/access levels the team has, or a review of the onboarding and service-centric documentation to ensure the same are included and reflected there, instead. ## Service information and resources 1. [ ] Iterate on [service grid](https://about.gitlab.com/handbook/engineering/infrastructure/team/reliability/observability.html#services-we-own) on team page 1. [ ] Discuss/develop a service maturity framework to guide our efforts in this space 2. [ ] Consolidate, relocate, and organize blueprints, readiness reviews, architecture documentation, general workflow/process documentation, links to vendor documentation, scripts, tooling, and operational runbooks. 3. [ ] Setup a subgroup and project structure for resources used/maintained by the team to build and manage our systems and infrastructure. ## Workflow and general processes 1. [ ] Incorporate [automated roll-up status reporting](https://about.gitlab.com/handbook/engineering/infrastructure/team/reliability/issues.html#epics) into our standard processes 2. [ ] Document MR Review process / standards 1. [ ] Add/Update `CODEOWNERS` files in all the places 2. [ ] Assigned reviewers (vs. requests in slack)
epic