Monitor Google Cloud Storage as an independent service
This stems from a discussion during an incident. Unfortunately the issue was not creating during or after that incident, and I've been unable to find it, so I'm unable to cross-link, although there are other incidents where this would help, such as production#3315 (closed).
Proposal
Treat Google Cloud Storage as it's own service
Why?
Google Cloud Storage is used by multiple services in GitLab. When GCS goes down, we will receive alerts from different systems, and need to manually correlate these to the GCS service.
"Extracting" GCS into it's own service will help us quickly attribute problems to the upstream provider, so that we can open tickets with Google as soon as possible.
How?
- Add GCS to Service Catalog and Metrics Catalog: https://gitlab.com/gitlab-com/runbooks/blob/master/metrics-catalog/services/all.jsonnet and https://gitlab.com/gitlab-com/runbooks/blob/master/services/service-catalog.yml
- Move SLIS such as the Registry's
storage
SLI from the registry service over to the new GCS service: https://gitlab.com/gitlab-com/runbooks/blob/master/metrics-catalog/services/registry.jsonnet#L83-101 - Iteratively expand the monitoring to ensure that all services that monitor GCS are covered.