Skip to content

Monitor Google Cloud Storage as an independent service

This stems from a discussion during an incident. Unfortunately the issue was not creating during or after that incident, and I've been unable to find it, so I'm unable to cross-link, although there are other incidents where this would help, such as production#3315 (closed).

Proposal

Treat Google Cloud Storage as it's own service

Why?

Google Cloud Storage is used by multiple services in GitLab. When GCS goes down, we will receive alerts from different systems, and need to manually correlate these to the GCS service.

"Extracting" GCS into it's own service will help us quickly attribute problems to the upstream provider, so that we can open tickets with Google as soon as possible.

How?

  1. Add GCS to Service Catalog and Metrics Catalog: https://gitlab.com/gitlab-com/runbooks/blob/master/metrics-catalog/services/all.jsonnet and https://gitlab.com/gitlab-com/runbooks/blob/master/services/service-catalog.yml
  2. Move SLIS such as the Registry's storage SLI from the registry service over to the new GCS service: https://gitlab.com/gitlab-com/runbooks/blob/master/metrics-catalog/services/registry.jsonnet#L83-101
  3. Iteratively expand the monitoring to ensure that all services that monitor GCS are covered.

cc @brentnewton @cmcfarland