Skip to content

Investigate registry corruption

We regularly receive alerts for docker clients attempting to pull manifests / layers that don't exist or are corrupted in GCS (often with empty files). See https://gitlab.com/gitlab-com/runbooks/blob/master/troubleshooting/gitlab-registry.md for a description of common problems.

Our approach needs to be two-fold:

  1. Ensure that true client errors are distinguished from server errors (i.e. that 404s are not counted in our error SLO metrics)
  2. Investigate the origin of the corruption. This might require opening a CE ticket once we've ruled out an infra cause.

This is a bit of a placeholder and probably needs to be expanded on.

cc @craig @cmiskell who have troubleshot many of these (@council-of-craigs?).

cc @ahanselka as current on-call

Edited by Craig Furman