Investigate registry corruption
We regularly receive alerts for docker clients attempting to pull manifests / layers that don't exist or are corrupted in GCS (often with empty files). See https://gitlab.com/gitlab-com/runbooks/blob/master/troubleshooting/gitlab-registry.md for a description of common problems.
Our approach needs to be two-fold:
- Ensure that true client errors are distinguished from server errors (i.e. that 404s are not counted in our error SLO metrics)
- Investigate the origin of the corruption. This might require opening a CE ticket once we've ruled out an infra cause.
This is a bit of a placeholder and probably needs to be expanded on.
cc @craig @cmiskell who have troubleshot many of these (@council-of-craigs
?).
cc @ahanselka as current on-call
Edited by Craig Furman