2018-07-03: Error response from daemon on https://registry.gitlab.com/v2/

As reported by @marin:

Error response from daemon: Get https://registry.gitlab.com/v2/: EOF

Also reported at https://gitlab.com/gitlab-com/support-forum/issues/3647

Incident working doc: https://docs.google.com/document/d/1V2IQ5ZVKU4QkWUiriyuDC9bdkC7kFtF9OKjNIV-pTAs/edit

Corrective Actions

  • Our health check for registry is option httpchk GET / HTTP/1.1\r\nHost:\ registry.gitlab.com, is this sufficent? Should we be using https://github.com/docker/distribution/blob/master/docs/configuration.md#health instead?
  • Outside of VM stats, there is little to no monitoring of the registry service, no prometheus metrics, no metrics collected from logs.
  • For monitoring should we not be using https://github.com/docker/distribution/blob/master/docs/configuration.md#debug for prometheus?
  • There is no centralized logging, registry also doesn’t use structured logs. Centralized logging for the unstructured logs in gitlab-com/migration#598 (moved)
  • Monitoring for td-agent: gitlab-com/migration#390 (closed)
Edited Jul 03, 2018 by John Jarvis
Assignee Loading
Time tracking Loading