2018-07-03: Error response from daemon on https://registry.gitlab.com/v2/
As reported by @marin:
Error response from daemon: Get https://registry.gitlab.com/v2/: EOF
Also reported at https://gitlab.com/gitlab-com/support-forum/issues/3647
Incident working doc: https://docs.google.com/document/d/1V2IQ5ZVKU4QkWUiriyuDC9bdkC7kFtF9OKjNIV-pTAs/edit
Corrective Actions
- Our health check for registry is option httpchk GET / HTTP/1.1\r\nHost:\ registry.gitlab.com, is this sufficent? Should we be using https://github.com/docker/distribution/blob/master/docs/configuration.md#health instead?
- Outside of VM stats, there is little to no monitoring of the registry service, no prometheus metrics, no metrics collected from logs.
- For monitoring should we not be using https://github.com/docker/distribution/blob/master/docs/configuration.md#debug for prometheus?
- There is no centralized logging, registry also doesn’t use structured logs. Centralized logging for the unstructured logs in gitlab-com/migration#598 (moved)
- Monitoring for td-agent: gitlab-com/migration#390 (closed)
Edited by John Jarvis