Container Registry DB load balancing DNS lookup failures
## Context This was discovered while investigating https://app.incident.io/gitlab/incidents/2480. ## Problem We're seeing a somehow small but regular stream of errors when trying to resolve DNS records for database load balancing on the container registry. The feature is documented in detail [here](https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs/spec/gitlab/database-load-balancing.md). The registry starts by performing an SRV lookup against the configured record for replicas (https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/2bd12f9e9118e8f02c34e42136106d71bc9be05b/releases/gitlab/values/gprd.yaml.gotmpl#L116) and then for each returned server it performs an Host lookup. We're seeing mostly the following errors: 1. `failed to resolve replica hosts: error resolving DNS SRV record: lookup replica.patroni-registry.service.consul. on 10.221.4.10:53: dial tcp: lookup consul-gl-consul-dns.consul.svc.cluster.local: i/o timeout` 2. `error resolving host "patroni-registry-v16-03-db-gprd.node.east-us-2.consul." address: lookup patroni-registry-v16-03-db-gprd.node.east-us-2.consul. on 10.67.0.10:53: no such host` The list of errors can be found here: https://log.gprd.gitlab.net/app/r/s/0ILyE ## Ask We need Infra help to determine the reason for these apparently random `i/o timeout` errors for SRV lookups followed by `no such host` errors for Host lookups, all performed against Consul.
issue