intermittent registry.gitlab.com client timeouts from hetzner.de VPSes
We've recently (2-3 weeks?) started experiencing high failure rates in our CI build jobs https://gitlab.com/prpl-foundation/prplos/feed-prpl/-/jobs?statuses=FAILED and subjectively it seems, that it's getting worse each week, there is one such example https://gitlab.com/prpl-foundation/prplos/feed-prpl/-/jobs/3418282471.
Running with gitlab-runner 15.6.1 (133d7e76)
...
Pulling docker image registry.gitlab.com/prpl-foundation/prplos/prplos/prplos/sdk-ipq40xx-generic:latest ...
WARNING: Failed to pull image with policy "always": Error response from daemon: Get "https://registry.gitlab.com/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (manager.go:237:15s)
ERROR: Job failed: failed to pull image "registry.gitlab.com/prpl-foundation/prplos/prplos/prplos/sdk-ipq40xx-generic:latest" with specified policies [always]: Error response from daemon: Get "https://registry.gitlab.com/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (manager.go:237:15s)
It usually helped to restart the failing pipeline, but for example today it's not enough to re-run the failed pipeline for 3 times so I've decided to report that.
We're using pool of 10 build workers, so I don't think, that it's an excessive use of resources, well, at least it does look like a timeout error and not a rate limiting. The issue seems to be related just to registry.gitlab.com service, once the container image is pulled, then there is a bunch of git clones from various gitlab.com repositories and those seems never to fail.