Gitlab Runner not able to resolve Gitlab URL - Kubernetes Executor
### Status update: 2021-05-13
- We believe the root cause to be the alpine DNS issue for the runner-helper.
- This [MR](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/2835) is in-flight to provide a ubuntu flavor of the runner-helper image and is at the maintainer review stage.
Eventually, user's will hopefully be able to set:
`helper_image_flavor = "ubuntu"`
in their runner toml config and this issue will hopefully disappear.
### Summary
Gitlab Runner is not able to resolve Gitlab URL. When gitlab & gitlab runners are deployed in kubernetes cluster, gitlab runner intermittently fails with following:
```
Running with gitlab-runner 10.8.0 (079aad9e)
on big-runner-gitlab-runner-7bd85f8f65-qqvpj f533f16d
Using Kubernetes namespace: gitlab-runner
Using Kubernetes executor with image $IMAGE ...
Waiting for pod gitlab-runner/runner-f533f16d-project-38-concurrent-4ttnck to be running, status is Pending
Waiting for pod gitlab-runner/runner-f533f16d-project-38-concurrent-4ttnck to be running, status is Pending
Waiting for pod gitlab-runner/runner-f533f16d-project-38-concurrent-4ttnck to be running, status is Pending
Running on runner-f533f16d-project-38-concurrent-4ttnck via big-runner-gitlab-runner-7bd85f8f65-qqvpj...
Cloning repository for master with git depth set to 20...
Cloning into '/repo/proj'...
fatal: unable to access 'https://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx@code.repo.io/repo/proj.git/': Could not resolve host: code.repo.io
/bin/bash: line 114: cd: /repo/proj: No such file or directory
ERROR: Job failed: error executing remote command: command terminated with non-zero exit code: Error executing in Docker Container: 1
```
Most of the times it's able to resolve domain name, and it's continues successfully. However, sometimes it fails with above error. It has became quite a pain to retry these jobs.
How can I debug what's the issue here? I tried debugging kubernetes DNS. I created a pod every 1 minute and tried to resolve url. It seemed to resolve always. I checked if my DNS server (AWS) is throttling queries, I was able to query 1000 QPS, didn't fail.
I added an extra dot in url: `https://code.repo.io.` as mentioned [here](https://gitlab.com/gitlab-org/gitlab-runner/issues/2847) didn't help.
### Steps to reproduce
1. Deploy gitlab on kubernetes, via helm chart: [gitlab](https://gitlab.com/charts/gitlab-rails)
2. Deploy gitlab-runner on kubernetes, via helm chart: [gitlab-runner](https://gitlab.com/charts/gitlab-runner.git)
3. Create a project in gitlab, with `.gitlab-ci.yaml`
4. Run job in CI/CD. Clone fails.
### What is the current *bug* behavior?
Runner job fails with:
```
fatal: unable to access 'https://gitlab-ci-token:xxxxxxxxxxxxxxxxxxxx@code.repo.io/repo/proj.git/': Could not resolve host: code.repo.io
```
### What is the expected *correct* behavior?
Clone shouldn't fail due to above failure.
## Possible workarounds
- https://gitlab.com/gitlab-org/gitlab-foss/issues/47283#note_155177691
- https://gitlab.com/gitlab-org/gitlab-ce/issues/47283#note_164716469
- https://gitlab.com/gitlab-org/gitlab-runner/issues/4129#note_171501624
issue