Adding services with sufficiently long alias + image name char lengths cause all higher indexed services to become inaccessible
Summary
Our company attempted to upgrade from GitLab Runner version 15.11.1
to 16.10.0
(kubernetes executors) and found that our integration test jobs were failing to start up the test environments properly.
Upon further investigation, it was discovered that the hostAlias entries within the /etc/hosts files on the services had grown substantially. This is apparently due to a bugfix which had previously been preventing those entries from landing in the /etc/hosts/
file if the image name contained a variable (ours had).
Ironically, this fix now causes hostAlias entries in /etc/hosts
to exceed line length limits (apparently ~>460 characters) set by some low-level binaries on our images if you are heavily leveraging services and aliases.
Steps to reproduce
Below is an example of the .gitlab-ci.yml and the resulting /etc/hosts by runner version:
.gitlab-ci.yml
stages:
- build
test:
stage: build
image: alpine
variables:
REGISTRY_URL: registry.example.com/sm-docker-remote
services:
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres1
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres2
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres3
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres4
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres5
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres6
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres7
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres8
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres9
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres10
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres11
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres12
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
alias: postgres13
- name: $REGISTRY_URL/postgres:16.2-alpine3.19
script:
- ping -c 1 postgres1 || true
- ping -c 1 postgres2 || true
- ping -c 1 postgres3 || true
- ping -c 1 postgres4 || true
- ping -c 1 postgres5 || true
- ping -c 1 postgres6 || true
- ping -c 1 postgres7 || true
- ping -c 1 postgres8 || true
- ping -c 1 postgres9 || true
- ping -c 1 postgres10 || true
- ping -c 1 postgres11 || true
- ping -c 1 postgres12 || true
- ping -c 1 postgres13 || true
Resulting /etc/hosts
GitLab Runner version 16.10.0
/etc/hosts
GitLab Runner version 16.10.0
# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.30.10.94 runner-kxfknsje-project-31022200-concurrent-0-9mwwsdwn
# Entries added by HostAliases.
127.0.0.1 registry.example.com-sm-docker-remote-postgres postgres1 registry.example.com-sm-docker-remote-postgres postgres2 registry.example.com-sm-docker-remote-postgres postgres3 registry.example.com-sm-docker-remote-postgres postgres4 registry.example.com-sm-docker-remote-postgres postgres5 registry.example.com-sm-docker-remote-postgres postgres6 registry.example.com-sm-docker-remote-postgres postgres7 registry.example.com-sm-docker-remote-postgres postgres8 registry.example.com-sm-docker-remote-postgres postgres9 registry.example.com-sm-docker-remote-postgres postgres10 registry.example.com-sm-docker-remote-postgres postgres11 registry.example.com-sm-docker-remote-postgres postgres12 registry.example.com-sm-docker-remote-postgres postgres13
Resulting /etc/hosts
GitLab Runner version 15.11.1
/etc/hosts
GitLab Runner version 15.11.1
# cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.30.10.94 runner-kxfknsje-project-31022200-concurrent-0-9mwwsdwn
# Entries added by HostAliases.
127.0.0.1 postgres1 postgres2 postgres3 postgres4 postgres5 postgres6 postgres7 postgres8 postgres9 postgres10 postgres11 postgres12 postgres13
Actual behavior
When many services are added to a job and the total number of characters between the respective service's image names and aliases combine to a line length ~>460 characters hostname resolution starts to fail.
Expected behavior
The character length of the service image names + their aliases should affect service-to-service communication. We should be able to operate as we had prior with no technical limitations reached for the number of services we had been using successfully during our large integration tests.
Relevant logs and/or screenshots
job log from 15.11.1
$ ping -c 1 postgres1 || true
PING postgres1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.025 ms
--- postgres1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.025/0.025/0.025 ms
$ ping -c 1 postgres2 || true
PING postgres2 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.020 ms
--- postgres2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.020/0.020/0.020 ms
$ ping -c 1 postgres3 || true
PING postgres3 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.021 ms
--- postgres3 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.021/0.021/0.021 ms
$ ping -c 1 postgres4 || true
PING postgres4 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.018 ms
--- postgres4 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.018/0.018/0.018 ms
$ ping -c 1 postgres5 || true
PING postgres5 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.015 ms
--- postgres5 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.015/0.015/0.015 ms
$ ping -c 1 postgres6 || true
PING postgres6 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.016 ms
--- postgres6 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.016/0.016/0.016 ms
$ ping -c 1 postgres7 || true
PING postgres7 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.015 ms
--- postgres7 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.015/0.015/0.015 ms
$ ping -c 1 postgres8 || true
PING postgres8 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.014 ms
--- postgres8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.014/0.014/0.014 ms
$ ping -c 1 postgres9 || true
PING postgres9 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.016 ms
--- postgres9 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.016/0.016/0.016 ms
$ ping -c 1 postgres10 || true
PING postgres10 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.015 ms
--- postgres10 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.015/0.015/0.015 ms
$ ping -c 1 postgres11 || true
PING postgres11 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.019 ms
--- postgres11 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.019/0.019/0.019 ms
$ ping -c 1 postgres12 || true
PING postgres12 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.014 ms
--- postgres12 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.014/0.014/0.014 ms
$ ping -c 1 postgres13 || true
PING postgres13 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.014 ms
--- postgres13 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.014/0.014/0.014 ms
job log from 16.10.0
$ ping -c 1 postgres1 || true
PING postgres1 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.032 ms
--- postgres1 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.032/0.032/0.032 ms
$ ping -c 1 postgres2 || true
PING postgres2 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.018 ms
--- postgres2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.018/0.018/0.018 ms
$ ping -c 1 postgres3 || true
PING postgres3 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.021 ms
--- postgres3 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.021/0.021/0.021 ms
$ ping -c 1 postgres4 || true
PING postgres4 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.014 ms
--- postgres4 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.014/0.014/0.014 ms
$ ping -c 1 postgres5 || true
PING postgres5 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.035 ms
--- postgres5 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.035/0.035/0.035 ms
$ ping -c 1 postgres6 || true
PING postgres6 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.019 ms
--- postgres6 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.019/0.019/0.019 ms
$ ping -c 1 postgres7 || true
PING postgres7 (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: seq=0 ttl=127 time=0.024 ms
--- postgres7 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.024/0.024/0.024 ms
$ ping -c 1 postgres8 || true
ping: bad address 'postgres8'
$ ping -c 1 postgres9 || true
ping: bad address 'postgres9'
$ ping -c 1 postgres10 || true
ping: bad address 'postgres10'
$ ping -c 1 postgres11 || true
ping: bad address 'postgres11'
$ ping -c 1 postgres12 || true
ping: bad address 'postgres12'
$ ping -c 1 postgres13 || true
ping: bad address 'postgres13'
Environment description
Used GitLab Runner versions
15.11.1
and 16.10.0
- both Kubernetes executors.
Possible fixes
Potentially breaking up the hostAlias entry into multiple items would fix it.