keystone operator (and maybe others) cannot resolve kubernetes service due to IPv4 only lookup
Summary
In an IPv6 only kubernetes cluster, the keystone operator fails with KeyError: 'No DNS Response for kubernetes.default.svc. Maybe we are not running inside a cluster. You can set YAOOK_OP_CLUSTER_DOMAIN to override it'
Detailed Description
In an IPv6 only kubernetes cluster, the keystone operator fails as follows:
2023-11-29 00:57:53,377 ERROR yaook.op.tasks task TaskItem(func=<bound method OperatorDaemon._reconcile_cr of <yaook.op.daemon.OperatorDaemon object at 0x7f2c821f7390>>, data=(<CustomResource keystonedeployments.yaook.cloud/v1>, 'yaook', 'keystone')) failed. retrying in 114.74451561237467s
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/api_utils.py", line 1101, in get_cluster_domain
response = socket.gethostbyname_ex("kubernetes.default.svc")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -5] No address associated with hostname
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/yaook/op/tasks.py", line 313, in run_next_task
requeue = await func(*data)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/tracing.py", line 63, in wrapper
return await function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/op/daemon.py", line 754, in _reconcile_cr
await cr_obj.reconcile(ctx, generation)
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/customresource.py", line 742, in reconcile
await super().reconcile(ctx, generation)
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/tracing.py", line 63, in wrapper
return await function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/customresource.py", line 257, in reconcile
blocking = await self.sm.ensure(ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/tracing.py", line 63, in wrapper
return await function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/statemachine.py", line 152, in ensure
ready = await self._ensure_state(state, ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/tracing.py", line 63, in wrapper
return await function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/statemachine.py", line 78, in _ensure_state
await state.reconcile(
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/tracing.py", line 63, in wrapper
return await function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/resources/k8s.py", line 762, in reconcile
new_body = await self._make_body(ctx, dependencies)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/resources/k8s.py", line 966, in _make_body
await self._get_template_parameters(ctx, dependencies),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/tracing.py", line 63, in wrapper
return await function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/resources/k8s.py", line 984, in _get_template_parameters
result = await super()._get_template_parameters(ctx, dependencies)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/resources/k8s.py", line 907, in _get_template_parameters
"cluster_domain": api_utils.get_cluster_domain(),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/yaook/statemachine/api_utils.py", line 1104, in get_cluster_domain
raise KeyError("No DNS Response for kubernetes.default.svc. Maybe we "
KeyError: 'No DNS Response for kubernetes.default.svc. Maybe we are not running inside a cluster. You can set YAOOK_OP_CLUSTER_DOMAIN to override it'
Steps to reproduce the issue
- Run k8s Cluster at IPv6 only environment
- Start keystone-operator
Result
above error get's raised
Expected Result
Name resolution works and no error get's raised
Additional Information
This is likely due to the usage of response = socket.gethostbyname_ex("kubernetes.default.svc")
, which only returns IPv4 records.
Using https://pythontic.com/modules/socket/getaddrinfo as a drop-in replacement would probably solve the issue for all cases.
Resolution
Use socket.getfqdn()
as we only need the FQDN. (Needs testing if that works with IPv6)
Proposal
Use socket.getfqdn()
.
Specification
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this issue are to be interpreted in the spirit of RFC 2119, even though we're not technically doing protocol design.
- Name resolution MUST support IPv6
- MUST still work with IPv4
- Used function MUST return FQDN for a short hostname (e.g.
kubernetes.default.svc.cluster.local
forkubernetes.default.svc
)