Doesn't work on Openshift
Hello! I have some problems running StackGres cluster in the OpenShift environment. Here are the details.
OpenShift (OKD) version: 4.6.9, based on Kubernetes 1.19.
StackGres version: 1.0.0-alpha1.
-
I can install the operator without errors (using Helm chart), but when I try to create a cluster (through the UI) the pods are not starting. That's because of permission error in
/var/lib/postgresql/datadirectory in thesetup-data-pathscontainer. I did some looking and figured out that there is an environment variable calledUSE_ARBITRARY_USERin thewebapi-deployment.yamltemplate, but not in theoperator-deployment.yaml. After moving this variable in theoperator-deployment.yamlthe permission error is gone. Seems like a bug in a Helm chart? -
Now all cluster pods are starting fine and all liveness probes are successful. However any operation in a psql client throws an error:
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
There are also errors in the Patroni container logs:
2021-02-09 14:16:40,442 ERROR: Permission denied
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/patroni/dcs/kubernetes.py", line 87, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/patroni/dcs/kubernetes.py", line 436, in patch_or_create
ret = self.retry(func, self._namespace, body) if retry else func(self._namespace, body)
File "/usr/local/lib/python3.6/site-packages/patroni/dcs/kubernetes.py", line 257, in retry
return self._retry.copy()(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/patroni/utils.py", line 330, in __call__
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/patroni/dcs/kubernetes.py", line 76, in wrapper
return getattr(self._api, func)(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 14840, in patch_namespaced_endpoints
(data) = self.patch_namespaced_endpoints_with_http_info(name, namespace, body, **kwargs)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 14940, in patch_namespaced_endpoints_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 334, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 168, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 393, in request
body=body)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 286, in PATCH
body=body)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '0050d2e9-fdc4-432e-a2e0-a4f8d1e86b64', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'e2a1e128-237e-4309-b223-22075ed53594', 'X-Kubernetes-Pf-Prioritylevel-Uid': '35ea24e5-3866-4c70-962e-ad4adae8cbc2', 'Date': 'Tue, 09 Feb 2021 14:16:40 GMT', 'Content-Length': '241'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \"pstest2\" is forbidden: endpoint address 10.116.0.47 is not allowed","reason":"Forbidden","details":{"name":"pstest2","kind":"endpoints"},"code":403}
I found a very similar issue report here. It seems that Patroni does not work in OpenShift in an "endpoints mode". That mode is determined by the PATRONI_KUBERNETES_USE_ENDPOINTS variable, but I can't easily change it to use a ConfigMap because it's hardcoded in StackGres here. Changing the variable in a source code and rebuilding the operator also does not work, cause it (as far as i guess) breaks some other dependent logic.