The selector of the operator k8s services is too loose
What I wanted to do
Install GitLab runner on OpenShift.
Actions performed
- In Operators > OperatorHub, select GitLab Runner (certified)
- Complete the installation with the default values
- Create a registration secret
apiVersion: v1
kind: Secret
metadata:
name: gitlab-runner-secret
type: Opaque
stringData:
runner-registration-token: REDACTED
- Create a Runner
kind: Runner
apiVersion: apps.gitlab.com/v1beta2
metadata:
name: example
namespace: default
spec:
gitlabUrl: 'https://gitlab.com'
imagePullPolicy: Always
tags: 'openshift, test'
token: gitlab-runner-secret
Expected result
The GitLab runner is installed.
Actual results
Error "failed calling webhook "mrunner.kb.io": failed to call webhook: Post "https://gitlab-runner-controller-manager-service.openshift-operators.svc:443/mutate-apps-gitlab-com-v1beta2-runner?timeout=10s": dial tcp 10.131.0.17:9443: connect: connection refused" for field "undefined".
Explanation
The selector of the gitlab-runner-webhook-service service is too loose. This results in the Service "adopting" pods from other operators and distributing the webhook requests with round-robin over those pods.
In a newly provisioned environment, it works. But as soon as you have several operators installed, it fails randomly.
Environment
- OpenShift 4.10.34
- Gitlab Runner 1.10.0
Possible fix
The gitlab-runner-webhook-service, gitlab-runner-controller-manager-service, gitlab-runner-controller-manager-metrics-service have a selector defined as:
control-plane=controller-manager
Other operators usually use selectors that combine two labels:
control-plane=controller-managerapp=foo-operator
