The selector of the operator k8s services is too loose

What I wanted to do

Install GitLab runner on OpenShift.

Actions performed

  • In Operators > OperatorHub, select GitLab Runner (certified)
  • Complete the installation with the default values
  • Create a registration secret
apiVersion: v1
kind: Secret
metadata:
    name: gitlab-runner-secret
type: Opaque
stringData:
    runner-registration-token: REDACTED
  • Create a Runner
kind: Runner
apiVersion: apps.gitlab.com/v1beta2
metadata:
  name: example
  namespace: default
spec:
  gitlabUrl: 'https://gitlab.com'
  imagePullPolicy: Always
  tags: 'openshift, test'
  token: gitlab-runner-secret

Expected result

The GitLab runner is installed.

Actual results

Error "failed calling webhook "mrunner.kb.io": failed to call webhook: Post "https://gitlab-runner-controller-manager-service.openshift-operators.svc:443/mutate-apps-gitlab-com-v1beta2-runner?timeout=10s": dial tcp 10.131.0.17:9443: connect: connection refused" for field "undefined".

Explanation

The selector of the gitlab-runner-webhook-service service is too loose. This results in the Service "adopting" pods from other operators and distributing the webhook requests with round-robin over those pods.

Screenshot_20221020_155853

In a newly provisioned environment, it works. But as soon as you have several operators installed, it fails randomly.

Environment

  • OpenShift 4.10.34
  • Gitlab Runner 1.10.0

Possible fix

The gitlab-runner-webhook-service, gitlab-runner-controller-manager-service, gitlab-runner-controller-manager-metrics-service have a selector defined as:

  • control-plane=controller-manager

Other operators usually use selectors that combine two labels:

  • control-plane=controller-manager
  • app=foo-operator
Assignee Loading
Time tracking Loading