Skip to content

Make resource checking disabled by default

Romuald Atchadé requested to merge k8s-configurable-resource-checks into main

What does this MR do?

Make the resource checking added in MR !3399 (merged) disabled by default

Why was this MR needed?

The resource checking required additional permissions to the service account which breaks the setup lacking those permissions.

In this MR a specific integration test has been added to run successfully a job using the minimal permission needed from a custom service account

What's the best way to test this MR?

  1. Use the following config.toml. In this configuration, the service_account is set to cs-sa the default service account in the cluster
config.toml
check_interval = 1
log_level = "debug"

[session_server]
  session_timeout = 1800

[[runners]]
  request_concurrency = 1
  url = "https://gitlab.com/"
  token = "__REDACTED__"
  executor = "kubernetes"
  [runners.custom_build_dir]
  [runners.kubernetes]
    service_account="cs-sa"
    pull_policy="always"
    image = "alpine:latest"
    namespace_overwrite_allowed = ""
    privileged = true
    allow_privilege_escalation = true
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    terminationGracePeriodSeconds = 30
    [runners.kubernetes.affinity]
    [runners.kubernetes.volumes]
    [runners.kubernetes.dns_config]
  1. Use the following gitlab-ci.yml
gitlab-ci
job:
  script:
  - sleep 15
  1. The job will fail. In the debug log, there is no reference to serviceAccount check attempts (disabled)
Preparing environment                   job=2675328079 project=25452826 runner=DzfSJrxx
Starting Kubernetes command with attach...          job=2675328079 project=25452826 runner=DzfSJrxx
Setting up secrets                                  job=2675328079 project=25452826 runner=DzfSJrxx
Feeding runners to channel                          builds=1
Setting up scripts config map                       job=2675328079 project=25452826 runner=DzfSJrxx
Setting up build pod                                job=2675328079 project=25452826 runner=DzfSJrxx
DNSPolicy string is blank, using "ClusterFirst" as default 
Checking for ImagePullSecrets or ServiceAccount existence  job=2675328079 project=25452826 runner=DzfSJrxx
Resources check has been disabled                   job=2675328079 project=25452826 runner=DzfSJrxx
Creating build pod                                  job=2675328079 project=25452826 runner=DzfSJrxx
ERROR: Job failed (system failure): prepare environment: setting up build pod: pods "runner-dzfsjrxx-project-25452826-concurrent-0" is forbidden: error looking up service account default/cs-sa: serviceaccount "cs-sa" not found. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information  duration_s=3.28418601 job=2675328079 project=25452826 runner=DzfSJrxx
  1. Use the following config.toml
config.toml
check_interval = 1
log_level = "debug"

[session_server]
  session_timeout = 1800

[[runners]]
  request_concurrency = 1
  url = "https://gitlab.com/"
  token = "__REDACTED__"
  executor = "kubernetes"
  [runners.custom_build_dir]
  [runners.kubernetes]
    service_account="cs-sa"
    pull_policy="always"
    resource_availability_check_max_attempts=3
    image = "alpine:latest"
    namespace_overwrite_allowed = ""
    privileged = true
    allow_privilege_escalation = true
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    terminationGracePeriodSeconds = 30
    [runners.kubernetes.affinity]
    [runners.kubernetes.volumes]
    [runners.kubernetes.dns_config]
  1. The job will fail. In the debug log, we see reference to serviceAccount check attempts
Preparing environment                   job=2675271396 project=25452826 runner=DzfSJrxx
Starting Kubernetes command with attach...          job=2675271396 project=25452826 runner=DzfSJrxx
Setting up secrets                                  job=2675271396 project=25452826 runner=DzfSJrxx
Setting up scripts config map                       job=2675271396 project=25452826 runner=DzfSJrxx
Feeding runners to channel                          builds=1
Setting up build pod                                job=2675271396 project=25452826 runner=DzfSJrxx
DNSPolicy string is blank, using "ClusterFirst" as default 
Checking for ImagePullSecrets or ServiceAccount existence  job=2675271396 project=25452826 runner=DzfSJrxx
Checking for ServiceAccount existence               job=2675271396 project=25452826 runner=DzfSJrxx
Appending trace to coordinator... ok                code=202 job=2675271396 job-log=0-640 job-status=running runner=DzfSJrxx sent-log=0-639 status=202 Accepted update-interval=3s
Pausing check of the ServiceAccount availability for 5000000000 (attempt 1)  job=2675271396 project=25452826 runner=DzfSJrxx
Pausing check of the ServiceAccount availability for 5000000000 (attempt 2)  job=2675271396 project=25452826 runner=DzfSJrxx
Pausing check of the ServiceAccount availability for 5000000000 (attempt 3)  job=2675271396 project=25452826 runner=DzfSJrxx
ERROR: Job failed (system failure): prepare environment: setting up build pod: Timed out while waiting for ServiceAccount/cs-sa to be present in the cluster. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information  duration_s=21.291653373 job=2675271396 project=25452826 runner=DzfSJrxx
  1. Use the following config.toml with default service account
config.toml
check_interval = 1
log_level = "debug"

[session_server]
  session_timeout = 1800

[[runners]]
  request_concurrency = 1
  url = "https://gitlab.com/"
  token = "__REDACTED__"
  executor = "kubernetes"
  [runners.custom_build_dir]
  [runners.kubernetes]
    pull_policy="always"
    image = "alpine:latest"
    namespace_overwrite_allowed = ""
    privileged = true
    allow_privilege_escalation = true
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    terminationGracePeriodSeconds = 30
    [runners.kubernetes.affinity]
    [runners.kubernetes.volumes]
    [runners.kubernetes.dns_config]
  1. The job succeeds

Test permissions

Using the helm chart project, test the permissions for each mode:

Attach mode

  • Service Account YAML
wi-sa.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: wi-role
rules:
- apiGroups: [""]
  resources: ["serviceAccounts"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["pods/exec", "pods/attach"]
  verbs: ["create", "patch", "delete"]
- apiGroups: [""]
  resources: ["pods", "services"]
  verbs: ["create", "get", "delete"]
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["create", "get", "update", "delete"]
---

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: wi-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: wi-role
subjects:
- kind: ServiceAccount
  name: wi-sa
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: wi-sa
  • values.yaml
values.yaml
image:
  registry: registry.gitlab.com
  image: gitlab-org/gitlab-runner
  # tag: alpine-v11.6.0

imagePullPolicy: IfNotPresent

replicas: 1

gitlabUrl: https://gitlab.com/

runnerRegistrationToken: __YOUR_TOKEN_

terminationGracePeriodSeconds: 0

concurrent: 1

checkInterval: 1

logLevel: "debug"

sessionServer:
  enabled: false
  annotations: {}
  timeout: 1800
  internalPort: 8093
  externalPort: 9000
  # publicIP: ""
  # loadBalancerSourceRanges:
  #   - 1.2.3.4/32

## For RBAC support:
rbac:
  create: false

  clusterWideAccess: false

  serviceAccountName: wi-sa

  serviceAccountAnnotations: {}


  podSecurityPolicy:
    enabled: true
    resourceNames:
    - gitlab-runner

metrics:
  enabled: true

  portName: metrics

  port: 9252

  serviceMonitor:
    enabled: false

service:
  enabled: false

  type: ClusterIP

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        image = "alpine"

  cache: {}

  builds: {}

  services: {}

  helpers: {}

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: false
  runAsNonRoot: false
  # privileged: false
  # capabilities:
  #   drop: ["ALL"]

podSecurityContext:
  runAsUser: 100
  # runAsGroup: 65533
  # fsGroup: 65533
  # supplementalGroups: [65533]

  ## Note: values for the ubuntu image:
  # runAsUser: 999
  # fsGroup: 999

resources: {}
  # limits:
  #   memory: 256Mi
  #   cpu: 200m
  # requests:
  #   memory: 128Mi
  #   cpu: 100m

affinity: {}

nodeSelector: {}

tolerations: []

hostAliases: []
  # Example:
  # - ip: "127.0.0.1"
  #   hostnames:
  #   - "foo.local"
  #   - "bar.local"
  # - ip: "10.1.2.3"
  #   hostnames:
  #   - "foo.remote"
  #   - "bar.remote"

podAnnotations: {}

podLabels: {}
secrets: []
configMaps: {}

volumeMounts: []

volumes: []

Exec mode

  • Service Account YAML
wi-sa.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: wi-role
rules:
- apiGroups: [""]
  resources: ["serviceAccounts"]
  verbs: ["get"]
- apiGroups: [""]
  resources: ["pods/exec"]
  verbs: ["create", "patch", "delete"]
- apiGroups: [""]
  resources: ["pods", "services", "secrets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: wi-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: wi-role
subjects:
- kind: ServiceAccount
  name: wi-sa
---

apiVersion: v1
kind: ServiceAccount
metadata:
  name: wi-sa
  • values.yaml
values.yaml
image:
  registry: registry.gitlab.com
  image: gitlab-org/gitlab-runner
  # tag: alpine-v11.6.0

imagePullPolicy: IfNotPresent

replicas: 1

gitlabUrl: https://gitlab.com/

runnerRegistrationToken: __YOUR_TOKEN_

terminationGracePeriodSeconds: 0

concurrent: 1

checkInterval: 1

logLevel: "debug"

sessionServer:
  enabled: false
  annotations: {}
  timeout: 1800
  internalPort: 8093
  externalPort: 9000
  # publicIP: ""
  # loadBalancerSourceRanges:
  #   - 1.2.3.4/32

## For RBAC support:
rbac:
  create: false

  clusterWideAccess: false

  serviceAccountName: wi-sa

  serviceAccountAnnotations: {}


  podSecurityPolicy:
    enabled: true
    resourceNames:
    - gitlab-runner

metrics:
  enabled: true

  portName: metrics

  port: 9252

  serviceMonitor:
    enabled: false

service:
  enabled: false

  type: ClusterIP

runners:
  config: |
    [[runners]]
      environment=["FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY=true"]
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        image = "alpine"

  cache: {}

  builds: {}

  services: {}

  helpers: {}

securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: false
  runAsNonRoot: false
  # privileged: false
  # capabilities:
  #   drop: ["ALL"]

podSecurityContext:
  runAsUser: 100
  # runAsGroup: 65533
  # fsGroup: 65533
  # supplementalGroups: [65533]

  ## Note: values for the ubuntu image:
  # runAsUser: 999
  # fsGroup: 999

resources: {}
  # limits:
  #   memory: 256Mi
  #   cpu: 200m
  # requests:
  #   memory: 128Mi
  #   cpu: 100m

affinity: {}

nodeSelector: {}

tolerations: []

hostAliases: []
  # Example:
  # - ip: "127.0.0.1"
  #   hostnames:
  #   - "foo.local"
  #   - "bar.local"
  # - ip: "10.1.2.3"
  #   hostnames:
  #   - "foo.remote"
  #   - "bar.remote"

podAnnotations: {}

podLabels: {}
secrets: []
configMaps: {}

volumeMounts: []

volumes: []

What are the relevant issue numbers?

#29101 (closed)

Edited by Romuald Atchadé

Merge request reports