Several pods fail to start on Openshift 3.11

Summary

Several pods fail to start successfully on OpenShift 3.11.

Steps to reproduce

I've created an Ansible Playbook that is should reproduce this. You'll need an Openshift 3.11 cluster.

See https://gitlab.com/snippets/1917612

Configuration used

(Please provide a sanitized version of the configuration used wrapped in a code block (```yaml))

From oc get scc anyuid -o yaml


allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: null
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
  type: RunAsAny
groups:
- system:cluster-admins
- system:serviceaccounts:gitlab
- system:serviceaccounts:gitlab-prometheus-server
- system:serviceaccount:gitlab:gitlab-prometheus-server
kind: SecurityContextConstraints
metadata:
  annotations:
    kubernetes.io/description: anyuid provides all features of the restricted SCC
      but allows users to run with any UID and any GID.
  creationTimestamp: 2019-11-24T21:24:22Z
  name: anyuid
  resourceVersion: "495273"
  selfLink: /apis/security.openshift.io/v1/securitycontextconstraints/anyuid
  uid: c8319bee-0f00-11ea-9768-005056aae1e8
priority: 10
readOnlyRootFilesystem: false
requiredDropCapabilities:
- MKNOD
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users:
- system:serviceaccount:kube-system:default
- system:serviceaccount:kube-system:gitlab-runner
- system:serviceaccount:gitlab:default
- system:serviceaccount:gitlab:gitlab-runner
- system:serviceaccount:gitlab:tiller
- system:serviceaccount:gitlab:gitlab-shared-secrets
- system:serviceaccount:gitlab:gitlab-gitlab-runner
- system:serviceaccount:gitlab:gitlab-prometheus-server
volumes:
- configMap
- downwardAPI
- emptyDir
- hostPath
- persistentVolumeClaim
- projected
- secret

Current behavior

The following pods are failing to start successfully (from oc get pods -o json | jq -r '.items[] | select(.status.phase != "Running" or ([ .status.conditions[] | select(.type == "Ready" and .status == "False") ] | length ) == 1 ) | .metadata.namespace + "/" + .metadata.name':

  • gitlab/gitlab-gitaly-0
  • gitlab/gitlab-gitlab-runner-6f4b6dc944-hb8gb
  • gitlab/gitlab-minio-8564669cf5-9wcl4
  • gitlab/gitlab-postgresql-9d75fd6f8-x22j6
  • gitlab/gitlab-prometheus-server-75585684b4-9wmrm
  • gitlab/gitlab-sidekiq-all-in-1-775b667679-6rw72
  • gitlab/gitlab-unicorn-79cb8b56c8-28n9g
  • gitlab/gitlab-unicorn-79cb8b56c8-6qqwp

Most of the problematic pods have permissions issues. However, in the case of the gitlab-postgresql it appears to be a configuration issue - see below for logs.

Expected behavior

Expect all pods to start successfully.

Versions

  • Chart: gitlab-2.5.1
  • Platform:
    • Self-hosted: Openshift 3.11
  • Kubernetes: (kubectl version)
    • Client: v1.11.0+d4cacc0
    • Server: v1.11.0+d4cacc0
  • Helm: (helm version)
    • Client: &version.Version{SemVer:"v2.16.1", GitCommit:"bbdfe5e7803a12bbdf97e94cd847859890cf4050", GitTreeState:"clean"}
    • Server: &version.Version{SemVer:"v2.16.1", GitCommit:"bbdfe5e7803a12bbdf97e94cd847859890cf4050", GitTreeState:"clean"}

Relevant logs

'Describe' postgres

From oc describe pod gitlab-postgresql-9d75fd6f8-x22j6.

Name:               gitlab-postgresql-9d75fd6f8-x22j6
Namespace:          gitlab
Priority:           0
PriorityClassName:  <none>
Node:               wallets-mgnt-worker100.mgmt.wallets/10.8.32.2
Start Time:         Wed, 27 Nov 2019 22:41:30 +0000
Labels:             app=postgresql
                    pod-template-hash=583198294
                    release=gitlab
Annotations:        openshift.io/scc=anyuid
Status:             Pending
IP:                 10.129.1.220
Controlled By:      ReplicaSet/gitlab-postgresql-9d75fd6f8
Containers:
  gitlab-postgresql:
    Container ID:   
    Image:          postgres:9.6.8
    Image ID:       
    Port:           5432/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   exec [sh -c exec pg_isready --host $POD_IP] delay=120s timeout=5s period=10s #success=1 #failure=6
    Readiness:  exec [sh -c exec pg_isready --host $POD_IP] delay=5s timeout=3s period=5s #success=1 #failure=3
    Environment:
      POSTGRES_USER:           gitlab
      PGUSER:                  gitlab
      POSTGRES_DB:             gitlabhq_production
      POSTGRES_INITDB_ARGS:    
      PGDATA:                  /var/lib/postgresql/data/pgdata
      POSTGRES_PASSWORD_FILE:  /conf/postgres-password
      POD_IP:                   (v1:status.podIP)
    Mounts:
      /conf from password-file (ro)
      /var/lib/postgresql/data/pgdata from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
  metrics:
    Container ID:   docker://c7f4485c2e816f53b2d54bbbcf27696917d358c0968c240801b5d343af8bb773
    Image:          wrouesnel/postgres_exporter:v0.1.1
    Image ID:       docker-pullable://docker.io/wrouesnel/postgres_exporter@sha256:d8bc6221112d77b2d7b7746b729f848b0db60823eb385355636943934c09d822
    Port:           9187/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 27 Nov 2019 22:41:32 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     100m
      memory:  256Mi
    Environment:
      DATA_SOURCE_NAME:  postgresql://gitlab@127.0.0.1:5432?sslmode=disable
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  gitlab-postgresql
    ReadOnly:   false
  password-file:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  gitlab-postgresql-password
    Optional:    false
  default-token-xd4ww:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-xd4ww
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  gitlab=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason     Age                 From                                          Message
  ----     ------     ----                ----                                          -------
  Normal   Scheduled  1h                  default-scheduler                             Successfully assigned gitlab/gitlab-postgresql-9d75fd6f8-x22j6 to wallets-mgnt-worker100.mgmt.wallets
  Normal   Pulled     1h                  kubelet, wallets-mgnt-worker100.mgmt.wallets  Container image "wrouesnel/postgres_exporter:v0.1.1" already present on machine
  Normal   Created    1h                  kubelet, wallets-mgnt-worker100.mgmt.wallets  Created container
  Normal   Started    1h                  kubelet, wallets-mgnt-worker100.mgmt.wallets  Started container
  Normal   Pulled     45m (x182 over 1h)  kubelet, wallets-mgnt-worker100.mgmt.wallets  Container image "postgres:9.6.8" already present on machine
  Warning  Failed     10s (x386 over 1h)  kubelet, wallets-mgnt-worker100.mgmt.wallets  Error: stat //gitlab/gitlab_data/gitlab-postgresql: no such file or directory

Postgres configmap

From oc get configmaps I notice that gitlab-postgresql has no data:

NAME                                  DATA      AGE
gitlab-gitaly                         3         21h
gitlab-gitlab-chart-info              2         21h
gitlab-gitlab-exporter                2         21h
gitlab-gitlab-runner                  5         21h
gitlab-gitlab-shell                   2         21h
gitlab-migrations                     4         21h
gitlab-minio-config-cm                3         21h
gitlab-nginx-ingress-controller       8         21h
gitlab-nginx-ingress-custom-headers   1         21h
gitlab-nginx-ingress-tcp              1         21h
gitlab-postgresql                     0         21h
gitlab-prometheus-server              3         21h
gitlab-redis                          2         21h
gitlab-registry                       2         21h
gitlab-sidekiq                        6         21h
gitlab-sidekiq-all-in-1               1         21h
gitlab-task-runner                    6         21h
gitlab-unicorn                        7         21h
gitlab-unicorn-tests                  1         21h
gitlab-workhorse-config               3         21h

And oc edit configmap gitlab-postgresql -o yaml shows:

apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: 2019-11-27T03:42:12Z
  labels:
    app: postgresql
    chart: postgresql-0.12.0
    heritage: Tiller
    release: gitlab
  name: gitlab-postgresql
  namespace: gitlab
  resourceVersion: "381270"
  selfLink: /api/v1/namespaces/gitlab/configmaps/gitlab-postgresql
  uid: e5dfb5b2-10c7-11ea-8ad6-005056aae1e8

'Describe' gitlab-sidekiq-all-in-1

From oc describe pod gitlab-sidekiq-all-in-1-775b667679-6rw72

Name:               gitlab-sidekiq-all-in-1-775b667679-6rw72
Namespace:          gitlab
Priority:           0
PriorityClassName:  <none>
Node:               wallets-mgnt-worker100.mgmt.wallets/10.8.32.2
Start Time:         Wed, 27 Nov 2019 03:42:13 +0000
Labels:             app=sidekiq
                    pod-template-hash=3316223235
                    release=gitlab
Annotations:        checksum/configmap=b84f6f00c232803d039bc31a74f0b85a86a69a20adcfc555235b983c21b729a7
                    checksum/configmap-pod=4968e0da5ba113227dabf98749e8fd04888c5f54f4fe10c60a6d7d36d97fbbb5
                    cluster-autoscaler.kubernetes.io/safe-to-evict=true
                    openshift.io/scc=anyuid
                    prometheus.io/port=3807
                    prometheus.io/scrape=true
Status:             Pending
IP:                 10.129.1.176
Controlled By:      ReplicaSet/gitlab-sidekiq-all-in-1-775b667679
Init Containers:
  certificates:
    Container ID:   docker://8aed490c6e84732d01ae132db8e3ae2b67195f0274951c378e8c55555e36b122
    Image:          registry.gitlab.com/gitlab-org/build/cng/alpine-certificates:20171114-r3
    Image ID:       docker-pullable://registry.gitlab.com/gitlab-org/build/cng/alpine-certificates@sha256:00ce9a585179e6b22c9bfea9ba82552630eab0bd25da4f13282b588b2ad022dc
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 27 Nov 2019 03:42:27 +0000
      Finished:     Wed, 27 Nov 2019 03:42:27 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        50m
    Environment:  <none>
    Mounts:
      /etc/ssl/certs from etc-ssl-certs (rw)
      /usr/local/share/ca-certificates from custom-ca-certificates (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
  configure:
    Container ID:  docker://d2605f6df572e7794eb99b958062c73edb8875dce899632ed5f1f343b5d18829
    Image:         busybox:latest
    Image ID:      docker-pullable://docker.io/busybox@sha256:1303dbf110c57f3edf68d9f5a16c082ec06c4cf7604831669faf2c712260b5a0
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      /config/configure
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 27 Nov 2019 03:42:37 +0000
      Finished:     Wed, 27 Nov 2019 03:42:37 +0000
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        50m
    Environment:  <none>
    Mounts:
      /config from sidekiq-config (ro)
      /init-config from init-sidekiq-secrets (ro)
      /init-secrets from sidekiq-secrets (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
  dependencies:
    Container ID:  docker://ed94dc9af34dac5066233013f294d261af2a2a9046bdf45c468ddcb9c56e1a75
    Image:         registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce:v12.5.0
    Image ID:      docker-pullable://registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce@sha256:0bada1ed67db739a88d7d11eeb652f8e8984793e8254ed61c5dcb5cca7dd98f3
    Port:          <none>
    Host Port:     <none>
    Args:
      /scripts/wait-for-deps
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 27 Nov 2019 23:55:12 +0000
      Finished:     Wed, 27 Nov 2019 23:59:32 +0000
    Ready:          False
    Restart Count:  129
    Requests:
      cpu:  50m
    Environment:
      GITALY_FEATURE_DEFAULT_ON:  1
      CONFIG_TEMPLATE_DIRECTORY:  /var/opt/gitlab/templates
      CONFIG_DIRECTORY:           /srv/gitlab/config
      SIDEKIQ_CONCURRENCY:        25
      SIDEKIQ_TIMEOUT:            5
    Mounts:
      /etc/gitlab from sidekiq-secrets (ro)
      /var/opt/gitlab/templates from sidekiq-config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
Containers:
  sidekiq:
    Container ID:   
    Image:          registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce:v12.5.0
    Image ID:       
    Port:           3807/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      50m
      memory:   650M
    Liveness:   exec [pgrep -f sidekiq] delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  exec [pgrep -f sidekiq] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      prometheus_multiproc_dir:             /metrics
      GITALY_FEATURE_DEFAULT_ON:            1
      CONFIG_TEMPLATE_DIRECTORY:            /var/opt/gitlab/templates
      CONFIG_DIRECTORY:                     /srv/gitlab/config
      SIDEKIQ_CONCURRENCY:                  25
      SIDEKIQ_TIMEOUT:                      5
      SIDEKIQ_MEMORY_KILLER_MAX_RSS:        2000000
      SIDEKIQ_MEMORY_KILLER_GRACE_TIME:     900
      SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT:  30
    Mounts:
      /etc/gitlab from sidekiq-secrets (ro)
      /etc/ssl/certs/ from etc-ssl-certs (ro)
      /metrics from sidekiq-metrics (rw)
      /srv/gitlab/INSTALLATION_TYPE from sidekiq-config (rw)
      /srv/gitlab/config/initializers/smtp_settings.rb from sidekiq-config (rw)
      /srv/gitlab/config/secrets.yml from sidekiq-secrets (rw)
      /var/opt/gitlab/templates from sidekiq-config (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  sidekiq-metrics:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  Memory
  sidekiq-config:
  <unknown>
  init-sidekiq-secrets:
  <unknown>
  sidekiq-secrets:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  Memory
  etc-ssl-certs:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  Memory
  custom-ca-certificates:
  <unknown>
  default-token-xd4ww:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-xd4ww
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  gitlab=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type     Reason   Age                  From                                          Message
  ----     ------   ----                 ----                                          -------
  Normal   Pulled   1h (x123 over 20h)   kubelet, wallets-mgnt-worker100.mgmt.wallets  Container image "registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce:v12.5.0" already present on machine
  Warning  BackOff  1m (x2862 over 20h)  kubelet, wallets-mgnt-worker100.mgmt.wallets  Back-off restarting failed container

Logs from gitlab-prometheus-server container.

level=info ts=2019-11-27T23:58:11.999Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.1, branch=HEAD, revision=e5b22494857deca4b806f74f6e3a6ee30c251763)"
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:330 build_context="(go=go1.12.7, user=root@d94406f2bb6f, date=20190710-13:51:17)"
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:331 host_details="(Linux 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 gitlab-prometheus-server-75585684b4-9wmrm (none))"
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:332 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:652 msg="Starting TSDB ..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:521 msg="Stopping scrape discovery manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:535 msg="Stopping notify discovery manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:557 msg="Stopping scrape manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:531 msg="Notify discovery manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:517 msg="Scrape discovery manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:551 msg="Scrape manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=manager.go:776 component="rule manager" msg="Stopping rule manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=manager.go:782 component="rule manager" msg="Rule manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:722 msg="Notifier manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=web.go:448 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=error ts=2019-11-27T23:58:12.001Z caller=main.go:731 err="opening storage failed: list block dirs in \"/data\": open /data: permission denied"

Logs from gitlab-gitaly-0

+ /scripts/set-config /etc/gitaly/templates /etc/gitaly
Begin parsing .erb files from /etc/gitaly/templates
Writing /etc/gitaly/config.toml
Writing /etc/gitaly/shell-config.yml
Copying other config files found in /etc/gitaly/templates
+ exec /bin/sh -c '"/scripts/process-wrapper"'
Starting Gitaly
==> /var/log/gitaly/gitaly.log <==

==> /var/log/gitaly/gitlab-shell.log <==

==> /var/log/gitaly/gitaly.log <==
time="2019-11-28T00:25:35Z" level=info msg="Starting Gitaly" version="Gitaly, version 1.72.0"
time="2019-11-28T00:25:35Z" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/usr/local/bin/git
time="2019-11-28T00:25:35Z" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/usr/local/bin/git
time="2019-11-28T00:25:35Z" level=info msg="clearing disk cache object folder" storage=default
time="2019-11-28T00:25:35Z" level=fatal msg="load config" config_path=/etc/gitaly/config.toml error="mkdir /home/git/repositories/+gitaly/tmp/diskcache623264096: permission denied"

SELinux logs

There doesnt appear to be any remaining SELinux issues. When I run sudo ausearch -m avc --start recent on the node that is running all the gitlab pods, there is no results returned.

gitlab-unicorn

oc logs gitlab-unicorn-79cb8b56c8-6qqwp -c dependencies shows it is dependant on postgres, so wont worry about this for now.

...
	Is the server running on host "gitlab-postgresql" (172.30.119.70) and accepting

Permissions of PV

ls -alt ${PV_HOST_DIRECTORY} shows

drwxrwxrwx. 13 root   root   221 Nov 27 02:48 gitlab-minio
drwxrwxrwx.  3 root   root    45 Nov 27 02:47 repo-data-gitlab-gitaly-0
drwxrwxrwx.  3 root   root    29 Nov 27 02:47 gitlab-prometheus-server
drwxrwxrwx.  6 deploy deploy 111 Nov 26 03:29 .
drwxrwxrwx.  2 root   root     6 Nov 26 02:56 gitlab-redis
drwxr-xr-x.  3 root   root    25 Nov 26 02:56 ..
Edited by DJ Mountney