Several pods fail to start on Openshift 3.11
Summary
Several pods fail to start successfully on OpenShift 3.11.
Steps to reproduce
I've created an Ansible Playbook that is should reproduce this. You'll need an Openshift 3.11 cluster.
See https://gitlab.com/snippets/1917612
Configuration used
(Please provide a sanitized version of the configuration used wrapped in a code block (```yaml))
From oc get scc anyuid -o yaml
allowHostDirVolumePlugin: true
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: true
allowedCapabilities: null
apiVersion: security.openshift.io/v1
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
groups:
- system:cluster-admins
- system:serviceaccounts:gitlab
- system:serviceaccounts:gitlab-prometheus-server
- system:serviceaccount:gitlab:gitlab-prometheus-server
kind: SecurityContextConstraints
metadata:
annotations:
kubernetes.io/description: anyuid provides all features of the restricted SCC
but allows users to run with any UID and any GID.
creationTimestamp: 2019-11-24T21:24:22Z
name: anyuid
resourceVersion: "495273"
selfLink: /apis/security.openshift.io/v1/securitycontextconstraints/anyuid
uid: c8319bee-0f00-11ea-9768-005056aae1e8
priority: 10
readOnlyRootFilesystem: false
requiredDropCapabilities:
- MKNOD
runAsUser:
type: RunAsAny
seLinuxContext:
type: MustRunAs
supplementalGroups:
type: RunAsAny
users:
- system:serviceaccount:kube-system:default
- system:serviceaccount:kube-system:gitlab-runner
- system:serviceaccount:gitlab:default
- system:serviceaccount:gitlab:gitlab-runner
- system:serviceaccount:gitlab:tiller
- system:serviceaccount:gitlab:gitlab-shared-secrets
- system:serviceaccount:gitlab:gitlab-gitlab-runner
- system:serviceaccount:gitlab:gitlab-prometheus-server
volumes:
- configMap
- downwardAPI
- emptyDir
- hostPath
- persistentVolumeClaim
- projected
- secret
Current behavior
The following pods are failing to start successfully (from oc get pods -o json | jq -r '.items[] | select(.status.phase != "Running" or ([ .status.conditions[] | select(.type == "Ready" and .status == "False") ] | length ) == 1 ) | .metadata.namespace + "/" + .metadata.name':
- gitlab/gitlab-gitaly-0
- gitlab/gitlab-gitlab-runner-6f4b6dc944-hb8gb
- gitlab/gitlab-minio-8564669cf5-9wcl4
- gitlab/gitlab-postgresql-9d75fd6f8-x22j6
- gitlab/gitlab-prometheus-server-75585684b4-9wmrm
- gitlab/gitlab-sidekiq-all-in-1-775b667679-6rw72
- gitlab/gitlab-unicorn-79cb8b56c8-28n9g
- gitlab/gitlab-unicorn-79cb8b56c8-6qqwp
Most of the problematic pods have permissions issues. However, in the case of the gitlab-postgresql it appears to be a configuration issue - see below for logs.
Expected behavior
Expect all pods to start successfully.
Versions
- Chart: gitlab-2.5.1
- Platform:
- Self-hosted: Openshift 3.11
- Kubernetes: (
kubectl version)- Client: v1.11.0+d4cacc0
- Server: v1.11.0+d4cacc0
- Helm: (
helm version)- Client: &version.Version{SemVer:"v2.16.1", GitCommit:"bbdfe5e7803a12bbdf97e94cd847859890cf4050", GitTreeState:"clean"}
- Server: &version.Version{SemVer:"v2.16.1", GitCommit:"bbdfe5e7803a12bbdf97e94cd847859890cf4050", GitTreeState:"clean"}
Relevant logs
'Describe' postgres
From oc describe pod gitlab-postgresql-9d75fd6f8-x22j6.
Name: gitlab-postgresql-9d75fd6f8-x22j6
Namespace: gitlab
Priority: 0
PriorityClassName: <none>
Node: wallets-mgnt-worker100.mgmt.wallets/10.8.32.2
Start Time: Wed, 27 Nov 2019 22:41:30 +0000
Labels: app=postgresql
pod-template-hash=583198294
release=gitlab
Annotations: openshift.io/scc=anyuid
Status: Pending
IP: 10.129.1.220
Controlled By: ReplicaSet/gitlab-postgresql-9d75fd6f8
Containers:
gitlab-postgresql:
Container ID:
Image: postgres:9.6.8
Image ID:
Port: 5432/TCP
Host Port: 0/TCP
State: Waiting
Reason: CreateContainerConfigError
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 256Mi
Liveness: exec [sh -c exec pg_isready --host $POD_IP] delay=120s timeout=5s period=10s #success=1 #failure=6
Readiness: exec [sh -c exec pg_isready --host $POD_IP] delay=5s timeout=3s period=5s #success=1 #failure=3
Environment:
POSTGRES_USER: gitlab
PGUSER: gitlab
POSTGRES_DB: gitlabhq_production
POSTGRES_INITDB_ARGS:
PGDATA: /var/lib/postgresql/data/pgdata
POSTGRES_PASSWORD_FILE: /conf/postgres-password
POD_IP: (v1:status.podIP)
Mounts:
/conf from password-file (ro)
/var/lib/postgresql/data/pgdata from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
metrics:
Container ID: docker://c7f4485c2e816f53b2d54bbbcf27696917d358c0968c240801b5d343af8bb773
Image: wrouesnel/postgres_exporter:v0.1.1
Image ID: docker-pullable://docker.io/wrouesnel/postgres_exporter@sha256:d8bc6221112d77b2d7b7746b729f848b0db60823eb385355636943934c09d822
Port: 9187/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 27 Nov 2019 22:41:32 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 256Mi
Environment:
DATA_SOURCE_NAME: postgresql://gitlab@127.0.0.1:5432?sslmode=disable
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: gitlab-postgresql
ReadOnly: false
password-file:
Type: Secret (a volume populated by a Secret)
SecretName: gitlab-postgresql-password
Optional: false
default-token-xd4ww:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-xd4ww
Optional: false
QoS Class: Burstable
Node-Selectors: gitlab=true
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1h default-scheduler Successfully assigned gitlab/gitlab-postgresql-9d75fd6f8-x22j6 to wallets-mgnt-worker100.mgmt.wallets
Normal Pulled 1h kubelet, wallets-mgnt-worker100.mgmt.wallets Container image "wrouesnel/postgres_exporter:v0.1.1" already present on machine
Normal Created 1h kubelet, wallets-mgnt-worker100.mgmt.wallets Created container
Normal Started 1h kubelet, wallets-mgnt-worker100.mgmt.wallets Started container
Normal Pulled 45m (x182 over 1h) kubelet, wallets-mgnt-worker100.mgmt.wallets Container image "postgres:9.6.8" already present on machine
Warning Failed 10s (x386 over 1h) kubelet, wallets-mgnt-worker100.mgmt.wallets Error: stat //gitlab/gitlab_data/gitlab-postgresql: no such file or directory
Postgres configmap
From oc get configmaps I notice that gitlab-postgresql has no data:
NAME DATA AGE
gitlab-gitaly 3 21h
gitlab-gitlab-chart-info 2 21h
gitlab-gitlab-exporter 2 21h
gitlab-gitlab-runner 5 21h
gitlab-gitlab-shell 2 21h
gitlab-migrations 4 21h
gitlab-minio-config-cm 3 21h
gitlab-nginx-ingress-controller 8 21h
gitlab-nginx-ingress-custom-headers 1 21h
gitlab-nginx-ingress-tcp 1 21h
gitlab-postgresql 0 21h
gitlab-prometheus-server 3 21h
gitlab-redis 2 21h
gitlab-registry 2 21h
gitlab-sidekiq 6 21h
gitlab-sidekiq-all-in-1 1 21h
gitlab-task-runner 6 21h
gitlab-unicorn 7 21h
gitlab-unicorn-tests 1 21h
gitlab-workhorse-config 3 21h
And oc edit configmap gitlab-postgresql -o yaml shows:
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: 2019-11-27T03:42:12Z
labels:
app: postgresql
chart: postgresql-0.12.0
heritage: Tiller
release: gitlab
name: gitlab-postgresql
namespace: gitlab
resourceVersion: "381270"
selfLink: /api/v1/namespaces/gitlab/configmaps/gitlab-postgresql
uid: e5dfb5b2-10c7-11ea-8ad6-005056aae1e8
'Describe' gitlab-sidekiq-all-in-1
From oc describe pod gitlab-sidekiq-all-in-1-775b667679-6rw72
Name: gitlab-sidekiq-all-in-1-775b667679-6rw72
Namespace: gitlab
Priority: 0
PriorityClassName: <none>
Node: wallets-mgnt-worker100.mgmt.wallets/10.8.32.2
Start Time: Wed, 27 Nov 2019 03:42:13 +0000
Labels: app=sidekiq
pod-template-hash=3316223235
release=gitlab
Annotations: checksum/configmap=b84f6f00c232803d039bc31a74f0b85a86a69a20adcfc555235b983c21b729a7
checksum/configmap-pod=4968e0da5ba113227dabf98749e8fd04888c5f54f4fe10c60a6d7d36d97fbbb5
cluster-autoscaler.kubernetes.io/safe-to-evict=true
openshift.io/scc=anyuid
prometheus.io/port=3807
prometheus.io/scrape=true
Status: Pending
IP: 10.129.1.176
Controlled By: ReplicaSet/gitlab-sidekiq-all-in-1-775b667679
Init Containers:
certificates:
Container ID: docker://8aed490c6e84732d01ae132db8e3ae2b67195f0274951c378e8c55555e36b122
Image: registry.gitlab.com/gitlab-org/build/cng/alpine-certificates:20171114-r3
Image ID: docker-pullable://registry.gitlab.com/gitlab-org/build/cng/alpine-certificates@sha256:00ce9a585179e6b22c9bfea9ba82552630eab0bd25da4f13282b588b2ad022dc
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 27 Nov 2019 03:42:27 +0000
Finished: Wed, 27 Nov 2019 03:42:27 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 50m
Environment: <none>
Mounts:
/etc/ssl/certs from etc-ssl-certs (rw)
/usr/local/share/ca-certificates from custom-ca-certificates (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
configure:
Container ID: docker://d2605f6df572e7794eb99b958062c73edb8875dce899632ed5f1f343b5d18829
Image: busybox:latest
Image ID: docker-pullable://docker.io/busybox@sha256:1303dbf110c57f3edf68d9f5a16c082ec06c4cf7604831669faf2c712260b5a0
Port: <none>
Host Port: <none>
Command:
sh
/config/configure
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 27 Nov 2019 03:42:37 +0000
Finished: Wed, 27 Nov 2019 03:42:37 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 50m
Environment: <none>
Mounts:
/config from sidekiq-config (ro)
/init-config from init-sidekiq-secrets (ro)
/init-secrets from sidekiq-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
dependencies:
Container ID: docker://ed94dc9af34dac5066233013f294d261af2a2a9046bdf45c468ddcb9c56e1a75
Image: registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce:v12.5.0
Image ID: docker-pullable://registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce@sha256:0bada1ed67db739a88d7d11eeb652f8e8984793e8254ed61c5dcb5cca7dd98f3
Port: <none>
Host Port: <none>
Args:
/scripts/wait-for-deps
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 27 Nov 2019 23:55:12 +0000
Finished: Wed, 27 Nov 2019 23:59:32 +0000
Ready: False
Restart Count: 129
Requests:
cpu: 50m
Environment:
GITALY_FEATURE_DEFAULT_ON: 1
CONFIG_TEMPLATE_DIRECTORY: /var/opt/gitlab/templates
CONFIG_DIRECTORY: /srv/gitlab/config
SIDEKIQ_CONCURRENCY: 25
SIDEKIQ_TIMEOUT: 5
Mounts:
/etc/gitlab from sidekiq-secrets (ro)
/var/opt/gitlab/templates from sidekiq-config (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
Containers:
sidekiq:
Container ID:
Image: registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce:v12.5.0
Image ID:
Port: 3807/TCP
Host Port: 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 50m
memory: 650M
Liveness: exec [pgrep -f sidekiq] delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [pgrep -f sidekiq] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
prometheus_multiproc_dir: /metrics
GITALY_FEATURE_DEFAULT_ON: 1
CONFIG_TEMPLATE_DIRECTORY: /var/opt/gitlab/templates
CONFIG_DIRECTORY: /srv/gitlab/config
SIDEKIQ_CONCURRENCY: 25
SIDEKIQ_TIMEOUT: 5
SIDEKIQ_MEMORY_KILLER_MAX_RSS: 2000000
SIDEKIQ_MEMORY_KILLER_GRACE_TIME: 900
SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT: 30
Mounts:
/etc/gitlab from sidekiq-secrets (ro)
/etc/ssl/certs/ from etc-ssl-certs (ro)
/metrics from sidekiq-metrics (rw)
/srv/gitlab/INSTALLATION_TYPE from sidekiq-config (rw)
/srv/gitlab/config/initializers/smtp_settings.rb from sidekiq-config (rw)
/srv/gitlab/config/secrets.yml from sidekiq-secrets (rw)
/var/opt/gitlab/templates from sidekiq-config (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xd4ww (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
sidekiq-metrics:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
sidekiq-config:
<unknown>
init-sidekiq-secrets:
<unknown>
sidekiq-secrets:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
etc-ssl-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
custom-ca-certificates:
<unknown>
default-token-xd4ww:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-xd4ww
Optional: false
QoS Class: Burstable
Node-Selectors: gitlab=true
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 1h (x123 over 20h) kubelet, wallets-mgnt-worker100.mgmt.wallets Container image "registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce:v12.5.0" already present on machine
Warning BackOff 1m (x2862 over 20h) kubelet, wallets-mgnt-worker100.mgmt.wallets Back-off restarting failed container
Logs from gitlab-prometheus-server container.
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:329 msg="Starting Prometheus" version="(version=2.11.1, branch=HEAD, revision=e5b22494857deca4b806f74f6e3a6ee30c251763)"
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:330 build_context="(go=go1.12.7, user=root@d94406f2bb6f, date=20190710-13:51:17)"
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:331 host_details="(Linux 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 gitlab-prometheus-server-75585684b4-9wmrm (none))"
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:332 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2019-11-27T23:58:11.999Z caller=main.go:333 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:652 msg="Starting TSDB ..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:521 msg="Stopping scrape discovery manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:535 msg="Stopping notify discovery manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:557 msg="Stopping scrape manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:531 msg="Notify discovery manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:517 msg="Scrape discovery manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:551 msg="Scrape manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=manager.go:776 component="rule manager" msg="Stopping rule manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=manager.go:782 component="rule manager" msg="Rule manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2019-11-27T23:58:12.000Z caller=main.go:722 msg="Notifier manager stopped"
level=info ts=2019-11-27T23:58:12.000Z caller=web.go:448 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=error ts=2019-11-27T23:58:12.001Z caller=main.go:731 err="opening storage failed: list block dirs in \"/data\": open /data: permission denied"
Logs from gitlab-gitaly-0
+ /scripts/set-config /etc/gitaly/templates /etc/gitaly
Begin parsing .erb files from /etc/gitaly/templates
Writing /etc/gitaly/config.toml
Writing /etc/gitaly/shell-config.yml
Copying other config files found in /etc/gitaly/templates
+ exec /bin/sh -c '"/scripts/process-wrapper"'
Starting Gitaly
==> /var/log/gitaly/gitaly.log <==
==> /var/log/gitaly/gitlab-shell.log <==
==> /var/log/gitaly/gitaly.log <==
time="2019-11-28T00:25:35Z" level=info msg="Starting Gitaly" version="Gitaly, version 1.72.0"
time="2019-11-28T00:25:35Z" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/usr/local/bin/git
time="2019-11-28T00:25:35Z" level=warning msg="git path not configured. Using default path resolution" resolvedPath=/usr/local/bin/git
time="2019-11-28T00:25:35Z" level=info msg="clearing disk cache object folder" storage=default
time="2019-11-28T00:25:35Z" level=fatal msg="load config" config_path=/etc/gitaly/config.toml error="mkdir /home/git/repositories/+gitaly/tmp/diskcache623264096: permission denied"
SELinux logs
There doesnt appear to be any remaining SELinux issues. When I run sudo ausearch -m avc --start recent on the node that is running all the gitlab pods, there is no results returned.
gitlab-unicorn
oc logs gitlab-unicorn-79cb8b56c8-6qqwp -c dependencies shows it is dependant on postgres, so wont worry about this for now.
...
Is the server running on host "gitlab-postgresql" (172.30.119.70) and accepting
Permissions of PV
ls -alt ${PV_HOST_DIRECTORY} shows
drwxrwxrwx. 13 root root 221 Nov 27 02:48 gitlab-minio
drwxrwxrwx. 3 root root 45 Nov 27 02:47 repo-data-gitlab-gitaly-0
drwxrwxrwx. 3 root root 29 Nov 27 02:47 gitlab-prometheus-server
drwxrwxrwx. 6 deploy deploy 111 Nov 26 03:29 .
drwxrwxrwx. 2 root root 6 Nov 26 02:56 gitlab-redis
drwxr-xr-x. 3 root root 25 Nov 26 02:56 ..