gitlab-shell health check produces non-sense errors
Summary
Right now the gitlab-shell is deployed with a readiness and a liveness probe that looks like this:
Liveness
The liveness probe executes a health check script which simply looks for the PID file that sshd produces, and checks to see if said PID is actively running.
livenessProbe:
exec:
command:
- /scripts/healthcheck
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
Readiness
The readiness check however, looks to see if the TCP port sshd listens on is active.
readinessProbe:
failureThreshold: 2
initialDelaySeconds: 10
periodSeconds: 5
successThreshold: 1
tcpSocket:
port: 22
timeoutSeconds: 3
Steps to reproduce
Deploy gitlab-shell
with the readiness probe enabled.
Configuration used
I don't have anything special on my gitlab-shell values in helm besides changing the port (which is not reflected in the yaml above):
## doc/charts/globals.md#configure-gitlab-shell-settings
shell:
port: 2223
authToken: { }
# secret:
# key:
hostKeys: { }
# secret:
The resulting Kube yaml from this is:
# Source: gitlab/charts/gitlab/charts/gitlab/charts/gitlab-shell/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitlab-gitlab-shell
namespace: elliot-gitlab
labels:
app: gitlab-shell
chart: gitlab-shell-7.1.2
release: gitlab
heritage: Helm
annotations:
app.gitlab.com/app: ""
app.gitlab.com/env: ""
spec:
selector:
matchLabels:
app: gitlab-shell
release: gitlab
template:
metadata:
labels:
app: gitlab-shell
chart: gitlab-shell-7.1.2
release: gitlab
heritage: Helm
annotations:
checksum/config: d208b4e7ec82e26f0b2ed6679e719b249f9f8038fc9c56711fc2e866d72eded7
checksum/config-sshd: fa8dbe79bd486f3edb6056d1c714ea628319a09aa77887dade5ac24c82ab9012
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
spec:
initContainers:
- name: certificates
image: registry.gitlab.com/gitlab-org/build/cng/alpine-certificates:20191127-r2
env:
volumeMounts:
- name: etc-ssl-certs
mountPath: /etc/ssl/certs
readOnly: false
- name: etc-pki-ca-trust-extracted-pem
mountPath: /etc/pki/ca-trust/extracted/pem
readOnly: false
- name: custom-ca-certificates
mountPath: /usr/local/share/ca-certificates
readOnly: true
resources:
requests:
cpu: 50m
- name: configure
command: ['sh', '/config/configure']
image: "registry.gitlab.com/gitlab-org/cloud-native/mirror/images/busybox:latest"
env:
volumeMounts:
- name: shell-config
mountPath: /config
readOnly: true
- name: shell-init-secrets
mountPath: /init-config
readOnly: true
- name: shell-secrets
mountPath: /init-secrets
readOnly: false
resources:
requests:
cpu: 50m
securityContext:
runAsUser: 1000
fsGroup: 1000
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: "kubernetes.io/hostname"
labelSelector:
matchLabels:
app: gitlab-shell
release: gitlab
automountServiceAccountToken: false
containers:
- name: gitlab-shell
image: "registry.gitlab.com/gitlab-org/build/cng/gitlab-shell:v14.23.0"
securityContext:
runAsUser: 1000
ports:
- containerPort: 2223
name: ssh
env:
- name: GITALY_FEATURE_DEFAULT_ON
value: "1"
- name: CONFIG_TEMPLATE_DIRECTORY
value: '/etc/gitlab-shell'
- name: CONFIG_DIRECTORY
value: '/srv/gitlab-shell'
- name: KEYS_DIRECTORY
value: '/etc/gitlab-secrets/ssh'
- name: SSH_DAEMON
value: "openssh"
volumeMounts:
- name: shell-config
mountPath: '/etc/gitlab-shell'
- name: shell-secrets
mountPath: '/etc/gitlab-secrets'
readOnly: true
- name: shell-config
mountPath: '/etc/krb5.conf'
subPath: krb5.conf
readOnly: true
- name: sshd-config
mountPath: /etc/ssh/sshd_config
subPath: sshd_config
readOnly: true
- name: etc-ssl-certs
mountPath: /etc/ssl/certs/
readOnly: true
- name: etc-pki-ca-trust-extracted-pem
mountPath: /etc/pki/ca-trust/extracted/pem
readOnly: true
livenessProbe:
exec:
command:
- /scripts/healthcheck
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
readinessProbe:
tcpSocket:
port: 2223
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 2
resources:
requests:
cpu: 0
memory: 6M
terminationGracePeriodSeconds: 30
volumes:
- name: shell-config
configMap:
name: gitlab-gitlab-shell
- name: sshd-config
configMap:
name: gitlab-gitlab-shell-sshd
- name: shell-init-secrets
projected:
defaultMode: 0440
sources:
- secret:
name: "gitlab-gitlab-shell-host-keys"
- secret:
name: "gitlab-gitlab-shell-secret"
items:
- key: "secret"
path: shell/.gitlab_shell_secret
# Actual config dirs that will be used in the container
- name: shell-secrets
emptyDir:
medium: "Memory"
- name: etc-ssl-certs
emptyDir:
medium: "Memory"
- name: etc-pki-ca-trust-extracted-pem
emptyDir:
medium: "Memory"
- name: custom-ca-certificates
projected:
defaultMode: 0440
sources:
- secret:
name: gitlab-wildcard-tls-ca
nodeSelector:
kubernetes.io/arch: amd64
^ Generated using helm template
Current behavior
As a result; every time the Kubernetes health probe checks to see if the gitlab-shell pod is "ready", sshd logs the following message:
kex_exchange_identification: Connection closed by remote host
This ends up filling up the log output for the pod itself with this:
{"component": "gitlab-shell","subcomponent":"ssh","level":"unknown","time":"2023-07-27T18:08:55Z","message":"kex_exchange_identification: Connection closed by remote host\r"}
Expected behavior
Once I understood the cause of it I'm not terribly concerned, but at a glance this would look like another service that is using GitLab is not working properly or a client is experiencing odd failures.
I'd expect that the readiness probe was such that the logs were not producing errors that are actually nothing to be concerned about.
Versions
- Chart:
7.1.2
- Platform:
- Cloud: N/A
- Self-hosted:
1.27.3
- Kubernetes: (
kubectl version
)- Client:
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
- Server:
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-14T09:47:40Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}
- Client:
- Helm: (
helm version
)- Client:
3.11.1
- Server: N/A
- Client:
Relevant logs
(Provided above)
As a suggestion, a /scripts/readinesscheck
that executes something like this:
/bin/bash -c 'ssh -T 127.0.0.1 -p 22 -o "StrictHostKeyChecking=no" || true' 2>&1 grep Permission
Which would actually validate that sshd is ready to serve traffic on that port, rather than just that port being available. It would also not produce worry-some logs that actually don't need any action taken on them.