healthcheck for gitlab-shell is not precise
Summary
From some investigative work here: #2356 (comment 430514122)
In the gitlab-shell
Pod, if there are connected clients doing work, and for SOME REASON the sshd
process dies (the service that accepts ssh connections), the healthcheck script will continue to pass so long as there are active client connections. This is not good, as this Pod will continue to exist in the Service Endpoint, but would be unable to process incoming connections. The reason that we successfully pass the healthcheck is that we are using pgrep
which client connections match. New clients will instead receive:
% git fetch minikube
ssh: connect to host gitlab.172.17.0.3.nip.io port 32022: Connection refused
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Steps to reproduce
- In a working gitlab installation
- Push/Clone a repository that takes enough time to perform this test (
😉 ) - During a push or clone operation, stop the
sshd
service that is running (normally I've seen this as PID 13) - In another shell, make another attempt at a git client operation, this will fail with the aforementioned error
Current behavior
The Pod is passing it's healthcheck despite a critical service not running.
Expected behavior
The Pod should fail the healthcheck removing it from the Service Endpoints
Milestones
-
CNG healthcheck is modified - gitlab-org/build/CNG!539 (merged) -
if accepted, modify the documentation as necessary
-
-
Documentation is updated - !1649 (merged) -
readiness probe is added - !1649 (merged)
Edited by John Skarbek