Health check endpoints report OK status before gitlab is reconfigured on startup

Summary

Using the health check endpoints to see if a GitLab instance/container is up and running isn't reliable as it is still reconfiguring when the endpoints are already reporting OK Status. I'm not sure if this is expected but I assumed it would mean that it's 100% ready.

For example, the python-gitlab is adding additional waits to ensure the gitlab instance is really available for integration tests against a live container.

I'm not even sure if the instance is 100% ready when gitlab reports "gitlab Reconfigured!", as there are sometimes a few additional tasks logged depending on the configuration.

Steps to reproduce

Example:

#!/bin/bash
export GREP_COLORS='ms=01;32'

docker run --name gitlab-test -d -p 8000:80 \
    -e GITLAB_OMNIBUS_CONFIG="gitlab_rails['monitoring_whitelist'] = ['172.17.0.1'];" \
    gitlab/gitlab-ce:latest

i=0
while [[ $i -le 200 ]]; do
    echo "GitLab starting up.. ${i}s"
    echo ""
    echo "Login page: $(curl -IL http://localhost:8000/users/sign_in 2>/dev/null | grep --color=always '200 OK')"
    echo "Health: $(curl -I http://localhost:8000/-/health 2>/dev/null | grep --color=always '200 OK')"
    echo "Readiness: $(curl -I http://localhost:8000/-/readiness?all=1 2>/dev/null | grep --color=always '200 OK')"
    echo "Liveness: $(curl -I http://localhost:8000/-/liveness 2>/dev/null | grep --color=always '200 OK')"
    echo "Reconfigured: $(docker container logs gitlab-test 2>/dev/null | grep --color=always "gitlab Reconfigured!")"
    logs=$(docker container logs gitlab-test 2>/dev/null | grep --color=always "^==>")
    echo "GitLab logs: $logs"

    echo ""
    sleep 1
    i=$((i+1))
    [ "$logs" ] && break # We're only getting logs from the container now
done

docker rm -f gitlab-test

And then docker container logs --follow gitlab-test in another shell to see what's still going on in the container.

What is the current bug behavior?

During container startup, healthchecks start returning OK status (200), but the chef cookboks are still running, so tests/requests against the instance might fail.

What is the expected correct behavior?

I'd expect that all of the 3 endpoints available (health, readiness, liveness) reporting OK means that GitLab is fully functional.

Assignee Loading
Time tracking Loading