Add health check for postgres in gitaly E2E
A number of failures are happening intermittently on the 'Praefect connectivity commands' test. e.g. https://gitlab.com/gitlab-org/gitlab-qa-mirror/-/jobs/2003002013#L7223
Looking at the errors in the logs 2 things that stands out is are
- occurs shortly after running a test that stops the database, and when we call the
start_all_nodesmethod we then get a failure. - instead of retrying the sql query it is immediately exiting suggesting that the command is exiting when an exception occurs.
This leads me to to suspect that in some situations although the postgres docker container has started, the database itself is still starting up. Mostly this happens very quickly, so we don't encounter the error, but perhaps due to resource constraints in the pipelines, it is somewhat more likely to occur.
This is very difficult to reproduce locally, but I was able to reproduce the same errors by tampering with the postgres docker container startup script, and adding a short sleep. This lead to failures that has the same error logs as the failed jobs we have seen. Querying the database before it is ready results in a non zero exit code being returned, which then causes the test to fail which is what I am suspecting has been happening.
So with that in mind - this change adds a health check to ensure that the database is actually started, and accepting connections before proceeding further into the test. In the event of the database not being ready - then we allow for a non-zero exit code, and retry. That will give some leeway for scenarios where the database is slower to start up.