The git workers (workhorse and ssh) do not recover from database failure
Summary
The application is not capable of recovering from a database restart.
Steps to reproduce
Get a lot of constant git traffic both through ssh and https and restart the database.
What is the current bug behavior?
Git workers lock returning errors 502 through the API which prevents people from actually accessing through git at all.
What is the expected correct behavior?
That the application recovers from a database failure without needing a restart.
We got the application recovering after bouncing all the workers manually.
Relevant logs and/or screenshots
High error rate until we manually bounced the git workers:
Coming from https://gitlab.com/gitlab-com/infrastructure/issues/1218
Possible fixes
Use exponential backoff all around, eventually give up when a connection fails. We didn't investigate further down the stack what the specific error case is but in general it looks like we just keep retrying forever leaving the application in a locked state.
