Zero-downtime upgrade: GitLab is expected to be running when stopped
Recognized in a customer ticket (internal link).
During the zero-downtime upgrade process, the following sequence of events take place:
- GitLab is explicitly stopped on the deploy node
- The deploy node is upgraded
- Puma is reloaded. Puma wasn't running so it is still stopped.
- A health check is performed on the deploy node. GitLab is still stopped so this fails.
The problem is that GitLab was never explicitly restarted on the deploy node, so its services are stopped. As a result, the health check fails.