Determine a definitive single source of truth for a deployed version
We have a few places where we could possibly determine the version running on gitlab.com:
- Prometheus - returns the the version that a majority of hosts is running. Inconsistent while a deploy is in progress.
gitlab.com/api/v4/version- returns the version that the responding API host is running. Inconsistent while a deploy is in progress.
- Chef - returns the version set after an upgrade is completed. Inconsistent if a deploy fails.
gitlab-org/gitlabdeployments - created after a deployment is successfully completed. Inconsistent if a deploy fails.
Depending on whether or not a production deploy is in-progress, and how many hosts have completed an upgrade, the version returned by all of these can differ.
#4 is stable and never fails to get created, I think that's the one we want to rely on. But I'm unsure.
If we deploy a new version to 90% of our fleet, and 10% fail to update, which version are we really running? I can see arguments for both "majority rules", and "if it didn't succeed, we're not actually running it."
Our production deploys can take as long as 2 hours, which is a long time to be unsure of what version is actually running.