Using consul for global state, deployment events, etc
@nolith raised this. GitLab.com already runs consul across the cluster, but we only use it for limited purposes.
The proposal is to use consul for storing key-value pairs around infrastructure events, such as canary-drain, rolling deployments etc.
consul_exporter, which we have already deployed across the fleet, we could integrate selected values with prometheus using the
kv.prefix configuration options: https://github.com/prometheus/consul_exporter#keyvalue-checks
With this state, we could improve the accuracy of our alerts, for example by disabling alerts on our canary nodes when the canary is drained, or (for example) allowing a slightly elevated error rate during deployments.