Health checks and update RollingUpdateStrategy for registry
We deployed the first registry configuration change while taking live traffic production#1127 (closed) and noticed that we saw a pretty big, but brief error spike during the apply.
Observing the apply in realtime the pods serving production traffic were terminated very aggressively.
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Proposed changes to lower errors during the rolling update
-
Add readiness/liveness probes to the helm chart gitlab-org/charts/gitlab#1571 (closed) implemented in gitlab-org/charts/gitlab!932 (merged) -
reduce the max unavailable to something much lower, I think we should set this to the absolute number of 1or lower the percentage -
Allow draintimeout https://github.com/docker/distribution/blob/release/2.7/configuration/configuration.go#L90 to be set in the chart and increase it. Draintimeout added in gitlab-org/charts/gitlab!934 (merged) -
Configure terminationGracePeriod if necessary (this is set to 30 seconds by default) gitlab-org/charts/gitlab!935 (closed)
Edited by John Jarvis
