Standard configuration for running helm from the console hosts
With the current way helm handles rollback we will likely need to manually intervene at some point, currently this is possible for preprod but not easy for staging/production.
I think we should all probably have the ability to run helm
from the console hosts similar to kubectl
. Past proposals to do this were abandoned like in gitlab-com/runbooks!1967 (closed) because while we recognize the importance, we are prioritizing safety and don't want to encourage manual command on the cluster.
However, today in preprod we got into a deadlocked state where nothing was able to properly deploy because of a failed deployment that took a long time to finish because of unhealthy pods, and the following deployment which fixed the issue but was issued while the previous one was still on-going. If this happened in staging/production I'm not sure how we would have recovered without running helm directly from the console.
Example:
$ helm tiller run helm history gitlab
Installed Helm version v2.16.5
Installed Tiller version v2.16.5
Helm and Tiller are the same version!
Starting Tiller...
Tiller namespace: kube-system
Running: helm history gitlab
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
331 Thu May 7 19:10:21 2020 SUPERSEDED gitlab-3.3.1 12.10.1 Upgrade complete
332 Fri May 8 13:22:12 2020 SUPERSEDED gitlab-3.3.1 12.10.1 Upgrade complete
333 Mon May 11 11:57:12 2020 SUPERSEDED gitlab-3.3.1 12.10.1 Upgrade "gitlab" failed: failed to create resource: Deplo...
334 Mon May 11 11:57:22 2020 SUPERSEDED gitlab-3.3.1 12.10.1 Rollback to 332
335 Mon May 11 12:09:40 2020 SUPERSEDED gitlab-3.3.1 12.10.1 Upgrade complete
336 Wed May 13 19:39:58 2020 SUPERSEDED gitlab-3.3.1 12.10.1 Upgrade complete
337 Thu May 14 14:11:10 2020 DEPLOYED gitlab-3.3.1 12.10.1 Upgrade complete
338 Fri May 15 11:16:24 2020 PENDING_UPGRADE gitlab-3.3.1 12.10.1 Preparing upgrade
339 Fri May 15 11:44:03 2020 FAILED gitlab-3.3.1 12.10.1 Upgrade "gitlab" failed: timed out waiting for the condition
340 Fri May 15 12:14:13 2020 PENDING_ROLLBACK gitlab-3.3.1 12.10.1 Rollback to 338
We were stuck trying to rollback to a deployment that was invalid.
The fix for this was simple:
$ helm tiller run helm rollback gitlab 337
Which forced us back to the last known good version.