Customer k8s implementation challenges
Zendesk Tickets
- https://gitlab.zendesk.com/agent/tickets/110817
- https://gitlab.zendesk.com/agent/tickets/111589
- https://gitlab.zendesk.com/agent/tickets/113839
- https://gitlab.zendesk.com/agent/tickets/114585
Request:
- From: < Diana Stanley/Support >
This customer bought a premium license for one seat in order to get support help to deploy GitLab, GitLab runners, and use AutoDevOps with k8s in every piece of the infrastructure. They have had a total of at least 12 support calls, most of them multi-hour calls, to work through all the issues he encountered trying to get this set up. The purpose of this ticket is to try to document all the problems he ran into and derive improvements in documentation and/or implementation to make it easier for future customers. Also, this is the first customer I'm aware of that has used k8s in every single piece of the infrastructure where we support it. Perhaps we should try to leverage this for marketing or reference implementation purposes.
Problems addressed in https://gitlab.zendesk.com/agent/tickets/110817
- He was attempting to upgrade from CE to EE using the helm chart. He had installed CE with chart 1.2.4 and ran into https://gitlab.com/gitlab-org/gitlab-ee/issues/7980 trying to upgrade to EE. We got past this by upgrading to EE using the 1.4.1 helm chart.
- In order to upgrade to EE he had to set the image urls explicitly in his config (
gitlab-rails-ee
, etc). - When doing the upgrade he had actually started by installing a new CE and restoring data to it. After doing the restore we had to give him details about how to get secrets from one instance and apply them to another.
- There was still a problem getting his runner to register. He eventually got past this: "There was token presented was not correct the whole time. I requested a new reg token and the new runner is able to register."
Problems addressed in https://gitlab.zendesk.com/agent/tickets/111589:
- Storage volume for prometheus: https://gitlab.com/gitlab-org/gitlab-ce/issues/46762?
- Had to remove an old cluster that no longer existed
- "Adding the a project k8s tiller works, but installing project runner under the new k8s at the project level is failing". I think this was due to not having the Premium license installed. He was unable to use multiple k8s clusters until installing it.
Problems addressed in https://gitlab.zendesk.com/agent/tickets/113839:
- This appears to have been a problem with a webhook on one particular repo. Not explicitly k8s related. There were a lot of scary k8s-related suggestions coming from support.
Problems addressed in https://gitlab.zendesk.com/agent/tickets/114585:
- GitLab wasn't recognizing active k8s cluster when attempting to audo-deploy to it. Seems to have been some non-printing character in the gitlab-ci.yml file.
- Many questions about clusters and their association with projects and runners.
- "Trying to launch a chart to k8s" using AutoDevOps. Might be associated with https://gitlab.com/gitlab-org/gitlab-ce/issues/54760? Customer didn't reply to a suggestion to add --force to the
helm upgrade
command. - Minio pod ran out of space, causing build pipelines to fail because it could not upload artifacts. Looking at disk usage, there was 6.5G of registry data and 1.5G of artifact data. Looked for but found no satisfying method of cleaning up either of those. Worked with customer to configure registry with S3 object store. Found that we needed to document S3 permission scopes to get registry working. https://gitlab.com/gitlab-org/gitlab-ce/issues/58881 After cleaning up disk space registry was working and builds proceeded, still working on the previous issue "Trying to launch a chart to k8s".