DigitalOcean Kubernetes integration fails when deploying Prometheus
Summary
DigitalOcean now also offers Kubernetes (currently in limited availability) which I tried to add to GitLab to use AutoDevOps. I added an existing cluster without any issues.
The cluster (seemed) fine an I could install Tiller, Ingress as well as the GitLab Runner. But installing Prometheus always fails with the error "Kubernetes error.", no further details provided by the UI.
If further logs are required I would need guidance in how to get them.
Here are the DigitalOcean docs: https://www.digitalocean.com/docs/kubernetes/overview/
Steps to reproduce
When adding the existing cluster I tried two ways:
- Preparing the cluster by adding the following: kubectl apply -f http://x.co/rm082018and then specifyinggitlab-managed-appsas the namespace and leaving RBAC unchecked.
- Preparing the cluster by creating an admin service account: kubectl create serviceaccount cluster-admin-gitlab-saandkubectl create clusterrolebinding cluster-admin-gitlab-sa --clusterrole=cluster-admin --serviceaccount=default:cluster-admin-gitlab-saand using the token of this service account as well as checking the RBAC checkbox.
Then install Tiller & Ingress (both will succeed, ingress will also show the IP). Then I tried to install Prometheus which will fail with "Kubernetes error.".
I thought it might be because of a missing dynamic volume provisioner but when I toggled AutoDevOps on in the project, the pipeline started running and during the "staging" phase it deployed a PostreSQL pod with a dynamic persistent storage which was correctly created by the default digital ocean provisioner (storage class: "do-block-storage" which is also set to default) and I could see the volume on the DigitalOcean UI. So I guess that's not the issue here.
Example Project
(Not needed)
What is the current bug behavior?
Prometheus installation fails with "Kubernetes error.". What I noticed from the Kubernetes events is that it also tries to start a load balancer and seems to fail in doing so.
What is the expected correct behavior?
Prometheus should install correctly or at least provide a useful error.
Relevant logs and/or screenshots
kubectl get events --namespace=gitlab-managed-apps
10m         Normal    Scheduled                    Pod          Successfully assigned gitlab-managed-apps/ingress-nginx-ingress-controller-f7bdf6c94-jwnzh to crazy-lederberg-hr4
10m         Normal    Pulled                       Pod          Container image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.10.2" already present on machine
10m         Normal    Created                      Pod          Created container
10m         Normal    Started                      Pod          Started container
10m         Normal    SuccessfulCreate             ReplicaSet   Created pod: ingress-nginx-ingress-controller-f7bdf6c94-jwnzh
8m57s       Normal    EnsuringLoadBalancer         Service      Ensuring load balancer
10m         Normal    ScalingReplicaSet            Deployment   Scaled up replica set ingress-nginx-ingress-controller-f7bdf6c94 to 1
9m2s        Warning   CreatingLoadBalancerFailed   Service      Error creating load balancer (will retry): Failed to ensure load balancer for service gitlab-managed-apps/ingress-nginx-ingress-controller: Get https://api.digitalocean.com/v2/load_balancers/19a8173f-6875-4c71-94b9-309f5e18a140: context deadline exceeded
8m54s       Normal    EnsuredLoadBalancer          Service      Ensured load balancer
10m         Normal    Scheduled                    Pod          Successfully assigned gitlab-managed-apps/ingress-nginx-ingress-default-backend-845f7f5785-d6bdp to crazy-lederberg-hr4
10m         Normal    Pulled                       Pod          Container image "k8s.gcr.io/defaultbackend:1.3" already present on machine
10m         Normal    Created                      Pod          Created container
10m         Normal    Started                      Pod          Started container
10m         Normal    SuccessfulCreate             ReplicaSet   Created pod: ingress-nginx-ingress-default-backend-845f7f5785-d6bdp
10m         Normal    ScalingReplicaSet            Deployment   Scaled up replica set ingress-nginx-ingress-default-backend-845f7f5785 to 1
12m         Normal    Scheduled                    Pod          Successfully assigned gitlab-managed-apps/install-helm to crazy-lederberg-hr4
12m         Normal    Pulled                       Pod          Container image "alpine:3.6" already present on machine
12m         Normal    Created                      Pod          Created container
12m         Normal    Started                      Pod          Started container
10m         Normal    Scheduled                    Pod          Successfully assigned gitlab-managed-apps/install-ingress to crazy-lederberg-hr4
10m         Normal    Pulled                       Pod          Container image "alpine:3.6" already present on machine
10m         Normal    Created                      Pod          Created container
10m         Normal    Started                      Pod          Started container
7m39s       Normal    Scheduled                    Pod          Successfully assigned gitlab-managed-apps/install-prometheus to crazy-lederberg-hr4
7m37s       Normal    Pulled                       Pod          Container image "alpine:3.6" already present on machine
7m37s       Normal    Created                      Pod          Created container
7m37s       Normal    Started                      Pod          Started container
12m         Normal    Scheduled                    Pod          Successfully assigned gitlab-managed-apps/tiller-deploy-6cc8b46cf-lkkws to crazy-lederberg-hr4
12m         Warning   FailedMount                  Pod          MountVolume.SetUp failed for volume "tiller-certs" : secret "tiller-secret" not found
12m         Normal    Pulled                       Pod          Container image "gcr.io/kubernetes-helm/tiller:v2.7.2" already present on machine
12m         Normal    Created                      Pod          Created container
12m         Normal    Started                      Pod          Started container
12m         Normal    SuccessfulCreate             ReplicaSet   Created pod: tiller-deploy-6cc8b46cf-lkkws
12m         Normal    ScalingReplicaSet            Deployment   Scaled up replica set tiller-deploy-6cc8b46cf to 1Output of checks
This bug happens on a self-hosted GitLab CE instance (docker image).
Results of GitLab environment info
It's a self hosted installation using the CE docker image.
System information
System:
Current User:   git
Using RVM:      no
Ruby Version:   2.4.5p335
Gem Version:    2.7.6
Bundler Version:1.16.2
Rake Version:   12.3.1
Redis Version:  3.2.12
Git Version:    2.18.1
Sidekiq Version:5.2.1
Go Version:     unknown
GitLab information
Version:        11.4.3
Revision:       adea99b
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     postgresql
URL:            https://xx.xx.xx
HTTP Clone URL: https://xx.xx.xx/some-group/some-project.git
SSH Clone URL:  git@xx.xx.xx:some-group/some-project.git
Using LDAP:     no
Using Omniauth: no
GitLab Shell
Version:        8.3.3
Repository storage paths:
- default:      /var/opt/gitlab/git-data/repositories
Hooks:          /opt/gitlab/embedded/service/gitlab-shell/hooks
Git:            /opt/gitlab/embedded/bin/gitResults of GitLab application Check
Checking GitLab Shell ...
GitLab Shell version >= 8.3.3 ? ... OK (8.3.3)
Repo base directory exists?
default... yes
Repo storage directories are symlinks?
default... no
Repo paths owned by git:root, or git:git?
default... yes
Repo paths access is drwxrws---?
default... yes
hooks directories in repos are links: ...
2/1 ... ok
(many entries, all of them ok)
10/304 ... ok
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Check GitLab API access: OK
Redis available via internal API: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes
Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Reply by email is disabled in config/gitlab.yml
Checking LDAP ...
LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab ...
Git configured correctly? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... yes
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
Projects have namespace: ...
2/1 ... yes
(many entries, all of them yes)
10/304 ... yes
Redis version >= 2.8.0? ... yes
Ruby version >= 2.3.5 ? ... yes (2.4.5)
Git version >= 2.9.5 ? ... yes (2.18.1)
Git user has default SSH configuration? ... yes
Active users: ... 69
Checking GitLab ... Finished