Skip to content

Divers issues with Kubernetes integration in Gitlab

Dear Gitlab team,

Context

We have 2 servers where I built a Kubernetes cluster using the Rancher RKE tool. I'm now in the process to use it with Gitlab since Gitlab has a Kubernetes integration from gitlab.com.

So my context is using Gitlab's Kubernetes integration with our own servers, not with GKE.

About this issue

I encountered many permission issues which I solved in a ugly way (allowing too much things). I'm opening this issue in order to summarise all of them so that they could be fixed and improve the Kubernetes integration feature for people using their own k8s cluster. I hope this will be well accepted by you.

Step by step with issue descriptions

Here I'm showing you, step by step, how I have installed my cluster, integrated it with Gitlab, and each issues with their dirty fix I did.

Initial configurations

Here is the servers configuration :

$ docker-machine ls
NAME            ACTIVE   DRIVER    STATE     URL                         SWARM   DOCKER        ERRORS
master          -        generic   Running   tcp://XX.XX.XXX.XXX:2376            v17.03.2-ce
worker          -        generic   Running   tcp://YYY.YYY.YY.YY:2376            v17.03.2-ce
...

And here is the cluster.yml file used by RKE in order to deploy the Kubernetes cluster :

# default k8s version: v1.8.10-rancher1-1
# default network plugin: canal
nodes:
  - address: XX.XX.XXX.XXX
    port: XXXXX
    ssh_key_path: '~/.ssh/gitlab_ci_id_rsa'
    user: root
    role: [controlplane,etcd]
  - address: YYY.YYY.YY.YY
    port: XXXXX
    ssh_key_path: '~/.ssh/gitlab_ci_worker_id_rsa'
    user: root
    role: [worker]
ingress:
  provider: none

Deploy the Kubernetes cluster

Now I'm deploying the k8s cluster :

$ rke up --config cluster.yml
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [XX.XX.XXX.XXX]
INFO[0000] [dialer] Setup tunnel for host [YYY.YYY.YY.YY]
INFO[0001] [network] Deploying port listener containers
INFO[0001] [network] Pulling image [rancher/rke-tools:v0.1.4] on host [XX.XX.XXX.XXX]
...
INFO[0134] [addons] Executing deploy job..
INFO[0140] [addons] User addons deployed successfully
INFO[0140] Finished building Kubernetes cluster successfully

Finally configure the local machine in order to get the kubectl command working :

$ rm -rf ~/.kube && mkdir ~/.kube
$ cp ./kube_config_cluster.yml ~/.kube/config

Checking all is fine:

$ kubectl cluster-info
Kubernetes master is running at https://XX.XX.XXX.XXX:6443
KubeDNS is running at https://XX.XX.XXX.XXX:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

From here, I need the logs and the dashboard in order to have a complete view of what is happening.

Accessing kubernetes logs

In order to check the k8s logs, there is very well done tool named kail which tail the k8s logs. Install it, run it and you'll get the logs.

Deploy the kubernetes dashboard (Optional)

I'm deploying the Kubernetes dashboard in order to have a better look at the happening issues:

$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
role.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created
rolebinding.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created
deployment.apps "kubernetes-dashboard" created
service "kubernetes-dashboard" created

In another terminal, run kubectl proxy and access the dashboard with the URL http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy. To authenticate, select the Token mode and copy/past the token from the following:

$ kubectl get secrets
NAME                  TYPE                                  DATA      AGE
default-token-6gncr   kubernetes.io/service-account-token   3         8m

$ kubectl describe secret default-token-6gncr
Name:         default-token-6gncr
Namespace:    default
Labels:       <none>
Annotations:  kubernetes.io/service-account.name=default
              kubernetes.io/service-account.uid=0c05b98a-63e6-11e8-bb81-fa163ed63d33

Type:  kubernetes.io/service-account-token

Data
====
ca.crt:     1017 bytes
namespace:  7 bytes
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.....
Dashboard permissions issue

Here is the first permission issue but I have already opened an issue for that. So to quickly fix it in a dirty way :

$ kubectl create clusterrolebinding --user system:serviceaccount:default:default default-sa-admin --clusterrole cluster-admin
clusterrolebinding.rbac.authorization.k8s.io "default-sa-admin" created

Now refresh and you get the dashboard without any remaining errors.

Heapster installation

Look at the logs with kail (and as described in the dashboard project's README.md file), Heapster needs to be installed :

$ git clone https://github.com/kubernetes/heapster.git
$ cd heapster
$ kubectl create -f deploy/kube-config/influxdb/
deployment.extensions "monitoring-grafana" created
service "monitoring-grafana" created
serviceaccount "heapster" created
deployment.extensions "heapster" created
service "heapster" created
deployment.extensions "monitoring-influxdb" created
service "monitoring-influxdb" created
$ kubectl create -f deploy/kube-config/rbac/heapster-rbac.yaml
clusterrolebinding.rbac.authorization.k8s.io "heapster" created

Within the k8s logs I can see the influxdb booting and so on and in the dashboard I have working graphs.

Import the cluster in Gitlab

From Gitlab Kubernetes page of a project:

  1. click the Add an existing Kubernetes cluster button
  2. Set the Kubernetes cluster name, the API URL with https://XX.XX.XXX.XXX:6443/, CA Certificate with the output of ssh root@XX.XX.XXX.XXX cat /etc/kubernetes/ssl/kube-ca.pem and the Token with the same token used to login to the dashboard
  3. Click the Add Kubernetes cluster button

Install Helm Tiller

Here comes the first hard part. When I click the Install button, Gitlab is showing the following error :

Error: error installing: deployments.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:default" cannot create deployments.extensions in the namespace "gitlab-managed-apps"

The gitlab-managed-apps namespace has been well created, but some permissions are missing preventing Gitlab to finish the installation of Tiller in k8s. I fixed it the same way than the dashboard :

$ kubectl create clusterrolebinding --user system:serviceaccount:gitlab-managed-apps:default default-gitlab-sa-admin --clusterrole cluster-admin
clusterrolebinding.rbac.authorization.k8s.io "default-gitlab-sa-admin" created

Refresh Gitlab (otherwise the Install button is disabled) and click again Install. You'll get the Helm Tiller was successfully installed on your Kubernetes cluster success message, and the Install buttons from Ingress, Prometheus and GitLab Runner available.

Install Ingress

Clicking the Install button seem to work fine as I get the Ingress was successfully installed on your Kubernetes cluster success message, but looking closer to the dashboard, I can see that the ingress deployment failed with the error message "Back-off restarting failed container" is visible and here are the pod logs :

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:    0.10.2
  Build:      git-fd7253a
  Repository: https://github.com/kubernetes/ingress-nginx
-------------------------------------------------------------------------------
 I0530 09:14:33.705106       7 flags.go:159] Watching for ingress class: nginx
I0530 09:14:33.706004       7 main.go:181] Creating API client for https://10.43.0.1:443
I0530 09:14:33.797413       7 main.go:193] Running in Kubernetes Cluster version v1.10 (v1.10.1) - git (clean) commit d4ab47518836c750f9949b9e0d387f20fb92260b - platform linux/amd64
F0530 09:14:33.817365       7 main.go:80] ✖ It seems the cluster it is running with Authorization enabled (like RBAC) and there is no permissions for the ingress controller. Please check the configuration

Update 5th of June I solved the permission issue by removing all the ingress stuff I could find and run the following command :

$ helm install --namespace gitlab-managed-apps --name ingress --set rbac.create=true stable/nginx-ingress
NAME:   ingress
LAST DEPLOYED: Tue Jun  5 15:00:55 2018
NAMESPACE: gitlab-managed-apps
STATUS: DEPLOYED

RESOURCES:
==> v1beta1/Role
NAME                   AGE
ingress-nginx-ingress  1s

==> v1/Service
NAME                                   TYPE          CLUSTER-IP     EXTERNAL-IP  PORT(S)                     AGE
ingress-nginx-ingress-controller       LoadBalancer  XX.XX.XX.XXX   <pending>    80:30908/TCP,443:31375/TCP  1s
ingress-nginx-ingress-default-backend  ClusterIP     YY.YY.YYY.YYY  <none>       80/TCP                      1s

==> v1beta1/Deployment
NAME                                   DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
ingress-nginx-ingress-controller       1        1        1           0          1s
ingress-nginx-ingress-default-backend  1        1        1           0          1s

==> v1beta1/PodDisruptionBudget
NAME                                   MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS  AGE
ingress-nginx-ingress-controller       1              N/A              0                    1s
ingress-nginx-ingress-default-backend  1              N/A              0                    1s

==> v1/Pod(related)
NAME                                                    READY  STATUS             RESTARTS  AGE
ingress-nginx-ingress-controller-68f4d665bc-pmsl7       0/1    ContainerCreating  0         1s
ingress-nginx-ingress-default-backend-6f58fb5f56-24ttb  0/1    ContainerCreating  0         1s

==> v1/ConfigMap
NAME                              DATA  AGE
ingress-nginx-ingress-controller  1     1s

==> v1beta1/ClusterRole
NAME                   AGE
ingress-nginx-ingress  1s

==> v1beta1/RoleBinding
NAME                   AGE
ingress-nginx-ingress  1s

==> v1/ServiceAccount
NAME                   SECRETS  AGE
ingress-nginx-ingress  1        1s

==> v1beta1/ClusterRoleBinding
NAME                   AGE
ingress-nginx-ingress  1s


NOTES:
The nginx-ingress controller has been installed.
It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status by running 'kubectl --namespace gitlab-managed-apps get services -o wide -w ingress-nginx-ingress-controller'

An example Ingress that makes use of the controller:

  apiVersion: extensions/v1beta1
  kind: Ingress
  metadata:
    annotations:
      kubernetes.io/ingress.class: nginx
    name: example
    namespace: foo
  spec:
    rules:
      - host: www.example.com
        http:
          paths:
            - backend:
                serviceName: exampleService
                servicePort: 80
              path: /
    # This section is only required if TLS is to be enabled for the Ingress
    tls:
        - hosts:
            - www.example.com
          secretName: example-tls

If TLS is enabled for the Ingress, a Secret containing the certificate and key must also be provided:

  apiVersion: v1
  kind: Secret
  metadata:
    name: example-tls
    namespace: foo
  data:
    tls.crt: <base64 encoded cert>
    tls.key: <base64 encoded key>
  type: kubernetes.io/tls

After many minutes, all the ingress stuff is green excepted the service which stays in pending. Gitlab never get the public IP address of my cluster and the commands to retrieve it, found in the documentation, return an empty output.

Here is the status of the ingress controller :

$ kubectl --namespace gitlab-managed-apps get services -o wide -w ingress-nginx-ingress-controller
NAME                               TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE       SELECTOR
ingress-nginx-ingress-controller   LoadBalancer   XX.XX.XX.XXX   <pending>     80:30908/TCP,443:31375/TCP   40s       app=nginx-ingress,component=controller,release=ingress

So something is blocking it to finalise its initialisation.

Install Prometheus

Clicking the Install button seem to work fine as I get the Prometheus was successfully installed on your Kubernetes cluster success message, but from the dashboard I see the deployment in a failure state with the error message "pod has unbound PersistentVolumeClaims (repeated 2 times)".

Here is a message found from the deployment events:

no persistent volumes available for this claim and no storage class is set

And the prometheus-prometheus-server Persistent Volume Claims is in a Pending state forever.

Also look at the k8s logs using kail I can see a lot of permission errors :

gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.031686       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list pods at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.231692       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list configmaps at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.431386       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list nodes at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.536462       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.Deployment: deployments.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list deployments.extensions at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.536681       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list horizontalpodautoscalers.autoscaling at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.538056       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list jobs.batch at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.539061       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.CronJob: cronjobs.batch is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list cronjobs.batch at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.541892       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.DaemonSet: daemonsets.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list daemonsets.extensions at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.542716       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.ReplicaSet: replicasets.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list replicasets.extensions at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.543720       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.StatefulSet: statefulsets.apps is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list statefulsets.apps at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.631521       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list secrets at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.831602       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list persistentvolumes at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.036679       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.ResourceQuota: resourcequotas is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list resourcequotas at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.231563       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list endpoints at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.431617       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.LimitRange: limitranges is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list limitranges at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.556225       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.Deployment: deployments.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list deployments.extensions at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.556593       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list horizontalpodautoscalers.autoscaling at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.557613       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list jobs.batch at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.558653       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.CronJob: cronjobs.batch is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list cronjobs.batch at the cluster scope
gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530
...

Same as before, trying to solve the issue :

$ kubectl create clusterrolebinding --user system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics default-gitlab-prometheus-sa-admin --clusterrole cluster-admin
clusterrolebinding.rbac.authorization.k8s.io "default-gitlab-prometheus-sa-admin" created

No more errors but still no green status for prometheus.

Update 5th of June I have fixed the prometheus Persistent Volume Claim issue by creating a default Persistent Volume with the following YAML :

kind: PersistentVolume
apiVersion: v1
metadata:
  name: hostpath2
  labels:
    type: local
spec:
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteOnce
  reclaimPolicy:
    - Recycle
  hostPath:
    path: "/k8s/data1"

After few seconds all prometheus are green.

Install the GitLab Runner

Here no issue found, all is working fine (but don't forget that the permissions are wrongly setup because they're too much wild). The runner is available in the Runners settings part of the CI / CD Settings page.

Summary

As of today, when not using GKE (which works fine as far as I saw), it is impossible to get the GitLab Runner installed without spending a lot of time to investigate and workaround the issue. After having done all my steps, you'll get a running GitLab Runner, but Ingress and Prometheus are in a bad shape.

I'm available if you want me to test anything. I really think this Kubernetes integration is key for Gitlab, and improving it shouldn't be that hard and more or less quick.

Edited by Guillaume Hain