Divers issues with Kubernetes integration in Gitlab
Dear Gitlab team, ## Context We have 2 servers where I built a Kubernetes cluster using [the Rancher RKE tool](https://github.com/rancher/rke). I'm now in the process to use it with Gitlab since Gitlab has a Kubernetes integration from gitlab.com. So my context is using Gitlab's Kubernetes integration with our own servers, not with GKE. ## About this issue I encountered many permission issues which I solved in a ugly way (allowing too much things). I'm opening this issue in order to summarise all of them so that they could be fixed and improve the Kubernetes integration feature for people using their own k8s cluster. I hope this will be well accepted by you. ## Step by step with issue descriptions Here I'm showing you, step by step, how I have installed my cluster, integrated it with Gitlab, and each issues with their dirty fix I did. ### Initial configurations Here is the servers configuration : ``` $ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS master - generic Running tcp://XX.XX.XXX.XXX:2376 v17.03.2-ce worker - generic Running tcp://YYY.YYY.YY.YY:2376 v17.03.2-ce ... ``` And here is the `cluster.yml` file used by RKE in order to deploy the Kubernetes cluster : ```yaml # default k8s version: v1.8.10-rancher1-1 # default network plugin: canal nodes: - address: XX.XX.XXX.XXX port: XXXXX ssh_key_path: '~/.ssh/gitlab_ci_id_rsa' user: root role: [controlplane,etcd] - address: YYY.YYY.YY.YY port: XXXXX ssh_key_path: '~/.ssh/gitlab_ci_worker_id_rsa' user: root role: [worker] ingress: provider: none ``` ### Deploy the Kubernetes cluster Now I'm deploying the k8s cluster : ``` $ rke up --config cluster.yml INFO[0000] Building Kubernetes cluster INFO[0000] [dialer] Setup tunnel for host [XX.XX.XXX.XXX] INFO[0000] [dialer] Setup tunnel for host [YYY.YYY.YY.YY] INFO[0001] [network] Deploying port listener containers INFO[0001] [network] Pulling image [rancher/rke-tools:v0.1.4] on host [XX.XX.XXX.XXX] ... INFO[0134] [addons] Executing deploy job.. INFO[0140] [addons] User addons deployed successfully INFO[0140] Finished building Kubernetes cluster successfully ``` Finally configure the local machine in order to get the kubectl command working : ``` $ rm -rf ~/.kube && mkdir ~/.kube $ cp ./kube_config_cluster.yml ~/.kube/config ``` Checking all is fine: ``` $ kubectl cluster-info Kubernetes master is running at https://XX.XX.XXX.XXX:6443 KubeDNS is running at https://XX.XX.XXX.XXX:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. ``` From here, I need the logs and the dashboard in order to have a complete view of what is happening. #### Accessing kubernetes logs In order to check the k8s logs, there is very well done tool named [kail](https://github.com/boz/kail) which tail the k8s logs. Install it, run it and you'll get the logs. #### Deploy the kubernetes dashboard (Optional) I'm deploying the Kubernetes dashboard in order to have a better look at the happening issues: ``` $ kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml secret "kubernetes-dashboard-certs" created serviceaccount "kubernetes-dashboard" created role.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created rolebinding.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created deployment.apps "kubernetes-dashboard" created service "kubernetes-dashboard" created ``` In another terminal, run `kubectl proxy` and access the dashboard with the URL http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy. To authenticate, select the `Token` mode and copy/past the `token` from the following: ``` $ kubectl get secrets NAME TYPE DATA AGE default-token-6gncr kubernetes.io/service-account-token 3 8m $ kubectl describe secret default-token-6gncr Name: default-token-6gncr Namespace: default Labels: <none> Annotations: kubernetes.io/service-account.name=default kubernetes.io/service-account.uid=0c05b98a-63e6-11e8-bb81-fa163ed63d33 Type: kubernetes.io/service-account-token Data ==== ca.crt: 1017 bytes namespace: 7 bytes token: eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9..... ``` ##### Dashboard permissions issue Here is the first permission issue but I have already opened [an issue](https://github.com/kubernetes/dashboard/issues/3064) for that. So to quickly fix it in a dirty way : ``` $ kubectl create clusterrolebinding --user system:serviceaccount:default:default default-sa-admin --clusterrole cluster-admin clusterrolebinding.rbac.authorization.k8s.io "default-sa-admin" created ``` Now refresh and you get the dashboard without any remaining errors. ##### Heapster installation Look at the logs with `kail` (and as described in the dashboard project's `README.md` file), Heapster needs to be installed : ``` $ git clone https://github.com/kubernetes/heapster.git $ cd heapster $ kubectl create -f deploy/kube-config/influxdb/ deployment.extensions "monitoring-grafana" created service "monitoring-grafana" created serviceaccount "heapster" created deployment.extensions "heapster" created service "heapster" created deployment.extensions "monitoring-influxdb" created service "monitoring-influxdb" created $ kubectl create -f deploy/kube-config/rbac/heapster-rbac.yaml clusterrolebinding.rbac.authorization.k8s.io "heapster" created ``` Within the k8s logs I can see the influxdb booting and so on and in the dashboard I have working graphs. ### Import the cluster in Gitlab From Gitlab Kubernetes page of a project: 1. click the `Add an existing Kubernetes cluster` button 2. Set the `Kubernetes cluster name`, the `API URL` with `https://XX.XX.XXX.XXX:6443/`, `CA Certificate` with the output of `ssh root@XX.XX.XXX.XXX cat /etc/kubernetes/ssl/kube-ca.pem` and the `Token` with the same token used to login to the dashboard 3. Click the `Add Kubernetes cluster` button ### Install Helm Tiller Here comes the first hard part. When I click the `Install` button, Gitlab is showing the following error : ``` Error: error installing: deployments.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:default" cannot create deployments.extensions in the namespace "gitlab-managed-apps" ``` The `gitlab-managed-apps` namespace has been well created, but some permissions are missing preventing Gitlab to finish the installation of Tiller in k8s. I fixed it the same way than the dashboard : ``` $ kubectl create clusterrolebinding --user system:serviceaccount:gitlab-managed-apps:default default-gitlab-sa-admin --clusterrole cluster-admin clusterrolebinding.rbac.authorization.k8s.io "default-gitlab-sa-admin" created ``` Refresh Gitlab (otherwise the `Install` button is disabled) and click again `Install`. You'll get the `Helm Tiller was successfully installed on your Kubernetes cluster` success message, and the `Install` buttons from Ingress, Prometheus and GitLab Runner available. ### Install Ingress Clicking the `Install` button seem to work fine as I get the `Ingress was successfully installed on your Kubernetes cluster` success message, but looking closer to the dashboard, I can see that the ingress deployment failed with the error message "Back-off restarting failed container" is visible and here are the pod logs : ``` ------------------------------------------------------------------------------- NGINX Ingress controller Release: 0.10.2 Build: git-fd7253a Repository: https://github.com/kubernetes/ingress-nginx ------------------------------------------------------------------------------- I0530 09:14:33.705106 7 flags.go:159] Watching for ingress class: nginx I0530 09:14:33.706004 7 main.go:181] Creating API client for https://10.43.0.1:443 I0530 09:14:33.797413 7 main.go:193] Running in Kubernetes Cluster version v1.10 (v1.10.1) - git (clean) commit d4ab47518836c750f9949b9e0d387f20fb92260b - platform linux/amd64 F0530 09:14:33.817365 7 main.go:80] ✖ It seems the cluster it is running with Authorization enabled (like RBAC) and there is no permissions for the ingress controller. Please check the configuration ``` **Update 5th of June** I solved the permission issue by removing all the ingress stuff I could find and run the following command : ``` $ helm install --namespace gitlab-managed-apps --name ingress --set rbac.create=true stable/nginx-ingress NAME: ingress LAST DEPLOYED: Tue Jun 5 15:00:55 2018 NAMESPACE: gitlab-managed-apps STATUS: DEPLOYED RESOURCES: ==> v1beta1/Role NAME AGE ingress-nginx-ingress 1s ==> v1/Service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-ingress-controller LoadBalancer XX.XX.XX.XXX <pending> 80:30908/TCP,443:31375/TCP 1s ingress-nginx-ingress-default-backend ClusterIP YY.YY.YYY.YYY <none> 80/TCP 1s ==> v1beta1/Deployment NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE ingress-nginx-ingress-controller 1 1 1 0 1s ingress-nginx-ingress-default-backend 1 1 1 0 1s ==> v1beta1/PodDisruptionBudget NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE ingress-nginx-ingress-controller 1 N/A 0 1s ingress-nginx-ingress-default-backend 1 N/A 0 1s ==> v1/Pod(related) NAME READY STATUS RESTARTS AGE ingress-nginx-ingress-controller-68f4d665bc-pmsl7 0/1 ContainerCreating 0 1s ingress-nginx-ingress-default-backend-6f58fb5f56-24ttb 0/1 ContainerCreating 0 1s ==> v1/ConfigMap NAME DATA AGE ingress-nginx-ingress-controller 1 1s ==> v1beta1/ClusterRole NAME AGE ingress-nginx-ingress 1s ==> v1beta1/RoleBinding NAME AGE ingress-nginx-ingress 1s ==> v1/ServiceAccount NAME SECRETS AGE ingress-nginx-ingress 1 1s ==> v1beta1/ClusterRoleBinding NAME AGE ingress-nginx-ingress 1s NOTES: The nginx-ingress controller has been installed. It may take a few minutes for the LoadBalancer IP to be available. You can watch the status by running 'kubectl --namespace gitlab-managed-apps get services -o wide -w ingress-nginx-ingress-controller' An example Ingress that makes use of the controller: apiVersion: extensions/v1beta1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: nginx name: example namespace: foo spec: rules: - host: www.example.com http: paths: - backend: serviceName: exampleService servicePort: 80 path: / # This section is only required if TLS is to be enabled for the Ingress tls: - hosts: - www.example.com secretName: example-tls If TLS is enabled for the Ingress, a Secret containing the certificate and key must also be provided: apiVersion: v1 kind: Secret metadata: name: example-tls namespace: foo data: tls.crt: <base64 encoded cert> tls.key: <base64 encoded key> type: kubernetes.io/tls ``` After many minutes, all the ingress stuff is green excepted the service which stays in pending. Gitlab never get the public IP address of my cluster and the commands to retrieve it, found in the documentation, return an empty output. Here is the status of the ingress controller : ``` $ kubectl --namespace gitlab-managed-apps get services -o wide -w ingress-nginx-ingress-controller NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR ingress-nginx-ingress-controller LoadBalancer XX.XX.XX.XXX <pending> 80:30908/TCP,443:31375/TCP 40s app=nginx-ingress,component=controller,release=ingress ``` So something is blocking it to finalise its initialisation. ### Install Prometheus Clicking the `Install` button seem to work fine as I get the `Prometheus was successfully installed on your Kubernetes cluster` success message, but from the dashboard I see the deployment in a failure state with the error message "pod has unbound PersistentVolumeClaims (repeated 2 times)". Here is a message found from the deployment events: ``` no persistent volumes available for this claim and no storage class is set ``` And the `prometheus-prometheus-server` Persistent Volume Claims is in a Pending state forever. Also look at the k8s logs using kail I can see a lot of permission errors : ``` gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.031686 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list pods at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.231692 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list configmaps at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.431386 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Node: nodes is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list nodes at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.536462 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.Deployment: deployments.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list deployments.extensions at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.536681 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list horizontalpodautoscalers.autoscaling at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.538056 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list jobs.batch at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.539061 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.CronJob: cronjobs.batch is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list cronjobs.batch at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.541892 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.DaemonSet: daemonsets.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list daemonsets.extensions at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.542716 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.ReplicaSet: replicasets.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list replicasets.extensions at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.543720 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.StatefulSet: statefulsets.apps is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list statefulsets.apps at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.631521 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Secret: secrets is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list secrets at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:35.831602 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list persistentvolumes at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.036679 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.ResourceQuota: resourcequotas is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list resourcequotas at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.231563 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list endpoints at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.431617 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.LimitRange: limitranges is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list limitranges at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.556225 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.Deployment: deployments.extensions is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list deployments.extensions at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.556593 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.HorizontalPodAutoscaler: horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list horizontalpodautoscalers.autoscaling at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.557613 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list jobs.batch at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 09:33:36.558653 1 reflector.go:205] k8s.io/kube-state-metrics/collectors/collectors.go:62: Failed to list *v1beta1.CronJob: cronjobs.batch is forbidden: User "system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics" cannot list cronjobs.batch at the cluster scope gitlab-managed-apps/prometheus-kube-state-metrics-6584885ccf-crwmj[prometheus-kube-state-metrics]: E0530 ... ``` Same as before, trying to solve the issue : ``` $ kubectl create clusterrolebinding --user system:serviceaccount:gitlab-managed-apps:prometheus-kube-state-metrics default-gitlab-prometheus-sa-admin --clusterrole cluster-admin clusterrolebinding.rbac.authorization.k8s.io "default-gitlab-prometheus-sa-admin" created ``` No more errors but still no green status for prometheus. **Update 5th of June** I have fixed the prometheus Persistent Volume Claim issue by creating a default Persistent Volume with the following YAML : ```yaml kind: PersistentVolume apiVersion: v1 metadata: name: hostpath2 labels: type: local spec: capacity: storage: 50Gi accessModes: - ReadWriteOnce reclaimPolicy: - Recycle hostPath: path: "/k8s/data1" ``` After few seconds all prometheus are green. ### Install the GitLab Runner Here no issue found, all is working fine (but don't forget that the permissions are wrongly setup because they're too much wild). The runner is available in the `Runners settings` part of the `CI / CD Settings` page. ## Summary As of today, when not using GKE (which works fine as far as I saw), it is impossible to get the GitLab Runner installed without spending a lot of time to investigate and workaround the issue. After having done all my steps, you'll get a running GitLab Runner, but Ingress and Prometheus are in a bad shape. I'm available if you want me to test anything. I really think this Kubernetes integration is key for Gitlab, and improving it shouldn't be that hard and more or less quick.
issue