Skip to content
Snippets Groups Projects
Verified Commit c2ce9f07 authored by Mikhail Mazurskiy's avatar Mikhail Mazurskiy
Browse files

kas: update runbooks

parent bb592627
No related branches found
No related tags found
Loading
......@@ -53,8 +53,8 @@
* [Web IDE Assets](../gitlab-static/web-ide-assets.md)
* [HostedRunnersServiceCiRunnerJobsApdexSLOViolationSingleShard](../hosted-runners/jobs_apdex_slo_violation.md)
* [Missing Metrics in HTTP Router Dashboard](../http-router/missing-metrics.md)
* [Kubernetes-Agent Basic Troubleshooting](../kas/kubernetes-agent-basic-troubleshooting.md)
* [Kubernetes-Agent Disable Integrations](../kas/kubernetes-agent-disable-integrations.md)
* [`kas` Basic Troubleshooting](../kas/kas-basic-troubleshooting.md)
* [`kas` Disable Integrations](../kas/kas-disable-integrations.md)
* [Helm Upgrade is Stuck](../kube/helm-upgrade-stuck.md)
* [Ad hoc observability tools on Kubernetes nodes](../kube/k8s-adhoc-observability.md)
* [../kube/k8s-oncall-setup.md](../kube/k8s-oncall-setup.md)
......
......@@ -36,7 +36,7 @@
* [Disabling routing requests through `http-router`](../http-router/disable-http-router.md)
* [HTTP Router Worker Logs](../http-router/logging.md)
* [Missing Metrics in HTTP Router Dashboard](../http-router/missing-metrics.md)
* [Kubernetes-Agent Basic Troubleshooting](../kas/kubernetes-agent-basic-troubleshooting.md)
* [`kas` Basic Troubleshooting](../kas/kas-basic-troubleshooting.md)
* [Upgrading Monitoring Components](../monitoring/upgrades.md)
* [Session: Application architecture](../onboarding/architecture.md)
* [Packagecloud Infrastructure and Backups](../packagecloud/infrastructure.md)
......
......@@ -53,7 +53,7 @@
* [Gitaly multi-project migration](../gitaly/multi-project-migration.md)
* [GitLab Storage Re-balancing](../gitaly/storage-rebalancing.md)
* [HostedRunnersServiceRunnerManagerDownSingleShard](../hosted-runners/runners_manager_is_down.md)
* [Kubernetes-Agent Basic Troubleshooting](../kas/kubernetes-agent-basic-troubleshooting.md)
* [`kas` Basic Troubleshooting](../kas/kas-basic-troubleshooting.md)
* [GKE Cluster Upgrade Procedure](../kube/k8s-cluster-upgrade.md)
* [../kube/k8s-oncall-setup.md](../kube/k8s-oncall-setup.md)
* [../kube/k8s-operations.md](../kube/k8s-operations.md)
......
......@@ -52,7 +52,7 @@
* [Adding new storage capacity](new-storage.md)
* [GitLab Storage Re-balancing](storage-rebalancing.md)
* [Managing GitLab Storage Shards (Gitaly)](storage-sharding.md)
* [Kubernetes-Agent Disable Integrations](../kas/kubernetes-agent-disable-integrations.md)
* [`kas` Disable Integrations](../kas/kas-disable-integrations.md)
* [Service-Level Monitoring](../metrics-catalog/service-level-monitoring.md)
* [Tuning and Modifying Alerts](../monitoring/alert_tuning.md)
* [../monitoring/apdex-alerts-guide.md](../monitoring/apdex-alerts-guide.md)
......
This diff is collapsed.
# Kubernetes-Agent Basic Troubleshooting
# `kas` Basic Troubleshooting
**Table of Contents**
[TOC]
## Kas deployment manifest location
## `kas` deployment manifest location
Kas is running inside our regional GKE cluster, in the `gitlab` namespace. It is deployed via the Gitlab Helm chart through CI jobs at the [k8s-workloads/gitlab-com](https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com) repository
`kas` is running inside our regional GKE cluster, in the `gitlab` namespace. It is deployed via the Gitlab Helm chart through CI jobs at the [k8s-workloads/gitlab-com](https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com) repository
## Changing the number of running pods
As kas is deployed as part of the Gitlab helm chart, you need to to modify the helm values that get passed to it in order to change the minimum and maximum number of running pods. The helm values in question are
As `kas` is deployed as part of the Gitlab helm chart, you need to modify the helm values that get passed to it in order to change the minimum and maximum number of running pods. The helm values in question are
`gitlab.kas.minReplicas` and `gitlab.kas.maxReplicas`
......@@ -22,7 +22,7 @@ Log onto a console server and get access to the cluster [as documented here](../
## Tail the logs
As Kas is a standard pod in our Gitlab helm chart, logs are being sent to Kibana/elasticsearch at <https://log.gprd.gitlab.net/goto/b8204a41999cc1a136fa12c885ce8d22>
As `kas` is a standard pod in our Gitlab helm chart, logs are being sent to Kibana/elasticsearch at <https://log.gprd.gitlab.net/goto/b8204a41999cc1a136fa12c885ce8d22>
If you need to get the logs from Kubernetes directly, you can do so by logging onto a console server and get access to the cluster [as documented here](../../uncategorized/k8s-oncall-setup.md) and run the following command
......@@ -30,13 +30,15 @@ If you need to get the logs from Kubernetes directly, you can do so by logging o
## Debugging ingress
As Kas uses a [GCP Ingress](https://cloud.google.com/kubernetes-engine/docs/concepts/ingress) and [Google managed certificates](https://cloud.google.com/kubernetes-engine/docs/how-to/managed-certs) it is different from other services, as there is no haproxy nor cloudflare involved. The GCP ingress object is defined in the [k8s-workloads/gitlab-com](https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com) repository, and a specific helm release called `gitlab-extras`. The definition can be seen [here](https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab-extras/values.yaml.gotmpl).
THIS SECTION IS OUT OF DATE.
As `kas` uses a [GCP Ingress](https://cloud.google.com/kubernetes-engine/docs/concepts/ingress) and [Google managed certificates](https://cloud.google.com/kubernetes-engine/docs/how-to/managed-certs) it is different from other services, as there is no haproxy nor cloudflare involved. The GCP ingress object is defined in the [k8s-workloads/gitlab-com](https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com) repository, and a specific helm release called `gitlab-extras`. The definition can be seen [here](https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab-extras/values.yaml.gotmpl).
GCP Ingress objects are implemented by a [GCP External HTTPS Load balancer](https://cloud.google.com/load-balancing/docs/https), and you find the exact GCP Load balancer in use by Kas using the following command
`gcloud --project gitlab-production compute forwarding-rules list | grep gitlab-gitlab-kas`
To see the the forwarding rule use
To see the forwarding rule use
`gcloud --project gitlab-production compute url-maps list | grep gitlab-gitlab-kas`
......@@ -54,7 +56,7 @@ The best way to view all this information however, is through the web ui. Simply
If you get reports the agent is not working, and you see the following error in the Kubernetes Agent logs
```
```json
{"level":"warn","time":"2020-11-26T09:44:47.943+1100","msg":"GetConfiguration.Recv failed","error":"rpc error: code = Unauthenticated desc = unauthenticated"}
```
......
# Kubernetes-Agent Disable Integrations
# `kas` Disable Integrations
**Table of Contents**
[TOC]
In case of incidents where kas might be inadvertedly be affecting services it
In case of incidents where kas might be inadvertently be affecting services it
integrates with including API, Gitaly, and Redis, it is possible to temporary
disable these integrations until proper diagnosis and remediation of problems
can occur.
......@@ -14,9 +14,8 @@ can occur.
There are multiple ways to do this, but one of the simplest is to use the
[Kubernetes Network Policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
to stop the `kas` pods from being able to access to Gitlab API. To do this
change the helm value `gitlab.kas.networkpolicy.egress.rules` to remove the the
rule that allows access to Gitlab API. e.g. <https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab/values/values.yaml.gotmpl#L1253-1263>
through a merge request and apply to production.
change the helm value `gitlab.kas.networkpolicy.egress.rules` to remove the
rule that allows access to Gitlab API through a merge request and apply to production.
When this access is disabled, all Gitlab users `agentk` agents will be unable
to authenticate to `kas` and thus will be unable to leverage any and all functionality
......@@ -27,20 +26,20 @@ that `kas` provides.
If access to all Gitaly nodes needs to be temporarily disabled, this can be done
through changing the [Kubernetes Network Policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
to stop the `kas` pods from being able to access Gitaly. To do this
change the helm value `gitlab.kas.networkpolicy.egress.rules` to remove the the
rule that allows access to Gitlab API. e.g. <https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab/values/values.yaml.gotmpl#L1264-1277>
through a merge request and apply to production.
change the helm value `gitlab.kas.networkpolicy.egress.rules` to remove the
rule that allows access to Gitaly through a merge request and apply to production.
When this access is disabled, Gitlab users will be unable to use `agentk`/`kas` for applying
Kubernetes manifests via gitops.
When this access is disabled, Gitlab users will be unable to use most features of `agentk`
since `kas` will not be able to fetch agents configuration.
## Disabling access to redis
## Disabling access to Redis
If access to redis/the `kas` redis integration needs to be temporarily disabled,
the best way to do this is to change the helm value `gitlab.kas.redis.enabled`
to `false`. e.g. <https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab/values/values.yaml.gotmpl#L1228>
through a merge request and apply to production.
If access to Redis needs to be temporarily disabled, this can be done
through changing the [Kubernetes Network Policy](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
to stop the `kas` pods from being able to access Redis. To do this
change the helm value `gitlab.kas.networkpolicy.egress.rules` to remove the
rule that allows access to Redis through a merge request and apply to production.
When this is disabled, it would stop `kas` from being able to do IP and token
based rate limiting, instead falling back to a global rate limit for all operations
which might bottleneck users.
When this is disabled, it would stop `kas` from being able to do token-based rate limiting,
instead falling back to a global rate limit for all operations which might bottleneck users.
Request proxying will work only partially too.
......@@ -20,7 +20,7 @@
* [../elastic/kibana.md](../elastic/kibana.md)
* [../git/gitlab-review-app-certs.md](../git/gitlab-review-app-certs.md)
* [`gitalyctl`](../gitaly/gitalyctl.md)
* [Kubernetes-Agent Basic Troubleshooting](../kas/kubernetes-agent-basic-troubleshooting.md)
* [`kas` Basic Troubleshooting](../kas/kas-basic-troubleshooting.md)
* [Rebuilding a kubernetes cluster](k8s-cluster-rebuild.md)
* [k8s-oncall-setup.md](k8s-oncall-setup.md)
* [k8s-operations.md](k8s-operations.md)
......
......@@ -33,7 +33,7 @@
* [../hosted-runners/runner_system_failure.md](../hosted-runners/runner_system_failure.md)
* [HTTP Router Worker Logs](../http-router/logging.md)
* [GitLab Production Onboarding for Incident.io](../incident-io-onboard/incident-management.md)
* [Kubernetes-Agent Basic Troubleshooting](../kas/kubernetes-agent-basic-troubleshooting.md)
* [`kas` Basic Troubleshooting](../kas/kas-basic-troubleshooting.md)
* [Kubernetes](../kube/kubernetes.md)
* [logging_gcs_archive_bigquery.md](logging_gcs_archive_bigquery.md)
* [Scaling Elastic Cloud Clusters](scaling.md)
......
......@@ -21,7 +21,7 @@
* [Chef Guidelines](../config_management/chef-guidelines.md)
* [Zonal and Regional Recovery Guide](../disaster-recovery/recovery.md)
* [../elastic/advanced-search-in-gitlab.md](../elastic/advanced-search-in-gitlab.md)
* [Kubernetes-Agent Disable Integrations](../kas/kubernetes-agent-disable-integrations.md)
* [`kas` Disable Integrations](../kas/kas-disable-integrations.md)
* [Ad hoc observability tools on Kubernetes nodes](../kube/k8s-adhoc-observability.md)
* [How to take a snapshot of an application running in a StatefulSet](../kube/k8s-sts-snapshot.md)
* [StatefulSet Guidelines](../kube/sts-guidelines.md)
......
......@@ -46,7 +46,7 @@
* [Gitaly unusual activity alert](../gitaly/gitaly-unusual-activity.md)
* [Gitaly multi-project migration](../gitaly/multi-project-migration.md)
* [Web IDE Assets](../gitlab-static/web-ide-assets.md)
* [Kubernetes-Agent Basic Troubleshooting](../kas/kubernetes-agent-basic-troubleshooting.md)
* [`kas` Basic Troubleshooting](../kas/kas-basic-troubleshooting.md)
* [../kube/k8s-oncall-setup.md](../kube/k8s-oncall-setup.md)
* [GKE/Kubernetes Administration](../kube/kube-administration.md)
* [Scaling Elastic Cloud Clusters](../logging/scaling.md)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment