Gitaly should run well in Kubernetes (and similar environments)
## Problem We have experienced multiple challenges with customers attempting to use Gitaly in Kubernetes. Some examples include: * gitaly~11434168 https://gitlab.com/gitlab-org/distribution/team-tasks/-/issues/791 * gitaly~~7266731 gitaly~~1672341 https://gitlab.com/gitlab-org/distribution/team-tasks/-/issues/414 * gitaly~~7266731 gitaly~~1672341 https://gitlab.com/gitlab-org/distribution/team-tasks/-/issues/431 ## Current timeline The Gitaly team is aware that this is a pain point for many, and as such will be focusing on this over the course of the FY25 fiscal year. These plans are subject to change. **FY25Q1 (Feb 2024 - Apr 2024)** Gitaly team to validate and document the limitations described in this epic. This is in https://gitlab.com/groups/gitlab-org/-/epics/12732. The goal of this effort was to determine the scope of effort required to support a fully cloud-native Gitaly, summary of output: - [Disk IO](https://gitlab.com/gitlab-org/gitaly/-/issues/5813 "Benchmark Gitaly disk i/o when running in Kubernetes") and [network bandwidth/latency](https://gitlab.com/gitlab-org/gitaly/-/issues/5830 "Benchmark Gitaly network i/o when running in Kubernetes") were ruled out as concerns, a pod can achieve the same performance as a VM - [Infrastructure and benchmarking tooling](https://gitlab.com/gitlab-org/gitaly/-/issues/5837 "Create Kubernetes repetitive testing code (IaC)") were developed to test Gitaly in more detail on Kubernetes - [OOM events were tested](https://gitlab.com/gitlab-org/gitaly/-/issues/5831 "Document OOM scenarios running Gitaly in Kubernetes") and best practices are [being documented](https://gitlab.com/gitlab-org/gitaly/-/issues/6043 "Document Gitaly on Kubernetes best practices") - An approach to support cgroups in Kubernetes was [designed and tested](https://gitlab.com/gitlab-org/gitaly/-/issues/5833#note_1852767923 "Investigate Gitaly's behaviour with cgroups running in Kubernetes"), this addresses most of OOM concerns - Some retry logic on clients was implemented as part of [Zero downtime upgrades](https://gitlab.com/groups/gitlab-org/-/epics/10328 "Zero-downtime upgrades in Gitaly") **FY25Q2 (May 2024 - Jul 2024)** Gitaly team will provide initial experimental support for Gitaly on Kubernetes with cgroups v2, limited to standalone or sharded Gitaly (Gitaly Cluster is out-of-scope for now). Implementation will be tracked at https://gitlab.com/groups/gitlab-org/-/epics/13623+ At the same time, GitLab will start to dogfood Gitaly on Kubernetes in non-production environments as a step to production grade support, tracked in epic https://gitlab.com/groups/gitlab-org/-/epics/13624+ **FY25Q3 (Aug 2024 - Oct 2024)** The Gitaly team will work throughout FY25Q3 to resolve as many technical issues as possible, with the goal of delivering some form of Cloud Native Gitaly by the end of FY25Q3. This schedule is optimistic, and subject to the results of the investigation mentioned above, underpinned by the breadth / depth of required changes, but we do feel taking an ambitious approach is warranted. **FY26Q2 (Apr 2025 - Jul 2025)** The Gitaly team will validate operational readiness in preparation for deploying Gitaly on K8s in Dedicated. **FY26Q3 (Aug 2025 - Oct 2025)** The Gitaly team will work with the Dedicated team to onboard a new customer running Gitaly on K8s. **FY26Q4 (Dec 2025 - Jan 2026)** The Dedicated team will begin onboarding new customers running Gitaly on K8s. We will also release Gitaly on k8s in GA. To summarize: * Gitaly doesn't do well with memory constraints when under load (getting better) * Out Of Memory (OOM) events can lead to loss of data during transactions, pod sizing is critical * Gitaly managed cgroups are necessary to isolate Git resource usage, this isn't currently supported by Kubernetes ## Known limitations - [Zero downtime upgrades](https://gitlab.com/groups/gitlab-org/-/epics/10328 "Zero-downtime upgrades in Gitaly") will not work (as it relies on restarting processes in-place). - Only a standalone or sharded Gitaly setup is supported. In other words, Praefect (aka Gitaly Cluster) and its Postgresql database are not supported. We _might_ consider [Raft](https://gitlab.com/groups/gitlab-org/-/epics/8903 "Implement a Raft-based decentralized architecture for Gitaly Cluster") once it's ready and tested. - `Git` has unpredictable memory footprint depending on the repository and the request at hand. K8S will kill the entire pod if a forked `Git` allocates too much memory. We have a [basic support](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/5547 "cgroup: Add support for cgroups v2") for `cgroups v2` that could be used to mitigate, [K8S experimental support](https://gitlab.com/groups/gitlab-org/-/epics/13623 "Support Gitaly on Kubernetes") is being worked on. ### Memory management The behavior of OOMs varies between k8s versions. The latest K8s version kills the whole pod if a `git` process exceeds the memory limit. In VM, Gitaly supports two layers: a single parent cgroup and multiple repository cgroups. When a singular `git` process exceeds its repository cgroup, that process is killed independently. However, when there are enough `git` processes reaching the limit, the parent cgroup triggers OOM Killer and the Gitaly process can be killed, although it's much less likely. Currently, in K8s, Gitaly can't control the cgroup, hence Gitaly process and all `git` processes belong to a single flat cgroup of the pod. When the memory limit is reached, the following steps occur: - New memory allocation is rejected. Usually, `git` processes exit with a non-zero status code. Gitaly's process tries its best to reclaim memory via GC. - OOM Killer may kick in. OOM Killer picks a victim by calculating the OOM scores of each process in the cgroup. Typically, the process dominating the memory usage is killed first. If it's a forked `git` process, then the pod continues working as usual. K8s won't restart the pod. - If the Gitaly process (pid 1 of the pod) is targeted, it kills the whole pod. We can adjust the OOM score adjustment of Gitaly process to prevent it from being killed. Unfortunately, in `cgroups v2`, Kubernetes kills the whole pod as soon as a subprocess exceeds the memory limit ([PR](https://github.com/kubernetes/kubernetes/pull/117793)). With `cgroups v2`, it is easier to delegate a cgroup to a user, if a cgroup inside the pod/container could be managed by Gitaly, then Gitaly could control resource usage from the Git processes and prevent a pod OOM event. Kubernetes doesn't natively support cgroup delegation, however cgroups can be manipulated by an init-container with the right mountpoint, this is a viable approach for experimental cgroup support, details on the approach are available at https://gitlab.com/gitlab-org/gitaly/-/issues/5833#note_1852767923 Overall, native cgroup delegation in Kubernetes would be the ideal solution, this is being explored at https://gitlab.com/gitlab-org/gitaly/-/issues/6006+ ### Technical notes * Within Omnibus GitLab, we have the [ability to make use of `cgroups` to constrain processes](https://docs.gitlab.com/omnibus/settings/memory_constrained_envs.html#optimize-gitaly) for Gitaly. This same restriction is not directly applicable within container runtimes for security purposes. (Also [affects `gitlab/gitlab-ee` containers](https://gitlab.com/gitlab-org/distribution/team-tasks/-/issues/834#note_578451799 "Call with CGI to reduce memory consumption")) * Kubernetes does directly make use of cgroups on it's own, but to use them properly, we need to understand expected patterns for memory / cpu, and calculations these based on worker/processes configuration. Getting this wrong can lead to full container / Pod termination, not simply a single "hungry" `git` command that was spawned. ## What is not planned right now? * Support for cgroups v1: Cgroups v1 is considered legacy now, no new features will be added. Cgroup v1 OOM handling is not cgroup-aware. When a container runs out of memory and the system needs to find a victim process to kill, it will kill processes from all containers in the same hierarchy. In contrast v2, only kills processes within that container without affecting other processes in the same hierarchy. Also it doesn't allow safe delegation of controllers to unprivileged processes.  * [**Support for memory QoS**](https://kubernetes.io/blog/2023/05/05/qos-memory-resources/): Throttling of memory when memory reaches memory.high can avoid direct OOM kills but this feature has its work on halt due to [undesired behavior around allocation of large chunks of memory leading to stuck processes](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2570-memory-qos/#latest-update-stalled). Since it is still in alpha, it is out of scope for now.
epic