Allow removal of gitaly fsGroup setting
Summary
When using Gitaly in a kubernetes environment that utilizes a CSI driver (In my case NetApp Trident) for persistent storage, the combination of securitycontext.fsgroup being set and having a large number of files results in painfully slow time for the kubelet to run a recursive chown on the contents of the persistent volume. In my case, it takes a minimum of 30 minutes for our Gitlay statefulset to start in Production when the fsgroup setting is applied.
This is a known issue and is documented here: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods
By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup.
Steps to reproduce
- In a kubernetes environment, where persistent volume orchestration is happening via a CSI driver
- Add in a large amount of small files (In my case, I am at 24GB of total data)
- Try restarting the Gitaly statefulset
Configuration used
We are using the default Gitaly configuration for securitycontext.
Current behavior
- The kube-scheduler selects a node for the workload
- The volume is mounted to the worker node
- The Kubelet trys to mount the volume to the container, but continually times out while the recursive chown operation runs
Expected behavior
The volume should mount without conducting a recursive chown operation (By removing fsGroup). There is an Alpha feature in Kubernetes 1.18 to work around this (fsGroupChangePolicy: "OnRootMismatch") but an alpha work around is not currently acceptable.
Relevant logs
Definition of done
We need to support:
- Having a value by default
- Allowing override, or removal of that default.
Currently, we explicitly have this in the template:
securityContext:
runAsUser: {{ .Values.securityContext.runAsUser }}
fsGroup: {{ .Values.securityContext.fsGroup }}
Immediate method is to wrap the items in if blocks based on presence. Then, a user should be able to override by setting a value, or remove by setting the value to nil/empty/false.