Slow mount times for very large CSI volumes (minio & more), modify helm chart(s) with workaround
Summary
With a lot of storage (160+ Gibibytes) MinIO pods can take 30 minutes to 1 hour to start up, because the MinIO sub chart has a SecurityContext set and this causes Kubernetes/CSI to chown all the files stored by MinIO when mounting the volume.
Steps to reproduce
Use the GitLab helm chart to install gitlab with the provided MinIO and configure hundreds of gigs of persistent storage, as well as having repos and registries with lots of storage utilized.
Configuration used
---
postgresql:
image:
tag: 13.6.0
certmanager:
install: false
gitlab:
webservice:
ingress:
tls:
secretName: ##SANTIZED##
migrations:
enabled: true
toolbox:
persistence:
enabled: true
size: 200Gi
backups:
cron:
enabled: true
schedule: 0 21 * * *
extraArgs: "--maximum-backups 3"
persistence:
enabled: true
accessMode: 'ReadWriteOnce'
size: '500Gi'
resources:
requests:
cpu: '500m'
memory: '1000M'
gitlab-runner:
install: false
runners:
privileged: true
global:
kas:
enabled: true
grafana:
enabled: true
appConfig:
ldap:
servers:
main:
active_directory: true
base: ##SANTIZED##
bind_dn: ##SANTIZED##
encryption: plain
host: ##SANTIZED##
label: LDAP
password:
key: password
secret: ##SANITIZED##
port: 389
uid: sAMAccountName
edition: ee
hosts:
domain: ##SANTIZED##
externalIP: ##SANTIZED##
https: true
gitlab:
name: ##SANTIZED##
ingress:
# class: nginx
configureCertmanager: false
enabled: true
tls:
enabled: true
secretName: ##SANTIZED##
time_zone: UTC
nginx-ingress:
enabled: True
minio:
persistence:
enabled: True
size: 1Ti
replicas: 4
Current behavior
Minio pod takes about 30-60 minutes (variable) to start up and operate, and lots of kubernetes event messages (kubectl describe) about timeouts mounting the volume until it finally mounts.
Related to: https://github.com/longhorn/longhorn/issues/2131#issuecomment-778897129 https://kubernetes.io/blog/2020/12/14/kubernetes-release-1.20-fsgroupchangepolicy-fsgrouppolicy/
Expected behavior
Minio pod starts up quickly after an upgrade, pod deletion, etc.
Workaround should be added to speed up mounting is to add securityContext.fsGroupChangePolicy to the helm chart (not available as a value right now).
securityContext:
runAsUser: 1000
fsGroup: 1000
fsGroupChangePolicy: "OnRootMismatch" ## new value to be added to minio sub chart as an optional value
Reference 1: https://kubernetes.io/blog/2020/12/14/kubernetes-release-1.20-fsgroupchangepolicy-fsgrouppolicy/ Reference 2: https://github.com/longhorn/longhorn/issues/2168#issuecomment-756995869
Versions
- Chart: 5.5.4
- Platform:
- Self-hosted: Kubernetes (vanilla from Kubeadm)
- Kubernetes: (
kubectl version
)- Client: 1.17.17
- Server: 1.17.17
- Helm: (
helm version
)- Client: 3.9.3
- Server: N/A?
Relevant logs
No logs collected, but have verified that pod is waiting to finish chown on remote storage due to SecurityContext being set (by default).