Random registry push failures
Summary
I've installed the GitLab Helm chart on a 2 node Digital Ocean cluster with a custom installation of the ingress-nginx helm chart from https://kubernetes.github.io/ingress-nginx. I'm getting a lot of random errors with the registry when uploading blobs (with buildah or genuinetools/img):
error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]<REGISTRY_IMG_URL>" to "REGISTRY_IMG_URL": Error writing blob: Error uploading layer chunked: received unexpected HTTP status: 504 Gateway Time-out
or
error pushing image "REGISTRY_IMG_URL" to "docker://REGISTRY_IMG_URL": error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]REGISTRY_IMG_URL" to "docker://REGISTRY_IMG_URL": Error trying to reuse blob sha256:c95d2191d7773c6e29188f92922bc9547e1f0b6130e85dfc2f5e4eae13137c7c at destination: Requesting bear token: invalid status code from registry 500 (Internal Server Error)
Note that these errors are random and occur about 50% of the time. Simply rerunning the Gitlab CI job often resolves the issue.
Steps to reproduce
Basically deploying the GitLab helm chart to a managed Digital Ocean cluster. I have a Digital Ocean Load Balancer setup via the ingress-nginx helm chart from https://kubernetes.github.io/ingress-nginx. I also use a custom installation of the cert-manager chart from "https://charts.jetstack.io"
Proxy Protocol is disabled on the Load Balancer (don't know if that has anything to do with it?). Enabling it was tricky with cert-manager so I left if disabled for now.
Configuration used
(Please provide a sanitized version of the configuration used wrapped in a code block ```yaml
certmanager:
install: false
gitlab:
task-runner:
backups:
cron:
enabled: true
schedule: 0 5 * * *
objectStorage:
config:
key: config
secret: gitlab-s3cmd-secret
enabled: true
webservice:
ingress:
tls:
secretName: gitlab-gitlab-tls
gitlab-runner:
install: true
runners:
config: "[[runners]]\n [runners.kubernetes]\n cpu_request = \"1.5\"\n memory_request = \"2.5Gi\"\n service_cpu_limit = \"1\"\n service_memory_limit = \"2Gi\" \n privileged = true\n poll_timeout = 600\n [runners.kubernetes.pod_labels]\n component = \"gitlab-ci-job\"\n [runners.kubernetes.pod_annotations]\n \"container.apparmor.security.beta.kubernetes.io/build\" = \"unconfined\"\n \"container.seccomp.security.alpha.kubernetes.io/build\" = \"unconfined\"\n [runners.kubernetes.node_selector]\n role = \"ci-runner\"\n [runners.kubernetes.node_tolerations]\n \"dedicated_role=ci-runner\" = \"NoSchedule\"\n [[runners.kubernetes.volumes.empty_dir]]\n name = \"buildah\"\n mount_path = \"/var/lib/containers/\"\n"
global:
appConfig:
artifacts:
bucket: company-gitlab-artifacts
connection:
secret: gitlab-rails-s3-secret
backups:
bucket: company-gitlab-backups
tmpBucket: company-gitlab-backups-tmp
dependencyProxy:
bucket: company-gitlab-dependency-proxy
connection:
secret: gitlab-rails-s3-secret
externalDiffs:
bucket: company-gitlab-external-diffs
connection:
secret: gitlab-rails-s3-secret
initialDefaults:
signupEnabled: false
lfs:
bucket: company-gitlab-lfs
connection:
secret: gitlab-rails-s3-secret
omniauth:
allowSingleSignOn: true
autoLinkLdapUser: true
blockAutoCreatedUsers: false
enabled: true
providers:
- secret: gitlab-azuread-oauth
packages:
bucket: company-gitlab-packages
connection:
secret: gitlab-rails-s3-secret
pseudonymizer:
bucket: company-gitlab-pseudonymizer
connection:
secret: gitlab-rails-s3-secret
terraformState:
bucket: company-gitlab-terraform-state
connection:
secret: gitlab-rails-s3-secret
uploads:
bucket: company-gitlab-uploads
connection:
secret: gitlab-rails-s3-secret
hosts:
domain: git.company.com
gitlab:
name: git.company.com
hostSuffix: bas
ingress:
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
kubernetes.io/tls-acme: true
class: nginx
configureCertmanager: false
minio:
enabled: false
operator:
enabled: false
registry:
bucket: company-gitlab-registry
nginx-ingress:
enabled: false
prometheus:
install: false
registry:
enabled: true
ingress:
tls:
secretName: gitlab-registry-tls
relativeurls: true
storage:
key: config
secret: gitlab-registry-space
Current behavior
Pushing randomly results in these errors:
error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]<REGISTRY_IMG_URL>" to "REGISTRY_IMG_URL": Error writing blob: Error uploading layer chunked: received unexpected HTTP status: 504 Gateway Time-out
or
error pushing image "REGISTRY_IMG_URL" to "docker://REGISTRY_IMG_URL": error copying layers and metadata from "containers-storage:[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.mountopt=nodev,metacopy=on]REGISTRY_IMG_URL" to "docker://REGISTRY_IMG_URL": Error trying to reuse blob sha256:c95d2191d7773c6e29188f92922bc9547e1f0b6130e85dfc2f5e4eae13137c7c at destination: Requesting bear token: invalid status code from registry 500 (Internal Server Error)
If the gitlab-workhorse pod I see a lot of status 500 errors but no further explanation as to why:
{"content_type":"text/html; charset=utf-8","correlation_id":"01EZ73N23DQR4JCXG1QD8HQJB1","duration_ms":41,"host":"git.company.com","level":"info","method":"GET","msg":"access","proto":"HTTP/1.1","referrer":"","remote_addr":"10.244.0.88:44428","remote_ip":"10.244.0.88","route":"","status":500,"system":"http","time":"2021-02-23T09:31:18Z","ttfb_ms":41,"uri":"/jwt/auth?account=gitlab-ci-token\u0026scope=repository%3Acompany%2Ffoundation%2Fmelodic%3Apull%2Cpush\u0026service=container_registry","user_agent":"Buildah/1.19.6","written_bytes":2926}
Expected behavior
I should be able to push images properly
Versions
- Chart: 4.9.0 (also happens with 4.8.4)
- Platform:
- Cloud: DigitalOcean
- Kubernetes: (
kubectl version)- Client: 1.20.2
- Server: 1.20.2
- Helm: (
helm version)- Client: v3.5.2
- Server: -
Relevant logs
See above.