Skip to content

Backup container fails to initialize when persistence is enabled

Summary

Having task runner persistence enabled causes the backup container from the cronjob to fail with the following error:

Events:
  Type     Reason              Age    From                                                Message
  ----     ------              ----   ----                                                -------
  Normal   Scheduled           2m11s  default-scheduler                                   Successfully assigned default/gitlab-task-runner-backup-1562515200-bsk22 to gke-prod-stage-default-pool-9a7ba391-4dhr
  Warning  FailedAttachVolume  2m11s  attachdetach-controller                             Multi-Attach error for volume "pvc-8855fbde-9ebe-11e9-8147-42010af00050" Volume is already used by pod(s) gitlab-task-runner-544cb695cc-6j5ln
  Warning  FailedMount         8s     kubelet, gke-prod-stage-default-pool-9a7ba391-4dhr  Unable to mount volumes for pod "gitlab-task-runner-backup-1562515200-bsk22_default(a8439528-a114-11e9-8147-42010af00050)": timeout expired waiting for volumes to attach or mount for pod "default"/"gitlab-task-runner-backup-1562515200-bsk22". list of unmounted volumes=[task-runner-tmp]. list of unattached volumes=[task-runner-config task-runner-tmp init-task-runner-secrets task-runner-secrets etc-ssl-certs default-token-gdvwh]

Disabling persistence makes the backup job start successfully, however we are then not able to create a backup. Sometimes the job gets suddenly canceled (not sure if timeout?) or evicted due to low resources.

Steps to reproduce

  1. enable task runner persistence
  2. enable task runner backup cronjob
  3. trigger backup job

Configuration used

This is the task runner configuration we are using. If you need anything else let me know:

gitlab:
  task-runner:
    backups:
      cron:
        enabled: true
        # schedule is in UTC
        schedule: "0 16 * * *"
      objectStorage:
        backend: gcs
        config:
          gcpProject: ...
          secret: gitlab-storage-config
          key: config
    persistence:
      enabled: true
      size: 50Gi

Current behavior

Backup container fails to initialize with a "multi-attach error" as shown in the log above

Expected behavior

Backup container initializes, starts and takes backup successfully

Versions

  • Chart: v2.0.3
  • Platform:
    • Cloud: GKE
  • Kubernetes:
    • Client: v1.12.9-gke.7 (version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.9-gke.7", GitCommit:"b6001a5d99c235723fc19342d347eee4394f2005", GitTreeState:"clean", BuildDate:"2019-06-24T19:47:32Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"windows/amd64"})
    • Server: v1.11.10-gke.5 (version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.10-gke.5", GitCommit:"5aa3a95d828fe45aab3611dfc4ebdc0341fe1507", GitTreeState:"clean", BuildDate:"2019-05-29T17:25:39Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"})
  • Helm: (helm version)
    • Client: v2.13.1 (&version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"})
    • Server: v2.13.1 (&version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"})

Relevant logs

(Same as above)

Events:
  Type     Reason              Age    From                                                Message
  ----     ------              ----   ----                                                -------
  Normal   Scheduled           2m11s  default-scheduler                                   Successfully assigned default/gitlab-task-runner-backup-1562515200-bsk22 to gke-prod-stage-default-pool-9a7ba391-4dhr
  Warning  FailedAttachVolume  2m11s  attachdetach-controller                             Multi-Attach error for volume "pvc-8855fbde-9ebe-11e9-8147-42010af00050" Volume is already used by pod(s) gitlab-task-runner-544cb695cc-6j5ln
  Warning  FailedMount         8s     kubelet, gke-prod-stage-default-pool-9a7ba391-4dhr  Unable to mount volumes for pod "gitlab-task-runner-backup-1562515200-bsk22_default(a8439528-a114-11e9-8147-42010af00050)": timeout expired waiting for volumes to attach or mount for pod "default"/"gitlab-task-runner-backup-1562515200-bsk22". list of unmounted volumes=[task-runner-tmp]. list of unattached volumes=[task-runner-config task-runner-tmp init-task-runner-secrets task-runner-secrets etc-ssl-certs default-token-gdvwh]
Edited by Dominik Montada