Skip to content

Geo: Document backups --s3tool awscli option for both manual and cron run jobs

Notice: this issue is more related to https://gitlab.com/gitlab-org/build/CNG/-/tree/master/gitlab-toolbox, but I don't have permission to create issue there.

Summary

We configures GitLab to use an s3 bucket as CI job artifact storage. The backup-utility uses s3cmd to download objects from S3 buckets, which, in our case, is less reliable than awscli. Most of our backup job crashed with error:

WARNING: Remote file ''. S3Error: 404 (NoSuchKey): The specified key does not exist.
ERROR: S3 error: 404 (NoSuchKey): The specified key does not exist.

This is probably related to how s3cmd works internally. On start, this tool first scans the whole bucket and build a file list in memory. It only starts downloading objects after finishing a full scan: https://github.com/s3tools/s3cmd/blob/v2.2.0/s3cmd#L1341

Per documentation, sidekiq deletes expired artifacts every 7 minutes: https://docs.gitlab.com/ee/administration/job_artifacts.html#expiring-artifacts.

The problem, I guess, is that unless the backup job can finish in 7 minutes, we will end up having some files deleted from remote bucket before s3cmd has a chance to read them from the existing in-memory list and download it.

It also worth to note that, the backup-utility uses tar to pack up everything into a single tar ball and upload that. The packaging step uses a lot of memory. It may worth to look into splitting the backup file into multiple tarballs and/or applying compression.

Steps to reproduce

(Please provide the steps to reproduce the issue)

I haven't gotten a chance to confirm my theory. But we switched from s3cmd to awscli and our backup job can run successfully without problem. I think awscli runs scan and download in parallel, so it is less likely to run into s3cmd's problem.

Configuration used

(Please provide a sanitized version of the configuration used wrapped in a code block (```yaml))

between working and not-working, the only difference is the choice of s3tool used by backup-utility:

Not working:

        - args:
            - /bin/bash
            - '-c'
            - cp /etc/gitlab/.s3cfg $HOME/.s3cfg && backup-utility

Working:

        - args:
            - /bin/bash
            - '-c'
            - cp /etc/gitlab/.s3cfg $HOME/.s3cfg && backup-utility --s3tool awscli

Current behavior

Without explicitly setting --s3tool to be awscli, backup job crashes with error:

WARNING: Remote file ''. S3Error: 404 (NoSuchKey): The specified key does not exist.
ERROR: S3 error: 404 (NoSuchKey): The specified key does not exist.

Expected behavior

The backup job should not panic when some objects disappear from the remote bucket. It is normal for artifacts to be deleted from S3 bucket.

Versions

  • Chart: 5.9.1 (gitlab version: 14.9.1 )
  • Platform:
    • Cloud: EKS
  • Kubernetes: (kubectl version)
    • Client:
    • Server:
  • Helm: (helm version)
    • Client:
    • Server:

Relevant logs

(Please provide any relevant log snippets you have collected, using code blocks (```) to format)

Edited by Yuhao Zhang