Disable `rake gitlab:cleanup:remote_upload_files` with bucket prefix

What does this MR do and why?

In GitLab 15.0 !91307 (merged) added official support for configuring an object storage bucket with a prefix. However, this Rake task doesn't take this bucket prefix into account and attempts to iterate through all files in the bucket. If the dry-run flag is disabled, this Rake task also moves all files into the lost and found directory.

Unfortunately, it does not appear Fog provides an easy, cloud-agnostic way to list all files in a bucket with a prefix filter. In addition, at least for Azure Blob Storage, there isn't a standardized method to distinguish a directory from a regular file using Fog.

For these reasons, this commit disables this Rake task if a prefix is configured to prevent data loss.

This Rake task should probably be dropped for a number of reasons:

  1. It's not used very much.
  2. It requires bucket permissions to list all files. Our documented permissions for object storage buckets don't grant these privileges.
  3. It requires walking through the entire bucket and doing a database query for each batch size. This is quite slow, and it doesn't scale well as more objects are added.

Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/415537

How to set up and validate locally

  1. Configure object storage with a bucket prefix. For example, in gdk.yml:
object_store:
  connection:
    provider: AzureRM
    azure_storage_account_name: REDACTED-STORAGE
    azure_storage_access_key: REDACTED-KEY
  consolidated_form: true
  enabled: true
  objects:
    artifacts:
      bucket: test1/artifacts
    external_diffs:
      bucket: test1/external_diffs
    lfs:
      bucket: test1/lfs
    uploads:
      bucket: test1/uploads
    packages:
      bucket: test1/packages
    dependency_proxy:
      bucket: test1/dependency-proxy
    terraform_state:
      bucket: test1/terraform
    pages:
      bucket: test1/pages
    ci_secure_files:
      bucket: test1/ci_secure_files
  1. Run gdk reconfigure.
  2. Run bin/rake gitlab:cleanup:remote_upload_files.

You should see:

% bin/rake gitlab:cleanup:remote_upload_files
	rake aborted!
Uploads are configured with a bucket prefix 'uploads'.
Unfortunately, prefixes are not supported for this Rake task.
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/cleanup/remote_uploads.rb:24:in `run!'
/Users/stanhu/gdk-ee/gitlab/lib/tasks/gitlab/cleanup.rake:47:in `block (3 levels) in <main>'
Tasks: TOP => gitlab:cleanup:remote_upload_files
(See full trace by running task with --trace)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports

Loading