Skip to content

Add support for Google CDN

Stan Hu requested to merge sh-add-google-cdn into master

What does this MR do and why?

This commit adds base support for using Google CDN in front of an object storage bucket to save costs. Note that this commit does not actually enable the use of the CDN yet; additional configuration and changes to the callers must be made for this to work (see the patch below).

To enable a bucket to use the Google CDN:

  1. A CarrierWave uploader should include ObjectStorage::CDN::Concern.
  2. Instead of calling #url, a caller can now call #use_cdn? and #cdn_signed_url.

At the moment, transfers within the Google Cloud Platform (GCP) network between Google Cloud Storage and virutal machines are free, so serving out artifacts to GCP machines would waste money in the form of egress traffic.

To avoid that, #use_cdn? will fetch the list of valid, public Google IP ranges and exclude IPs that are private or within the public Google network.

#cdn_signed_url will create a pre-signed URL valid for 10 minutes by default. The Google CDN must be set up so that a Google Cloud Storage bucket is behind a unique URL.

Relates to https://gitlab.com/gitlab-org/gitlab/-/issues/360462

How to set up and validate locally

As described in https://gitlab.com/gitlab-org/container-registry/-/issues/535#note_792288038:

Setting up Google CDN

  1. Created an GCS test bucket.
  2. Followed https://cloud.google.com/cdn/docs/setting-up-cdn-with-bucket to create an HTTPS load balancer with a static IP. I let Google create the HTTPS certs and assigned the domain stanhu-cdn.example.org.
  3. Registered the load balancer IP with that domain.
  4. Continued https://cloud.google.com/cdn/docs/using-signed-urls with registering a signing key and giving permissions to the bucket.

Testing this merge request

  1. Create a GCS VM and install the latest GitLab nightly build.
  2. Per https://docs.gitlab.com/ee/administration/object_storage.html#google-example-with-adc-consolidated-form, I had to stop the VM and grant it Allow full access to all Cloud APIs.
  3. Tweaked the default service account permissions by limiting access with Service Account Token Creator and giving it access to read/write storage buckets.
  4. Enabled IAM Service Account Credentials API in https://console.cloud.google.com/apis/library/iamcredentials.googleapis.com. (This wasn't documented; I ran into error messages before I enabled it).
  5. In my Omnibus config, I have:
external_url 'https://gitlab.example.com'
gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['connection'] = {
    'provider' => 'Google',
    'google_project' => 'stan-redacted',
    'google_application_default' => true
}
gitlab_rails['object_store']['proxy_download'] = false

bucket = 'stanhu-test'
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = "#{bucket}/artifacts"

gitlab_rails['object_store']['objects']['artifacts']['cdn'] = {
  'provider' => 'Google',
  'url' => 'https://stanhu-cdn.example.org',
  'key_name' => 'stanhu-key',
  'key' => '<REDACTED KEY>'
}

gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = "#{bucket}/external_diffs"
gitlab_rails['object_store']['objects']['lfs']['bucket'] = "#{bucket}/lfs"
gitlab_rails['object_store']['objects']['uploads']['bucket'] = "#{bucket}/uploads"
gitlab_rails['object_store']['objects']['packages']['bucket'] = "#{bucket}/packages"
gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = "#{bucket}/dependency_proxy"
gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = "#{bucket}/terraform_state"
gitlab_rails['object_store']['objects']['ci_secure_files']['bucket'] = "#{bucket}/ci_secure_files"
  1. Apply this patch:
diff --git a/app/uploaders/job_artifact_uploader.rb b/app/uploaders/job_artifact_uploader.rb
index 83dc1030606..b38e7d93eac 100644
--- a/app/uploaders/job_artifact_uploader.rb
+++ b/app/uploaders/job_artifact_uploader.rb
@@ -3,6 +3,7 @@
 class JobArtifactUploader < GitlabUploader
   extend Workhorse::UploadPath
   include ObjectStorage::Concern
+  include ObjectStorage::CDN::Concern
 
   UnknownFileLocationError = Class.new(StandardError)
 
diff --git a/config/object_store_settings.rb b/config/object_store_settings.rb
index 3280bc284ad..ca26551f27e 100644
--- a/config/object_store_settings.rb
+++ b/config/object_store_settings.rb
@@ -3,7 +3,7 @@
 # Set default values for object_store settings
 class ObjectStoreSettings
   SUPPORTED_TYPES = %w(artifacts external_diffs lfs uploads packages dependency_proxy terraform_state pages secure_files).freeze
-  ALLOWED_OBJECT_STORE_OVERRIDES = %w(bucket enabled proxy_download).freeze
+  ALLOWED_OBJECT_STORE_OVERRIDES = %w(bucket enabled proxy_download cdn).freeze
 
   # To ensure the one Workhorse credential matches the Rails config, we
   # enforce consolidated settings on those accelerated
diff --git a/lib/api/helpers.rb b/lib/api/helpers.rb
index e29d76a5950..8d9d84d6fcf 100644
--- a/lib/api/helpers.rb
+++ b/lib/api/helpers.rb
@@ -612,11 +612,20 @@ def present_carrierwave_file!(file, supports_direct_download: true)
       return not_found! unless file&.exists?
 
       if file.file_storage?
-        present_disk_file!(file.path, file.filename)
-      elsif supports_direct_download && file.class.direct_download_enabled?
-        redirect(file.url)
+        return present_disk_file!(file.path, file.filename)
+      end
+
+      url =
+        if file.use_cdn?(ip_address)
+          file.cdn_signed_url
+        else
+          file.url
+        end
+
+      if supports_direct_download && file.class.direct_download_enabled?
+        redirect(url)
       else
-        header(*Gitlab::Workhorse.send_url(file.url))
+        header(*Gitlab::Workhorse.send_url(url))
         status :ok
         body '' # to avoid an error from API::APIGuard::ResponseCoercerMiddleware
       end
  1. Run a CI job with multiple stages, or use the artifacts API (e.g. https://stanhu.example.org/api/v4/projects/2/jobs/4/artifacts) to download the file. Notice the URL has the CDN.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports