Skip to content

Disable Content-Type sniffing when using Go Cloud to upload

What does this MR do and why?

It was revealed that the gocloud client that is used to upload objects to GCP and Azure will determine the content type of the uploading object by reading first bytes. For the Dependency Proxy that's not correct way and will result in a wrong Content-Type for uploaded objects.

In order to prevent the gocloud client to set the Content-Type for us, we can set it by ourself.

This MR pipes the Content-Type header from the Rails to the Workhorse for the direct upload to Google Cloud Storage using gocloud when the feature flag workhorse_google_client is enabled. Then gocloud client can pickup the Content-Type header from the request.

How to set up and validate locally

  1. Create a Google Cloud Storage Bucket dependency-proxy and get the credentials file.

  2. Update GitLab gdk.yml to:

    object_store:
      enabled: true
      proxy_download: false
      direct_upload: true
      connection:
        provider: Google
        google_project: <google project id>
        google_json_key_location: <credentials key location>
      consolidated_form: true

    and also enable the Container Registry docs:

    registry:
      enabled: true
  3. Update the Workhorse config to:

    [object_storage]
      provider = "Google"
    
    [object_storage.google]
      google_json_key_location = <credentials key location>
  4. In Rails console enable the feature flag

    Feature.enable(:workhorse_google_client)
  5. Prepare the group with enabled the Dependency Proxy

  6. Grab the Dependency Proxy image prefix from

    http://gdk.test:3000/groups/<path to the group>/-/dependency_proxy

  7. Pull a new image using the Dependency Proxy image prefix. (Remove the alpine images with the same image's prefix if they exist)

    docker pull <dependency proxy image prefix>/alpine
  8. Verify the Content-Type property of the recently uploaded objects in Google Cloud Storage.

    Now for the blobs it should be application/octet-stream.
    And for the manifests either application/vnd.docker.distribution.manifest.list.v2+json or application/vnd.docker.distribution.manifest.v2+json depending on the manifest's type.

Azure Blob storage validation (optional)

Additionally I validated the existing upload to Azure Blob storage since this isn't behind a feature flag and the changes from the current MR will have an immediate effect there.

  1. Azure might be set as an object storage.

  2. Pull a new image using the Dependency Proxy image prefix. (Remove the alpine images with the same image's prefix if they exist)

    docker pull <dependency proxy image prefix>/alpine
  3. Verify the Content-Type property of the recently uploaded objects in Azure Storage.

    Now for the blobs it should be application/octet-stream.
    And for the manifests either application/vnd.docker.distribution.manifest.list.v2+json or application/vnd.docker.distribution.manifest.v2+json depending on the manifest's type.

I uploaded the screenshot from my test:

Screenshot_2023-12-11_at_18.08.04

The content-type of the manifest file is set to application/vnd.docker.distribution.manifest.v2+json

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #422377 (closed)

Edited by Dzmitry (Dima) Meshcharakou

Merge request reports