Skip to content

Add support for the workhorse google client

David Fernandez requested to merge 10io-workhorse-google-client into master

🃏 Context

During direct uploads, there is a step where Workhorse will ask Rails (/authorize request): Hey, I want to upload this file. Tell me where to upload it. Rails will reply: Sure, upload it here.

From the initial version, here was a presigned url. It's basically an URL that is presigned for the given object storage and workhorse can simply do a standard PUT request against that URL to upload the file.

The consolidated configuration of Object storage allows us to go a step further:

  • Workhorse will build the configured object storage client on its side.
  • When rails reply, here is basically a bucket name + key. No more pre signed url.

By using native clients, the upload gets more reliable.

The above is used for Azure Blob Storage.

While checking how to add encryption for Google Cloud Storage, I noticed that we are not using the workhorse client for the Google provider. We still use the presigned url.

This is not great as the client used for Azure Blob Storage is a "general" library that supports GCS. So why not use that?

This MR aims to support the workhorse google client.

The workhorse google client can also bring more reliable uploads. See this:

By default, resumable uploads occur automatically when the file is larger than 16 MiB. You change the cutoff for performing resumable uploads with Writer.ChunkSize. Resumable uploads are always chunked when using the Go client library.

The important word here is automatically: the underlying go client do the heavy lifting for us.

This is issue #372593 (closed).

🔬 What does this MR do and why?

  • workhorse
    • Support google as an object storage provider configuration.
      • Load and configure the proper URL opener when that configuration is read.
      • Support the google_json_key_location, google_json_key_string and google_application_default parameters from the GitLab config.
        • Credentials are checked in this order: google_application_default, google_json_key_string and google_json_key_location.
    • Create the background context earlier so that it's available when loading the configuration.
    • Add/Update the related tests.
  • rails
    • Update Rails so that the response to the /authorize calls for the use of the workhorse client or not.
    • Add/Updated the related specs.

This change is behind a feature flag: workhorse_google_client.

We will probably need an update in https://gitlab.com/gitlab-org/omnibus-gitlab so that the Google object storage configuration from gitlab rails is "translated" into the workhorse configuration.

🖥 Screenshots or screen recordings

Here are the uploads I tested. I choose a list of uploads where the direct upload is used for some and not for the others.

Test feature flag disabled feature flag enabled
nuget package
maven package
generic package
npm package
graphql
CI artifact
user avatar
git LFS

The change looks stable

How to set up and validate locally

  1. Have GDK ready with object storage support.
  2. Enable consolidated configuration.
  3. Create a Google Cloud Storage Bucket and get the credentials file.
  4. Update the GitLab config to:
    object_store:
     enabled: true
     proxy_download: false
     direct_upload: true
     remote_directory: <bucket name>
     connection:
       provider: Google
       google_project: <project name>
       google_client_email: <client email>
       google_json_key_location: <credentials key location>
     objects: {"artifacts":{"bucket":"artifacts"},"external_diffs":{"bucket":"external-diffs"},"lfs":{"bucket":"lfs-objects"},"uploads":{"bucket":"uploads"},"packages":{"bucket":"<bucket name>"},"dependency_proxy":{"bucket":"dependency-proxy"},"terraform_state":{"bucket":"terraform"},"pages":{"bucket":"pages"}}
  5. Update the Workhorse config to:
    [object_storage]
      provider = "Google"
    
    [object_storage.google]
      google_json_key_location = <credentials key location>

We're now ready to play! We're going to use the Generic Package Registry for the test as we can upload files there with simple $ curl commands.

  1. Create a project.

  2. Create a dummy.txt file with whatever content you want.

  3. Upload it:

    curl --header "PRIVATE-TOKEN: <pat>" --upload-file ./dummy.txt "http://gdk.test:8000/api/v4/projects/<project_id>/packages/generic/my_awesome_package/1.3.7/ananas.txt"
  4. Check the workhorse logs:

    {"client_mode":"presigned_put","copied_bytes":8,"correlation_id":"01GBZE29AK2EN4WFK2RSA36S51","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1662133548-89391-0001-1404-0e0c60b088070249d1ab665f10bb5864","remote_temp_object":"","time":"2022-09-02T17:45:49+02:00"}
    • Check the client_mode. presigned_put 😿
  5. Now, let's enable the feature flag :

    Feature.enable(:workhorse_google_client)
  6. Re-upload the same file. (that's fine, duplicated uploads are allowed in the Generic Package Registry):

    curl --header "PRIVATE-TOKEN: <pat>" --upload-file ./dummy.txt "http://gdk.test:8000/api/v4/projects/<project_id>/packages/generic/my_awesome_package/1.3.7/ananas.txt"
  7. Check the workhorse logs:

    {"client_mode":"go_cloud:Google","copied_bytes":8,"correlation_id":"01GBZE573K7Z18EK801XV52Y41","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1662133644-89390-0001-0079-24772d1d8c49573933c5d7a673f113b6","remote_temp_object":"tmp/uploads/1662133644-89390-0001-0079-24772d1d8c49573933c5d7a673f113b6","time":"2022-09-02T17:47:25+02:00"}
    • Check the client_mode. go_cloud:Google woot! that means that the workhorse client for google has been used! 🎉

🚥 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports