Skip to content

Add support for the workhorse google storage client configuration

David Fernandez requested to merge 10io-workhorse-google-native-client into master

Context

In gitlab!96891 (merged), workhorse was updated so that a google cloud storage client could be setup. This helps to have more reliable uploads and unblocks bucket encryption. See #7324 (closed).

This configuration should be used in workhorse only when:

  • A consolidated object storage configuration is used.
  • A Google provider is used.
  • One of these parameters is set:
    • google_application_default
    • google_json_key_string
    • google_json_key_location

Lastly, note that this part of workhorse is gated behind a feature flag in rails. Basically, rails will instruct workhorse to use either:

  • a presigned url (this is what is used today and what is used when the feature flag is disabled)
  • the workhorse google cloud storage client (used when the feature flag is enabled).

Since, the feature flag is currently disabled by default, this MR will have no impact.

🔬 What does this MR do?

  • Map the object storage configuration from the rails config to the workhorse config file when the conditions are met.
  • Update the related specs.

How to validate this locally

🔧 Setup

  1. Setup an omnibus development environment as described in https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/master/doc/development/setup.md.
  2. Make sure to pull the changes of this MR branch as described in https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/master/doc/development/setup.md#get-the-source-of-omnibus-gitlab.
  3. Make sure that you have an Google Cloud Storage bucket ready with a service account and its related json file.
  4. Enable the related feature flag in a # gitlab-rails console:
    Feature.enable(:workhorse_google_client)
    • This step is important. If not enabled, the rails backend will not instruct workhorse to use its google client (and use a pre signed url instead).

Now that we have an omnibus "instance" running, let's configure object storage.

  1. In /etc/gitlab/gitlab.rb:
    gitlab_rails['object_store']['enabled'] = true
    gitlab_rails['object_store']['proxy_download'] = true
    gitlab_rails['object_store']['connection'] = {
       <this is what we will update through our scenarios>
    }
    gitlab_rails['object_store']['objects']['artifacts']['bucket'] = '<bucket>'
    gitlab_rails['object_store']['objects']['artifacts']['proxy_download'] = false
    gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = '<bucket>'
    gitlab_rails['object_store']['objects']['lfs']['bucket'] = '<bucket>'
    gitlab_rails['object_store']['objects']['uploads']['bucket'] = '<bucket>'
    gitlab_rails['object_store']['objects']['packages']['bucket'] = '<bucket>'
    gitlab_rails['object_store']['objects']['dependency_proxy']['enabled'] = false
    gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = '<bucket>'
    gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = '<bucket>'
    gitlab_rails['object_store']['objects']['pages']['bucket'] = '<bucket>'

The testing scenario

We are going to keep it nice and simple and use the generic package registry. Basically, we're going to upload a dummy file to the GitLab generic package registry and assert that workhorse used its google cloud storage client to upload that file to object storage.

  1. Have a project + personal access token ready.
  2. Execute (from outside the omnibus instance) $ curl --upload-file <dummy file> "http://<user>:<pat>@<base_url>/api/v4/projects/<project_id>/packages/generic/my/1.1.2/file.txt"
  3. Check the workhorse logs ($ tail -f /var/log/gitlab/gitlab-workhorse/current), it should contain a line similar to this one:
    {"client_mode":"go_cloud:Google","copied_bytes":8,"correlation_id":"01GJG2WCGK5TFARQSY6QM7DJSV","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1669134693-23742-0001-4032-0dde1427ae53d9167356b065ff491342","remote_temp_object":"tmp/uploads/1669134693-23742-0001-4032-0dde1427ae53d9167356b065ff491342","time":"2022-11-22T16:31:34Z"}
    • The important part is client_mode. It MUST be set to go_cloud:Google. This is workhorse saying that it is using its own google cloud storage client to upload the file which is what we want

1️⃣ With google_application_default

This configuration is challenging in the sense that the google libraries will check default locations in this mode.

Fortunately, one of these locations is an environment variable. As such, we can configure it and point to the json file.

  • Put the json file somewhere reachable:
    # nano /etc/gitlab/object_storage.json
  • Update the /etc/gitlab/gitlab.rb file with this line:
gitlab_rails['object_store']['connection'] = {
   'provider' => 'Google',
   'google_project' => 'dfernandez-5494dd2c',
   'google_application_default' => true
}
  • Now, update the /etc/gitlab/gitlab.rb file to set environment variables. We have to do this for the rails and workhorse service:
gitlab_rails['env'] = {
    'GOOGLE_APPLICATION_CREDENTIALS' => '/etc/gitlab/object_storage.json'
}
gitlab_workhorse['env'] = {
    'GOOGLE_APPLICATION_CREDENTIALS' => '/etc/gitlab/object_storage.json'
}
  • Reconfigure with: # gitlab-ctl reconfigure. (# gitlab-ctl restart might be needed.)
  • Check the workhorse configuration with # less /var/opt/gitlab/gitlab-workhorse/config.toml. The google_application_default should be set to true.

Try the testing scenario, it should work.

2️⃣ With google_json_key_string

In this configuration, the parameter holds the entire json file contents.

  • Update the /etc/gitlab/gitlab.rb file with this line:
gitlab_rails['object_store']['connection'] = {
   'provider' => 'Google',
   'google_project' => 'dfernandez-5494dd2c',
   'google_json_key_string' => '
     <the exact contents of the json service account file>
   '
}
  • Reconfigure with: # gitlab-ctl reconfigure. (# gitlab-ctl restart might be needed.)
  • Check the workhorse configuration with # less /var/opt/gitlab/gitlab-workhorse/config.toml. The content of the json file should be there.

Try the testing scenario, it should work.

3️⃣ With google_json_key_location

In this configuration, the parameter points to the json file location path.

  • Put the json file somewhere reachable:
    # nano /etc/gitlab/object_storage.json
  • Update the /etc/gitlab/gitlab.rb file with this line:
gitlab_rails['object_store']['connection'] = {
   'provider' => 'Google',
   'google_project' => 'dfernandez-5494dd2c',
   'google_json_key_location' => '/etc/gitlab/object_storage.json'
}
  • Reconfigure with: # gitlab-ctl reconfigure. (# gitlab-ctl restart might be needed.)
  • Check the workhorse configuration with # less /var/opt/gitlab/gitlab-workhorse/config.toml. The content of the json path should be there.

Try the testing scenario, it should work.

Related issues

#7324 (closed)

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion

Required

  • Merge Request Title, and Description are up to date, accurate, and descriptive
  • MR targeting the appropriate branch
  • MR has a green pipeline on GitLab.com
  • Pipeline is green on dev.gitlab.org if the change is touching anything besides documentation or internal cookbooks
  • trigger-package has a green pipeline running against latest commit

Expected (please provide an explanation if not completing)

Edited by Jason Young

Merge request reports