Skip to content

Add support for the workhorse GCS client

David Fernandez requested to merge 4009-workhorse-gcs-client-support into master

🏀 Context

In gitlab-org/gitlab!96891 (merged), workhorse was updated so that a google cloud storage client could be setup. This helps to have more reliable uploads and unblocks bucket encryption. See #4009 (closed).

This configuration should be used in workhorse only when:

  • A consolidated object storage configuration is used.
  • A Google provider is used.
  • One of these parameters is set:
    • google_application_default
    • google_json_key_string
    • google_json_key_location

Lastly, note that this part of workhorse is gated behind a feature flag in rails. Basically, rails will instruct workhorse to use either:

  • a presigned url (this is what is used today and what is used when the feature flag is disabled)
  • the workhorse google cloud storage client (used when the feature flag is enabled).

Since, the feature flag is currently disabled by default, this MR will have no impact on uploads.

🔬 What does this MR do?

  • Update the workhorse.object_storage.config template so that if the proper conditions are detected, it will generate the correct workhorse configuration file for google cloud storage.
  • Update a related spec.

Related issues

#4009 (closed)

This is the mirror change of this omnibus change: gitlab-org/omnibus-gitlab!6530 (merged)

🤔 How to validate this locally?

As we can see here, we have 3 different settings.

Now, we don't need all 3. It's actually the opposite: only one of them is needed. We thus have 3 configurations to test here.

We're going to need:

  • a k8s cluster ready.
  • a GCS bucket.
  • a google service account that can write to that account.
  • a google key associated with that service and in the json format.

To have a look in logs, we use kail.

As we will see, only one of the parameters can be used without updating deployment files (use case 2️⃣) but for completeness here, we go through all 3 possible parameters.

The testing scenario

We are going to keep it nice and simple and use the generic package registry. Basically, we're going to upload a dummy file to the GitLab generic package registry and assert that workhorse used its google cloud storage client to upload that file to object storage.

  1. Have a project + personal access token ready.

  2. Execute (from outside the omnibus instance)

    $ curl --upload-file <dummy file> "http://<user>:<pat>@<base_url>/api/v4/projects/<project_id>/packages/generic/my/1.1.2/file.txt"
  3. Check the workhorse logs ($ tail -f /var/log/gitlab/gitlab-workhorse/current), it should contain a line similar to this one:

    default/gitlab-webservice-default-6d9bbc8864-k25zv[gitlab-workhorse]: {"client_mode":"presigned_put","copied_bytes":8,"correlation_id":"01GWVPHHNP1HN3MV5RVV13FS1S","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1680261826-132-0003-4213-603ba2ba17befa66fde116f5253fcb9e","remote_temp_object":"","time":"2023-03-31T11:23:47Z"}
    • This is the proof that the upload was successful. Please note (the client_mode) that we are not using the workhorse gcs client that this MR will allow. That's because, this decision is done by rails and currently, it's behind a feature flag that is disabled by default.

Another way to confirm that the scenario went ok, is trying to download the file:

$ curl "http://<user>:<pat>@<base_url>/api/v4/projects/<project_id>/packages/generic/my/1.1.2/file.txt"

You should get the file contents back.

🐰 Going further

So you want to use the workhorse gcs client? Fine, let's enable the feature flag :

$ kubectl exec -it <gitlab-webservice pod name> -c webservice -- /bin/bash

(in the container) $ cd /srv/gitlab/

$ ./bin/rails c

irb(main):001:0> Feature.enable(:workhorse_google_client)
irb(main):002:0> exit

$ exit

Try to upload the file with curl again.

This time around, workhorse logs will show this:

default/gitlab-webservice-default-6d9bbc8864-k25zv[gitlab-workhorse]: {"client_mode":"go_cloud:Google","copied_bytes":8,"correlation_id":"01GWVQCNH2MZ5YKR43WD29TBHW","filename":"upload","is_local":false,"is_multipart":false,"is_remote":true,"level":"info","msg":"saved file","remote_id":"1680262715-199-0001-4337-4e293e8145637540b9eb6b965d95ef30","remote_temp_object":"tmp/uploads/1680262715-199-0001-4337-4e293e8145637540b9eb6b965d95ef30","time":"2023-03-31T11:38:36Z"}

Notice the client_mode. It's go_cloud:Google. That means that workhorse used its own GCS client to upload the file 🎉

If you still have doubts, you can always check the bucket on GCS. Your file will be there 😸

Setting 1️⃣ google_application_default

This configuration is challenging in the sense that the google libraries will check default locations in this mode.

Fortunately, one of these locations is an environment variable. As such, we can configure it and point to the json file.

To keep this simple, we're going to have a k8s secret that is the contents of the google json key file and write that secret to a specific file, then point that file with the GOOGLE_APPLICATION_CREDENTIALS environment variable.

  1. Update charts/gitlab/charts/webservice/templates/deployment.yaml with this:

    Diff
    diff --git a/charts/gitlab/charts/webservice/templates/deployment.yaml b/charts/gitlab/charts/webservice/templates/deployment.yaml
    index 95111a72a..58017fd20 100644
    --- a/charts/gitlab/charts/webservice/templates/deployment.yaml
    +++ b/charts/gitlab/charts/webservice/templates/deployment.yaml
    @@ -203,6 +203,8 @@ spec:
                   value: '/var/opt/gitlab/templates'
                 - name: CONFIG_DIRECTORY
                   value: '/srv/gitlab/config'
    +            - name: GOOGLE_APPLICATION_CREDENTIALS
    +              value: '/etc/secret-volume/key'
                 {{- if $.Values.metrics.enabled }}
                 - name: prometheus_multiproc_dir
                   value: /metrics
    @@ -262,6 +264,9 @@ spec:
                 - name: webservice-secrets
                   mountPath: '/etc/gitlab'
                   readOnly: true
    +            - name: secret-volume
    +              mountPath: /etc/secret-volume
    +              readOnly: true
                 - name: webservice-secrets
                   mountPath: /srv/gitlab/config/secrets.yml
                   subPath: rails-secrets/secrets.yml
    @@ -359,6 +364,8 @@ spec:
                   value: '/var/opt/gitlab/templates'
                 - name: CONFIG_DIRECTORY
                   value: '/srv/gitlab/config'
    +            - name: GOOGLE_APPLICATION_CREDENTIALS
    +              value: '/etc/secret-volume/key'
                 {{- if .workhorse.sentryDSN }}
                 - name: GITLAB_WORKHORSE_SENTRY_DSN
                   value: {{ .workhorse.sentryDSN }}
    @@ -372,6 +379,9 @@ spec:
                 - name: workhorse-secrets
                   mountPath: '/etc/gitlab'
                   readOnly: true
    +            - name: secret-volume
    +              mountPath: /etc/secret-volume
    +              readOnly: true
                 - name: shared-upload-directory
                   mountPath: /srv/gitlab/public/uploads/tmp
                   readOnly: false
    @@ -429,6 +439,9 @@ spec:
           - name: workhorse-config
             configMap:
                 name: {{ $.Release.Name }}-workhorse-{{ .name }}
    +      - name: secret-volume
    +        secret:
    +          secretName: google-key-json
           - name: init-webservice-secrets
             projected:
               defaultMode: 0400
  2. Let's create a rails.gcs.yml:

    provider: Google
    google_project: <google project id>
    google_application_default: true
  3. Let's create the object storage secret:

    $ kubectl create secret generic gitlab-object-storage --from-file=connection=rails.gcs.yaml
  4. Let's create a secret with the google key json file:

    $ kubectl create secret generic google-key-json --from-file=key=<full path to google key json file>
  5. Lastly, let's reate additional values.yml file to read that object storage secret (and also disable minio):

    global:
      minio:
        enabled: false
      registry:
        bucket: <bucket name>
      appConfig:
        object_store:
          enabled: true
          connection:
            secret: gitlab-object-storage
            key: connection
        lfs:
          bucket: <bucket name>
        artifacts:
          bucket: <bucket name>
        uploads:
          bucket: <bucket name>
        packages:
          bucket: <bucket name>
        backups:
          bucket: <bucket name>
  6. Let's deploy the gitlab chart with the additional file (we use the "minikube minimum" base):

    $ helm upgrade --install gitlab . --timeout 600s -f ./examples/values-minikube-minimum.yaml -f values.yml 

Checking the workhorse logs ($ kail -c gitlab-workhorse):

default/gitlab-webservice-default-675c6cddc5-9d46l[gitlab-workhorse]: {"address":"0.0.0.0:8181","level":"info","msg":"Running upstream server","network":"tcp","time":"2023-03-30T13:03:54Z"}
default/gitlab-webservice-default-675c6cddc5-9d46l[gitlab-workhorse]: {"address":"/tmp/gitlab/workhorse.sock","level":"info","msg":"Running upstream server","network":"unix","time":"2023-03-30T13:03:54Z"}

Workhorse booted normally

Let's check its config:

$ kubectl exec -it <gitlab-webservice pod name> -c gitlab-workhorse -- /bin/bash 

(inside the gitlab-workhorse container) $ cat /srv/gitlab/config/workhorse-config.toml 

We get this config content:

shutdown_timeout = "61s"
[redis]
URL = "redis://gitlab-redis-master.default.svc:6379"
Password = "xxx"
[object_storage]
provider = "Google"
# Google storage configuration.
[object_storage.google]
google_application_default = true
[image_resizer]
max_scaler_procs = 2
max_filesize = 250000
[[listeners]]
network = "tcp"
addr = "0.0.0.0:8181"

object.storage and object.storage.google sections are properly configured

The testing scenario is working with this config

Setting 2️⃣ google_json_key_string

Alright, this is the easiest configuration to test because it's the one in the charts example file.

Basically, we pass the contents of the google key file.

With a k8s cluster, ready (and empty),

  1. Create a rails.gcs.yaml file with:
    provider: Google
    google_project: <google project id>
    google_json_key_string: |
      <exact contents of the json key file>
  2. Create a k8s secret out of that file:
    $ kubectl create secret generic gitlab-object-storage --from-file=connection=rails.gcs.yaml
  3. Create additional values.yml file to read that secret (and also disable minio):
    global:
      minio:
        enabled: false
      registry:
        bucket: <bucket name>
      appConfig:
        object_store:
          enabled: true
          connection:
            secret: gitlab-object-storage
            key: connection
        lfs:
          bucket: <bucket name>
        artifacts:
          bucket: <bucket name>
        uploads:
          bucket: <bucket name>
        packages:
          bucket: <bucket name>
        backups:
          bucket: <bucket name>
  4. Let's deploy the gitlab chart with the additional file (we use the "minikube minimum" base):
    $ helm upgrade --install gitlab . --timeout 600s -f ./examples/values-minikube-minimum.yaml -f values.yml 

Checking the workhorse logs ($ kail -c gitlab-workhorse):

default/gitlab-webservice-default-745f57c88d-9ck7c[gitlab-workhorse]: {"address":"0.0.0.0:8181","level":"info","msg":"Running upstream server","network":"tcp","time":"2023-03-30T11:59:48Z"}
default/gitlab-webservice-default-745f57c88d-9ck7c[gitlab-workhorse]: {"address":"/tmp/gitlab/workhorse.sock","level":"info","msg":"Running upstream server","network":"unix","time":"2023-03-30T11:59:48Z"}

Workhorse was able to boot normally 👍

Let's check its config:

$ kubectl exec -it <gitlab-webservice pod name> -c gitlab-workhorse -- /bin/bash 

(inside the gitlab-workhorse container) $ cat /srv/gitlab/config/workhorse-config.toml 

We get this config content:

shutdown_timeout = "61s"
[redis]
URL = "redis://gitlab-redis-master.default.svc:6379"
Password = "xxx"
[object_storage]
provider = "Google"
# Google storage configuration.
[object_storage.google]
google_json_key_string = '''
<exact google key json file contents>
'''
[image_resizer]
max_scaler_procs = 2
max_filesize = 250000
[[listeners]]
network = "tcp"
addr = "0.0.0.0:8181"

That's the expected config for object_storage and object_storage.google.

The testing scenario is working with this config

Setting 3️⃣ google_json_key_location

This time around this value needs to point to the location of the google key json file.

For this, we're going to use the same approach to 1️⃣ but instead of having an environment variable, we directly point to the expected file location.

  1. Update charts/gitlab/charts/webservice/templates/deployment.yaml with this:

    Diff
    diff --git a/charts/gitlab/charts/webservice/templates/deployment.yaml b/charts/gitlab/charts/webservice/templates/deployment.yaml
    index 95111a72a..58017fd20 100644
    --- a/charts/gitlab/charts/webservice/templates/deployment.yaml
    +++ b/charts/gitlab/charts/webservice/templates/deployment.yaml
    @@ -262,6 +264,9 @@ spec:
                 - name: webservice-secrets
                   mountPath: '/etc/gitlab'
                   readOnly: true
    +            - name: secret-volume
    +              mountPath: /etc/secret-volume
    +              readOnly: true
                 - name: webservice-secrets
                   mountPath: /srv/gitlab/config/secrets.yml
                   subPath: rails-secrets/secrets.yml
    @@ -372,6 +379,9 @@ spec:
                 - name: workhorse-secrets
                   mountPath: '/etc/gitlab'
                   readOnly: true
    +            - name: secret-volume
    +              mountPath: /etc/secret-volume
    +              readOnly: true
                 - name: shared-upload-directory
                   mountPath: /srv/gitlab/public/uploads/tmp
                   readOnly: false
    @@ -429,6 +439,9 @@ spec:
           - name: workhorse-config
             configMap:
                 name: {{ $.Release.Name }}-workhorse-{{ .name }}
    +      - name: secret-volume
    +        secret:
    +          secretName: google-key-json
           - name: init-webservice-secrets
             projected:
               defaultMode: 0400
  2. Let's create a rails.gcs.yml:

    provider: Google
    google_project: <google project id>
    google_json_key_location: /etc/secret-volume/key
  3. Let's create the object storage secret:

    $ kubectl create secret generic gitlab-object-storage --from-file=connection=rails.gcs.yaml
  4. Let's create a secret with the google key json file:

    $ kubectl create secret generic google-key-json --from-file=key=<full path to google key json file>
  5. Lastly, let's create additional values.yml file to read that object storage secret (and also disable minio):

    global:
      minio:
        enabled: false
      registry:
        bucket: <bucket name>
      appConfig:
        object_store:
          enabled: true
          connection:
            secret: gitlab-object-storage
            key: connection
        lfs:
          bucket: <bucket name>
        artifacts:
          bucket: <bucket name>
        uploads:
          bucket: <bucket name>
        packages:
          bucket: <bucket name>
        backups:
          bucket: <bucket name>
  6. Let's deploy the gitlab chart with the additional file (we use the "minikube minimum" base):

    $ helm upgrade --install gitlab . --timeout 600s -f ./examples/values-minikube-minimum.yaml -f values.yml 

Checking the workhorse logs ($ kail -c gitlab-workhorse):

default/gitlab-webservice-default-7b65945595-h4r8p[gitlab-workhorse]: {"address":"0.0.0.0:8181","level":"info","msg":"Running upstream server","network":"tcp","time":"2023-03-30T13:25:52Z"}
default/gitlab-webservice-default-7b65945595-h4r8p[gitlab-workhorse]: {"address":"/tmp/gitlab/workhorse.sock","level":"info","msg":"Running upstream server","network":"unix","time":"2023-03-30T13:25:52Z"}

Workhorse was able to boot normally

Let's check its config:

$ kubectl exec -it <gitlab-webservice pod name> -c gitlab-workhorse -- /bin/bash 

(inside the gitlab-workhorse container) $ cat /srv/gitlab/config/workhorse-config.toml 

We get this config content:

shutdown_timeout = "61s"
[redis]
URL = "redis://gitlab-redis-master.default.svc:6379"
Password = "xxx"
[object_storage]
provider = "Google"
# Google storage configuration.
[object_storage.google]
google_json_key_location = "/etc/secret-volume/key"
[image_resizer]
max_scaler_procs = 2
max_filesize = 250000
[[listeners]]
network = "tcp"
addr = "0.0.0.0:8181"

That's the expected config for object_storage and object_storage.google.

The testing scenario is working with this config

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion.

Required

  • Merge Request Title and Description are up to date, accurate, and descriptive
  • MR targeting the appropriate branch
  • MR has a green pipeline on GitLab.com
  • When ready for review, MR is labeled "~workflow::ready for review" per the Distribution MR workflow

Expected (please provide an explanation if not completing)

  • Test plan indicating conditions for success has been posted and passes
  • Documentation created/updated
  • Tests added
  • Integration tests added to GitLab QA
  • Equivalent MR/issue for omnibus-gitlab opened
  • Validate potential values for new configuration settings. Formats such as integer 10, duration 10s, URI scheme://user:passwd@host:port may require quotation or other special handling when rendered in a template and written to a configuration file.
Edited by Jason Plum

Merge request reports