Skip to content

Augment GCS signed URLs with GitLab metadata for package registry

Context

The container registry's epic on the same topic greatly explained how we can achieve the instrumentation of the data transfer.

However, unlike the container registry, the package registry doesn't use Cloud CDN on gitlab.com. It uses signed URLs/redirects for GCS.

So, according to the instrumentation blueprint, the package registry needs to send some metadata to GCS when a package file is downloaded.

Those metadata are:

  • the package file's root namespace id
  • the package file's project id (if any)
  • the package file's size

When a package file is downloaded, GCS will include those metadata in its logs for the download request. Those logs are aggregated and processed to get the data transfer usage statistics.

Implementation

When a package file is requested for download, a carrierwave's method named url is called to generate the signed URL. To be able to append the metadata to the download URL, we have to override the url method and append whatever we want to the URL and then call its super method.

What this MR does?

  • Create a module named Packages::GcsSignedUrlMetadata. This module has the logic of overriding the url method and append the needed metadata.
  • Include Packages::GcsSignedUrlMetadata module in each package file uploader. The uploader is a class that inherits from CarrierWave::Uploader::Base. So it's the place where the url method is being called. Including the Packages::GcsSignedUrlMetadata module in the uploader allow us to override the url method.
  • Modify the underlying model of each uploader to make sure it implements the three needed metadata:
    • project_id
    • root_namespace
    • size
  • Add the needed specs.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Testing this feature requires using Google Cloud Storage as the Object Storage.

  1. Create a GCS new project or use your existing one if any.
  2. Create a bucket in your GCS project and make sure to handle the needed permissions in order to have access to the audit logs (I can help in setting this up)
  3. Create a service account on GCS and download its credentials JSON file (Needed to connect GDK to GCS).
  4. Configure your GDK to use your GCS as the Object Storage:
    • in your gitlab.yml, update the packages section as follows:
      ## Packages (maven repository, npm registry, etc...)
      packages:
        enabled: true
        dpkg_deb_path: /opt/homebrew/bin/dpkg-deb
        object_store:
          enabled: true
          remote_directory: <name of gcs bucket>
          direct_upload: true
          connection:
            provider: 'Google'
            google_project: '<your gcs project id>'
            google_json_key_location: '<path to your gcs service account json file>'
    • Restart your GDK.
  5. In rails console, create a package that we can test with:
 # stub file upload
 def fixture_file_upload(*args, **kwargs)
   Rack::Test::UploadedFile.new(*args, **kwargs)
 end

 FactoryBot.create(:generic_package)
  1. Download the package from its UI page.
  2. On your GCS project, check out the logs of your service account: IAM & Admin => Service Accounts => Click on your service account => LOGS tab
  3. You should find the requests done on the bucket logged. The latest log entry should be the package file download request log: storage.objects.get. In the log entry details, the metadata we send in the signed URL should be present:
 {
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {},
    "authenticationInfo": {
      "principalEmail": "XXXX@XXX.iam.gserviceaccount.com"
    },
    "requestMetadata": {
      "callerIp": "XXXX",
      "requestAttributes": {
        "time": "2024-03-18T18:40:55.423539123Z",
        "auth": {}
      },
      "destinationAttributes": {}
    },
    "serviceName": "storage.googleapis.com",
    "methodName": "storage.objects.get",
    "authorizationInfo": [
      {
        "resource": "path_to_file/ananas.txt",
        "permission": "storage.objects.get",
        "granted": true,
        "resourceAttributes": {}
      }
    ],
    "resourceName": "path_to_file/ananas.txt",
    "metadata": {
      "audit_context": {
        "app_context": "EXTERNAL",
        "audit_info": {
          "x-goog-custom-audit-gitlab-size-bytes": "10",
          "x-goog-custom-audit-gitlab-namespace": "24",
          "x-goog-custom-audit-gitlab-project": "2"
        }
      }
    },
    "resourceLocation": {
      "currentLocations": [
        "eu"
      ]
    }
  },
  "insertId": "XXXX",
  "resource": {
    "type": "gcs_bucket",
    "labels": {
      "location": "eu",
      "bucket_name": "XXXXX",
      "project_id": "XXXXX"
    }
  },
  "timestamp": "2024-03-18T18:40:55.414995195Z",
  "severity": "INFO",
  "logName": "projects/XXXX/logs/cloudaudit.googleapis.com%2Fdata_access",
  "receiveTimestamp": "2024-03-18T18:40:56.685214421Z"
}

As we can see in the log entry, the metadata are present:

"x-goog-custom-audit-gitlab-size-bytes": "10",
"x-goog-custom-audit-gitlab-namespace": "24",
"x-goog-custom-audit-gitlab-project": "2"

Related to #443335

Edited by Moaz Khalifa

Merge request reports