Read Package Metadata via Package Metadata DB (License DB) Load Balancer Instead of GCP Ruby Storage Library in GitLab Rails
Problem definition
A customer encountered an issue with their self-managed Gitlab instance, which was unable to synchronize with the package data stored in the License-db public buckets. This problem arose from a network connectivity error. The customer faced difficulty in determining the complete set of Google Cloud Platform (GCP) IP addresses that needed to be allowed for access.
Zendesk Ticket - internal only - for GitLab team members who can view the ticket
Proposal
We can use an External Application Load Balancer to reach the cloud storage buckets. This way anyone can use a static IP to reach the load balancer and hence the bucket, enabling users to add to their allowlist one IP address.
By introducing a load balancer behind a domain name that means we can no longer use google-apis-storage_v1 and google-cloud-storage to access files in the CS buckets. We now need to access the domain name with a simple http request to get a list of all the files and paths in the file. Then with another http GET request we can download any file of the bucket.
Example
This example contains test data:
curl http://<domain_name_license_bucket>/
response
<ListBucketResult
xmlns="http://doc.s3.amazonaws.com/2006-03-01">
<Name>dev-export-license-bucket-331c9d2e1a6e5f5e</Name>
<Prefix/>
<Marker/>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>cat.txt</Key>
<Generation>1698320010322791</Generation>
<MetaGeneration>1</MetaGeneration>
<LastModified>2023-10-26T11:33:30.372Z</LastModified>
<ETag>"fa831f2879d63a61893905681f5d4c41"</ETag>
<Size>9</Size>
</Contents>
<Contents>
<Key>v2/</Key>
<Generation>1698325712572459</Generation>
<MetaGeneration>1</MetaGeneration>
<LastModified>2023-10-26T13:08:32.610Z</LastModified>
<ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag>
<Size>0</Size>
</Contents>
<Contents>
<Key>v2/something/</Key>
<Generation>1698325725581015</Generation>
<MetaGeneration>1</MetaGeneration>
<LastModified>2023-10-26T13:08:45.620Z</LastModified>
<ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag>
<Size>0</Size>
</Contents>
<Contents>
<Key>v2/something/sheep.txt</Key>
<Generation>1698325762772492</Generation>
<MetaGeneration>1</MetaGeneration>
<LastModified>2023-10-26T13:09:22.811Z</LastModified>
<ETag>"9de4f74f0688948bd58f77e1bbf97c4f"</ETag>
<Size>10</Size>
</Contents>
</ListBucketResult>
Domain names
You can reach the advisories bucket at https://advisories.gitlab-package-metadata.com/
You can reach the licenses bucket at https://licenses.gitlab-package-metadata.com
You can reach a file by specifying a full path: https://advisories.gitlab-package-metadata.com/v2/apk/1700646799/000000000.ndjson
Related links
- Add a load balancer in front of License-DB stor... (#429483 - closed) • Nick Ilieskou • 16.7
- #429483 (comment 1622189423)
Implementation Plan
Update connector gcp.rb
- Update PackageMetadata::SyncConfiguration to set a
base_uri
forgcp
configs ofhttps://advisories.gitlab-package-metadata.com/
orhttps://licenses.gitlab-package-metadata.com
depending on thedata_type
. - Update Gitlab::PackageMetadata::Connector::Gcp#all_files call to issue a plain
GET
request tosync_config.base_uri
instead ofbucket.files
- Issued with argument
withPrefix
set to#file_prefix
. - Parse file list response as XML and yield each entry's
mediaLink
uri to the existingGcpFileWrapper
creation logic.
- Issued with argument
- Remove google cloud ruby gem
- Remove calls to
Google::Cloud
- Remove requires for google cloud ruby gem
- Check if the gem is still used elsewhere in the codebase, and if not - remove it (potentially a separate MR)
- Remove calls to