Geo blob downloads blocked by network filtering for tenants using S3 and AWS DNS

Summary

Since 18.10.4, Geo blob downloads on the Gitlab::HTTP code path (gated by the :geo_blob_download_with_gitlab_http ops feature flag from !230361 (merged)) fail with Gitlab::HTTP_V2::BlockedUrlError "URL is blocked: Requests to hosts and IP addresses not on the Allow List are denied" on instances where:

  1. ApplicationSetting#deny_all_requests_except_allowed is true, AND
  2. Object stores are configured using default AWS S3 region-based DNS (no explicit connection.endpoint).

ee/lib/gitlab/geo/replication/blob_downloader.rb#stream_from_url passes allow_object_storage: true, which Gitlab::HTTP (lib/gitlab/http.rb) translates to extra_allowed_uris = ObjectStoreSettings.enabled_endpoint_uris. But enabled_endpoint_uris only returns object stores with an explicit endpoint:

endpoint = object_store_setting.dig('connection', 'endpoint')
next unless endpoint     # <-- silently drops default-DNS AWS S3
URI(endpoint)

Default AWS S3 configs derive the host from region + remote_directory, so this returns [], the allow_object_storage bypass is a no-op, and validate_resolved_uri falls through to validate_deny_all_requests_except_allowed!.

The old http-gem download path didn't use Gitlab::HTTP_V2::UrlBlocker, which masked this gap. The 18.10.4 backport is correct in routing through Gitlab::HTTP; it just exposes a pre-existing bug.

Distinct from but related to #544821 (closed).

Impact

  • Contributed to a customer-facing S1 affecting multiple GitLab Dedicated tenants. All blob types (job artifacts, uploads, LFS, packages, MR diffs, terraform state, pipeline artifacts, CI secure files) failed to replicate.
  • Affects any instance running 18.10.4+ with the FF enabled, deny_all_requests_except_allowed = true, and default AWS S3 DNS for one or more object stores. This is the standard GitLab Dedicated configuration.
  • Workaround (per-instance): add every object-store bucket hostname (<bucket>.s3.<region>.amazonaws.com) to outbound_local_requests_whitelist.

Recommendation

Fix ObjectStoreSettings.enabled_endpoint_uris to derive the hostname when connection.endpoint is absent — for AWS, https://<remote_directory>.s3.<region>.amazonaws.com. Equivalent treatment likely needed for Google Cloud Storage and Azure Blob default-DNS configurations.

Add a regression spec asserting that an enabled AWS S3 object store with region set and no endpoint produces a non-empty enabled_endpoint_uris, and that BlobDownloader#execute succeeds against it under deny_all_requests_except_allowed: true.

Backport target: 18.10 (alongside the existing geo_blob_download_with_gitlab_http fix).

Verification

Reproduced and fix verified on a GitLab Dedicated tenant running 18.10.4-ee:

  1. Diagnostic state: FF enabled, deny_all_requests_except_allowed? == true, ObjectStoreSettings.enabled_endpoint_uris == [], affected bucket hostname missing from outbound_local_requests_whitelist.
  2. Synchronous BlobDownloader#execute raised Gitlab::HTTP_V2::BlockedUrlError. Backtrace originates in Gitlab::HTTP_V2::NewConnectionAdapter#validate_url_with_proxy!, called from blob_downloader.rb#stream_from_url via download_file_with_gitlab_http.
  3. Adding the bucket hostname to outbound_local_requests_whitelist and re-running BlobDownloader#execute succeeded.
Edited by Brendan McKitrick