Skip to content
Snippets Groups Projects
Select Git revision
  • renovate/byebug-12.x-lockfile
  • renovate/rubocop-1.x-lockfile
  • renovate/terraform-monorepo
  • renovate/docker-28.x
  • mattkasa-postgres-runbook-replication-lag
  • amh-incidentio-gstg-alerts
  • bvl/rails-queueing-sli-gitlab-com
  • sabrams-master-patch-7379
  • renovate/lock-file-maintenance
  • master default protected
  • pgbouncer-sec-dashboard
  • sidekiq-re-queueing
  • add-desc-to-rails-queue-sli
  • ac/tissue-in-rollout-and-failed
  • add-pgbouncer-sec-to-service-catalog
  • add-runway-jobs-recording-rule
  • decomission-patroni-embedding-pt1
  • silvester/update-deployment-docs
  • tmillhouse/add-times-series-plugin-support
  • feat/sidekiq-alerts-configurable-thresholds
  • v3.329.1 protected
  • v3.329.0 protected
  • v3.328.2 protected
  • v3.328.1 protected
  • v3.328.0 protected
  • v3.327.0 protected
  • v3.326.0 protected
  • v3.325.0 protected
  • v3.324.0 protected
  • v3.323.0 protected
  • v3.322.0 protected
  • v3.321.0 protected
  • v3.320.3 protected
  • v3.320.2 protected
  • v3.320.1 protected
  • v3.320.0 protected
  • v3.319.0 protected
  • v3.318.1 protected
  • v3.318.0 protected
  • v3.317.0 protected
40 results

static-repository-objects-caching.md

Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Static repository objects caching

Table of Contents

Static repository objects such as repository archives and raw blobs can be served from an external storage such as a CDN, to relieve the application from serving a fresh version of the object for every request.

This is achieved by redirecting requests to an object endpoint (e.g. /archive/ or /raw/) to the external storage, and in turn the external storage makes a request to the origin, caches the response then serve it to subsequent requests if the object hasn't expired yet. An example of the requests flow can be found in the docs.

Enabling/Disabling external caching

Follow the documented steps to enable external caching. The arbitrary token can be found in 1Password (or should be stored if this is an initial setup) under "[Environment] static objects external storage token" item (replace [Environment] with "Staging" or "Production"). This token is also used by Terraform, more on that below.

The base URL is endpoint that will serve the cached objects, it depends on the CDN used. Currently, we use an entry point URL to a Cloudflare worker, which is provisioned by Terraform, more on that below.

To disable external caching, in the admin panel, simply set the External storage URL field to an empty value, this will cause the application to stop redirecting requests to the external storage and revert the static object paths to their original form. In Terraform module configuration, set enabled to false to stop requests from reaching the worker.

Provisioning the external storage

The application makes no assumptions about the external storage, it only expects a certain header to be set correctly in order to identify requests originating from the external storage. As such, an external storage can be a Fastly service, a FaaS, or a Cloudflare Worker. We use the latter for GitLab.com.

Using Terraform, we provision a worker; a worker route; and a proxied DNS A record, all in Cloudflare.

The DNS record and the worker route are used primarily for cosmetic purposes, as a worker domain may not be aesthetically pleasing to users. This DNS record is provided to the application as an entry point URL (see above).

We can't use worker routes directly to handle caching as a route pattern doesn't allow multiple wildcards in the path segment (i.e. we can't have such patterns */-/archive/*or */raw/*). If the zone of the entry point domain is not hosted by Cloudflare then we can't use worker routes and the raw worker domain has to be used. If the worker domain is to be used, due to limitations in Terraform's Cloudflare provider, the worker provisioned is not deployed automatically, it has be to deployed manually through Cloudflare's dashboard.

Deploying a Cloudflare worker

Operation modes

The worker can be configured to work in one of two modes: conservative and aggressive. These modes are in terms of cache invalidating, not in terms of caching itself. Also, it can be configured either cache private repository objects or not.

These are configured through Terraform, through the cache_private_objects and mode variables.

Public repository objects

In conservative mode, the worker will immediately serve public objects if they haven't expired yet. Expiry time is influenced by the Cache-Control header returned by origin, specifically the max-age directive. Once an object is expired, it will be evicted from cache and the worker will request it from origin in full. This may be fine for small objects but may cause stress on the origin for larger ones.

In aggressive mode, the worker invalidates the object every time it's request, using the ETag value present in the cached response. The Cache-Control header and its directives are ignored in this mode, which means the objects live for longer period at the expense of frequent invalidation from the origin.

Private repository objects

The worker can configured to either cache private repository objects or not. If the latter, the worker acts as a proxy, without touching or caching the response. The worker identifies private objects by looking for the private directive in the Cache-Control header.

If enabled, any private object requested is invalidated regardless of the current mode, to enforce authentication and authorization.

Cloudflare caching behavior

We utilize Cache API in the worker script, this means cached objects are not replicated across Cloudflare data centers. This is important to know because, in aggressive mode, if a repository object is suddenly in a high demand across the globe, we may observe a small surge of 200 responses as opposed to the expected 304 ones. The 200s would be individual Cloudflare data centers warming their caches, afterwards it should be a steady flow of 304s.

Protection against cache bypassing

The worker script checks the query segment of each request, and only allows query parameters expected by the application to go through. This is to prevent malicious users from bypassing the cache by adding arbitrary query parameters.

The following rules are applied:

  • For /raw/ requests
    • inline query parameter is only allowed if its value is either true or false
  • For /archive/ requests
    • append_sha query parameter is only allowed if its value is either true or false
    • path query parameter is allowed regardless of its value

Logging

Every request to the worker is logged in Elasticsearch, in an index with this name format: <environment>-static-objects-cache-<date>. A scheduled CI pipeline archives old indexes to the logs archive bucket in GCS.

Elasticsearch endpoint and credentials are provided through Terraform.

Cloudflare Logs wasn't used as it doesn't provide a way to filter logs for certain routes or workers. Using it would cause logging redundancy if the site is completely behind Cloudflare (as is the case with staging), and would prove difficult to have immediate visibility into the worker as logs would need to be imported from GCS (after they're exported from Cloudflare) to BigQuery for analysis.