Improve caching on raw endpoints
As part of https://gitlab.com/gitlab-com/gl-infra/production/-/issues/2797 we noticed, that caching for the raw
endpoint (most-likely also for the archive
) endpoint is implemented inefficiently.
There are a few main points:
- We don't serve strong ETags, making it impossible for CF to cache those resources
- We set inappropriate headers on responses.
- Our Cloudflare rules are not optimized to cache. (https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11554)
I understand we need to set headers to prevent caching, when on a branch, but this is unnecessary when using strong ETags.
Right now we specify
cache-control: public, no-cache, no-store
etag: W/"5b1faf106e7b6186e336e76fb741f969"
expires: Fri, 01 Jan 1990 00:00:00 GMT
pragma: no-cache
vary: Accept-Encoding
We need to consider 2 cases. 1 being regular public files, and the other being private projects.
We can create a strong ETag, by combining the sha of the branch/tag/ref, the filename (or a hash thereof) and the value of the content-encoding
header if presend. For example a Link like this https://gitlab.com/T4cC0re/freebsd-ci/-/raw/master/installer/init.sh
should produce these headers:
etag: "2b02fde12e155eb65409bbf1b4c9a5b305082e85-installer_init.sh"
cache-control: public, must-revalidate, max-age=0
pragma: no-cache
vary: Accept-Encoding
Or in case of a request with accept-encoding: gzip
, that was responded to with content-encoding: gzip
etag: "2b02fde12e155eb65409bbf1b4c9a5b305082e85-installer_init.sh-gzip"
The expires
header is rendered irrelevant by max-age=0
and must-revalidate
(but can be kept, if desired). no-cache
and no-store
prevent caches from caching. But this is not what we want. We want everyone in the cache chain to revalidate using the ETag. Thus must-revalidate
replaces them.
pragma: no-cache
is deprecated, but still used by HTTP/1.0. The value specified here is only used in HTTP/1.1+ when cache-control
is not set.
For a private resource we just need to change the headers a tiny bit:
etag: "2b02fde12e155eb65409bbf1b4c9a5b305082e85-installer_init.sh"
cache-control: private, must-revalidate, max-age=0
pragma: no-cache
vary: Accept-Encoding
Changing the cache-control
to private
will ensure caches along the way do not cache the response, but the browser cache may. It will need to revalidate before using the resource. This means bypassing caches along the way and revalidating directly at the origin.