Fix ETag caching for job artifact downloads
What does this MR do?
Every API endpoint that delegates a response body to Workhorse sets body '' on the Rails side. Rack::ETag then computes its weak digest from the empty body, emitting a constant W/"12ae32cb1ec02d01eda3581b127c1fee" for every response. Combined with Rack::ConditionalGet, this causes the API to return 304 Not Modified whenever a client sends If-None-Match with that value, even when the underlying content has actually changed. This affects the CI artifact endpoints from the issue, but the same root cause silently broke every other Workhorse-delegated download too (packages, avatars, exports, LFS, vulnerability/dependency exports, virtual registries, …).
This MR addresses it in two layers, gated behind the workhorse_download_etag_caching feature flag (rollout: #601345).
1. Default-off ETag in the Workhorse-delegation helpers.
present_carrierwave_file!, send_artifacts_entry, and legacy_send_artifacts_entry now accept an etag: kwarg. When omitted, they set Last-Modified: '0' so Rack::ETag skips its empty-body digest entirely — matching the existing pattern in app/controllers/application_controller.rb#stream_headers for the same Rack::ETag interaction. The result: every caller that doesn't have a content digest stops emitting the bogus constant ETag and emits no ETag at all, which is the correct HTTP semantics when no validator is available.
2. Opt-in strong ETag for the four CI artifact endpoints.
A small artifact_etag(build, path:) helper returns a strong, content-derived ETag and is passed through the new kwarg:
- archive endpoints (
/jobs/:job_id/artifacts,/jobs/artifacts/:ref_name/download) →"<archive_sha256>" - raw-file endpoints (
/jobs/:job_id/artifacts/*artifact_path,/jobs/artifacts/:ref_name/raw/*artifact_path) →"<sha256(archive_sha256:path)>"
Strong is correct here because file_sha256 is a byte digest of the stored archive (cf. app/controllers/projects/repositories_controller.rb#set_cache_headers, which uses fresh_when(strong_etag: …) for repo archive downloads — the closest analog). Rack::ConditionalGet then returns 304 only when the value actually matches.
Feature flag
| State | Behaviour |
|---|---|
workhorse_download_etag_caching disabled (default) |
apply_etag_or_suppress_rack_etag! is a no-op. Every Workhorse-delegated API response continues to emit W/"12ae32cb1ec02d01eda3581b127c1fee" exactly as before this MR. |
workhorse_download_etag_caching enabled |
Callers that pass etag: emit a content-derived ETag; callers that don't emit Last-Modified: '0' so Rack::ETag skips. The four CI artifact endpoints become correctly cacheable. |
Type: gitlab_com_derisk, default false. Rollout plan in #601345.
How to test
Feature.enable(:workhorse_download_etag_caching)in the Rails console.- Trigger a CI pipeline that produces an artifact, wait for it to succeed.
curl -i .../jobs/:job_id/artifacts— note the new content-basedETag(strong, noW/prefix).- Re-run with
If-None-Match: <etag>—304 Not Modified. - Push a change that produces a new artifact, request again with the old
If-None-Match—200 OKwith a differentETag. - Hit any other Workhorse-delegated endpoint (e.g.
/api/v4/groups/:id/avatar, an LFS object, or a package file). Confirm the bogus constantW/"12ae32cb..."is gone — the response now hasLast-Modified: 0and noETag. Feature.disable(:workhorse_download_etag_caching)and repeat step 6 — the bogus constant is back, confirming the flag gates the behaviour.
References
Notes
- For artifacts where
file_sha256isnil(very old rows that predate the column),artifact_etagreturnsniland the helper falls back to suppression — no ETag emitted, no caching, no spurious 304s. - Two-thirds of the other
present_carrierwave_file!callers already have a usable digest on their model (Packages::PackageFile#file_sha256,LfsObject#oid,Packages::Rpm::RepositoryFile#file_sha256, etc.) and can opt into proper ETag caching in follow-ups by passingetag:. They are no worse off in the meantime than they were before this MR.