Fix ETag caching for job artifact downloads

What does this MR do?

Every API endpoint that delegates a response body to Workhorse sets body '' on the Rails side. Rack::ETag then computes its weak digest from the empty body, emitting a constant W/"12ae32cb1ec02d01eda3581b127c1fee" for every response. Combined with Rack::ConditionalGet, this causes the API to return 304 Not Modified whenever a client sends If-None-Match with that value, even when the underlying content has actually changed. This affects the CI artifact endpoints from the issue, but the same root cause silently broke every other Workhorse-delegated download too (packages, avatars, exports, LFS, vulnerability/dependency exports, virtual registries, …).

This MR addresses it in two layers, gated behind the workhorse_download_etag_caching feature flag (rollout: #601345).

1. Default-off ETag in the Workhorse-delegation helpers. present_carrierwave_file!, send_artifacts_entry, and legacy_send_artifacts_entry now accept an etag: kwarg. When omitted, they set Last-Modified: '0' so Rack::ETag skips its empty-body digest entirely — matching the existing pattern in app/controllers/application_controller.rb#stream_headers for the same Rack::ETag interaction. The result: every caller that doesn't have a content digest stops emitting the bogus constant ETag and emits no ETag at all, which is the correct HTTP semantics when no validator is available.

2. Opt-in strong ETag for the four CI artifact endpoints. A small artifact_etag(build, path:) helper returns a strong, content-derived ETag and is passed through the new kwarg:

  • archive endpoints (/jobs/:job_id/artifacts, /jobs/artifacts/:ref_name/download) → "<archive_sha256>"
  • raw-file endpoints (/jobs/:job_id/artifacts/*artifact_path, /jobs/artifacts/:ref_name/raw/*artifact_path) → "<sha256(archive_sha256:path)>"

Strong is correct here because file_sha256 is a byte digest of the stored archive (cf. app/controllers/projects/repositories_controller.rb#set_cache_headers, which uses fresh_when(strong_etag: …) for repo archive downloads — the closest analog). Rack::ConditionalGet then returns 304 only when the value actually matches.

Feature flag

State Behaviour
workhorse_download_etag_caching disabled (default) apply_etag_or_suppress_rack_etag! is a no-op. Every Workhorse-delegated API response continues to emit W/"12ae32cb1ec02d01eda3581b127c1fee" exactly as before this MR.
workhorse_download_etag_caching enabled Callers that pass etag: emit a content-derived ETag; callers that don't emit Last-Modified: '0' so Rack::ETag skips. The four CI artifact endpoints become correctly cacheable.

Type: gitlab_com_derisk, default false. Rollout plan in #601345.

How to test

  1. Feature.enable(:workhorse_download_etag_caching) in the Rails console.
  2. Trigger a CI pipeline that produces an artifact, wait for it to succeed.
  3. curl -i .../jobs/:job_id/artifacts — note the new content-based ETag (strong, no W/ prefix).
  4. Re-run with If-None-Match: <etag>304 Not Modified.
  5. Push a change that produces a new artifact, request again with the old If-None-Match200 OK with a different ETag.
  6. Hit any other Workhorse-delegated endpoint (e.g. /api/v4/groups/:id/avatar, an LFS object, or a package file). Confirm the bogus constant W/"12ae32cb..." is gone — the response now has Last-Modified: 0 and no ETag.
  7. Feature.disable(:workhorse_download_etag_caching) and repeat step 6 — the bogus constant is back, confirming the flag gates the behaviour.

References

Notes

  • For artifacts where file_sha256 is nil (very old rows that predate the column), artifact_etag returns nil and the helper falls back to suppression — no ETag emitted, no caching, no spurious 304s.
  • Two-thirds of the other present_carrierwave_file! callers already have a usable digest on their model (Packages::PackageFile#file_sha256, LfsObject#oid, Packages::Rpm::RepositoryFile#file_sha256, etc.) and can opt into proper ETag caching in follow-ups by passing etag:. They are no worse off in the meantime than they were before this MR.
Edited by Marc Shaw

Merge request reports

Loading