Skip to content

Fix assets caching in scheduled cache-assets:production job

What does this MR do and why?

Background

One of the optimizations for the build and deploy process is to cache assets as a generic package that can then be consumed by the build process.

Assets in this context refers to frontend assets built by the gitlab:assets:compile rake task, which calls out to yarn. We compute a cached-assets-hash over all frontend files. If none of these source files changed, the build can reuse the previously compiled assets and save approximately 40 minutes of build time.

The way this process is intended to work is via a scheduled pipeline on gitlab-org/gitlab that runs every 2 hours. It checks the cached-assets-hash, if no package exists, it builds an assets package and publishes it to the package registry on gitlab-org/gitlab.

The bug

This logic was introduced by !96297 (merged). It was most recently updated by !179950 (merged).

That MR introduced a subtle bug: By changing the order of setting $GITLAB_ASSETS_HASH and including scripts/gitlab_component_helpers.sh, that helper library no longer is able to consume the $GITLAB_ASSETS_HASH and instead defaults to the string "NO_HASH".

There is no logic to fail, when no hash is supplied. And so we compute a package URL containing the string NO_HASH. The job then publishes a package to that URL, and on the next run it will skip re-compiling assets, because there already is a package present under NO_HASH.

We can see that behaviour here.

The current cached assets package is 9 days old:

➜  ~ curl -I https://gitlab.com/api/v4/projects/278964/packages/generic/assets/production-ee-NO_HASH/assets-production-ee-NO_HASH-v2.tar.gz

last-modified: Wed, 05 Feb 2025 22:06:11 GMT

Bug impact

The saving grace is that this bug was only introduced for the scheduled job, and not for the jobs consuming that cache. Thus we avoid building and deploying omnibus packages or CNG images which contain a stale cache. We got lucky here.

The only real consequence is that we no longer get any cache hits, so the build process will always need to rebuild assets, even if none changed. This was surfaced as part of gitlab-com/gl-infra/production#19280 (closed).

The fix

This patch fixes the bug by re-introducing the original order. This allows the cache-assets:production job to produce valid assets cache packages again, which will speed up builds and deploys in cases where no assets were changed, which is crucial for rolling forward urgent fixes, as it cuts 40m from time-to-production.

Further considerations

Additional measures we should consider for more safety:

  • Check for NO_HASH and bail out.
  • After downloading an assets archive, validate the contained cached-assets-hash against the one from the filesystem.

References

Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

Kudos to @skarbek for highlighting this!

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

n/a

How to set up and validate locally

n/a, we will need to test it in the context of all of the pipelines.

We can look at the pipeline schedules for the next run of [2-hourly] [maintenance] Full test run, Repo caching, Review Apps cleanup, Caches update. It should contain a cache-assets:production job, and that job should not be trying to download from NO_CACHE. See broken example.

Merge request reports

Loading