Skip to content

ci: Download assets from generic package

What does this MR do and why?

The goal is to implement something similar to !79766 (merged) but for assets.

Related to #371244 (closed).

Current state

This essentially reproduce the built-in caching mechanism that we already have but with this MR the "cache key" includes the hash sum of all asset files.

Currently, the assets cache has the key assets-debian-${DEBIAN_VERSION}-ruby-${RUBY_VERSION}-node-${NODE_ENV}-v2 and we check the hash sum of all asset files to determine if the cache can be used as-is, or if we need to recompile all assets.

This is pretty inefficient as the asset files change often, so in a lot of cases, an MR cannot use the latest cache from master since it's rebuilt only every 2 hours.

What this MR improves

With this new strategy, we build a "cache" package on each master pipelines as soon as any asset file is touched. Cache building is also available as a manual job on other master commits, and MRs.

That way, the chance to use a fresh cache is more likely, and cache won't be downloaded if the cache package doesn't exist (since we build the hash sum beforehand instead of downloading "the latest cache" and checking it's usable or not).

Notes:

  • The new strategy is gated by the CACHE_ASSETS_AS_PACKAGE being set to true to ensure we can fallback to the legacy strategy in case we have any issue.
  • We continue to download the "legacy" cache until we switch 100% to this new strategy so that if we fallback to the legacy strategy, everything would continue to work as today. This means in some cases we'd download an outdated cache, and then download a fresh cache from the packages registry, but that's temporary.
  • If we were able to include all the asset files as dependencies for the cache key, we could use the native caching feature, but it's currently limited to 2 files.

Performance improvements

The new strategy removes the need to run gettext:po_to_json prior to check if the cache is fresh or not. The reason is that gettext:po_to_json generates files under app/assets/javascripts/locale/**/app.js which are currently part of the hash sum calculation. With the new strategy, we calculate the hash sum prior to downloading the cache, so that the app/assets/javascripts/locale/**/app.js aren't part of the cache key, and thus don't need to be generated prior to calculate it. Note that locale/gitlab.pot was added to the dependency files for the cache key since app/assets/javascripts/locale/**/app.js depends on it.

This should save 34 seconds for test assets compilation, and 48 seconds for production assets compilation.

With a fresh cache from package, the performance should be similar to what's currently happening with a fresh legacy cache, but the point here is that a lot more MRs will be able to use a fresh cache.

Test matrix

{
    "fields" : [
        {"key": "a", "label": "Legacy cache"},
        {"key": "b", "label": "New strategy"},
        {"key": "c", "label": "Assets package"},
        {"key": "d", "label": "`compile-test-assets` job"},
        {"key": "e", "label": "Duration", "sortable": true}
    ],
    "items" : [
      {"a": "Empty", "b": "Disabled", "c": "N/A", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3070853080", "e": "7m 49s"},
      {"a": "Empty", "b": "Enabled", "c": "Absent", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3071663274", "e": "8m 6s"},
      {"a": "Empty", "b": "Enabled", "c": "Present (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3071742224)", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3071776607", "e": "2m 56s"},
      {"a": "Fresh (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3074737586)", "b": "Disabled", "c": "N/A", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3074844815", "e": "1m 59s"},
      {"a": "Fresh", "b": "Enabled", "c": "Absent", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3075027161", "e": "2m 33s"},
      {"a": "Fresh", "b": "Enabled", "c": "Present (caching job: https://gitlab.com/gitlab-org/gitlab/-/jobs/3075027146)", "d": "https://gitlab.com/gitlab-org/gitlab/-/jobs/3075150789", "e": "2m 48s"}
    ]
}

Previous tests

Legacy cache New strategy Assets package compile-test-assets job Duration
Empty Disabled N/A https://gitlab.com/gitlab-org/gitlab/-/jobs/2941006395 7m 34s
Empty Enabled Absent https://gitlab.com/gitlab-org/gitlab/-/jobs/2941033418 8m 26s
Empty Enabled Present https://gitlab.com/gitlab-org/gitlab/-/jobs/2941118730 2m 26s
Fresh Disabled N/A https://gitlab.com/gitlab-org/gitlab/-/jobs/2941371887 3m 33s
Fresh Enabled Absent https://gitlab.com/gitlab-org/gitlab/-/jobs/2941376574 2m 25s
Fresh Enabled Present https://gitlab.com/gitlab-org/gitlab/-/jobs/2941399334 2m 56s
  1. With no legacy cache

Next steps

  • Periodically delete old packages: https://docs.gitlab.com/ee/api/packages.html#delete-a-project-package => #375606 (closed)
  • What about mirrors (including on other instances?) and forks: these should always try to download from the canonical package registry, and never upload.
  • Stop downloading/uploading "legacy" cache (for now we keep downloading/uploading it so that we can fallback to it if needed), and remove the CACHE_ASSETS_AS_PACKAGE == "true" gatekeeper
  • We could generate cache packages for every MR commit, to maximize the chances for MRs to use a fresh cache, but that would increase the number of cache packages a lot (so we should also be aggressive in deleting cache packages if we do that)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rémy Coutable

Merge request reports