Skip to content

Improve documentation for caching/artifacts

Description

People are really confused about caching and artifacts on GitLab CI. Let's improve the documentation to make it clear how to get the most out of the cache and artifacts, and when to use each.

We do support caching of gem's and other file artifacts. There are tricks about file locations, but once you get the hang of it, it works well. See https://gitlab.com/gitlab-org/gitlab-ci-yml/blob/master/Ruby.gitlab-ci.yml for an example for Ruby. It's actually one of the sources for our new CI configuration templates! But sometimes, you don't really want caching, you want artifacts. Caching is an optimization, but isn't guaranteed to always work, so you need to be prepared to regenerate any cached files in each job that needs them.

Artifacts, on the other hand, are guaranteed to be available. It's sometimes confusing because the name artifact sounds like something that is only useful outside of the build, like for downloading a final image. But artifacts are also available in between stages within a build. So if you "build" your application by downloading all the required modules, you might want to declare them as artifacts so that each subsequent stage can depend on them being there. There are some optimizations like declaring an expiry time so you don't keep artifacts around too long, and using dependencies to control exactly where artifacts are passed around. Again, complicated subject that is poorly documented. I'd be happy to help you figure it out for your use case (and then publish the learnings).

Proposal

  • Document good caching practices, including setting a constant cache key to re-use the cache across branches and jobs.
  • Document how to cache gems and npm modules.
  • Document what caches are good for (not guaranteed, best-efforts, local-only by default).
  • Document how to use s3 to share cache between runners. (Admin)
  • Document what artifacts are good for (guaranteed, stored on GitLab server).
  • Document using artifacts for passing files between jobs within the same pipeline.
  • Encourage artifact expiry for controlling disk usage.
  • Document dependenciesto minimize downloading artifacts.

Links / references

/cc @axil