Improve documentation for caching/artifacts
Description
People are really confused about caching and artifacts on GitLab CI. Let's improve the documentation to make it clear how to get the most out of the cache and artifacts, and when to use each.
We do support caching of gem's and other file artifacts. There are tricks about file locations, but once you get the hang of it, it works well. See https://gitlab.com/gitlab-org/gitlab-ci-yml/blob/master/Ruby.gitlab-ci.yml for an example for Ruby. It's actually one of the sources for our new CI configuration templates! But sometimes, you don't really want caching, you want artifacts. Caching is an optimization, but isn't guaranteed to always work, so you need to be prepared to regenerate any cached files in each job that needs them.
Artifacts, on the other hand, are guaranteed to be available. It's sometimes confusing because the name
artifact
sounds like something that is only useful outside of the build, like for downloading a final image. But artifacts are also available in between stages within a build. So if you "build" your application by downloading all the required modules, you might want to declare them asartifacts
so that each subsequent stage can depend on them being there. There are some optimizations like declaring an expiry time so you don't keep artifacts around too long, and usingdependencies
to control exactly where artifacts are passed around. Again, complicated subject that is poorly documented. I'd be happy to help you figure it out for your use case (and then publish the learnings).
Proposal
- Document good caching practices, including setting a constant cache key to re-use the cache across branches and jobs.
- Document how to cache gems and npm modules.
- Document what caches are good for (not guaranteed, best-efforts, local-only by default).
- Document how to use s3 to share cache between runners. (Admin)
- Document what artifacts are good for (guaranteed, stored on GitLab server).
- Document using artifacts for passing files between jobs within the same pipeline.
- Encourage artifact expiry for controlling disk usage.
- Document
dependencies
to minimize downloading artifacts.
Links / references
/cc @axil