Should we implement a (remote) CAS-based SourceCache?
Preliminary
This issue is an alternative response to #418 (closed), namely, @sstriker's closing remark:
Now, closing remark, as I went off topic for this issue. "Should we consider using a CAS-based source cache as a mirror?" I think the answer to that is No, in the current form of the proposal, and with the notion of mirrors having full and perpetual history.
I think we can close this issue, and potentially file a new one that is more explicitly addressing Source Caching as per @tristanvb's response.
Overview
The question, "Should we use a CAS-based SourceCache as a mirror?" has been asked, and after a lengthy discussion (#418 (closed)), the answer has ultimately boiled down to: No.
However, another question could be: "Should we use a (remote) CAS-based SourceCache?". One that developers across a BuildStream project could share, and is explicitly not meant for preservation of source objects.
This issue aims to provide an overview of the various use-cases which could arise upon the implementation of a remote SourceCache.
A similar overview has been mentioned by @sstriker here
Technical details
Ultimately, we would want to be able to stage sources in the SourceCache in the same way that they would be used by their elements. For example, if we have a tar source, we would like to stage the unpacked tarball into the SourceCache.
Sticking to the tar source example, say element foo, which has a tar source, depends on bar. My understanding is that, to build foo, we first stage the artifact of bar, and then fetch foo's sources from the SourceCache and stage these into foo's buildroot. Hence, we'll need foo's source in its unpacked form if we simply want to just 'fetch' it.
So with an empty SourceCache, bst fetch
should fetch the upstream sources and stage them into the SourceCache in a ready-to-be-used (by an element) state.
If upstream changes, bst track
will take care of this.
Default use-case
We use the the local source cache (typically stored at ~/.cache/buildsream/sources
), the remote SourceCache and the upstream source repos together. With the default behaviour being:
1. Pull source from the local cache
if the source is not present in the local cache
i) Fetch from remote SourceCache
ii) Store into the local cache
if the source is not present in the remote SourceCache
i) Fetch from upstream
ii) Store into remote SourceCache AND local cache
Note: As mentioned in Technical details, the source should be stored into the SourceCache in a ready-to-be-used state by the element.
Further caching use-cases
- Giving preference to the remote SourceCache but allowing access to upstream, no local cache
- Only upstream
- Only local
- Only ever use remote SourceCache, no ability to use upstream
1. Preference to remote SourceCache (but allowing access to upstream)
As a user with little to no local storage space, I want to be able to pull all sources from the SourceCache without having them stored on my local machine But to also have the ability to fetch/track missing sources from upstream and place them into the SourceCache.
2. Only use upstream sources
Probably an unlikely use-case, but... As a user that only wants to run CI jobs cheaply, and hasn't got access to a remote SourceCache, I want to be able to only use the upstream repo
3. Only local
Another unlikely use-case, but... As a user that will be completely offline and out of office, I want to be able to run everything off of a local cache
4. No accessing upstream sources
As a user with access to a remote SourceCache with a fast network, I want to fetch only from the remote SourceCache, and not fetch upstream. This should raise a warning if the source is missing.
5. Remote Execution
As a user with access to a remote SourceCache, I want to fetch only what is strictly necessary to perform remote builds.
Background: for remote execution, as we only require Directory to perform virtual staging. We can avoid the transfer of actual source files from the remote SourceCache.