WIP: Refactor artifactcache
Update: turns out the latter half of this isn't needed for source cache and so has been postponed for now
Description
Raised in #802 (closed) to help implementation of #440 (closed), this will be a refactor of the artifact cache and CAS related modules which should much more easily allow using a remote CAS to save to a local source cache.
This involves some new API to allow artifact cache to interact with remote and
local CAS separately. Previous methods which were local only will remain, or
remote only will get moved into CASRemote
, for example add_object
will
remain in CASCache
while get_reference
is entirely moved to CASRemote
.
I've only listed API that has changed so less methods are listed for CASCache
Remote API:
-
get_reference
: Given an elements reference return the root directory digest. -
update_reference
: Replace a reference with a new digest. -
get_tree_blob
: fetch aTree
message given it's digest. -
yield_directory_digests
: This will provide an iterator over the blob digests, given a root directory digest. -
yield_tree_digests
: Similarly provide an iterator over blob digests, but this time given a tree message. -
request_blob
: Given a blob digest will decide whether to add blob to a batch (and maybe trigger downloading the batch), or use bytestream. Downloads will be to a temporary directorytmp
located in the buildstream cache. -
get_blobs
: iterator over any downloaded blobs, returning the temporary file path. Thecomplete_batch
argument is used for the final request and will trigger a batch download regardless of current size. -
upload_blob
: Upload a blob, will decide whether to add blob to a batch (and maybe trigger sending the batch) or use the bytestream. -
send_update_batch
: Upload anything left in the batch, must be used after usingupload_blob
-
find_missing_blob
: Does aFindMissingBlob
request, and returns a dictionary of missing blobs.
Local API:
-
yield_directory_digests
: Similarly provide an iterator over the blob digests, this time reading fro the local CAS. -
check_blob
: Given a digest check whether a blob is in the local cache. -
read_blob
: Read a blob from a given digest and yield its data.
As an example. in ArtifactCache
there is a private method _fetch_directory
which given a remote, a root directory digest and excluded subdirs, downloads blobs in that directory and adds them to the local cache.
def _fetch_directory(self, remote, root_digest, excluded_subdirs):
for blob_digest in remote.yield_directory_digests(
root_digest, excluded_subdirs=excluded_subdirs):
if self.cas.check_blob(blob_digest):
continue
remote.request_blob(blob_digest)
for blob_file in remote.get_blobs():
self.cas.add_object(path=blob_file.name, link_directly=True)
# Request final CAS batch
for blob_file in remote.get_blobs(complete_batch=True):
self.cas.add_object(path=blob_file.name, link_directly=True)
Changes proposed in this merge request:
-
- Rename directory _artifactcache
to_cas
and moveartifactcache.py
to a root level module_artifactcache.py
. -
- Move CASRemote
into it's own modulecasremote.py
. - Move remote logic out of
CASCache
and intoCASRemote
- To replace
pull
functionality. -
- remote yield_directory_digests
-
- remote request_blob
-
- remote get_blobs
-
- local check_blob
- To replace
pull_tree
functionality -
- remote yield_tree_digests
- To replace
push
functionality -
- remote upload_blobs
-
- local yield_directory_digests
- To replace
- Update artifact cache to use new API
-
- pull
-
- pull_tree
-
- push
-
- push_directory
-
-
- Squash commits -
- Add and/or modify tests to ensure the tmp folder is cleared out
This merge request, when approved, will close: #802 (closed)