WIP: Refactor artifactcache
Update: turns out the latter half of this isn't needed for source cache and so has been postponed for now
Description
Raised in #802 (closed) to help implementation of #440 (closed), this will be a refactor of the artifact cache and CAS related modules which should much more easily allow using a remote CAS to save to a local source cache.
This involves some new API to allow artifact cache to interact with remote and
local CAS separately. Previous methods which were local only will remain, or
remote only will get moved into CASRemote, for example add_object will
remain in CASCache while get_reference is entirely moved to CASRemote.
I've only listed API that has changed so less methods are listed for CASCache
Remote API:
-
get_reference: Given an elements reference return the root directory digest. -
update_reference: Replace a reference with a new digest. -
get_tree_blob: fetch aTreemessage given it's digest. -
yield_directory_digests: This will provide an iterator over the blob digests, given a root directory digest. -
yield_tree_digests: Similarly provide an iterator over blob digests, but this time given a tree message. -
request_blob: Given a blob digest will decide whether to add blob to a batch (and maybe trigger downloading the batch), or use bytestream. Downloads will be to a temporary directorytmplocated in the buildstream cache. -
get_blobs: iterator over any downloaded blobs, returning the temporary file path. Thecomplete_batchargument is used for the final request and will trigger a batch download regardless of current size. -
upload_blob: Upload a blob, will decide whether to add blob to a batch (and maybe trigger sending the batch) or use the bytestream. -
send_update_batch: Upload anything left in the batch, must be used after usingupload_blob -
find_missing_blob: Does aFindMissingBlobrequest, and returns a dictionary of missing blobs.
Local API:
-
yield_directory_digests: Similarly provide an iterator over the blob digests, this time reading fro the local CAS. -
check_blob: Given a digest check whether a blob is in the local cache. -
read_blob: Read a blob from a given digest and yield its data.
As an example. in ArtifactCache there is a private method _fetch_directory which given a remote, a root directory digest and excluded subdirs, downloads blobs in that directory and adds them to the local cache.
def _fetch_directory(self, remote, root_digest, excluded_subdirs):
for blob_digest in remote.yield_directory_digests(
root_digest, excluded_subdirs=excluded_subdirs):
if self.cas.check_blob(blob_digest):
continue
remote.request_blob(blob_digest)
for blob_file in remote.get_blobs():
self.cas.add_object(path=blob_file.name, link_directly=True)
# Request final CAS batch
for blob_file in remote.get_blobs(complete_batch=True):
self.cas.add_object(path=blob_file.name, link_directly=True)
Changes proposed in this merge request:
-
- Rename directory _artifactcacheto_casand moveartifactcache.pyto a root level module_artifactcache.py. -
- Move CASRemoteinto it's own modulecasremote.py. - Move remote logic out of
CASCacheand intoCASRemote- To replace
pullfunctionality. -
- remote yield_directory_digests -
- remote request_blob -
- remote get_blobs -
- local check_blob - To replace
pull_treefunctionality -
- remote yield_tree_digests - To replace
pushfunctionality -
- remote upload_blobs -
- local yield_directory_digests
- To replace
- Update artifact cache to use new API
-
- pull -
- pull_tree -
- push -
- push_directory
-
-
- Squash commits -
- Add and/or modify tests to ensure the tmp folder is cleared out
This merge request, when approved, will close: #802 (closed)