Skip to content

WIP: Refactor artifactcache

Update: turns out the latter half of this isn't needed for source cache and so has been postponed for now

Description

Raised in #802 (closed) to help implementation of #440 (closed), this will be a refactor of the artifact cache and CAS related modules which should much more easily allow using a remote CAS to save to a local source cache.

This involves some new API to allow artifact cache to interact with remote and local CAS separately. Previous methods which were local only will remain, or remote only will get moved into CASRemote, for example add_object will remain in CASCache while get_reference is entirely moved to CASRemote. I've only listed API that has changed so less methods are listed for CASCache

Remote API:

  • get_reference: Given an elements reference return the root directory digest.
  • update_reference: Replace a reference with a new digest.
  • get_tree_blob: fetch a Tree message given it's digest.
  • yield_directory_digests: This will provide an iterator over the blob digests, given a root directory digest.
  • yield_tree_digests: Similarly provide an iterator over blob digests, but this time given a tree message.
  • request_blob: Given a blob digest will decide whether to add blob to a batch (and maybe trigger downloading the batch), or use bytestream. Downloads will be to a temporary directory tmp located in the buildstream cache.
  • get_blobs: iterator over any downloaded blobs, returning the temporary file path. The complete_batch argument is used for the final request and will trigger a batch download regardless of current size.
  • upload_blob: Upload a blob, will decide whether to add blob to a batch (and maybe trigger sending the batch) or use the bytestream.
  • send_update_batch: Upload anything left in the batch, must be used after using upload_blob
  • find_missing_blob: Does a FindMissingBlob request, and returns a dictionary of missing blobs.

Local API:

  • yield_directory_digests: Similarly provide an iterator over the blob digests, this time reading fro the local CAS.
  • check_blob: Given a digest check whether a blob is in the local cache.
  • read_blob: Read a blob from a given digest and yield its data.

As an example. in ArtifactCache there is a private method _fetch_directory which given a remote, a root directory digest and excluded subdirs, downloads blobs in that directory and adds them to the local cache.

def _fetch_directory(self, remote, root_digest, excluded_subdirs):
    for blob_digest in remote.yield_directory_digests(
            root_digest, excluded_subdirs=excluded_subdirs):
        if self.cas.check_blob(blob_digest):
            continue
        remote.request_blob(blob_digest)
        for blob_file in remote.get_blobs():
            self.cas.add_object(path=blob_file.name, link_directly=True)

    # Request final CAS batch
    for blob_file in remote.get_blobs(complete_batch=True):
        self.cas.add_object(path=blob_file.name, link_directly=True)

Changes proposed in this merge request:

  • - Rename directory _artifactcache to _cas and move artifactcache.py to a root level module _artifactcache.py.
  • - Move CASRemote into it's own module casremote.py.
  • Move remote logic out of CASCache and into CASRemote
    • To replace pull functionality.
    • - remote yield_directory_digests
    • - remote request_blob
    • - remote get_blobs
    • - local check_blob
    • To replace pull_tree functionality
    • - remote yield_tree_digests
    • To replace push functionality
    • - remote upload_blobs
    • - local yield_directory_digests
  • Update artifact cache to use new API
    • - pull
    • - pull_tree
    • - push
    • - push_directory
  • - Squash commits
  • - Add and/or modify tests to ensure the tmp folder is cleared out

This merge request, when approved, will close: #802 (closed)


Edited by Raoul Hidalgo Charman

Merge request reports

Loading