Skip to content

Use buildbox-casd for CAS access

Background

BuildStream currently has its own CAS implementation with a local cache, client and server support. There are two major issues with this implementation:

  • Local cache expiry doesn't scale, cleanup takes far too much time: #734
  • Connections to remote CAS servers are not shared among job subprocesses: #810 (closed). This applies to both artifact servers and remote execution.

The BuildBox project is working on a local, caching CAS server called buildbox-casd. It will implement a gRPC protocol called 'LocalCAS' in addition to the standard 'CAS', providing efficient access to the local cache. Local cache expiry will use a simple approach that is expected to be much faster than BuildStream's current local cache expiry.

Task description

Using buildbox-casd in BuildStream for CAS access would solve the two major issues. While job subprocesses would still not be able to share the connection to buildbox-casd, establishing multiple local connection on the same host is much less of an issue. All remote connections would be handled by a single buildbox-casd process.

The move to buildbox-casd should also improve performance of CAS operations as it's written in C++, i.e., compiled instead of interpreted Python.

BuildStream

  • casd process/socket management
    • Basic start/stop and channel setup
    • Terminate buildbox-casd when bst raises an error (context manager)
  • Use casd for CAS writes
    • Use casd for add_object()
  • Use casd to access remote CAS
    • Use GetInstanceNameForRemote to initialize remote
    • Use FetchMissingBlobs for individual and batch downloads
    • Use UploadMissingBlobs for individual and batch uploads
    • Check availability of remote CAS via casd
  • Avoid CAS writes in main process (gRPC vs. fork())
    • Use scheduler to fetch subprojects !1414 (merged)
    • bst source checkout: Do not import sources into CAS in main process !1427 (merged)
    • Fix non-frontend tests that write to CAS in main process
  • CAS expiry
    • Remove old expiry code
    • Quota configuration
    • Local cache / quota status display
  • bst-artifact-server
    • Use casd for timestamp updates
    • Forward requests to casd instead of reimplementing them
    • Verify proper operation with casd
  • Infrastructure
    • Add buildbox-casd to CI images
    • Update installation instructions to include buildbox-casd
    • Consider adding static buildbox-casd binary to Python wheel
  • Optimizations (some may be required for initial MR)
    • Benchmark
    • Batch CAS writes (add_object() calls) to reduce round trips #1132 (closed)
    • Use FetchTree/UploadTree for directory downloads/uploads
  • Documentation
    • Architecture docs to be updated (user facing docs do not need documentation)
  • Future work (not blocking initial MR)
    • Use casd for staging

buildbox-casd

Edited by Jürg Billeter
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information