Use buildbox-casd for CAS access
Background
BuildStream currently has its own CAS implementation with a local cache, client and server support. There are two major issues with this implementation:
- Local cache expiry doesn't scale, cleanup takes far too much time: #734
- Connections to remote CAS servers are not shared among job subprocesses: #810 (closed). This applies to both artifact servers and remote execution.
The BuildBox project is working on a local, caching CAS server called buildbox-casd. It will implement a gRPC protocol called 'LocalCAS' in addition to the standard 'CAS', providing efficient access to the local cache. Local cache expiry will use a simple approach that is expected to be much faster than BuildStream's current local cache expiry.
Task description
Using buildbox-casd in BuildStream for CAS access would solve the two major issues. While job subprocesses would still not be able to share the connection to buildbox-casd, establishing multiple local connection on the same host is much less of an issue. All remote connections would be handled by a single buildbox-casd process.
The move to buildbox-casd should also improve performance of CAS operations as it's written in C++, i.e., compiled instead of interpreted Python.
BuildStream
- casd process/socket management
-
Basic start/stop and channel setup -
Terminate buildbox-casd when bst raises an error (context manager)
-
- Use casd for CAS writes
-
Use casd for add_object()
-
- Use casd to access remote CAS
-
Use GetInstanceNameForRemote to initialize remote -
Use FetchMissingBlobs for individual and batch downloads -
Use UploadMissingBlobs for individual and batch uploads -
Check availability of remote CAS via casd
-
- Avoid CAS writes in main process (gRPC vs. fork())
-
Use scheduler to fetch subprojects !1414 (merged) -
bst source checkout: Do not import sources into CAS in main process !1427 (merged) -
Fix non-frontend tests that write to CAS in main process
-
- CAS expiry
-
Remove old expiry code -
Quota configuration -
Local cache / quota status display
-
- bst-artifact-server
-
Use casd for timestamp updates -
Forward requests to casd instead of reimplementing them -
Verify proper operation with casd
-
- Infrastructure
-
Add buildbox-casd to CI images -
Update installation instructions to include buildbox-casd -
Consider adding static buildbox-casd binary to Python wheel
-
- Optimizations (some may be required for initial MR)
-
Benchmark -
Batch CAS writes (add_object() calls) to reduce round trips #1132 (closed) -
Use FetchTree/UploadTree for directory downloads/uploads
-
- Documentation
-
Architecture docs to be updated (user facing docs do not need documentation)
-
- Future work (not blocking initial MR)
-
Use casd for staging
-
buildbox-casd
- Essential
-
Use 0644 file permissions BuildGrid/buildbox/buildbox-casd!37 (merged) -
Implement GetInstanceForRemote() -
Support dynamic remotes in CaptureFiles() and CaptureTree() -
Implement FetchMissingBlobs() -
Implement UploadMissingBlobs() -
Fix FindMissingBlobs BuildGrid/buildbox/buildbox-casd#21 (closed) -
Support dynamic remotes in FindMissingBlobs() BuildGrid/buildbox/buildbox-casd#31 (closed) -
Write tests for new methods, ensure proper error handling
-
- Optimizations
-
Implement FetchTree() -
Implement UploadTree()
-