Indexed CAS
Context
As part of the CAS cleanup effort, and to speed up FindMissingBlobs(), we should add an index to Buildgrid CAS. See the linked mailing list post for details on how the index would be useful for cleanup.
One small note--in the ML thread, I mentioned this:
One question that I haven’t figured out the answer to is whether the index should sit in between the CAS server and the CAS backend or whether it should sit off to the side. In other words, should the CAS server "consult" the index for locations of blobs, whether blobs exist, etc. and then talk to the CAS backend; or should it treat the index as a CAS implementation and just forward all of the requests to the index, and let the index do the communication with the CAS backend? I think both are valid approaches.
I think it would be best to have the index live as a member of the CAS server, and we can then pass it around to various storage methods when they need to cleanup.
Task Description
-
Define the interface for the index - This could be very similar to the current interface between the CAS server and the storage backend. It will need a couple of additions, such as a function to delete an entry from the index and a function to list the blobs in timestamp order.
-
Provide a preliminary implementation for an index in SQL - Keep cleanup in mind when designing the schema (see the "Functions" section of the ML post)
- We can use SQLAlchemy for the database interface since the plan is to use that for bounceability as well
-
Add yaml configuration support for selection of the index layer -
Make FindMissingBlobs only reach out to the index and not the backend (this will probably be pretty easy)
Maybe:
-
Provide a SQLite implementation that works "out of the box" -- in other words, a user should be able to just specify a path to a database file on disk, and the index should be able to create the database file and tables if necessary and work with that -
Provide a way to "sync" the index with the storage backend on server start.
Acceptance Criteria
All of the above items are complete with tests where appropriate. The index (and in particular, the SQL schema) has been documented.