Indexed CAS
Description
This MR adds support for an index to CAS. Currently only a SQL-based implementation is provided.
Changes proposed in this merge request:
- Add a new interface,
index_abc.py
, which is an extension ofstorage_abc.py
. - Add a SQL implementation for the index, as well as models for the database.
- Tests
- Configuration support is included in !236 (merged)
Current limitations
- The SQL index currently only supports a single storage backend. Multilevel storage can still be accomplished with the
with_cache
storage implementation. For increased flexibility in the future, however, we may want to allow it to support more storages. - As a result of the above, the table schema does not currently have a "location" field, and it only keeps track of the digest hash and size as well as the last updated time.
-
The missing blobs query currently fails if too many blobs are given to it. SQL implementations limit the number of bound parameters, with SQLite defaulting to 999 and PostgreSQL defaulting to (I think) 32767. There are also limits on query length. As a result I suspect the SQL index needs to be made aware of the SQL implementation, and it needs to split up largeThis has been handled in this MR.FindMissingBlobs
requests into multiple SQL queries based on the implementation limit. I will defer this until implementation of the YAML parsing, where I can add a field to specify the implementation and break largeFindMissingBlobs
requests based on that.
TODO
-
More tests -
SQL implementation of BatchUpdateBlobs
andBatchReadBlobs
. -
Allow this to handle large number of blobs. -
Performance testing
This merge request, when merged, will address issue/bug:
Edited by Rohit Kothur