Skip to content

WIP Feature: Support storing blobs directly in the index

Ed Baunton requested to merge edbaunton/features/inline-index-blobs into master

Previously all blobs managed by the index would be stored in a separate backing store. For example, one might setup a Postgres index on top of S3. The index tracks the access and size of the backing store.

Therefore, in order to retrieve a blob from the backing store, the index is first checked and updated as necessary. In particular, for small blobs, the additional overhead of reaching out to S3 to retrieve the blob can be reduced.

This commit adds support in BuildGrid for allowing users to specify that blobs below a certain size should be stored within the index itself. This allows flexibility around how much space is occupied by these 'inlined-blobs' and allows the user to speed up blob access by leveraging the database.

The implementation of this feature is deliberately relatively forgiving in the sense that if a blob that should be inlined in the index is not found there, it continues to fallback to the underlying storage. This is to faciliate the enablement of this feature without an expensive "backfill" of all blobs meeting the size requirements. Additionally, it facilitates the ability to, say, reduce the size threshold to lower and higher values after having populated the index without ill effects. It is worth noting however, that in the case that a blob is missing from the inline storage, BuildGrid does not go back and add it. Only new blobs are added to the inlining.

Edited by Ed Baunton

Merge request reports