Cached response partitioning
The following discussion from !157055 (merged) should be addressed:
-
@DylanGriffithstarted a discussion: (+2 comments)Thought (non-blocking):
I'm wondering if we've considered a partitioning strategy for this table. I assume this table will grow very large over time and it might be nice to have a strategy for dealing with that.
I think the access pattern might lend itself well to either of:
- Hash partitioning => In this case we could be hashing by the keys used to look up a cached response. It would allow us to predefine the number of partitions to ensure that no individual partition ever gets too large
- Date range partitioning => In this case we might want to take advantage of a kind of "cache expiration". If we decided that we always expire cached responses after X months then we could easily delete partitions after X months. Of course this might not be that simple because these records correspond to files stored somewhere that might also need to be cleaned up. But maybe expiry in our buckets could help with that. In this case we might still want a strategy to bump the timestamp of a record when it is read. But this comes with performance tradeoffs of it's own
Since I don't have a strong proposal here for now I just wanted to note this as an idea to consider. The earlier you make a decision about partitioning (especially before rolling this out to customers) the easier it will be for you to implement.
Involve the groupdatabase team for the available options.
Edited by David Fernandez