Cache deduplicated "self" size of container repositories in Redis
Context
Related to &9146. Please read through the description to get the required context.
Proposal
Once requested to calculate the deduplicated size of a repository, where the size
query parameter is set to self
, we should look up against Redis to find a cache value:
- If the value is cached in Redis, exit early and use that value for the response;
- If the value is not cached in Redis, proceed with the calculation and cache value in Redis.
Data structure
We should use Redis strings for this purpose. The key name must be set to the full path of the target repository, and the value should be set to its deduplicated size in bytes.
Additionally, following the documented best practices, we should prefix key names for CROSSSLOT compatibility, name clash avoidance (in case of Redis instances shared by multiple applications) and discoverability. These keys should therefore follow the registry:api:{repository:<namespace path>:<path hash>}
naming convention. See the linked doc for more details.
Invalidation
After successfully processing a write request against a repository with path a/b/c
, we should invalidate any cached size data in Redis using the corresponding key (registry:api:{repository:<namespace path>:<path hash>}
).
We only need to do this for successful "tag create", "tag delete" and "manifest delete" operations. These are the only ones that will impact storage usage, as we only account for tagged artifacts.
Any error when trying to invalidate cached values should be reported to Sentry so that we get a notification about them.