Estimated registry usage for very large namespaces
## Context In https://gitlab.com/gitlab-org/container-registry/-/issues/779+ we have identified a performance/scaling issue that leads to failures when calculating the deduplicated registry usage with maximum precision for very large namespaces (~1%). Among the several mitigation strategies identified in https://gitlab.com/gitlab-org/container-registry/-/issues/779#note_1179923688, this epic focuses on delivering the first one, `Option A`. `Option A` consists in falling back to a faster usage calculation method (less accurate) when the main one (maximum accuracy) fails due to the size of the target namespace. This alternative calculation method consists in _ignoring_ which images are tagged or not (with the default method we only account for image layers that are tagged at least once). While we benefit from online and continuous garbage collection on the GitLab.com registry, a dangling image is only garbage collected after 24h+ of being left unreferenced (no tags remaining). Therefore, as consequence, the accuracy of usage measured by this alternative method is _inversely_ proportional to the number of tag deletions performed in the last 24h across all repositories in a given namespace. ## Tasks To make this happen we need to: - **Registry (https://gitlab.com/gitlab-org/container-registry/-/issues/853):** - Automatically fallback to the alternative method if the primary one fails (timeout); - If falling back to the alternative method, the API response should indicate that the provided size value is an estimate (using an additional response body attribute). - **Rails:** - Look for the "estimated" flag in the registry API responses; - Flag the measured namespace usage as an estimate on the database; - Whenever an "estimated" usage is received, schedule a usage refresh for 25h ahead. This is to make sure that the namespace usage is refreshed as soon as we expected online GC to have pruned any dangling images (24h delay plus 1h of slack); - If usage has been flagged as an estimate, when showing it on the UI, display a warning/sign so that users know it's an estimate, with the caveat described above. ## Proposal Updated 2022-12-21 per https://gitlab.com/groups/gitlab-org/-/epics/9413#note_1209330288 - [ ] The UI will indicate that the registry usage is estimated and will link to GitLab docs with more information (why estimated, [how to reduce container registry storage](https://docs.gitlab.com/ee/user/packages/container_registry/reduce_container_registry_storage.html), how to purchase storage). https://gitlab.com/gitlab-org/gitlab/-/issues/386468+ - [ ] Ahead of enforcement, we will reach out to these namespaces to work with them to reduce their storage usage. https://gitlab.com/gitlab-org/gitlab/-/issues/386458+ NOTE: Overall, we only expect \~80 namespaces to be impacted by this. 1. These namespaces' registry usage is very, very large. So much so that we struggle to precisely calculate their storage usage. 2. They will be locked if their estimated storage usage is above the namespace limit after the storage enforcement begins. By this time, they would have been notified both in app and via our outreach with enough time to show progress towards reducing storage usage.
epic