Only cache Gitaly pack-objects cache entries that occur more than N times
In #2208 we are discussing the write IO workload generated by the Gitaly pack-objects cache. There is an improvement we can make to the implementation where it would write fewer cache entries.
The current implementation optimistically caches every response. About 75% of responses are never read back a second time so it would be better to not store them in the cache in the first place.
To address this, we could add a configurable minimum number of occurrences for cache keys. The proposed minimum would be 1. This means that only when a key occurs for the second time we write it into the cache.
Plan
-
Add "MinOccurrences" middleware to Gitaly, configurable via env var gitlab-org/gitaly!5501 (merged) -
Set min occurrences to 1 on file-cny-01, observe impact production#8580 (closed) -
Set min occurrences to 1 across all gprd Gitaly servers production#8580 (closed) -
Set default minimum to 1 in Gitaly application code gitlab-org/gitaly!5540 (merged) -
Remove environment variable used for testing https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/3128 -
Update pack-objects cache documentation gitlab-org/gitlab!116303 (merged) -
Optional: make minimum configurable in Omnibus, HelmAutomatically supported via gitlab-org/omnibus-gitlab!6621 (merged)
The expected outcome is that skipping the first occurences of each key is better for everyone, and that it should become the default behavior for Gitaly. Depending on our confidence in this default, we can elect to not do the Omnibus/Helm work to make it configurable.
Status 2023-03-29
The default behavior of Gitaly is now to not cache fetches until they have occurred more than. What is left is to update documentation and to remove the now ignored environment variable that acted as a feature flag from chef-repo.