Cache Repository Info as one hash
Summary
Redis is used extensively to cache data which otherwise would've required an RPC to Gitaly, as such shielding
Gitaly from lots of loads. (Thanks Redis!
* 136K (16%) had key prefix: "cache:gitlab:exists?:"
* 99K (12%) had key prefix: "cache:gitlab:projects/"
* 59K (7%) had prefix: "cache:gitlab:has_visible_content?:"
* 50K (6%) had prefix: "cache:gitlab:root_ref:"
* 26K (3%) had prefix: "cache:gitlab:avatar:"
* 23K (3%) had prefix: "cache:gitlab:readme_path:"
First thing to notice is that has_visible_content? and exists? depend on each other, as does root_ref (default branch).
It will happen that some part of the application will need to know the default branch for a repository, and with a cold cache
will first ask if the root_ref is cached, than ask Redis if it has a has_visible_content value cached, if it doesn't we end
up checking if the repository exists. That's 3 cache misses, and 3 values to cache mere milliseconds later.
Improvements
If we were to create a repository info cache, that caches these structs as one entity it could potentially remove a lot of lookups from the cache. My proposal would be to have a hash like:
{
exists: true,
empty: false,
default_branch: `refs/heads/master`
}
Ideally we'd create a Gitaly RPC GetRepository that would send this data in on RPC response.
Risks
When only data is required for the default branch, and this data is cached, this would now load additional data we might or might not need later. So transferring more data in one lookup.
Additionally, this structure is invalidated as one. So updating the default branch would lead to invalidate the whole struct. In the proposal I've picked relatively stale data.
Third and last potential risk I see, is that this structure might need additional keys later. Which leads to cached values which
do not have the key yet just after a deploy. The quick solution is to invalidate the whole structure and repopulate the cache.
However, this leads to conditional code execution paths, which might not be ideal. Example for new_key to be added could look
like:
def new_key
return repo_info[:new_key] if repo_info.key?(:new_key)
invalidate_repo_info
new_key #recursive call as the cache needs repopulation
end
Involved components
- Redis
- GitLab-Rails
- Gitaly, although could be a second iteration
/cc @reprazent @andrewn