Cache Repository Info as one hash

Summary

Redis is used extensively to cache data which otherwise would've required an RPC to Gitaly, as such shielding Gitaly from lots of loads. (Thanks Redis! ❤) Data in gitlab-com/www-gitlab-com#5708 (comment 239116963) suggests there's a couple of closely related keys fetched:

* 136K (16%) had key prefix: "cache:gitlab:exists?:"
* 99K (12%) had key prefix: "cache:gitlab:projects/"
* 59K (7%) had prefix: "cache:gitlab:has_visible_content?:"
* 50K (6%) had prefix: "cache:gitlab:root_ref:"
* 26K (3%) had prefix: "cache:gitlab:avatar:"
* 23K (3%) had prefix: "cache:gitlab:readme_path:"

First thing to notice is that has_visible_content? and exists? depend on each other, as does root_ref (default branch). It will happen that some part of the application will need to know the default branch for a repository, and with a cold cache will first ask if the root_ref is cached, than ask Redis if it has a has_visible_content value cached, if it doesn't we end up checking if the repository exists. That's 3 cache misses, and 3 values to cache mere milliseconds later.

Improvements

If we were to create a repository info cache, that caches these structs as one entity it could potentially remove a lot of lookups from the cache. My proposal would be to have a hash like:

{
  exists: true,
  empty: false,
  default_branch: `refs/heads/master`
}

Ideally we'd create a Gitaly RPC GetRepository that would send this data in on RPC response.

Risks

When only data is required for the default branch, and this data is cached, this would now load additional data we might or might not need later. So transferring more data in one lookup.

Additionally, this structure is invalidated as one. So updating the default branch would lead to invalidate the whole struct. In the proposal I've picked relatively stale data.

Third and last potential risk I see, is that this structure might need additional keys later. Which leads to cached values which do not have the key yet just after a deploy. The quick solution is to invalidate the whole structure and repopulate the cache. However, this leads to conditional code execution paths, which might not be ideal. Example for new_key to be added could look like:

def new_key
  return repo_info[:new_key] if repo_info.key?(:new_key)

  invalidate_repo_info
  new_key #recursive call as the cache needs repopulation
end

Involved components

  1. Redis
  2. GitLab-Rails
  3. Gitaly, although could be a second iteration

/cc @reprazent @andrewn

Edited Nov 29, 2019 by Zeger-Jan van de Weg
Assignee Loading
Time tracking Loading