Skip to content

Update outdated namespace descendant cache records

Adam Hegyi requested to merge 428500-update-denormalized-data into master

What does this MR do and why?

This change implements a scheduled CRON worker to update outdated namespace descendant records. When a group hierarchy changes (subgroup/project added, removed, moved) the associated Namespaces::Descendants record will be outdated by setting the outdated_at column. The association is optional, it will only present for groups which enabled the optimization (see the snippet below for more details). This worker batches over the outdated records and update the hierarchy cache.

Updating the hierarchy cache happens the following way:

  1. Take N outdated Namespaces::Descendants records.
  2. Iterate over the records
  3. Invoke the UpdateDenormalizedDescendantsService which does the following:
  4. Determine if the given record needs to be updated or deleted (in case the Group is already gone/deleted).
  5. Collect all descendant namespace ids using an iterator.
  6. Group the namespace ids by subgroups and projects (ProjectNamespace).
  7. Pluck the record ids (namespace.id or projects.id)
  8. Update the Namespaces::Descendants record with the new data and mark it up to date.

Next steps: optionally enable the optimization on staging and production and monitor the performance changes.

Database

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

This snippet can be run in the Rails console. It does the following:

  1. Collects all groups and collects the descendant ids and project ids
  2. Enables the hierarchy cache and invokes the cache updater worker.
  3. Verifies that the collected data in the cache is the same as the collected data in the first step.
  4. Cleanup.

Snippet:

non_cached_values = {}
Group.all.each do |group|
  non_cached_values[group.id] = {
    self_and_descendant_group_ids: group.self_and_descendant_ids.pluck(:id).sort,
    all_project_ids: group.all_projects.pluck(:id).sort
  }
end

# Enable caching, create outdated records

Namespaces::Descendants.delete_all
Group.all.each do |group|
  Namespaces::Descendants.create!(
    namespace_id: group.id,
    outdated_at: Time.current
  )
end

# Invoke the worker

loop do
  Namespaces::ProcessOutdatedNamespaceDescendantsCronWorker.new.perform
  break if Namespaces::Descendants.where('outdated_at is not null').count == 0
end

# Collect the cached data

cached_values = {}
Group.all.each do |group|
  cache = Namespaces::Descendants.find(group.id)
  cached_values[group.id] = {
    self_and_descendant_group_ids: cache.self_and_descendant_group_ids,
    all_project_ids: cache.all_project_ids
  }
end

# Compare the values
puts cached_values == non_cached_values

# cleanup
Namespaces::Descendants.delete_all

Related to #428500 (closed)

Edited by Adam Hegyi

Merge request reports