Update outdated namespace descendant cache records
What does this MR do and why?
This change implements a scheduled CRON worker to update outdated namespace descendant records. When a group hierarchy changes (subgroup/project added, removed, moved) the associated Namespaces::Descendants
record will be outdated by setting the outdated_at
column. The association is optional, it will only present for groups which enabled the optimization (see the snippet below for more details). This worker batches over the outdated records and update the hierarchy cache.
Updating the hierarchy cache happens the following way:
- Take N outdated
Namespaces::Descendants
records. - Iterate over the records
- Invoke the
UpdateDenormalizedDescendantsService
which does the following: - Determine if the given record needs to be updated or deleted (in case the Group is already gone/deleted).
- Collect all descendant namespace ids using an iterator.
- Group the namespace ids by
subgroups
andprojects
(ProjectNamespace). - Pluck the record ids (namespace.id or projects.id)
- Update the
Namespaces::Descendants
record with the new data and mark it up to date.
Next steps: optionally enable the optimization on staging and production and monitor the performance changes.
Database
- Loading a batch: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25691/commands/81104
- Updating/upserting data: https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/25691/commands/81105
- Apart from these queries we have a few primary key lookups, I don't think having a query plan is necessary.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
How to set up and validate locally
This snippet can be run in the Rails console. It does the following:
- Collects all groups and collects the descendant ids and project ids
- Enables the hierarchy cache and invokes the cache updater worker.
- Verifies that the collected data in the cache is the same as the collected data in the first step.
- Cleanup.
Snippet:
non_cached_values = {}
Group.all.each do |group|
non_cached_values[group.id] = {
self_and_descendant_group_ids: group.self_and_descendant_ids.pluck(:id).sort,
all_project_ids: group.all_projects.pluck(:id).sort
}
end
# Enable caching, create outdated records
Namespaces::Descendants.delete_all
Group.all.each do |group|
Namespaces::Descendants.create!(
namespace_id: group.id,
outdated_at: Time.current
)
end
# Invoke the worker
loop do
Namespaces::ProcessOutdatedNamespaceDescendantsCronWorker.new.perform
break if Namespaces::Descendants.where('outdated_at is not null').count == 0
end
# Collect the cached data
cached_values = {}
Group.all.each do |group|
cache = Namespaces::Descendants.find(group.id)
cached_values[group.id] = {
self_and_descendant_group_ids: cache.self_and_descendant_group_ids,
all_project_ids: cache.all_project_ids
}
end
# Compare the values
puts cached_values == non_cached_values
# cleanup
Namespaces::Descendants.delete_all
Related to #428500 (closed)