Skip to content

Implement cached hierarchical queries

Adam Hegyi requested to merge 428488-implement-new-hierarchy-queries into master

What does this MR do and why?

Note: the query changes are behind a feature flag.

This MR implements cached hierarchical queries. There are two methods that are overridden:

  • self_and_descendant_ids
  • all_project_ids: here I changed the original method to return AR scope.

How:

The database query implements an OR condition based on the availability of the cache. If there is an up-to-date Namespaces::Descendants record for the given namespace, then take take the self_and_descendant_group_ids or all_project_ids array and unnest them. These will be the yielded values.

When the cache is outdated or not available, fall back to the original implementation that uses the traversal_ids column. The OR condition is implemented with COALESCE: COALESCE(id_array_from_cache, id_array_from_original_implementation). The trick here is SQL won't evaluate the 2nd query if the 1st query returns records.

Simple demonstration:

-- Cached case
SELECT COALESCE((SELECT ARRAY[1,2,3]), (SELECT ARRAY[1] FROM pg_sleep(5)));

-- Uncached case, runs for 5 s
SELECT COALESCE((SELECT null::integer[]), (SELECT ARRAY[1] FROM pg_sleep(5)));

Database

Self and descendant ids:

Cache was populated with this query:

INSERT INTO namespace_descendants (namespace_id, outdated_at, self_and_descendant_group_ids, all_project_ids)
SELECT 9970,
       NULL, 
(
  -- subgroups
SELECT array_agg(namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] ORDER BY 1 ASC)
FROM "namespaces"
WHERE "namespaces"."type" = 'Group'
  AND (traversal_ids @> ('{9970}'))
), 
(
 -- projects
SELECT array_agg(projects.id ORDER BY 1 ASC)
FROM "projects"
WHERE "projects"."namespace_id" IN
    (SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
     FROM "namespaces"
     WHERE "namespaces"."type" = 'Group'
       AND (traversal_ids @> ('{9970}')))
)

All project ids

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

Cached and uncached queries should return the same data:

group = Group.find_by_path("gitlab-org")

group_ids = group.self_and_descendant_ids.pluck(:id)
project_ids = group.all_projects.pluck(:id)

Feature.enable(:group_hierarchy_optimization)

cached_group_ids = group.self_and_descendant_ids.pluck(:id)
cached_project_ids = group.all_projects.pluck(:id)

puts group_ids == cached_group_ids
puts project_ids == cached_project_ids

Related to #428488 (closed)

Edited by Adam Hegyi

Merge request reports