Skip to content

Add recursive approach for Namespace#all_projects

Corinna Gogolok requested to merge refactor_namespace_all_projects into master

What does this MR do?

Part of #260327 (closed)

Add another approach to load all projects for a namespace. The new approach is a recursive approach instead of the route path wildcard approach. It is behind the feature flag :recursive_approach_for_all_projects.

The rollout issue for the feature flag can be found in #263442 (closed).

Query plan:

Formatted Query:
EXPLAIN SELECT
    "projects".*
FROM "projects"
WHERE "projects"."namespace_id" IN (
    WITH RECURSIVE "base_and_descendants" AS
    (
        (
            SELECT "namespaces".*
            FROM "namespaces"
            WHERE "namespaces"."type" = 'Group'
                AND "namespaces"."id" = 9970
        )
        UNION
        (
            SELECT "namespaces".* 
            FROM "namespaces", "base_and_descendants"
            WHERE "namespaces"."type" = 'Group' 
                AND "namespaces"."parent_id" = "base_and_descendants"."id"
        )
    )
    SELECT id FROM "base_and_descendants" AS "namespaces"
)
Explain Output (https://explain.depesz.com/s/IUad):
Nested Loop  (cost=1422.75..5681.77 rows=2725 width=727) (actual time=704.658..3313.697 rows=1101 loops=1)
   Buffers: shared hit=964 read=1660 dirtied=103
   I/O Timings: read=3241.175
   ->  HashAggregate  (cost=1422.32..1423.83 rows=151 width=4) (actual time=695.635..696.126 rows=186 loops=1)
         Group Key: namespaces.id
         Buffers: shared hit=605 read=319
         I/O Timings: read=685.938
         ->  CTE Scan on base_and_descendants namespaces  (cost=1417.41..1420.43 rows=151 width=4) (actual time=5.148..695.222 rows=186 loops=1)
               Buffers: shared hit=605 read=319
               I/O Timings: read=685.938
               CTE base_and_descendants
                 ->  Recursive Union  (cost=0.43..1417.41 rows=151 width=346) (actual time=5.144..694.302 rows=186 loops=1)
                       Buffers: shared hit=605 read=319
                       I/O Timings: read=685.938
                       ->  Index Scan using index_namespaces_on_type_and_id_partial on public.namespaces namespaces_1  (cost=0.43..3.45 rows=1 width=346) (actual time=5.120..5.121 rows=1 loops=1)
                             Index Cond: (((namespaces_1.type)::text = 'Group'::text) AND (namespaces_1.id = 9970))
                             Buffers: shared read=4
                             I/O Timings: read=5.080
                       ->  Nested Loop  (cost=0.56..141.09 rows=15 width=346) (actual time=7.047..137.422 rows=37 loops=5)
                             Buffers: shared hit=605 read=315
                             I/O Timings: read=680.858
                             ->  WorkTable Scan on base_and_descendants  (cost=0.00..0.20 rows=10 width=4) (actual time=0.001..0.028 rows=37 loops=5)
                             ->  Index Scan using index_namespaces_on_parent_id_and_id on public.namespaces namespaces_2  (cost=0.56..14.07 rows=2 width=346) (actual time=2.011..3.689 rows=1 loops=186)
                                   Index Cond: (namespaces_2.parent_id = base_and_descendants.id)
                                   Filter: ((namespaces_2.type)::text = 'Group'::text)
                                   Rows Removed by Filter: 0
                                   Buffers: shared hit=605 read=315
                                   I/O Timings: read=680.858
   ->  Index Scan using index_projects_on_namespace_id_and_id on public.projects  (cost=0.43..28.02 rows=18 width=727) (actual time=3.478..14.058 rows=6 loops=186)
         Index Cond: (projects.namespace_id = namespaces.id)
         Buffers: shared hit=359 read=1341 dirtied=103
         I/O Timings: read=2555.237
Summary:
Cold cache:
Time: 3.317 s
  - planning: 1.765 ms
  - execution: 3.315 s
    - I/O read: 3.241 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 964 (~7.50 MiB) from the buffer pool
  - reads: 1660 (~13.00 MiB) from the OS file cache, including disk I/O
  - dirtied: 103 (~824.00 KiB)
  - writes: 0
Warm cache:
Time: 7.747 ms
  - planning: 1.380 ms
  - execution: 6.367 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 2588 (~20.20 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Some additional info provided by @ahegyi in #260327 (comment 427899547) why we want to move away from the path based lookup to the recursive approach:

On the long term we should move away from the path based lookup and have one standard way to look up nested groups. Hopefully !36025 (closed) will land soon which should improve the performance of self_and_descendants further.

gitlab-org namespace:

gbobject namespace (using the suggested index):

A group with 40K projects is probably one of the "worst" case scenario.

Note: all executions are "cached".

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Corinna Gogolok

Merge request reports