Skip to content

Query multiple group descendants at once

Alex Pooley requested to merge 335733-recursive-ci-runners-query into master

What does this MR do?

Provide the ability to perform traversal queries straight off an ActiveRecord::Relation object.

Currently, we can query a single Group which will flip between a linear or recursive algorithm depending on the use_traversal_ids feature flag:

group.self_and_descendants

Or, we can query an ActiveRecord::Relation object recursively:

Gitlab::ObjectHierarchy.new(owned_groups).base_and_descendants

This format is a bit cumbersome and not possible for linear queries.

With this MR we can query multiple groups at once easily:

some_groups.self_and_descendants

Which will flip between linear or recursive based on the use_traversal_ids feature flag.

Along with .self_and_descendants I've also added a .self_and_descendant_ids scope along with .as_ids and .without_sti_condition as supporting scopes.

SQL Changes

.as_ids

Namespace.where(id: [1,2,3]).as_ids

Linear

SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id 
FROM "namespaces" 
WHERE "namespaces"."id" IN (1, 2, 3)

Recursive

SELECT "namespaces"."id" 
FROM "namespaces" 
WHERE "namespaces"."id" IN (1, 2, 3)

.self_and_descendants

Namespace.where(id: [1,2,3]).self_and_descendants

Only the Linear and Recursive versions exist in code.

Linear

SELECT 
  "namespaces".* 
FROM 
  (
    SELECT 
      DISTINCT on(namespaces.id) namespaces.* 
    FROM 
      namespaces, 
      (
        SELECT 
          "namespaces"."id" 
        FROM 
          "namespaces" 
        WHERE 
          "namespaces"."id" IN (1,2,3)
      ) base 
    WHERE 
      (
        namespaces.traversal_ids @> ARRAY[base.id]
      )
  ) namespaces
Time: 113.256 ms  
  - planning: 3.369 ms  
  - execution: 109.887 ms  
    - I/O read: 78.460 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 99 (~792.00 KiB) from the buffer pool  
  - reads: 2395 (~18.70 MiB) from the OS file cache, including disk I/O  
  - dirtied: 60 (~480.00 KiB)  
  - writes: 0

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5637/commands/19131

Recursive

WITH RECURSIVE "base_and_descendants" AS (
  (
    SELECT 
      "namespaces".* 
    FROM 
      "namespaces" 
    WHERE 
      "namespaces"."id" IN (1, 2, 3)
  ) 
  UNION 
    (
      SELECT 
        "namespaces".* 
      FROM 
        "namespaces", 
        "base_and_descendants" 
      WHERE 
        "namespaces"."parent_id" = "base_and_descendants"."id"
    )
) 
SELECT 
  "namespaces".* 
FROM 
  "base_and_descendants" AS "namespaces"
Time: 187.805 ms  
  - planning: 3.671 ms  
  - execution: 184.134 ms  
    - I/O read: 121.994 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 25670 (~200.50 MiB) from the buffer pool  
  - reads: 3670 (~28.70 MiB) from the OS file cache, including disk I/O  
  - dirtied: 62 (~496.00 KiB)  
  - writes: 0    

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5637/commands/19107

Alternative plans that complement MR discussions

Linear with duplicates

SELECT 
  "namespaces".* 
FROM 
  namespaces, 
  (
    SELECT 
      "namespaces"."traversal_ids" 
    FROM 
      "namespaces" 
    WHERE 
      "namespaces"."id" IN (1,2,3)
  ) base 
WHERE 
  (
    namespaces.traversal_ids @> base.traversal_ids
  )
Time: 148.181 ms  
  - planning: 3.476 ms  
  - execution: 144.705 ms  
    - I/O read: 118.994 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 115 (~920.00 KiB) from the buffer pool  
  - reads: 4455 (~34.80 MiB) from the OS file cache, including disk I/O  
  - dirtied: 2560 (~20.00 MiB)  
  - writes: 0  

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5213/commands/18573

Linear without duplicates without type='Group' condition

10% slower than Linear without duplicates from above which includes WHERE type='Group' condition.

SELECT 
  namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1) ] AS id 
FROM 
  namespaces, 
  (
    SELECT 
      jsonb_array_elements(
        traversal_ids_superset(
          (
            SELECT 
              jsonb_agg(traversal_ids) 
            FROM 
              "namespaces" 
            WHERE 
              "namespaces"."id" IN (1,2,3)
          )
        )
      ) as traversal_ids
  ) base 
WHERE 
    namespaces.traversal_ids @> jsonb_array_castint(base.traversal_ids)
Time: 107.645 ms  
  - planning: 2.791 ms  
  - execution: 104.854 ms  
    - I/O read: 80.495 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 209 (~1.60 MiB) from the buffer pool  
  - reads: 2411 (~18.80 MiB) from the OS file cache, including disk I/O  
  - dirtied: 47 (~376.00 KiB)  
  - writes: 0

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5503/commands/18936

DISTINCT without type='Group' condition

SELECT DISTINCT "namespaces".* 
FROM 
  namespaces, 
  (
    SELECT id
    FROM 
      "namespaces" 
    WHERE 
      "namespaces"."id" IN (1,2,3)
  ) base 
WHERE 
  namespaces.traversal_ids @> ARRAY[base.id]
Time: 157.216 ms  
  - planning: 3.837 ms  
  - execution: 153.379 ms  
    - I/O read: 98.318 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 122 (~976.00 KiB) from the buffer pool  
  - reads: 2411 (~18.80 MiB) from the OS file cache, including disk I/O  
  - dirtied: 47 (~376.00 KiB)  
  - writes: 0  

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5503/commands/18926

DISTINCT with type='Group' condition

SELECT DISTINCT "namespaces".* 
FROM 
  namespaces, 
  (
    SELECT id
    FROM 
      "namespaces" 
    WHERE 
      "namespaces"."id" IN (1,2,3)
  ) base 
WHERE 
  namespaces.type = 'Group' AND
  namespaces.traversal_ids @> ARRAY[base.id]

First run, very slow for some reason:

Time: 4.873 s  
  - planning: 3.910 ms  
  - execution: 4.869 s  
    - I/O read: 3.829 s  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 24288 (~189.80 MiB) from the buffer pool  
  - reads: 26523 (~207.20 MiB) from the OS file cache, including disk I/O  
  - dirtied: 39 (~312.00 KiB)  
  - writes: 0  

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5503/commands/18943

Second run (after reset):

Time: 1.474 s  
  - planning: 4.169 ms  
  - execution: 1.470 s  
    - I/O read: 603.074 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 24288 (~189.80 MiB) from the buffer pool  
  - reads: 26523 (~207.20 MiB) from the OS file cache, including disk I/O  
  - dirtied: 39 (~312.00 KiB)  
  - writes: 0

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5503/commands/18945

.self_and_descendant_ids

Only the Linear without duplicates and Recursive versions exist in code. The Linear with duplicates has been included for interests sake.

Namespace.where(id: [1,2,3]).self_and_descendant_ids

Linear without duplicates

SELECT 
  DISTINCT namespaces.id 
FROM 
  namespaces, 
  (
    SELECT 
      "namespaces"."id" 
    FROM 
      "namespaces" 
    WHERE 
      "namespaces"."id" IN (1,2,3)
  ) base 
WHERE 
  (
    namespaces.traversal_ids @> ARRAY[base.id]
  )
Time: 110.481 ms  
  - planning: 2.409 ms  
  - execution: 108.072 ms  
    - I/O read: 79.643 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 96 (~768.00 KiB) from the buffer pool  
  - reads: 2395 (~18.70 MiB) from the OS file cache, including disk I/O  
  - dirtied: 60 (~480.00 KiB)  
  - writes: 0 

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5637/commands/19136

Recursive

WITH RECURSIVE "base_and_descendants" AS (
  (
    SELECT 
      "namespaces".* 
    FROM 
      "namespaces" 
    WHERE 
      "namespaces"."id" IN (1,2,3)
  ) 
  UNION 
    (
      SELECT 
        "namespaces".* 
      FROM 
        "namespaces", 
        "base_and_descendants" 
      WHERE 
        "namespaces"."parent_id" = "base_and_descendants"."id"
    )
) 
SELECT 
  id 
FROM 
  "base_and_descendants" AS "namespaces"
Time: 224.935 ms  
  - planning: 3.640 ms  
  - execution: 221.295 ms  
    - I/O read: 159.523 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 28195 (~220.30 MiB) from the buffer pool  
  - reads: 5576 (~43.60 MiB) from the OS file cache, including disk I/O  
  - dirtied: 2971 (~23.20 MiB)  
  - writes: 0  

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5213/commands/18564

Linear with duplicates

SELECT 
  namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1) ] AS id 
FROM 
  namespaces, 
  (
    SELECT 
      "namespaces"."traversal_ids" 
    FROM 
      "namespaces" 
    WHERE 
      "namespaces"."id" IN (1,2,3)
  ) base 
WHERE 
  (
    namespaces.traversal_ids @ > base.traversal_ids
  )
Time: 140.360 ms  
  - planning: 2.612 ms  
  - execution: 137.748 ms  
    - I/O read: 114.227 ms  
    - I/O write: 0.000 ms  
  
Shared buffers:  
  - hits: 115 (~920.00 KiB) from the buffer pool  
  - reads: 4455 (~34.80 MiB) from the OS file cache, including disk I/O  
  - dirtied: 2560 (~20.00 MiB)  
  - writes: 0    

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5213/commands/18558

.without_sti_condition

There is only one version of this scope.

Group.where(id: [1,2,3]).without_sti_condition

Linear

SELECT "namespaces".* 
FROM "namespaces" 
WHERE "namespaces"."id" IN (1, 2, 3)

Recursive

SELECT "namespaces".* 
FROM "namespaces" 
WHERE "namespaces"."id" IN (1, 2, 3)

Timings above were performed using actual groups. Please click through to the specific postgres.ai links for further details.


This work is part of the linear namespace queries work to improve performance of group related queries. The work here is gated behind the <code data-sourcepos="492:194-492:210">use_traversal_ids</code> feature flag.

Screenshots or Screencasts (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #335733 (closed)

Edited by Alex Pooley

Merge request reports