Query multiple group descendants at once
What does this MR do?
Provide the ability to perform traversal queries straight off an ActiveRecord::Relation
object.
Currently, we can query a single Group
which will flip between a linear or recursive algorithm depending on the use_traversal_ids
feature flag:
group.self_and_descendants
Or, we can query an ActiveRecord::Relation
object recursively:
Gitlab::ObjectHierarchy.new(owned_groups).base_and_descendants
This format is a bit cumbersome and not possible for linear queries.
With this MR we can query multiple groups at once easily:
some_groups.self_and_descendants
Which will flip between linear or recursive based on the use_traversal_ids
feature flag.
Along with .self_and_descendants
I've also added a .self_and_descendant_ids
scope along with .as_ids
and .without_sti_condition
as supporting scopes.
SQL Changes
.as_ids
Namespace.where(id: [1,2,3]).as_ids
Linear
SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id
FROM "namespaces"
WHERE "namespaces"."id" IN (1, 2, 3)
Recursive
SELECT "namespaces"."id"
FROM "namespaces"
WHERE "namespaces"."id" IN (1, 2, 3)
.self_and_descendants
Namespace.where(id: [1,2,3]).self_and_descendants
Only the Linear and Recursive versions exist in code.
Linear
SELECT
"namespaces".*
FROM
(
SELECT
DISTINCT on(namespaces.id) namespaces.*
FROM
namespaces,
(
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1,2,3)
) base
WHERE
(
namespaces.traversal_ids @> ARRAY[base.id]
)
) namespaces
Time: 113.256 ms
- planning: 3.369 ms
- execution: 109.887 ms
- I/O read: 78.460 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 99 (~792.00 KiB) from the buffer pool
- reads: 2395 (~18.70 MiB) from the OS file cache, including disk I/O
- dirtied: 60 (~480.00 KiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5637/commands/19131
Recursive
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT
"namespaces".*
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1, 2, 3)
)
UNION
(
SELECT
"namespaces".*
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."parent_id" = "base_and_descendants"."id"
)
)
SELECT
"namespaces".*
FROM
"base_and_descendants" AS "namespaces"
Time: 187.805 ms
- planning: 3.671 ms
- execution: 184.134 ms
- I/O read: 121.994 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 25670 (~200.50 MiB) from the buffer pool
- reads: 3670 (~28.70 MiB) from the OS file cache, including disk I/O
- dirtied: 62 (~496.00 KiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5637/commands/19107
Alternative plans that complement MR discussions
Linear with duplicates
SELECT
"namespaces".*
FROM
namespaces,
(
SELECT
"namespaces"."traversal_ids"
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1,2,3)
) base
WHERE
(
namespaces.traversal_ids @> base.traversal_ids
)
Time: 148.181 ms
- planning: 3.476 ms
- execution: 144.705 ms
- I/O read: 118.994 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 115 (~920.00 KiB) from the buffer pool
- reads: 4455 (~34.80 MiB) from the OS file cache, including disk I/O
- dirtied: 2560 (~20.00 MiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5213/commands/18573
Linear without duplicates without type='Group' condition
10% slower than Linear without duplicates from above which includes WHERE type='Group'
condition.
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1) ] AS id
FROM
namespaces,
(
SELECT
jsonb_array_elements(
traversal_ids_superset(
(
SELECT
jsonb_agg(traversal_ids)
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1,2,3)
)
)
) as traversal_ids
) base
WHERE
namespaces.traversal_ids @> jsonb_array_castint(base.traversal_ids)
Time: 107.645 ms
- planning: 2.791 ms
- execution: 104.854 ms
- I/O read: 80.495 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 209 (~1.60 MiB) from the buffer pool
- reads: 2411 (~18.80 MiB) from the OS file cache, including disk I/O
- dirtied: 47 (~376.00 KiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5503/commands/18936
DISTINCT without type='Group' condition
SELECT DISTINCT "namespaces".*
FROM
namespaces,
(
SELECT id
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1,2,3)
) base
WHERE
namespaces.traversal_ids @> ARRAY[base.id]
Time: 157.216 ms
- planning: 3.837 ms
- execution: 153.379 ms
- I/O read: 98.318 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 122 (~976.00 KiB) from the buffer pool
- reads: 2411 (~18.80 MiB) from the OS file cache, including disk I/O
- dirtied: 47 (~376.00 KiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5503/commands/18926
DISTINCT with type='Group' condition
SELECT DISTINCT "namespaces".*
FROM
namespaces,
(
SELECT id
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1,2,3)
) base
WHERE
namespaces.type = 'Group' AND
namespaces.traversal_ids @> ARRAY[base.id]
First run, very slow for some reason:
Time: 4.873 s
- planning: 3.910 ms
- execution: 4.869 s
- I/O read: 3.829 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 24288 (~189.80 MiB) from the buffer pool
- reads: 26523 (~207.20 MiB) from the OS file cache, including disk I/O
- dirtied: 39 (~312.00 KiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5503/commands/18943
Second run (after reset):
Time: 1.474 s
- planning: 4.169 ms
- execution: 1.470 s
- I/O read: 603.074 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 24288 (~189.80 MiB) from the buffer pool
- reads: 26523 (~207.20 MiB) from the OS file cache, including disk I/O
- dirtied: 39 (~312.00 KiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5503/commands/18945
.self_and_descendant_ids
Only the Linear without duplicates and Recursive versions exist in code. The Linear with duplicates has been included for interests sake.
Namespace.where(id: [1,2,3]).self_and_descendant_ids
Linear without duplicates
SELECT
DISTINCT namespaces.id
FROM
namespaces,
(
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1,2,3)
) base
WHERE
(
namespaces.traversal_ids @> ARRAY[base.id]
)
Time: 110.481 ms
- planning: 2.409 ms
- execution: 108.072 ms
- I/O read: 79.643 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 96 (~768.00 KiB) from the buffer pool
- reads: 2395 (~18.70 MiB) from the OS file cache, including disk I/O
- dirtied: 60 (~480.00 KiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5637/commands/19136
Recursive
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT
"namespaces".*
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1,2,3)
)
UNION
(
SELECT
"namespaces".*
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."parent_id" = "base_and_descendants"."id"
)
)
SELECT
id
FROM
"base_and_descendants" AS "namespaces"
Time: 224.935 ms
- planning: 3.640 ms
- execution: 221.295 ms
- I/O read: 159.523 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 28195 (~220.30 MiB) from the buffer pool
- reads: 5576 (~43.60 MiB) from the OS file cache, including disk I/O
- dirtied: 2971 (~23.20 MiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5213/commands/18564
Linear with duplicates
SELECT
namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1) ] AS id
FROM
namespaces,
(
SELECT
"namespaces"."traversal_ids"
FROM
"namespaces"
WHERE
"namespaces"."id" IN (1,2,3)
) base
WHERE
(
namespaces.traversal_ids @ > base.traversal_ids
)
Time: 140.360 ms
- planning: 2.612 ms
- execution: 137.748 ms
- I/O read: 114.227 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 115 (~920.00 KiB) from the buffer pool
- reads: 4455 (~34.80 MiB) from the OS file cache, including disk I/O
- dirtied: 2560 (~20.00 MiB)
- writes: 0
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/5213/commands/18558
.without_sti_condition
There is only one version of this scope.
Group.where(id: [1,2,3]).without_sti_condition
Linear
SELECT "namespaces".*
FROM "namespaces"
WHERE "namespaces"."id" IN (1, 2, 3)
Recursive
SELECT "namespaces".*
FROM "namespaces"
WHERE "namespaces"."id" IN (1, 2, 3)
Timings above were performed using actual groups. Please click through to the specific postgres.ai links for further details.
This work is part of the linear namespace queries work to improve performance of group related queries. The work here is gated behind the <code data-sourcepos="492:194-492:210">use_traversal_ids</code> feature flag.
Screenshots or Screencasts (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #335733 (closed)