Optimize top-bound lineage search to use index-only scan (!144835) · Merge requests · GitLab.org / GitLab

Brian Williams requested to merge bwill/optimize-top-bound-lineage-search into master Feb 15, 2024

What does this MR do and why?

During backend pair programming (timestamp 15:54), I demonstrated a method of reading a namespace hierarchy in a linear manner by using >= and < operators on traversal_ids arrays. It was noted that this is different from the current query which uses @>, and I kinda glossed over it saying that they do the same thing. While both methods do have the same outcome, they actually achieve them in different ways and I was curious which one was better. So, I tested both methods in DB lab and compared them. >= and < can work with b-tree indices, while @> must use a gin index. When we use a b-tree index, we can perform an index-only scan which saves us a few MB of memory.

How the current method (`@>`) works

The current method uses the clause WHERE traversal_ids @> '{namespace_id}'. This means "find all traversal_ids that contain namespace_id". It's equivalent to the following ruby code:

namespaces.find { |namespace| namespace.traversal_ids.include?(id) }

Given a hierarchy like the following:

 id  | traversal_ids 
-----+---------------
 108 | {108}
 109 | {108,109}
 110 | {108,109,110}
 111 | {108,111}
 112 | {108,111,112}
 113 | {108,113}
 114 | {108,113,114}

To find all namespaces under 108, we look for arrays containing 108. To find namespaces under 109, we look for arrays containing 109, and so on.

This works because each namespace can only have one parent and so we don't need to worry about something like {999, 109} appearing in the table. However, it means that we need to search the whole traveral_ids array for each row.

How the new method (`>= AND <`) works

The new method uses the clause WHERE traversal_ids >= '{traversal_ids}' AND traversal_ids < {traversal_ids_with_rightmost_number_incremented}. Postgres knows how to compare arrays and will do so by comparing each item from left-to-right. So, when we consider a table like this:

 id  | traversal_ids 
-----+---------------
 81  | {81}
 83  | {81,83}
 85  | {81,83,85}
 87  | {81,83,84,87}
 88  | {81,83,85,88}
 84  | {81,83,84}
 108 | {108}
 109 | {108,109}
 110 | {108,109,110}
 111 | {108,111}
 112 | {108,111,112}
 113 | {108,113}
 114 | {108,113,114}

If we SELECT * FROM namespaces WHERE traversal_ids >= '{108}' AND traversal_ids < '{109}'; Postgres will check the first element of every array in the traversal_ids columns and read it if it is 108 and skip it if it is not.

If we SELECT * FROM namespaces WHERE traversal_ids >= '{108, 109}' AND traversal_ids < '{108, 110}';, then Postgres will do the following:

Check if the first element is 108, skip the row if not
Check if the second element is 109, skip the row if not
Read the row

This is faster because when checking row 88, we can skip that row after checking the first element, while @> will need to check the entire array.

Query comparisons

Description	Query	Postgres.ai link	Execution time	Buffer Reads
`gitlab-org` before	`SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids @> '{9970}';`	https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/26298/commands/82646	702.698 ms (684.318 ms on I/O reads)	~7.00 MiB
`gitlab-org` after	`SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids >= '{9970}' AND traversal_ids < '{9971}';`	https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/26298/commands/82648	27.121 ms (23.910 ms on I/O reads)	~1.50 MiB
`gitlab-org/govern` before	`SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids @> '{11787569}';`	https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/26298/commands/82663	1.044 s (1.022 s on I/O reads)	~2.20 MiB
`gitlab-org/govern` after	`SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids >= '{9970, 11787569}' AND traversal_ids < '{9970, 11787570}';`	https://postgres.ai/console/gitlab/gitlab-production-main/sessions/26319/commands/82666	108.029 ms (102.110 ms on I/O reads)	~496.00 KiB
4249178 (customer with deep hierarchy) before	`SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids @> '{4249178}';`	https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/26298/commands/82664	15.316 s (14.797 s on I/O reads)	~61.50 MiB
4249178 (customer with deep hierarchy) after	`SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids >= '{4249178}' AND traversal_ids < '{4249179}';`	https://postgres.ai/console/gitlab/gitlab-production-main/sessions/26319/commands/82667	3.883 s (3.750 s on I/O reads)	~11.90 MiB
5778500 (customer with many projects) before	`SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids @> '{5778500}';`	https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/26298/commands/82665	2.123 s (2.064 s on I/O reads)	~7.50 MiB
5778500 (customer with many projects) after	`SELECT namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id FROM namespaces WHERE namespaces.type = 'Group' AND traversal_ids >= '{5778500}' AND traversal_ids < '{5778501}';`	https://postgres.ai/console/gitlab/gitlab-production-main/sessions/26319/commands/82669	487.244 ms (476.877 ms on I/O reads)	~1.70 MiB

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited Feb 20, 2024 by Brian Williams

Optimize top-bound lineage search to use index-only scan

What does this MR do and why?

How the current method (@>) works

How the new method (>= AND <) works

Query comparisons

MR acceptance checklist

Merge request reports

How the current method (`@>`) works

How the new method (`>= AND <`) works