Add UNION optimization for keyset pagination
What does this MR do?
This MR adds UNION optimization for keyset paginated queries. The optimization is currently not enabled by default.
Additionally there is an Iterator
utility class which behaves like EachBatch, this will be very useful for running batched queries. (It is needed for this MR: !59175 (merged))
Why
Keyset pagination is an efficient way to iterate over records in batches, however it has some limitations. When ordering by more than one column (tie breaker order) the generated queries are not efficient enough due to the OR
conditions.
Example:
SELECT "projects".*
FROM "projects"
WHERE (("projects"."id" < 7
AND "projects"."name" = 'Typeahead.Js')
OR ("projects"."name" > 'Typeahead.Js')
OR ("projects"."name" IS NULL))
ORDER BY "projects"."name" ASC,
"projects"."id" DESC
LIMIT 5
If we have the correct index available we can turn this query to a UNION
query which significantly improves the performance of the queries.
Example:
SELECT "projects".*
FROM (
(SELECT "projects".*
FROM "projects"
WHERE ("projects"."id" < 7
AND "projects"."name" = 'Typeahead.Js')
ORDER BY "projects"."name" ASC, "projects"."id" DESC
LIMIT 5)
UNION ALL
(SELECT "projects".*
FROM "projects"
WHERE ("projects"."name" > 'Typeahead.Js')
ORDER BY "projects"."name" ASC, "projects"."id" DESC
LIMIT 5)
UNION ALL
(SELECT "projects".*
FROM "projects"
WHERE ("projects"."name" IS NULL)
ORDER BY "projects"."name" ASC, "projects"."id" DESC
LIMIT 5)) projects
ORDER BY "projects"."name" ASC,
"projects"."id" DESC
LIMIT 5
With the correct index, this query will scan maximum 3 x LIMIT rows from the index.
Snippet
This snippet generates the queries with and without UNION optimization
max_pages = 5
original_scope = Project.order(name: :asc, id: :desc)
scope = Gitlab::Pagination::Keyset::SimpleOrderBuilder.build(original_scope).first
i = 0
Gitlab::Pagination::Keyset::Iterator.new(scope: scope).each_batch(of: 5) do |records|
break if i == max_pages
puts records.map(&:id)
i += 1
end
i = 0
Gitlab::Pagination::Keyset::Iterator.new(scope: scope, use_union_optimization: true).each_batch(of: 5) do |records|
break if i == max_pages
puts records.map(&:id)
i += 1
end
Plans
Before: https://explain.depesz.com/s/bUwe
- Notice that the index was picked up but we have
Rows Removed by Filter: 25
- This number will increase as we paginate forward, so at some point it will just time out
After: https://explain.depesz.com/s/ADOG
- No filters
- There is an extra in memory sorting after the UNION query which will be pretty cheap. Sorts maximum 3 x LIMIT rows.
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because not user facing change.
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team