Skip to content

Add UNION optimization for keyset pagination

Adam Hegyi requested to merge union-optimization-for-keyset-pagination into master

What does this MR do?

This MR adds UNION optimization for keyset paginated queries. The optimization is currently not enabled by default.

Additionally there is an Iterator utility class which behaves like EachBatch, this will be very useful for running batched queries. (It is needed for this MR: !59175 (merged))

Why

Keyset pagination is an efficient way to iterate over records in batches, however it has some limitations. When ordering by more than one column (tie breaker order) the generated queries are not efficient enough due to the OR conditions.

Example:

SELECT "projects".*
FROM "projects"
WHERE (("projects"."id" < 7
        AND "projects"."name" = 'Typeahead.Js')
       OR ("projects"."name" > 'Typeahead.Js')
       OR ("projects"."name" IS NULL))
ORDER BY "projects"."name" ASC,
         "projects"."id" DESC
LIMIT 5

If we have the correct index available we can turn this query to a UNION query which significantly improves the performance of the queries.

Example:

SELECT "projects".*
FROM (
        (SELECT "projects".*
         FROM "projects"
         WHERE ("projects"."id" < 7
                AND "projects"."name" = 'Typeahead.Js')
         ORDER BY "projects"."name" ASC, "projects"."id" DESC
         LIMIT 5)
      UNION ALL
        (SELECT "projects".*
         FROM "projects"
         WHERE ("projects"."name" > 'Typeahead.Js')
         ORDER BY "projects"."name" ASC, "projects"."id" DESC
         LIMIT 5)
      UNION ALL
        (SELECT "projects".*
         FROM "projects"
         WHERE ("projects"."name" IS NULL)
         ORDER BY "projects"."name" ASC, "projects"."id" DESC
         LIMIT 5)) projects
ORDER BY "projects"."name" ASC,
         "projects"."id" DESC
LIMIT 5

With the correct index, this query will scan maximum 3 x LIMIT rows from the index.

Snippet

This snippet generates the queries with and without UNION optimization

max_pages = 5
original_scope = Project.order(name: :asc, id: :desc)
scope = Gitlab::Pagination::Keyset::SimpleOrderBuilder.build(original_scope).first

i = 0
Gitlab::Pagination::Keyset::Iterator.new(scope: scope).each_batch(of: 5) do |records|
  break if i == max_pages

  puts records.map(&:id)

  i += 1
end

i = 0

Gitlab::Pagination::Keyset::Iterator.new(scope: scope, use_union_optimization: true).each_batch(of: 5) do |records|
  break if i == max_pages

  puts records.map(&:id)

  i += 1
end

Plans

Before: https://explain.depesz.com/s/bUwe

  • Notice that the index was picked up but we have Rows Removed by Filter: 25
  • This number will increase as we paginate forward, so at some point it will just time out

After: https://explain.depesz.com/s/ADOG

  • No filters
  • There is an extra in memory sorting after the UNION query which will be pretty cheap. Sorts maximum 3 x LIMIT rows.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Michael Kozono

Merge request reports